# CryoDRGN User Guide

This page contains a guide to the **cryoDRGN** 🐉 ❄️ open-source software package, including in-depth tutorials for training and analyzing cryoDRGN models.

## Quick Start

CryoDRGN can be installed with `pip`, and we recommend installing `cryodrgn` in a separate anaconda environment.

<pre class="language-bash"><code class="lang-bash"><strong>$ conda create --name cryodrgn-env python=3.13
</strong>$ conda activate cryodrgn-env
(cryodrgn-env) $ pip install cryodrgn
</code></pre>

All cryoDRGN commands are accessed through the `cryodrgn` and `cryodrgn_utils` executables. Use the `-h` flag to display all available subcommands, and `cryodrgn <command> -h` to see the parameters for each command.

```bash
(cryodrgn-env) $ cryodrgn -h
(cryodrgn-env) $ cryodrgn_utils -h
```

See [installation](https://ez-lab.gitbook.io/cryodrgn/installation "mention") for more details and advanced installation instructions.

## Recent Releases

See [news-and-release-notes](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-user-guide/news-and-release-notes "mention") for more details!

#### Updates in Version 4.2.x ***(last release: Feb. 2026)***

* **\[NEW]** `cryodrgn abinit` command for cryoDRGN-AI *ab initio* reconstruction
  * cryoDRGN2 *ab initio* commands `abinit_homo` and `abinit_het` now deprecated as `abinit_homo_old` and `abinit_het_old`&#x20;
* adding support for Python 3.13 and PyTorch up to 2.9
* more memory-efficient *ab initio* reconstruction

#### Updates in Version 3.5.x  ***(last release: Nov. 2025)***

* 1-indexing of output volumes and epochs replacing the previous 0-indexing
* **\[NEW]** volume reconstruction using an autodecoder with `cryodrgn train_dec`  *(beta)*
* **\[NEW]** `cryodrgn parse_relion` for parsing RELION5 3D tomo files to the cryoDRGN 2D input format
* improved landscape analysis using Leiden clustering
* adding support for Python 3.12, deprecating support for Python 3.9
* **\[NEW]** consolidated `cryodrgn parse_star` command (merging `parse_pose_star` and `parse_ctf_star`)
* `analyze` is now run automatically on the final epoch once model training is complete
* faster backprojection and downsampling; faster landscape analysis with `--multigpu`

#### Updates in Version 3.x  *(initial release: Sep. 2023)*

The official release of [cryoDRGN-ET](https://www.biorxiv.org/content/10.1101/2023.08.18.553799v1) for heterogeneous subtomogram analysis.

* **\[NEW]** Heterogeneous reconstruction of subtomograms. See documentation [on gitbook](https://ez-lab.gitbook.io/cryodrgn/)
* Updated `cryodrgn backproject_voxel` for voxel-based homogeneous reconstruction
* Major refactor of dataset loading for handling large datasets

## Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of *discrete* methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a *continuous distribution* of density maps parameterized by a coordinate-based neural network.

{% embed url="<https://figshare.com/articles/media/S4_video_spliceosome/21170578>" %}
Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from [Zhong et al 2021](https://www.nature.com/articles/s41592-020-01049-4)
{% endembed %}

The inputs to a cryoDRGN training run are **1) extracted particle images**, **2) the CTF parameters** associated with each particle, and, optionally, **3) poses** for each particle from a C1 (asymmetric) 3D refinement. CryoDRGN2's *ab initio* reconstruction algorithms do not require input poses.

The final result of the software will be **1) latent embeddings** for each particle image in the form of a real-valued vector (usually denoted with z, and output as a `z.pkl` file by the software), and **2) neural network weights** modeling the distribution of density maps (parameterizing the function from z→V). Once trained, the software can reconstruct a 3D density map given a value of z.

**How do you interpret the resulting distribution of structures?** Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the `cryodrgn analyze` pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the `cryodrgn analyze` pipeline will produce **1) N density maps** sampled from different regions of the latent space (N=20, by default), **2) continuous trajectories** along the principal components axes of the latent space embeddings, and **3) visualizations of the latent space** with PCA and UMAP.

CryoDRGN also provides interactive tools to further explore the learned ensemble, implemented as **Jupyter notebooks** with interactive widgets for visualizing the dataset, extracting particles, and generating more volumes. Additional tools are also available that can generate trajectories given user-defined endpoints and convert particle selections to `.star` files for further refinement in other tools. An overview of these functionalities will be demonstrated in the tutorial.

Furthermore, because the model is trained to reconstruct *image heterogeneity,* any non-structural image heterogeneity that is not captured by the image formation model **(e.g. junk particles and artifacts)** can be reflected in the latent embeddings. In practice, junk particles are often easily identified in the latent embeddings and can then be filtered out. A jupyter notebook is provided to filter particle stacks.

**What settings should I use for training cryoDRGN networks?** Common hyper-parameters when training a cryoDRGN model are: **1) the size of the neural network**, which controls the capacity of the model, **2) the input image size**, which bounds the resolution information and greatly impacts the training speed and **3) the latent variable dimension**, which is the bottleneck layer that bounds the expressiveness of the model. The three parameters together all affect the expressiveness/complexity of the learned model. After exploring many real datasets, we provide reasonable defaults and recommended settings of these parameters for training.

## Input data requirements

* Extracted single particle images (in .mrcs/.cs/.star/.txt format), ideally clean from edge, ice, or hot pixel artifacts
* For cryoDRGN1, a C1 consensus reconstruction with:
  * High-quality CTF parameters
  * High-quality image poses (also called particle alignments)
* Image poses are not required for cryoDRGN2's *ab initio* reconstruction tools

## Tutorial overview

See [cryodrgn-empiar-10076-tutorial](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-empiar-10076-tutorial "mention") for a step-by-step guide for running cryoDRGN. This walkthrough of cryoDRGN analysis of the **assembling ribosome dataset (EMPIAR-10076)** covers all steps used to reproduce the analysis in [Zhong et al.](https://www.nature.com/articles/s41592-020-01049-4), including:

1. preprocessing of inputs,
2. initial cryoDRGN training and explanation of outputs,
3. particle filtering to remove junk particles,
4. high-resolution cryoDRGN training,
5. extracting particle subsets for traditional refinement, and
6. generation of trajectories.

For an abbreviated overview of the steps for running cryoDRGN, see the github [README](https://github.com/zhonge/cryodrgn).&#x20;

{% embed url="<https://figshare.com/articles/media/S3_video_LSU_assembly/23574909>" %}
SI Video 3 from [Zhong et al 2021](https://www.nature.com/articles/s41592-020-01049-4)
{% endembed %}

## References

For a complete description of the method, see:

* **CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks**\
  Ellen D. Zhong, Tristan Bepler, Bonnie Berger\*, Joseph H Davis\*\
  Nature Methods 2021, <https://doi.org/10.1038/s41592-020-01049-4> \[[pdf](https://ezlab.princeton.edu/assets/pdf/2021_cryodrgn_nature_methods.pdf)]

An earlier version of this work appeared at the International Conference of Learning Representations (ICLR):

* **Reconstructing continuous distributions of protein structure from cryo-EM images**\
  Ellen D. Zhong, Tristan Bepler, Joseph H. Davis\*, Bonnie Berger\*\
  ICLR 2020, Spotlight, <https://arxiv.org/abs/1909.05215>

The CryoDRGN-AI *ab initio* reconstruction algorithm is described here:

* **CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets**\
  Axel Levy, Rishwanth Raghu, Ryan Feathers, Michal Grzadkowski, Frederic Poitevin, Jake D. Johnston, Francesca Vallese, Oliver B. Clarke, Gordon Wetzstein, and Ellen D. Zhong\
  Nature Methods 2025, [nature.com/articles/s41592-025-02720-4](https://nature.com/articles/s41592-025-02720-4), \[[pdf](https://www.nature.com/articles/s41592-025-02720-4.pdf)]

CryoDRGN's *ab initio* reconstruction algorithms are described here:

* **CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images**\
  Ellen D. Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger\
  International Conference on Computer Vision (ICCV) 2021, \[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhong_CryoDRGN2_Ab_Initio_Neural_Reconstruction_of_3D_Protein_Structures_From_ICCV_2021_paper.pdf)]

A protocols paper that describes the analysis of the EMPIAR-10076 assembling ribosome dataset:

* **Uncovering structural ensembles from single particle cryo-EM data using cryoDRGN**\
  Laurel Kinman, Barrett Powell, Ellen D. Zhong\*, Bonnie Berger\*, Joseph H Davis\*\
  Nature Protocols 2023, <https://doi.org/10.1038/s41596-022-00763-x>&#x20;

CryoDRGN-ET for heterogeneous subtomogram analysis:

* **CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells**\
  Ramya Rangan, Ryan Feathers, Sagar Khavnekar, Adam Lerer, Jake Johnston, Ron Kelley, Martin Obr, Abhay Kotecha, Ellen D. Zhong\
  Nature Methods 2024, <https://www.nature.com/articles/s41592-024-02340-4> \[[pdf](https://ezlab.cs.princeton.edu/assets/pdf/2024_cryodrgnet.pdf)]
