# CryoDRGN User Guide

This page contains a guide to the **cryoDRGN** 🐉 ❄️ open-source software package, including in-depth tutorials for training and analyzing cryoDRGN models.

## Quick Start

CryoDRGN can be installed with `pip`, and we recommend installing `cryodrgn` in a separate anaconda environment.

<pre class="language-bash"><code class="lang-bash"><strong>$ conda create --name cryodrgn-env python=3.13
</strong>$ conda activate cryodrgn-env
(cryodrgn-env) $ pip install cryodrgn
</code></pre>

All cryoDRGN commands are accessed through the `cryodrgn` and `cryodrgn_utils` executables. Use the `-h` flag to display all available subcommands, and `cryodrgn <command> -h` to see the parameters for each command.

```bash
(cryodrgn-env) $ cryodrgn -h
(cryodrgn-env) $ cryodrgn_utils -h
```

See [Installation](/cryodrgn/installation.md) for more details and advanced installation instructions.

## Recent Releases

See [News and Release Notes](/cryodrgn/cryodrgn-user-guide/news-and-release-notes.md) for more details!

#### Updates in Version 4.2.x ***(last release: Apr. 2026)***

* **\[NEW]** `cryodrgn dashboard` command implementing an interactive dashboard web app for analyzing results
* **\[NEW]** `cryodrgn abinit` command for cryoDRGN-AI *ab initio* reconstruction
  * cryoDRGN2 *ab initio* commands `abinit_homo` and `abinit_het` now deprecated as `abinit_homo_old` and `abinit_het_old`&#x20;
* adding support for Python 3.13 and PyTorch up to 2.9
* more memory-efficient *ab initio* reconstruction

#### Updates in Version 3.5.x  ***(last release: Nov. 2025)***

* 1-indexing of output volumes and epochs replacing the previous 0-indexing
* **\[NEW]** volume reconstruction using an autodecoder with `cryodrgn train_dec`  *(beta)*
* **\[NEW]** `cryodrgn parse_relion` for parsing RELION5 3D tomo files to the cryoDRGN 2D input format
* improved landscape analysis using Leiden clustering
* adding support for Python 3.12, deprecating support for Python 3.9
* **\[NEW]** consolidated `cryodrgn parse_star` command (merging `parse_pose_star` and `parse_ctf_star`)
* `analyze` is now run automatically on the final epoch once model training is complete
* faster backprojection and downsampling; faster landscape analysis with `--multigpu`

#### Updates in Version 3.x  *(initial release: Sep. 2023)*

The official release of [cryoDRGN-ET](https://www.biorxiv.org/content/10.1101/2023.08.18.553799v1) for heterogeneous subtomogram analysis.

* **\[NEW]** Heterogeneous reconstruction of subtomograms. See documentation [on gitbook](https://ez-lab.gitbook.io/cryodrgn/)
* Updated `cryodrgn backproject_voxel` for voxel-based homogeneous reconstruction
* Major refactor of dataset loading for handling large datasets

## Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of *discrete* methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a *continuous distribution* of density maps parameterized by a coordinate-based neural network.

{% embed url="<https://figshare.com/articles/media/S4_video_spliceosome/21170578>" %}
Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from [Zhong et al 2021](https://www.nature.com/articles/s41592-020-01049-4)
{% endembed %}

The inputs to a cryoDRGN training run are **1) extracted particle images**, **2) the CTF parameters** associated with each particle, and, optionally, **3) poses** for each particle from a C1 (asymmetric) 3D refinement. CryoDRGN2's *ab initio* reconstruction algorithms do not require input poses.

The final result of the software will be **1) latent embeddings** for each particle image in the form of a real-valued vector (usually denoted with z, and output as a `z.pkl` file by the software), and **2) neural network weights** modeling the distribution of density maps (parameterizing the function from z→V). Once trained, the software can reconstruct a 3D density map given a value of z.

**How do you interpret the resulting distribution of structures?** Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the `cryodrgn analyze` pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the `cryodrgn analyze` pipeline will produce **1) N density maps** sampled from different regions of the latent space (N=20, by default), **2) continuous trajectories** along the principal components axes of the latent space embeddings, and **3) visualizations of the latent space** with PCA and UMAP.

CryoDRGN also provides interactive tools to further explore the learned ensemble, implemented as **Jupyter notebooks** with interactive widgets for visualizing the dataset, extracting particles, and generating more volumes. Additional tools are also available that can generate trajectories given user-defined endpoints and convert particle selections to `.star` files for further refinement in other tools. An overview of these functionalities will be demonstrated in the tutorial.

Furthermore, because the model is trained to reconstruct *image heterogeneity,* any non-structural image heterogeneity that is not captured by the image formation model **(e.g. junk particles and artifacts)** can be reflected in the latent embeddings. In practice, junk particles are often easily identified in the latent embeddings and can then be filtered out. A jupyter notebook is provided to filter particle stacks.

**What settings should I use for training cryoDRGN networks?** Common hyper-parameters when training a cryoDRGN model are: **1) the size of the neural network**, which controls the capacity of the model, **2) the input image size**, which bounds the resolution information and greatly impacts the training speed and **3) the latent variable dimension**, which is the bottleneck layer that bounds the expressiveness of the model. The three parameters together all affect the expressiveness/complexity of the learned model. After exploring many real datasets, we provide reasonable defaults and recommended settings of these parameters for training.

## Input data requirements

* Extracted single particle images (in .mrcs/.cs/.star/.txt format), ideally clean from edge, ice, or hot pixel artifacts
* For cryoDRGN1, a C1 consensus reconstruction with:
  * High-quality CTF parameters
  * High-quality image poses (also called particle alignments)
* Image poses are not required for cryoDRGN2's *ab initio* reconstruction tools

## Tutorial overview

See [CryoDRGN EMPIAR-10076 Tutorial](/cryodrgn/cryodrgn-empiar-10076-tutorial.md) for a step-by-step guide for running cryoDRGN. This walkthrough of cryoDRGN analysis of the **assembling ribosome dataset (EMPIAR-10076)** covers all steps used to reproduce the analysis in [Zhong et al.](https://www.nature.com/articles/s41592-020-01049-4), including:

1. preprocessing of inputs,
2. initial cryoDRGN training and explanation of outputs,
3. particle filtering to remove junk particles,
4. high-resolution cryoDRGN training,
5. extracting particle subsets for traditional refinement, and
6. generation of trajectories.

For an abbreviated overview of the steps for running cryoDRGN, see the github [README](https://github.com/zhonge/cryodrgn).&#x20;

{% embed url="<https://figshare.com/articles/media/S3_video_LSU_assembly/23574909>" %}
SI Video 3 from [Zhong et al 2021](https://www.nature.com/articles/s41592-020-01049-4)
{% endembed %}

## References

For a complete description of the method, see:

* **CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks**\
  Ellen D. Zhong, Tristan Bepler, Bonnie Berger\*, Joseph H Davis\*\
  Nature Methods 2021, <https://doi.org/10.1038/s41592-020-01049-4> \[[pdf](https://ezlab.princeton.edu/assets/pdf/2021_cryodrgn_nature_methods.pdf)]

An earlier version of this work appeared at the International Conference of Learning Representations (ICLR):

* **Reconstructing continuous distributions of protein structure from cryo-EM images**\
  Ellen D. Zhong, Tristan Bepler, Joseph H. Davis\*, Bonnie Berger\*\
  ICLR 2020, Spotlight, <https://arxiv.org/abs/1909.05215>

The CryoDRGN-AI *ab initio* reconstruction algorithm is described here:

* **CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets**\
  Axel Levy, Rishwanth Raghu, Ryan Feathers, Michal Grzadkowski, Frederic Poitevin, Jake D. Johnston, Francesca Vallese, Oliver B. Clarke, Gordon Wetzstein, and Ellen D. Zhong\
  Nature Methods 2025, [nature.com/articles/s41592-025-02720-4](https://nature.com/articles/s41592-025-02720-4), \[[pdf](https://www.nature.com/articles/s41592-025-02720-4.pdf)]

CryoDRGN's *ab initio* reconstruction algorithms are described here:

* **CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images**\
  Ellen D. Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger\
  International Conference on Computer Vision (ICCV) 2021, \[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhong_CryoDRGN2_Ab_Initio_Neural_Reconstruction_of_3D_Protein_Structures_From_ICCV_2021_paper.pdf)]

A protocols paper that describes the analysis of the EMPIAR-10076 assembling ribosome dataset:

* **Uncovering structural ensembles from single particle cryo-EM data using cryoDRGN**\
  Laurel Kinman, Barrett Powell, Ellen D. Zhong\*, Bonnie Berger\*, Joseph H Davis\*\
  Nature Protocols 2023, [https://doi.org/10.1038/s41596-022-00763-x](< https://doi.org/10.1038/s41596-022-00763-x>)&#x20;

CryoDRGN-ET for heterogeneous subtomogram analysis:

* **CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells**\
  Ramya Rangan, Ryan Feathers, Sagar Khavnekar, Adam Lerer, Jake Johnston, Ron Kelley, Martin Obr, Abhay Kotecha, Ellen D. Zhong\
  Nature Methods 2024, <https://www.nature.com/articles/s41592-024-02340-4> \[[pdf](https://ezlab.cs.princeton.edu/assets/pdf/2024_cryodrgnet.pdf)]


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ez-lab.gitbook.io/cryodrgn/cryodrgn-user-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
