# CryoDRGN User Guide

Welcome to cryoDRGN's documentation!

This page contains user guides for the **cryoDRGN** š āļø open-source software package, including in-depth tutorials for training and analyzing cryoDRGN models.

## Quick Start

CryoDRGN can be installed with `pip`

, and supports Python 3.9 and 3.10. We recommend installing `cryodrgn`

in a separate anaconda environment.

All cryoDRGN commands are accessed through the `cryodrgn`

and `cryodrgn_utils`

executables. Use the `-h`

flag to display all available subcommands, and `cryodrgn <command> -h`

to see the parameters for each command.

See Installation for more details and advanced installation instructions.

## Recent Highlights

April 2024:

**cryodrgn version 3.3.0**released! First non-beta release of cryoDRGN-ET, new command`cryodrgn direct_traversal`

, improved testing, and cleaned-up jupyter notebooksSept 2023: CryoDRGN-ET for subtomogram analysis is now available in beta, as

**cryodrgn version 3.0.0-beta**Jun 2023: Documentation clean up, available here on gitbook

May 2023:

**cryodrgn version 2.3.0**released with improvements and fixes to ab initio tools.Jan 2023:

**cryodrgn version 2.2.0**released with new ab initio reconstruction tools and more.July 2022:

**cryodrgn version 1.1.0**released with updated default architectureMay 2022:

**cryodrgn version 1.0.0**released with new landscape analysis, cryodrgn_utils, and more

See News and release notes for additional details.

## Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of *discrete* methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a *continuous distribution* of density maps parameterized by a coordinate-based neural network.

The inputs to a cryoDRGN training run are **1) extracted particle images**, **2) the CTF parameters** associated with each particle, and, optionally, **3) poses** for each particle from a C1 (asymmetric) 3D refinement. CryoDRGN2's *ab initio* reconstruction algorithms do not require input poses.

The final result of the software will be **1) latent embeddings** for each particle image in the form of a real-valued vector (usually denoted with z, and output as a `z.pkl`

file by the software), and **2) neural network weights** modeling the distribution of density maps (parameterizing the function from zāV). Once trained, the software can reconstruct a 3D density map given a value of z.

How do you interpret the resulting distribution of structures? Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the `cryodrgn analyze`

pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the `cryodrgn analyze`

pipeline will produce **1) N density maps** sampled from different regions of the latent space (N=20, by default), **2) continuous trajectories** along the principal components axes of the latent space embeddings, and **3) visualizations of the latent space** with PCA and UMAP.

CryoDRGN also provides interactive tools to further explore the learned ensemble, implemented as **Jupyter notebooks** with interactive widgets for visualizing the dataset, extracting particles, and generating more volumes. Additional tools are also available that can generate trajectories given user-defined endpoints and convert particle selections to `.star`

files for further refinement in other tools. An overview of these functionalities will be demonstrated in the tutorial.

Furthermore, because the model is trained to reconstruct *image heterogeneity,* any non-structural image heterogeneity that is not captured by the image formation model **(e.g. junk particles and artifacts)** can be reflected in the latent embeddings. In practice, junk particles are often easily identified in the latent embeddings and can then be filtered out. A jupyter notebook is provided to filter particle stacks.

What settings should I use for training cryoDRGN networks? Common hyperparameters when training a cryoDRGN model are: **1) the size of the neural network**, which controls the capacity of the model, **2) the input image size**, which bounds the resolution information and greatly impacts the training speed and **3) the latent variable dimension**, which is the bottleneck layer that bounds the expressiveness of the model. The three parameters together all affect the expressiveness/complexity of the learned model. After exploring many real datasets, we provide reasonable defaults and recommended settings of these parameters for training.

## Input data requirements

Extracted single particle images (in .mrcs/.cs/.star/.txt format), ideally clean from edge, ice, or hot pixel artifacts

For cryoDRGN1, a C1 consensus reconstruction with:

High-quality CTF parameters

High-quality image poses (also called particle alignments)

Image poses are not required for cryoDRGN2's

*ab initio*reconstruction tools

## Tutorial overview

See CryoDRGN EMPIAR-10076 tutorial for a step-by-step guide for running cryoDRGN. This walkthrough of cryoDRGN analysis of the **assembling ribosome dataset (EMPIAR-10076)** covers all steps used to reproduce the analysis in Zhong et al., including:

preprocessing of inputs,

initial cryoDRGN training and explanation of outputs,

particle filtering to remove junk particles,

high-resolution cryoDRGN training,

extracting particle subsets for traditional refinement, and

generation of trajectories.

For an abbreviated overview of the steps for running cryoDRGN, see the github README

A protocols paper that describes the analysis of the assembling ribosome dataset is now published. See Kinman*, Powell*, Zhong* et al.

## References

For a complete description of the method, see:

**CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks**Ellen D. Zhong, Tristan Bepler, Bonnie Berger*, Joseph H Davis* Nature Methods 2021, https://doi.org/10.1038/s41592-020-01049-4 [pdf]

An earlier version of this work appeared at the International Conference of Learning Representations (ICLR):

**Reconstructing continuous distributions of protein structure from cryo-EM images**Ellen D. Zhong, Tristan Bepler, Joseph H. Davis*, Bonnie Berger* ICLR 2020, Spotlight, https://arxiv.org/abs/1909.05215

CryoDRGN's *ab initio* reconstruction algorithms are described here:

**CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images**Ellen D. Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger International Conference on Computer Vision (ICCV) 2021, [paper]

A protocols paper that describes the analysis of the EMPIAR-10076 assembling ribosome dataset:

**Uncovering structural ensembles from single particle cryo-EM data using cryoDRGN**Laurel Kinman, Barrett Powell, Ellen D. Zhong*, Bonnie Berger*, Joseph H Davis* Nature Protocols 2023, https://doi.org/10.1038/s41596-022-00763-x

CryoDRGN-ET for heterogeneous subtomogram analysis:

**Deep reconstructing generative networks for visualizing dynamic biomolecules inside cells**Ramya Rangan, Sagar Khavnekar, Adam Lerer, Jake Johnston, Ron Kelley, Martin Obr, Abhay Kotecha, Ellen D. Zhong bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.08.18.553799v1

Last updated