CryoDRGN User Guide
welcome to cryoDRGN's detailed documentation!
This page contains user guides for the cryoDRGN š āļø open-source software package, including in-depth tutorials for training and analyzing cryoDRGN models.
Quick Start
CryoDRGN can be installed with pip
, and supports Python 3.9, 3.10 and 3.11. We recommend installing cryodrgn
in a separate anaconda environment.
All cryoDRGN commands are accessed through the cryodrgn
and cryodrgn_utils
executables. Use the -h
flag to display all available subcommands, and cryodrgn <command> -h
to see the parameters for each command.
See Installation for more details and advanced installation instructions.
Recent Highlights
Sep 2024: cryodrgn version 3.4.0 released! This version includes a new
plot_classes
utility, full support for RELION 3.1 .star files, and cryoSPARC-style phase-randomization applied to FSCsApril 2024: cryodrgn version 3.3.0 released: first non-beta release of cryoDRGN-ET, new command
cryodrgn direct_traversal
, improved testing, and cleaned-up jupyter notebooksSep 2023: cryoDRGN-ET for subtomogram analysis is now available in beta, as cryodrgn version 3.0.0-beta
Jun 2023: Documentation clean up, available here on gitbook
May 2023: cryodrgn version 2.3.0 released with improvements and fixes to ab initio tools.
Jan 2023: cryodrgn version 2.2.0 released with new ab initio reconstruction tools and more.
July 2022: cryodrgn version 1.1.0 released with updated default architecture
May 2022: cryodrgn version 1.0.0 released with new landscape analysis, cryodrgn_utils, and more
See News and Release Notes for additional details.
Background
CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of discrete methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a continuous distribution of density maps parameterized by a coordinate-based neural network.
The inputs to a cryoDRGN training run are 1) extracted particle images, 2) the CTF parameters associated with each particle, and, optionally, 3) poses for each particle from a C1 (asymmetric) 3D refinement. CryoDRGN2's ab initio reconstruction algorithms do not require input poses.
The final result of the software will be 1) latent embeddings for each particle image in the form of a real-valued vector (usually denoted with z, and output as a z.pkl
file by the software), and 2) neural network weights modeling the distribution of density maps (parameterizing the function from zāV). Once trained, the software can reconstruct a 3D density map given a value of z.
How do you interpret the resulting distribution of structures? Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the cryodrgn analyze
pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the cryodrgn analyze
pipeline will produce 1) N density maps sampled from different regions of the latent space (N=20, by default), 2) continuous trajectories along the principal components axes of the latent space embeddings, and 3) visualizations of the latent space with PCA and UMAP.
CryoDRGN also provides interactive tools to further explore the learned ensemble, implemented as Jupyter notebooks with interactive widgets for visualizing the dataset, extracting particles, and generating more volumes. Additional tools are also available that can generate trajectories given user-defined endpoints and convert particle selections to .star
files for further refinement in other tools. An overview of these functionalities will be demonstrated in the tutorial.
Furthermore, because the model is trained to reconstruct image heterogeneity, any non-structural image heterogeneity that is not captured by the image formation model (e.g. junk particles and artifacts) can be reflected in the latent embeddings. In practice, junk particles are often easily identified in the latent embeddings and can then be filtered out. A jupyter notebook is provided to filter particle stacks.
What settings should I use for training cryoDRGN networks? Common hyperparameters when training a cryoDRGN model are: 1) the size of the neural network, which controls the capacity of the model, 2) the input image size, which bounds the resolution information and greatly impacts the training speed and 3) the latent variable dimension, which is the bottleneck layer that bounds the expressiveness of the model. The three parameters together all affect the expressiveness/complexity of the learned model. After exploring many real datasets, we provide reasonable defaults and recommended settings of these parameters for training.
Input data requirements
Extracted single particle images (in .mrcs/.cs/.star/.txt format), ideally clean from edge, ice, or hot pixel artifacts
For cryoDRGN1, a C1 consensus reconstruction with:
High-quality CTF parameters
High-quality image poses (also called particle alignments)
Image poses are not required for cryoDRGN2's ab initio reconstruction tools
Tutorial overview
See CryoDRGN EMPIAR-10076 Tutorial for a step-by-step guide for running cryoDRGN. This walkthrough of cryoDRGN analysis of the assembling ribosome dataset (EMPIAR-10076) covers all steps used to reproduce the analysis in Zhong et al., including:
preprocessing of inputs,
initial cryoDRGN training and explanation of outputs,
particle filtering to remove junk particles,
high-resolution cryoDRGN training,
extracting particle subsets for traditional refinement, and
generation of trajectories.
For an abbreviated overview of the steps for running cryoDRGN, see the github README
A protocols paper that describes the analysis of the assembling ribosome dataset is now published. See Kinman*, Powell*, Zhong* et al.
References
For a complete description of the method, see:
CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks Ellen D. Zhong, Tristan Bepler, Bonnie Berger*, Joseph H Davis* Nature Methods 2021, https://doi.org/10.1038/s41592-020-01049-4 [pdf]
An earlier version of this work appeared at the International Conference of Learning Representations (ICLR):
Reconstructing continuous distributions of protein structure from cryo-EM images Ellen D. Zhong, Tristan Bepler, Joseph H. Davis*, Bonnie Berger* ICLR 2020, Spotlight, https://arxiv.org/abs/1909.05215
CryoDRGN's ab initio reconstruction algorithms are described here:
CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images Ellen D. Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger International Conference on Computer Vision (ICCV) 2021, [paper]
A protocols paper that describes the analysis of the EMPIAR-10076 assembling ribosome dataset:
Uncovering structural ensembles from single particle cryo-EM data using cryoDRGN Laurel Kinman, Barrett Powell, Ellen D. Zhong*, Bonnie Berger*, Joseph H Davis* Nature Protocols 2023, https://doi.org/10.1038/s41596-022-00763-x
CryoDRGN-ET for heterogeneous subtomogram analysis:
Deep reconstructing generative networks for visualizing dynamic biomolecules inside cells Ramya Rangan, Sagar Khavnekar, Adam Lerer, Jake Johnston, Ron Kelley, Martin Obr, Abhay Kotecha, Ellen D. Zhong bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.08.18.553799v1
Last updated