Page cover image

CryoDRGN User Guide

welcome to cryoDRGN's detailed documentation!

This page contains user guides for the cryoDRGN šŸ‰ ā„ļø open-source software package, including in-depth tutorials for training and analyzing cryoDRGN models.

Quick Start

CryoDRGN can be installed with pip, and supports Python 3.9, 3.10 and 3.11. We recommend installing cryodrgn in a separate anaconda environment.

$ conda create --name cryodrgn-env python=3.10
$ conda activate cryodrgn-env
(cryodrgn-env) $ pip install cryodrgn

All cryoDRGN commands are accessed through the cryodrgn and cryodrgn_utils executables. Use the -h flag to display all available subcommands, and cryodrgn <command> -h to see the parameters for each command.

(cryodrgn-env) $ cryodrgn -h
(cryodrgn-env) $ cryodrgn_utils -h

See Installation for more details and advanced installation instructions.

Recent Highlights

  • Sep 2024: cryodrgn version 3.4.0 released! This version includes a new plot_classes utility, full support for RELION 3.1 .star files, and cryoSPARC-style phase-randomization applied to FSCs

  • April 2024: cryodrgn version 3.3.0 released: first non-beta release of cryoDRGN-ET, new command cryodrgn direct_traversal, improved testing, and cleaned-up jupyter notebooks

  • Sep 2023: cryoDRGN-ET for subtomogram analysis is now available in beta, as cryodrgn version 3.0.0-beta

  • Jun 2023: Documentation clean up, available here on gitbook

  • May 2023: cryodrgn version 2.3.0 released with improvements and fixes to ab initio tools.

  • Jan 2023: cryodrgn version 2.2.0 released with new ab initio reconstruction tools and more.

  • July 2022: cryodrgn version 1.1.0 released with updated default architecture

  • May 2022: cryodrgn version 1.0.0 released with new landscape analysis, cryodrgn_utils, and more

See News and Release Notes for additional details.

Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of discrete methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a continuous distribution of density maps parameterized by a coordinate-based neural network.

Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from Zhong et al 2021

The inputs to a cryoDRGN training run are 1) extracted particle images, 2) the CTF parameters associated with each particle, and, optionally, 3) poses for each particle from a C1 (asymmetric) 3D refinement. CryoDRGN2's ab initio reconstruction algorithms do not require input poses.

The final result of the software will be 1) latent embeddings for each particle image in the form of a real-valued vector (usually denoted with z, and output as a z.pkl file by the software), and 2) neural network weights modeling the distribution of density maps (parameterizing the function from zā†’V). Once trained, the software can reconstruct a 3D density map given a value of z.

How do you interpret the resulting distribution of structures? Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the cryodrgn analyze pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the cryodrgn analyze pipeline will produce 1) N density maps sampled from different regions of the latent space (N=20, by default), 2) continuous trajectories along the principal components axes of the latent space embeddings, and 3) visualizations of the latent space with PCA and UMAP.

CryoDRGN also provides interactive tools to further explore the learned ensemble, implemented as Jupyter notebooks with interactive widgets for visualizing the dataset, extracting particles, and generating more volumes. Additional tools are also available that can generate trajectories given user-defined endpoints and convert particle selections to .star files for further refinement in other tools. An overview of these functionalities will be demonstrated in the tutorial.

Furthermore, because the model is trained to reconstruct image heterogeneity, any non-structural image heterogeneity that is not captured by the image formation model (e.g. junk particles and artifacts) can be reflected in the latent embeddings. In practice, junk particles are often easily identified in the latent embeddings and can then be filtered out. A jupyter notebook is provided to filter particle stacks.

What settings should I use for training cryoDRGN networks? Common hyperparameters when training a cryoDRGN model are: 1) the size of the neural network, which controls the capacity of the model, 2) the input image size, which bounds the resolution information and greatly impacts the training speed and 3) the latent variable dimension, which is the bottleneck layer that bounds the expressiveness of the model. The three parameters together all affect the expressiveness/complexity of the learned model. After exploring many real datasets, we provide reasonable defaults and recommended settings of these parameters for training.

Input data requirements

  • Extracted single particle images (in .mrcs/.cs/.star/.txt format), ideally clean from edge, ice, or hot pixel artifacts

  • For cryoDRGN1, a C1 consensus reconstruction with:

    • High-quality CTF parameters

    • High-quality image poses (also called particle alignments)

  • Image poses are not required for cryoDRGN2's ab initio reconstruction tools

Tutorial overview

See CryoDRGN EMPIAR-10076 Tutorial for a step-by-step guide for running cryoDRGN. This walkthrough of cryoDRGN analysis of the assembling ribosome dataset (EMPIAR-10076) covers all steps used to reproduce the analysis in Zhong et al., including:

  1. preprocessing of inputs,

  2. initial cryoDRGN training and explanation of outputs,

  3. particle filtering to remove junk particles,

  4. high-resolution cryoDRGN training,

  5. extracting particle subsets for traditional refinement, and

  6. generation of trajectories.

For an abbreviated overview of the steps for running cryoDRGN, see the github README

A protocols paper that describes the analysis of the assembling ribosome dataset is now published. See Kinman*, Powell*, Zhong* et al.

SI Video 3 from Zhong et al 2021

References

For a complete description of the method, see:

An earlier version of this work appeared at the International Conference of Learning Representations (ICLR):

  • Reconstructing continuous distributions of protein structure from cryo-EM images Ellen D. Zhong, Tristan Bepler, Joseph H. Davis*, Bonnie Berger* ICLR 2020, Spotlight, https://arxiv.org/abs/1909.05215

CryoDRGN's ab initio reconstruction algorithms are described here:

  • CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images Ellen D. Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger International Conference on Computer Vision (ICCV) 2021, [paper]

A protocols paper that describes the analysis of the EMPIAR-10076 assembling ribosome dataset:

  • Uncovering structural ensembles from single particle cryo-EM data using cryoDRGN Laurel Kinman, Barrett Powell, Ellen D. Zhong*, Bonnie Berger*, Joseph H Davis* Nature Protocols 2023, https://doi.org/10.1038/s41596-022-00763-x

CryoDRGN-ET for heterogeneous subtomogram analysis:

Last updated