CryoDRGN-ET Subtomogram Analysis

how to perform heterogeneous reconstruction using cryo-ET subtomograms

Now available as part of a production/stable release in cryodrgn version 3.3.0! See News and Release Notes

CryoDRGN-ET for subtomogram analysis has been made available as of version 3.0.0+ using additional flags passed to the train_vae command:

cryodrgn train_vae particles_from_M.star --encode-mode tilt --dose-per-tilt 2.93 --angle-per-tilt 3.0 --ctf ctf.pkl --poses pose.pkl --datadir /data/subtiltstacks/ --zdim 8 -n 50 --beta 0.025 -o my_output_directory/

Note that --encode-mode tilt as well as a given value for --dose-per-tilt are required to activate subtomogram analysis, while --angle-per-tilt is optional with a default value of 3.

We describe here a typical workflow for preparing tilt series inputs for use with cryoDRGN heterogeneous reconstruction, training a reconstruction model, and analyzing its outputs. See our preprint here for a description of the cryoDRGN-ET method and associated results.

Preprocessing

Export particles

CryoDRGN-ET expects 2D particle tilt series images in a .star file exported from Windows Warp/M or RELION5 --tomo. From RELION5, 2D particle tilt series images without CTF premultiplication should be extracted using the --no_ctf option . We are actively working to support CTF-corrected images exported from WarpTools but this data type is not currently supported.

Windows Warp/M

If you intend to perform subvolume refinement or tomogram visualization, we recommend also exporting highly binned subvolumes to generate a 3D star file at this stage, since additional Warp/M processing could result in particle reordering (!), which would affect downstream refinement if filtering with cryoDRGN-ET.

Example settings for exporting unbinned subtilts from M
Example settings for exporting binned subvolumes from M

Example 2D Warp/M star file:

RELION5

Export refined particles as 2D stacks using the --no_ctf additional argument. It is important to consider using a larger box to prevent CTF aliasing.

Although RELION5 exports 2D particle stacks rather than subvolumes, the resulting star file refers to the 3D tomogram coordinates. Following RELION5 extraction, 3D particle star files need to be converted to the 2D format used for cryoDRGN-ET.

Run the RELION5 3D to 2D conversion parse_relion in your RELION5 project directory. The conversion utility requires inputs of a particles.star file, the associated tomograms.star file, and the raw tilt dimensions. The tiltseries.star files are read from the relative paths provided in the tomograms.star file. To run the conversion outside of a project directory, create a sym link to make the tiltseries.star relative paths accessible.

(cryodrgn) $ cryodrgn_utils parse_relion -t Polish/jobxxx/tomograms.star -p Extract/jobxxx/particles.star --tilt-dim 4096 4096 -o particles_2d.star

Prepare cryoDRGN-ET input files

Here we will assume our tilt series images have been exported to the file particles_2d.star, and that we have already loaded a conda environment named cryodrgn with cryoDRGN v3.5.1+ installed. We will need to extract separate .pkl files containing CTF parameters and pose estimates from this .star file for use with cryoDRGN commands.

  1. Downsample images (if necessary)

To optimize cryoDRGN training, you may first want to consider downsampling your images to a more manageable size. The downsampled pixel size should be chosen based on the quality of the consensus reconstruction and the resolution required to observe the types of changes you anticipate in your data. For example, A 6Å Nyquist limit should be sufficient to observe changes in secondary structure. For cryo-ET data, a pixel size of up to 10Å may be useful for particle classification based on low resolution features.

Downsampled particle images need to maintain the same stack organization and ordering to prevent metadata mismatch. This is handled automatically by cryodrgn downsample when a star file is used as the input.

(cryodrgn) $ cryodrgn downsample particles_2d.star -D 128 -o particles_2d.128.star --outdir downsampled_128

For subsequent commands, the star file will contain relative paths to the provided outdir location. Alternatively, you can use the full path to downsampled_128/ with the --datadir argument (and the same .star file) to use these downsampled subtilts from a different working directory.

Particles may also be downsampled at the time of extraction in RELION5 or Warp/M.

  1. Parse additional pose and CTF information

Obtaining the additional CTF parameter and pose estimate files can be done using the utility command installed as part of cryoDRGN:

(cryodrgn) $ cryodrgn parse_star particles_2d.star --ctf ctf.pkl --poses pose.pkl --Apix 1.54 -D 280

The command should be run on the Warp/M or converted RELION5 2D star file. Do not parse pose and ctf information from the cryodrgn downsample star file.

Note that this command requires you to specify the original box size and Å/pixel values if these are not listed in the .star file under fields such as _rlnImageSize and _rlnImagePixelSize. The values should match the output of particle extraction. If particles were downsampled during extraction the downsampled pixel and box size should be used however, if cryodrgn downsample was used instead, then the original box and pixel size should be provided.

  1. Perform a sanity check using backprojection

We can confirm our inputs were correctly parsed using traditional homogeneous reconstruction. This step will run 10x faster on a GPU compute node! Note that this command also requires extra metadata on how the tilts were collected.

Particles must have tilts ≥ --ntilts (10 tilts by default). If this is not the case, you can generate an indices file to exclude particles with tilts < --ntilts by providing the --force-ntilts argument at this stage. An indices .pkl file will be written to the backprojection output directory which can also be used for the subsequent training job.

(cryodrgn) $ cryodrgn backproject_voxel particles_2d.128.star --ctf ctf.pkl --poses pose.pkl --dose-per-tilt 3 -o 00_backproject_128

Note: For large datasets or limited GPU ram, the --lazy argument will prevent out of memory errors.

CryoDRGN-ET training

Once we have obtained our input files, we are ready to train the reconstruction model on our tilt series. Here we use an example command using the files described above:

(cryodrgn) $ cryodrgn train_vae particles_2d.128.star --ctf ctf.pkl --poses pose.pkl  --encode-mode tilt --dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o 01_trainvae_128/

In particular:

  • --encode-mode tilt is required to properly treat tilt series data

  • In our current experiments, we use a KL regularization weight of --beta 0.025. We recommend this setting as a starting point for all tilt series experiments!

  • --dose-per-tilt and --angle-per-tilt are used for dose exposure correction. The default value of --angle-per-tilt is 3 degrees and is left off of the example command.

Training a model on 16,655 particles for 50 epochs on 1 A100 GPU took 3h, 38min.

Analysis

Once a cryoDRGN-ET model has finished training, use cryodrgn analyze to visualize the latent space and generate volumes.

(cryodrgn) $ cryodrgn analyze output_directory 50 # or replace with a different epoch number

This portion of the analysis is similar to the workflow in single particle cryodrgn. See the EMPIAR-10076 tutorial for further documentation.

Additional tools

Landscape analysis: The commands cryodrgn analyze_landscape and cryodrgn analyze_landscape_full can be used for further analysis on the landscape of reconstructed volumes (as opposed to the landscape of latent space co-ordinates). See the cryoDRGN landscape analysis tutorial for more information.

Particle selection: We implemented a standalone filtering tool to enable lasso selection from the UMAP representation outside of the Jupyter notebook. Run cryodrgn filter . from your results directory to launch an interactive plot in an X11 window. If running remotely you must be connected with ssh -Y

Star file filtering: Particle indices identified by cryoDRGN-ET can be used to filter 2D subtilt star files from Warp/M or 3D subvolume star files for downstream proccessing or visualization in ArtiaX. Filtering a 3D subvolume star file with cryodrgn_utils filter_star using the --micrograph-files or -m option produces a directory containing one star file per tomogram.

Experiment Workflow

Similarly to the reconstruction workflow for single particle analysis, we recommend an iterative process for training successive cryoDRGN models on a new dataset:

  1. First, train on downsampled images (e.g. D=128) as an initial pass to sanity check results and remove junk particles:

(cryodrgn) $ cryodrgn train_vae particles_2d.128.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled_128/ --encode-mode tilt -dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/01_trainvae_128
  1. Validate selection and junk classifications with backprojections:

(cryodrgn) $ cryodrgn backproject_voxel particles_2d_128.star --poses pose.pkl --ctf ctf.pkl --tilt --ntilts 10 --dose-per-tilt 3 -o 02_backproject_128_selected_particles --lazy --ind 01_trainvae_128/selected_particles.pkl
  1. After validating selected particles (--ind selected_particles.pkl), train a new model excluding any junk:

(cryodrgn) $ cryodrgn train_vae particles_2d_128.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled_128/ --encode-mode tilt --ind selected_particles.pkl --dose-per-tilt 3 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/03_trainvae_selected.128
  1. Analyze heterogeneity and separate discrete states using:

    • Clustering methods available in output_dir/analyze.50/cryoDRGN_ET_viz.ipynb

    • Standalone lasso GUI cryodrgn filter

  2. Optional - Filter original 3D star file with cryodrgn selected particles for further refinement in RELION/M:

(cryodrgn) $ cryodrgn_utils filter_star Extract/jobxxx/particles.star --ind 01_trainvae_128/sel_ind.pkl -o cryodrgn_particles.star
  1. Optional - Train a new cryoDRGN model on higher resolution images to better resolve conformational heterogeneity:

(cryodrgn) $ cryodrgn train_vae particles_2d.star --ctf ctf.pkl --poses pose.pkl --datadir Extract/jobxxx/Micrographs/ --encode-mode tilt -dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/04_trainvae_256

Feedback

Please file a github issue or contact Ellen (zhonge@princeton.edu) with any questions or feedback!

Last updated