CryoDRGN-ET Subtomogram Analysis

how to perform heterogeneous reconstruction using cryo-ET subtomograms

Now available as part of a production/stable release in cryodrgn version 3.3.0! See News and Release Notes

CryoDRGN-ET for subtomogram analysis has been made available as of version 3.0.0+ using additional flags passed to the train_vae command:

cryodrgn train_vae particles_from_M.star --encode-mode tilt --dose-per-tilt 2.93 --angle-per-tilt 3.0 --ctf ctf.pkl --poses pose.pkl --datadir /data/subtiltstacks/ --zdim 8 -n 50 --beta 0.025 -o my_output_directory/

Note that --encode-mode tilt as well as a given value for --dose-per-tilt are required to activate subtomogram analysis, while --angle-per-tilt is optional with a default value of 3.

We describe here a typical workflow for preparing tilt series inputs for use with cryoDRGN heterogeneous reconstruction, training a reconstruction model, and analyzing its outputs. See our preprint here for a description of the cryoDRGN-ET method and associated results.

Preprocessing

Export particles

CryoDRGN-ET expects 2D particle tilt series images in a .star file exported from Windows Warp/M. From RELION5, 2D particle tilt series images without CTF premultiplication can be extracted using the --no_ctf option. We are currently beta-testing a conversion script, please contact us if you are interested in testing cryoDRGN-ET with particles exported from RELION5 --tomo . Additionally, we are actively working to support CTF-corrected images exported from WarpTools.

If you intend to perform subvolume refinement or tomogram visualization, we recommend also exporting highly binned subvolumes to generate a 3D star file at this stage, since additional Warp/M processing could result in particle reordering (!), which would affect downstream refinement if filtering with cryoDRGN-ET.

Prepare cryoDRGN-ET input files

Here we will assume our tilt series images have been exported to the file particles_from_M.star, and that we have already loaded a conda environment named cryodrgn with cryoDRGN installed. We will need to extract separate .pkl files containing CTF parameters and pose estimates from this .star file for use with cryoDRGN commands.

Example 2D star file:

13KB

FAS_fromM_subtilts.star

Downsample images (if necessary)

To reduce cryoDRGN runtimes, you may first want to consider downsampling your images to a more manageable size. This must be done for each of the image stacks referenced in the .star file individually.

For example, if /data/subtiltstacks/ is the --datadir containing these stacks, you can use the following bash command to downsample each file in this directory to a size of 128x128 and store the new stacks in a new directory downsampled-128/:

(cryodrgn) $ cd /data/subtiltstacks/; for fl in *mrcs; do cryodrgn downsample $fl -o downsampled-128/$fl -D 128; done

For subsequent commands you can now use downsampled-128/ with the --datadir argument (and the same .star file) to use these downsampled subtilts instead.

Parse additional pose and CTF information

Obtaining the additional CTF parameter and pose estimate files can be done using the utility commands installed as part of cryoDRGN:

(cryodrgn) $ cryodrgn parse_pose_star particles_from_M.star -o pose.pkl -D 256 --Apix 1.96
(cryodrgn) $ cryodrgn parse_ctf_star particles_from_M.star -o ctf.pkl -D 256 --Apix 1.96

Note that these commands require you to specify the original resolution and A/px values if these are not listed in the .star file under fields such as _rlnImageSize and _rlnImagePixelSize.

Perform a sanity check using backprojection

We can confirm our inputs were correctly parsed using traditional homogeneous reconstruction. This step will run 10x faster on a GPU compute node! Note that this command also requires extra metadata on how the tilts were collected.

(cryodrgn) $ cryodrgn backproject_voxel particles_from_M.star --datadir /data/subtiltstacks/ --ctf ctf.pkl --poses pose.pkl --dose-per-tilt 2.93 -o reconstruct.mrc

CryoDRGN-ET training

Once we have obtained our input files, we are ready to train the reconstruction model on our tilt series. Here we use an example command using the files described above:

(cryodrgn) $ cryodrgn train_vae particles_from_M.star --ctf ctf.pkl --poses pose.pkl --datadir /data/subtiltstacks/ --encode-mode tilt --dose-per-tilt 2.93 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/

In particular:

--encode-mode tilt is required to properly treat tilt series data
In our current experiments, we use a KL regularization weight of --beta 0.025. We recommend this setting as a starting point for all tilt series experiments!
--dose-per-tilt and --angle-per-tilt are used for dose exposure correction. The default value of --angle-per-tilt is 3 degrees and is left off of the example command.

Training a model on 16,655 particles for 50 epochs on 1 A100 GPU took 3h, 38min.

Analysis

Once a cryoDRGN-ET model has finished training, use cryodrgn analyze to visualize the latent space and generate volumes.

(cryodrgn) $ cryodrgn analyze output_directory 49 # or replace with a different epoch number

This portion of the analysis is similar to the workflow in single particle cryodrgn. See the EMPIAR-10076 tutorial for further documentation.

Additional tools

Landscape analysis: The commands cryodrgn analyze_landscape and cryodrgn analyze_landscape_full can be used for further analysis on the landscape of reconstructed volumes (as opposed to the landscape of latent space co-ordinates). See the cryoDRGN landscape analysis tutorial for more information.

Particle selection: We implemented a standalone filtering tool to enable lasso selection from the UMAP representation outside of the Jupyter notebook. Run cryodrgn filter . from your results directory to launch an interactive plot in an X11 window. If running remotely you must be connected with ssh -Y

Star file filtering: Particle indices identified by cryoDRGN-ET can be used to filter 2D subtilt star files from Warp/M or 3D subvolume star files for downstream proccessing or visualization in ArtiaX. Filtering a 3D subvolume star file with cryodrgn_utils filter_star using the --micrograph-files or -m option produces a directory containing one star file per tomogram.

Experiment Workflow

Similarly to the reconstruction workflow for single particle analysis, we recommend an iterative process for training successive cryoDRGN models on a new dataset:

First, train on lower resolution images (e.g. D=128) using a relatively small architecture (fast) as an initial pass to sanity check results and remove junk particles:

(cryodrgn) $ cryodrgn train_vae particles_from_M.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled-128/ --encode-mode tilt --dec-dim=256 --enc-dim=256 --dose-per-tilt 2.93 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/001_small.128

After creating a particle filter (--ind chosen_particles.pkl), train a larger model with the --enc-dim 1024 and --dec-dim 1024 arguments, which will have more parameters and can potentially learn more heterogeneity:

(cryodrgn) $ cryodrgn train_vae particles_from_M.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled-128/ --encode-mode tilt --ind chosen_particles.pkl --dec-dim=1024 --enc-dim=1024 --dose-per-tilt 2.93 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/002_big.128

Optional - Filter 3D star file for further refinement in RELION/M. Export new 2D and 3D subtomograms.
Finally, after validation, pose optimization, and any necessary particle filtering, train on the full resolution image stack (up to D=256) with a large architecture:

(cryodrgn) $ cryodrgn train_vae particles_from_M.star --ctf ctf.pkl --poses pose.pkl --datadir /data/subtiltstacks/ --encode-mode tilt --ind chosen_particles.pkl --dec-dim=1024 --enc-dim=1024 --dose-per-tilt 2.93 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/003_big.256

Feedback

Please file a github issue or contact Ellen (zhonge@princeton.edu) with any questions or feedback!

PreviousCryoDRGN2 Ab Initio Reconstruction NextCryoDRGN Conformational Landscape Analysis

Last updated 2 months ago