CryoDRGN-ET Subtomogram Analysis
how to perform heterogeneous reconstruction using cryo-ET subtomograms
Last updated
how to perform heterogeneous reconstruction using cryo-ET subtomograms
Last updated
Now available as part of a production/stable release in cryodrgn version 3.3.0! See News and Release Notes
CryoDRGN-ET for subtomogram analysis has been made available as of version 3.0.0+ using additional flags passed to the train_vae
command:
Note that --encode-mode tilt
as well as a given value for --dose-per-tilt
are required to activate subtomogram analysis, while --angle-per-tilt
is optional with a default value of 3.
We describe here a typical workflow for preparing tilt series inputs for use with cryoDRGN heterogeneous reconstruction, training a reconstruction model, and analyzing its outputs. See our preprint here for a description of the cryoDRGN-ET method and associated results.
Export particles
cryoDRGN-ET expects 2D particle tilt series images in a .star
file exported from Windows Warp/M. 2D particle tilt series images without CTF premultiplication can be extracted in RELION5 using the --no_ctf
option however, RELION5 and Linux Warp/M star files are not currently supported.
If you intend perform subvolume refinement or tomogram visualization, we recommend also exporting highly binned subvolumes to generate 3D star file at this stage since additional Warp/M processing could result in particle reordering.
Prepare cryoDRGN-ET input files
Here we will assume our tilt series images have been exported to the file particles_from_M.star
, and that we have already loaded a conda environment named cryodrgn
with cryoDRGN installed. We will need to extract separate .pkl
files containing CTF parameters and pose estimates from this .star
file for use with cryoDRGN commands.
Downsample images (if necessary)
To reduce cryoDRGN runtimes, you may first want to consider downsampling your images to a more manageable size. This must be done for each of the image stacks referenced in the .star
file individually.
For example, if /data/subtiltstacks/
is the --datadir
containing these stacks, you can use the following bash command to downsample each file in this directory to a size of 128x128 and store the new stacks in a new directory downsampled-128/
:
For subsequent commands you can now use downsampled-128/
with the --datadir
argument (and the same .star
file) to use these downsampled subtilts instead.
Parse additional pose and CTF information
Obtaining the additional CTF parameter and pose estimate files can be done using the utility commands installed as part of cryoDRGN:
Note that these commands require you to specify the original resolution and A/px values if these are not listed in the .star
file under fields such as _rlnImageSize
and _rlnImagePixelSize
.
Perform a sanity check using backprojection
We can confirm our inputs were correctly parsed using traditional homogeneous reconstruction. This step will run 10x faster on a GPU compute node! Note that this command also requires extra metadata on how the tilts were collected.
Once we have obtained our input files, we are ready to train the reconstruction model on our tilt series. Here we use an example command using the files described above:
In particular:
--encode-mode tilt
is required to properly treat tilt series data
In our current experiments, we use a KL regularization weight of --beta 0.025
. We recommend this setting as a starting point for all tilt series experiments!
--dose-per-tilt
and --angle-per-tilt
are used for dose exposure correction. The default value of --angle-per-tilt
is 3 degrees and is left off of the example command.
Training a model on 16,655 particles for 50 epochs on 1 A100 GPU took 3h, 38min.
Once a cryoDRGN-ET model has finished training, use cryodrgn analyze
to visualize the latent space and generate volumes.
This portion of the analysis is similar to the workflow in single particle cryodrgn. See the EMPIAR-10076 tutorial for further documentation.
Landscape analysis: The commands cryodrgn analyze_landscape
and cryodrgn analyze_landscape_full
can be used for further analysis on the landscape of reconstructed volumes (as opposed to the landscape of latent space co-ordinates). See the cryoDRGN landscape analysis tutorial for more information.
Particle selection: We implemented a standalone filtering tool to enable lasso selection from the UMAP representation outside of the Jupyter notebook. Run cryodrgn filter .
from your results directory to launch an interactive plot in an X11 window. If running remotely you must be connected with ssh -Y
Star file filtering: Particle indices identified by cryoDRGN-ET can be used to filter 2D subtilt star files from Warp/M or 3D subvolume star files for downstream proccessing or visualization in ArtiaX. Filtering a 3D subvolume star file with cryodrgn_utils filter_star
using the --micrograph-files
or -m
option produces a directory containing one star file per tomogram.
Similarly to the reconstruction workflow for single particle analysis, we recommend an iterative process for training successive cryoDRGN models on a new dataset:
First, train on lower resolution images (e.g. D=128) using a relatively small architecture (fast) as an initial pass to sanity check results and remove junk particles:
After creating a particle filter (--ind chosen_particles.pkl
), train a larger model with the --enc-dim 1024
and --dec-dim 1024
arguments, which will have more parameters and can potentially learn more heterogeneity:
Optional - Filter 3D star file for further refinement in RELION/M. Export new 2D and 3D subtomograms.
Finally, after validation, pose optimization, and any necessary particle filtering, train on the full resolution image stack (up to D=256) with a large architecture:
Please file a github issue or contact Ellen (zhonge@princeton.edu) with any questions or feedback!