ab-initio reconstruction with the assembling ribosome dataset using cryoDRGN-AI
Here we provide a walkthrough of ab-initio analysis of the assembling ribosome dataset (EMPIAR-10076) from Figure 5 of Zhong et al. done using the cryodrgn abinit command.
We will follow the general recommended workflow for cryoDRGN training:
First, train on lower resolution images (e.g. D=128) using the default architecture (fast) as an initial pass to sanity check results and remove junk particles. (D=128, smaller 256x3 architecture)
After any particle filtering, then train a larger model or longer model with the --dim 1024 and --epochs-pose-search=5 arguments — these can potentially learn more heterogeneity. (D=128, larger 1024x3 architecture)
Finally, after validation, pose optimization, and any necessary particle filtering, then train on the full resolution image stack (up to D=256) with a large architecture. (D=256, larger 1024x3 architecture)
For a full step-by-step tutorial that includes all of the preprocessing steps required to prepare an input dataset for analysis with cryoDRGN, see the original tutorial, which is broadly similar to the material below but done using the cryodrgn train_vae command.
1) Initial CryoDRGN-AI training
We begin by running the default architecture, which is designed to run relatively quickly to provide an initial pass for checking results and filtering particles. We assume here that we have downsampled our particles to D=128 at particles.128.txt:
Note that we are also using the --multigpu argument so that the command uses all four GPUs available on our compute node!
With 4 H100 GPUs, we were able to finish training this model in forty minutes, with pose search epochs taking roughly 15 minutes each and SGD epochs taking about 30 seconds each. This information — and much else! — is available in the log file run.log saved by cryoDRGN in the output folder:
50S_abinit/001_init.128/run.log
...
# [Train Epoch: 30/30] [104960/131899 particles]
# [Train Epoch: 30/30] [115200/131899 particles]
# [Train Epoch: 30/30] [124928/131899 particles]
# =====> SGD Epoch: 30 finished in 0:00:24.261025; total loss = 0.650518
Finished in 0:40:40.433245 (0:01:21.347775 per epoch)
Analyzing cryoDRGN-AI results
The abinit command, like the other reconstruction commands included in cryoDRGN, runs cryodrgn analyze on the final output epoch once model training is complete. The output of these analyses can be found in our cryoDRGN experiments folder under analyze.30/:
These outputs are the same as for commands such as cryodrgn train_vae and are fully described in our main tutorial. Here we note that our ab-initio model still identifies heterogeneity, as shown in the visualization of the z-latent-space generated by the model:
z-latent-space UMAP embeddings found at analyze.30/kmeans20/umap_hex.png
The annotated points in this visualization identify images chosen by a k-means algorithm as centroids of clusters in the z-latent-space. We can use ChimeraX to visualize the corresponding volumes that have been reconstructed by cryoDRGN-AI for these centroid images and are saved in the same place:
2) Retraining with more pose search epochs and filtering
Let's use the cryodrgn filter tool to interactively select one of the clusters identified in the z-latent-space and create a filtering index: