# CryoDRGN-ET Subtomogram Analysis

{% hint style="info" %}
Now available as part of a production/stable release in cryodrgn version 3.3.0! See [news-and-release-notes](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-user-guide/news-and-release-notes "mention")
{% endhint %}

CryoDRGN-ET for subtomogram analysis has been made available as of version 3.0.0+ using additional flags passed to the `train_vae` command:

{% code overflow="wrap" %}

```
cryodrgn train_vae particles_from_M.star --encode-mode tilt --dose-per-tilt 2.93 --angle-per-tilt 3.0 --ctf ctf.pkl --poses pose.pkl --datadir /data/subtiltstacks/ --zdim 8 -n 50 --beta 0.025 -o my_output_directory/
```

{% endcode %}

Note that `--encode-mode tilt` as well as a given value for `--dose-per-tilt` are required to activate subtomogram analysis, while `--angle-per-tilt` is optional with a default value of 3.

We describe here a typical workflow for preparing tilt series inputs for use with cryoDRGN heterogeneous reconstruction, training a reconstruction model, and analyzing its outputs. See our preprint [here](https://www.biorxiv.org/content/10.1101/2023.08.18.553799v1) for a description of the cryoDRGN-ET method and associated results.

## Preprocessing

### **Export particles**

CryoDRGN-ET expects 2D *particle* tilt series images in a `.star` file exported from Windows Warp/M or RELION5 `--tomo`. From RELION5, 2D particle tilt series images without CTF premultiplication should be extracted using the `--no_ctf` option . We are actively working to support CTF-corrected images exported from WarpTools but this data type is not currently supported.&#x20;

#### Windows Warp/M

If you intend to perform subvolume refinement or tomogram visualization, we recommend also exporting highly binned subvolumes to generate a 3D star file at this stage, since additional Warp/M processing could result in particle reordering (!), which would affect downstream refinement if filtering with cryoDRGN-ET. &#x20;

<div><figure><img src="https://3487200171-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fz7E99j3jJUSLJDvU43U6%2Fuploads%2FD1gn3el4qeVMIdj3Kh35%2Fimage.png?alt=media&#x26;token=bc13ded5-0d7a-46a3-bd83-129e81d0e820" alt="" width="188"><figcaption><p>Example settings for exporting<br> unbinned subtilts from M</p></figcaption></figure> <figure><img src="https://3487200171-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fz7E99j3jJUSLJDvU43U6%2Fuploads%2FYcYd0qROh5m58qK4Tey2%2Fimage.png?alt=media&#x26;token=6e1098da-fb49-4ed1-a2b1-733112bc8c8f" alt="" width="188"><figcaption><p>Example settings for exporting <br>binned subvolumes from M</p></figcaption></figure></div>

Example 2D Warp/M star file:

{% file src="<https://3487200171-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fz7E99j3jJUSLJDvU43U6%2Fuploads%2FsZoFpTHSoEfVHMzwhBw6%2FFAS_fromM_subtilts.star?alt=media&token=0dd8d831-9818-411c-a49b-33d6decde99e>" %}

#### RELION5

Export refined particles as 2D stacks using the `--no_ctf` additional argument. It is important to consider using a larger box to prevent CTF aliasing.&#x20;

<figure><img src="https://3487200171-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fz7E99j3jJUSLJDvU43U6%2Fuploads%2FhQEjp3sX35BPg4dvLTUy%2Fimage.png?alt=media&#x26;token=e7078f56-9424-468c-bf81-1e7a5d2a35bb" alt=""><figcaption></figcaption></figure>

Although RELION5 exports 2D particle stacks rather than subvolumes, the resulting star file refers to the 3D tomogram coordinates. Following RELION5 extraction, 3D particle star files need to be converted to the 2D format used for cryoDRGN-ET.&#x20;

Run the RELION5 3D to 2D conversion `parse_relion` in your RELION5 project directory. The conversion utility requires inputs of a `particles.star` file,  the associated `tomograms.star`  file, and the raw tilt dimensions. The `tiltseries.star` files are read from the relative paths provided in the `tomograms.star` file. To run the conversion outside of a project directory, create a sym link to make the `tiltseries.star` relative paths accessible.&#x20;

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn_utils parse_relion -t Polish/jobxxx/tomograms.star -p Extract/jobxxx/particles.star --tilt-dim 4096 4096 -o particles_2d.star
```

{% endcode %}

### **Prepare cryoDRGN-ET input files**

Here we will assume our tilt series images have been exported to the file `particles_2d.star`, and that we have already loaded a conda environment named `cryodrgn` with cryoDRGN v3.5.1+ installed. We will need to extract separate `.pkl` files containing CTF parameters and pose estimates from this `.star` file for use with cryoDRGN commands.

1. **Downsample images (if necessary)**

To optimize cryoDRGN training, you may first want to consider downsampling your images to a more manageable size. The downsampled pixel size should be chosen based on the quality of the consensus reconstruction and the resolution required to observe the types of changes you anticipate in your data. For example, A 6Å Nyquist limit should be sufficient to observe changes in secondary structure. For cryo-ET data, a pixel size of up to 10Å may be useful for particle classification based on low resolution features.&#x20;

Downsampled particle images need to maintain the same stack organization and ordering to prevent metadata mismatch. This is handled automatically by `cryodrgn downsample` when a star file is used as the input.&#x20;

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn downsample particles_2d.star -D 128 -o particles_2d.128.star --outdir downsampled_128
```

{% endcode %}

For subsequent commands, the star file will contain relative paths to the provided `outdir` location. Alternatively, you can use the full path to `downsampled_128/` with the `--datadir` argument (and the same `.star` file) to use these downsampled subtilts from a different working directory.

Particles may also be downsampled at the time of extraction in RELION5 or Warp/M.

2. **Parse additional pose and CTF information**

Obtaining the additional CTF parameter and pose estimate files can be done using the utility command installed as part of cryoDRGN:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn parse_star particles_2d.star --ctf ctf.pkl --poses pose.pkl --Apix 1.54 -D 280
```

{% endcode %}

**The command should be run on the Warp/M or converted RELION5 2D star file.** Do not parse pose and ctf information from the `cryodrgn downsample` star file.&#x20;

Note that this command requires you to specify the original box size and Å/pixel values if these are not listed in the `.star` file under fields such as `_rlnImageSize` and `_rlnImagePixelSize`. **The values should match the output of particle extraction.**  If particles were downsampled during extraction the downsampled pixel and box size should be used however, if `cryodrgn downsample` was used instead, then the original box and pixel size should be provided.&#x20;

3. **Perform a sanity check using backprojection**

We can confirm our inputs were correctly parsed using traditional homogeneous reconstruction. This step will run 10x faster on a GPU compute node! Note that this command also requires extra metadata on how the tilts were collected.&#x20;

Particles must have tilts ≥ `--ntilts` (10 tilts by default). If this is not the case, you can generate an indices file to exclude particles with tilts < `--ntilts` by providing the `--force-ntilts` argument at this stage. An indices .pkl file will be written to the backprojection output directory which can also be used for the subsequent training job.&#x20;

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn backproject_voxel particles_2d.128.star --ctf ctf.pkl --poses pose.pkl --dose-per-tilt 3 -o 00_backproject_128
```

{% endcode %}

Note: For large datasets or limited GPU ram, the `--lazy` argument will prevent out of memory errors.&#x20;

## CryoDRGN-ET training

Once we have obtained our input files, we are ready to train the reconstruction model on our tilt series. Here we use an example command using the files described above:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn train_vae particles_2d.128.star --ctf ctf.pkl --poses pose.pkl  --encode-mode tilt --dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o 01_trainvae_128/
```

{% endcode %}

In particular:

* `--encode-mode tilt` is required to properly treat tilt series data
* In our current experiments, we use a  KL regularization weight of `--beta 0.025`. We recommend this setting as a starting point for all tilt series experiments!
* `--dose-per-tilt` and `--angle-per-tilt` are used for dose exposure correction. The default value of `--angle-per-tilt` is 3 degrees and is left off of the example command.

Training a model on 16,655 particles for 50 epochs on 1 A100 GPU took 3h, 38min.

## Analysis

Once a cryoDRGN-ET model has finished training, use `cryodrgn analyze` to visualize the latent space and generate volumes.

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn analyze output_directory 50 # or replace with a different epoch number
```

{% endcode %}

This portion of the analysis is similar to the workflow in single particle cryodrgn. See the [EMPIAR-10076 tutorial](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-empiar-10076-tutorial) for further documentation.

## **Additional tools**

**Landscape analysis:** The commands `cryodrgn analyze_landscape` and `cryodrgn analyze_landscape_full` can be used for further analysis on the landscape of reconstructed volumes (as opposed to the landscape of latent space co-ordinates). See the [cryoDRGN landscape analysis tutorial ](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-conformational-landscape-analysis)for more information.

**Particle selection:** We implemented a standalone filtering tool to enable lasso selection from the UMAP representation outside of the Jupyter notebook. Run `cryodrgn filter .` from your results directory to launch an interactive plot in an X11 window. If running remotely you must be connected with `ssh -Y`

<figure><img src="https://3487200171-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fz7E99j3jJUSLJDvU43U6%2Fuploads%2FWxifSiB7tXLa36I0MBbW%2Fcryodrgnfilter(1).gif?alt=media&#x26;token=8b24b13a-847c-430b-9aa9-2307483ab1a5" alt="" width="375"><figcaption></figcaption></figure>

**Star file filtering:** Particle indices identified by cryoDRGN-ET can be used to filter 2D subtilt star files from Warp/M or 3D subvolume star files for downstream proccessing or visualization in [ArtiaX](https://github.com/FrangakisLab/ArtiaX). Filtering a 3D subvolume star file with `cryodrgn_utils filter_star` using the `--micrograph-files` or `-m` option produces a directory containing one star file per tomogram. &#x20;

## Experiment Workflow

Similarly to [the reconstruction workflow for single particle analysis](https://ez-lab.gitbook.io/cryodrgn/cryodrgn-empiar-10076-tutorial#general-recommended-workflow), we recommend an iterative process for training successive cryoDRGN models on a new dataset:

1. First, train on downsampled images (e.g. D=128)  as an initial pass to sanity check results and remove junk particles:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn train_vae particles_2d.128.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled_128/ --encode-mode tilt -dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/01_trainvae_128
```

{% endcode %}

2. Validate selection and junk classifications with backprojections:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn backproject_voxel particles_2d_128.star --poses pose.pkl --ctf ctf.pkl --tilt --ntilts 10 --dose-per-tilt 3 -o 02_backproject_128_selected_particles --lazy --ind 01_trainvae_128/selected_particles.pkl
```

{% endcode %}

3. After validating selected particles (`--ind selected_particles.pkl`), train a new model excluding any junk:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn train_vae particles_2d_128.star --ctf ctf.pkl --poses pose.pkl --datadir downsampled_128/ --encode-mode tilt --ind selected_particles.pkl --dose-per-tilt 3 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/03_trainvae_selected.128
```

{% endcode %}

4. Analyze heterogeneity and separate discrete states using:
   * Clustering methods available in `output_dir/analyze.50/cryoDRGN_ET_viz.ipynb`
   * Standalone lasso GUI `cryodrgn filter`&#x20;
5. Optional - Filter original 3D star file with cryodrgn selected particles for further refinement in [RELION/M](https://teamtomo.org/walkthroughs/EMPIAR-10164/relion.html):

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn_utils filter_star Extract/jobxxx/particles.star --ind 01_trainvae_128/sel_ind.pkl -o cryodrgn_particles.star
```

{% endcode %}

6. Optional - Train a new cryoDRGN model on higher resolution images to better resolve conformational heterogeneity:

{% code overflow="wrap" %}

```
(cryodrgn) $ cryodrgn train_vae particles_2d.star --ctf ctf.pkl --poses pose.pkl --datadir Extract/jobxxx/Micrographs/ --encode-mode tilt -dose-per-tilt 3.0 --zdim 8 -n 50 --beta 0.025 -o my_output_directory/04_trainvae_256
```

{% endcode %}

## Feedback

Please file a github issue or contact Ellen (<zhonge@princeton.edu>) with any questions or feedback!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ez-lab.gitbook.io/cryodrgn/cryodrgn-et-subtomogram-analysis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
