Getting started

How to get up and running with DRGN-AI

Installation

We recommend installing DRGN-AI in a clean conda environment. First clone the git repository, and then use pip to install the package from source code:

(base) $ conda create --name drgnai-env python=3.9
(base) $ conda activate drgnai-env
(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai.git
(drgnai-env) $ cd drgnai/
(drgnai-env) $ pip install .

You can also install the latest development version of DRGN-AI if you have access to the drgnai repository:

(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai.git -b v1.0.0
(drgnai-env) $ cd drgnai-internal/
(drgnai-env) $ pip install .

To confirm that the package was installed successfully, use drgnai test:

(drgnai) $ drgnai test
Installation was successful!

You can also perform a more comprehensive test of the package and its installation using pytest. This will take the better part of an hour, so we recommend running it in the background, or on a CPU (not a GPU) compute node, e.g. if using a Slurm cluster:

sbatch -t 61 --wrap='pytest' --mem=16G

Setting up an experiment

To run an experiment, first we need to set up an experiment folder. For DRGN-AI to recognize a folder as an experiment folder, all it needs to contain is a configs.yaml file listing parameters used by the reconstruction model for dataset acquisition and model training:

particles: inputs/particles.mrcs
ctf: inputs/ctf.pkl
quick_config:
    capture_setup: spa
    reconstruction_type: het
    conf_estimation: autodecoder
    pose_estimation: abinit

For those unfamiliar with creating files and folder in a command line setting, we have created the drgnai setup utility to assist you in the steps needed to set up a DRGN-AI experiment. For example, the above config file can be created and placed in a new experiment folder your_workdir using:

drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
                     --capture-setup spa --reconstruction-type het \
                     --conf-estimation autodecoder --pose-estimation abinit

To begin an experiment it is sufficient to specify an input dataset and the four quick_config parameters described below. Major parameters have their own flags for drgnai setup; the remainder can be added to configs.yaml using the --cfgs flag:

drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
                     --capture-setup spa --reconstruction-type homo \
                     --conf-estimation autodecoder --pose-estimation abinit \
                     --cfgs 'learning-rate=0.003' 'num-epochs=50' 

Use drgnai setup -h to get a list of all parameters that have their own setup flags.

Quick Config

As a shortcut for the most important parameter settings we have introduced the quick_config parameter for use in configs.yaml, which uniquely amongst DRGN-AI parameters contains four sub-parameters:

  • capture_setupFor the moment, only single-particle imaging (spa) is supported.

  • reconstruction_type Whether we want to model a latent space for conformations (het) or instead do homogeneous reconstruction (homo).

  • conf_estimation If doing heterogeneous reconstruction, what type of model to use for conformations: (autodecoder or encoder).

  • pose_estimation Whether to model poses from scratch (abinit), use known poses (fixed), or refine known poses (refine).

These are listed in a nested manner under the quick_config entry in a configs.yaml as demonstrated in the example above.

Input Datasets

We also have to specify the input dataset. A DRGN-AI experiment relies upon a stack of particles picked from a cryoEM imaging run; an input dataset for DRGN-AI thus consists of, at minimum, a file with the picked particles and the CTF parameters. There are multiple ways of telling DRGN-AI where these files are located:

  1. Add particles and ctf entries to the configs.yaml file in your experiment folder, as well as a datadir entry if necessary when using a .star or .cs file particle stack.

  2. Use the --particles and --ctf arguments (and --datadir if necessary) to the drgnai setup tool, which will place these entries in the configs file for you.

  3. Set the DRGNAI_DATASETS environment variable to point to a file with an entry for the dataset.

We have already seen examples of the first two approaches; in the third approach, we create a file called e.g. /home/drgnai-paths.yaml that will contain dataset entries:

new_data:
  particles: /scratch/consensus_particles/particles.128.txt
  pose: /scratch/consensus_particles/pose.pkl
  ctf: /scratch/consensus_particles/ctf.pkl
ankyrin_256:
  particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
  pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
  ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
ankyrin_256_filtered:
  particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
  pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
  ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
  ind: /scratch/43_empiar_11043/11043/data/consensus_particles/ind.pkl

We then set the environment variable:

export DRGNAI_DATASETS=/home/drgnai-paths.yaml

Now we can use the dataset labels defined in the paths file as shortcuts when using either drgnai setup or configs.yaml:

drgnai setup your_workdir --dataset ankyrin_256 \
                     --capture-setup spa --reconstruction-type het \
                     --conf-estimation autodecoder --pose-estimation abinit

Last updated