Getting started
How to get up and running with cryoDRGN-AI
Installation
We recommend installing cryoDRGN-AI in a clean conda environment. First clone the git repository, and then use pip
to install the package from source code:
(base) $ conda create --name drgnai-env python=3.9
(base) $ conda activate drgnai-env
(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai.git
(drgnai-env) $ cd drgnai/
(drgnai-env) $ pip install .
If you don't want to create a local clone of the cryoDRGN-AI repo, you can also install directly from pip
:
(drgnai-env) $ pip install git+https://github.com/ml-struct-bio/drgnai.git
You can also install the latest development version of cryoDRGN-AI if you have access to the drgnai-internal
repository:
(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai-internal.git
(drgnai-env) $ cd drgnai-internal/
(drgnai-env) $ pip install .
In either case, to confirm that the package was installed successfully, use drgnai test
:
(drgnai-env) $ drgnai test
Installation was successful!
You can also perform a more comprehensive test of the package and its installation using pytest
. This will take the better part of an hour, so we recommend running it in the background, or on a CPU (not a GPU) compute node, e.g. if using a Slurm cluster:
sbatch -t 61 --wrap='pytest' --mem=16G
Setting up an experiment
To run an experiment, first we need to set up an experiment folder. For cryoDRGN-AI to recognize a folder as an experiment folder, all it needs to contain is a configs.yaml
file listing parameters used by the reconstruction model for dataset acquisition and model training:
particles: inputs/particles.mrcs
ctf: inputs/ctf.pkl
quick_config:
capture_setup: spa
reconstruction_type: het
conf_estimation: autodecoder
pose_estimation: abinit
For those unfamiliar with creating files and folder in a command line setting, we have created the drgnai setup
utility to assist you in the steps needed to set up a cryoDRGN-AI experiment. For example, the above config file can be created and placed in a new experiment folder your_workdir
using:
drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
--capture-setup spa --reconstruction-type het \
--conf-estimation autodecoder --pose-estimation abinit
To begin an experiment it is sufficient to specify an input dataset and the four quick_config
parameters described below. Major parameters have their own flags for drgnai setup
; the remainder can be added to configs.yaml
using the --cfgs
flag:
drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
--capture-setup spa --reconstruction-type homo \
--conf-estimation autodecoder --pose-estimation abinit \
--cfgs 'learning-rate=0.003' 'num-epochs=50'
Quick Config
As a shortcut for the most important parameter settings we have introduced the quick_config
parameter for use in configs.yaml
, which uniquely amongst cryoDRGN-AI parameters contains four sub-parameters:
capture_setup
For the moment, only single-particle imaging (spa
) is supported.reconstruction_type
Whether we want to model a latent space for conformations (het
) or instead do homogeneous reconstruction (homo
).conf_estimation
If doing heterogeneous reconstruction, what type of model to use for conformations: (autodecoder
orencoder
).pose_estimation
Whether to model poses from scratch (abinit
), use known poses (fixed
), or refine known poses (refine
).
These are listed in a nested manner under the quick_config
entry in a configs.yaml
as demonstrated in the example above.
Input Datasets
We also have to specify the input dataset. A cryoDRGN-AI experiment relies upon a stack of particles picked from a cryoEM imaging run; an input dataset for cryoDRGN-AI thus consists of, at minimum, a file with the picked particles and the CTF parameters. There are multiple ways of telling cryoDRGN-AI where these files are located:
Add
particles
andctf
entries to theconfigs.yaml
file in your experiment folder, as well as adatadir
entry if necessary when using a .star or .cs file particle stack.Use the
--particles
and--ctf
arguments (and--datadir
if necessary) to thedrgnai setup
tool, which will place these entries in theconfigs
file for you.Set the
DRGNAI_DATASETS
environment variable to point to a file with an entry for the dataset.
We have already seen examples of the first two approaches; in the third approach, we create a file called e.g. /home/drgnai-paths.yaml
that will contain dataset entries:
new_data:
particles: /scratch/consensus_particles/particles.128.txt
pose: /scratch/consensus_particles/pose.pkl
ctf: /scratch/consensus_particles/ctf.pkl
ankyrin_256:
particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
ankyrin_256_filtered:
particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
ind: /scratch/43_empiar_11043/11043/data/consensus_particles/ind.pkl
We then set the environment variable:
export DRGNAI_DATASETS=/home/drgnai-paths.yaml
Now we can use the dataset labels defined in the paths file as shortcuts when using either drgnai setup
or configs.yaml
:
drgnai setup your_workdir --dataset ankyrin_256 \
--capture-setup spa --reconstruction-type het \
--conf-estimation autodecoder --pose-estimation abinit
Last updated