# Getting started

## Installation

We recommend installing cryoDRGN-AI in a clean conda environment. First clone the git repository, and then use `pip` to install the package from source code:

```
(base) $ conda create --name drgnai-env python=3.9
(base) $ conda activate drgnai-env
(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai.git
(drgnai-env) $ cd drgnai/
(drgnai-env) $ pip install .
```

If you don't want to create a local clone of the cryoDRGN-AI repo, you can also install directly from `pip`:

```
(drgnai-env) $ pip install git+https://github.com/ml-struct-bio/drgnai.git
```

You can also install the latest development version of cryoDRGN-AI if you have access to the `drgnai-internal` repository:

```
(drgnai-env) $ git clone git@github.com:ml-struct-bio/drgnai-internal.git
(drgnai-env) $ cd drgnai-internal/
(drgnai-env) $ pip install .
```

In either case, to confirm that the package was installed successfully, use `drgnai test`:

```
(drgnai-env) $ drgnai test
Installation was successful!
```

You can also perform a more comprehensive test of the package and its installation using `pytest`. This will take the better part of an hour, so we recommend running it in the background, or on a CPU (not a GPU) compute node, e.g. if using a Slurm cluster:

```
sbatch -t 61 --wrap='pytest' --mem=16G
```

## Setting up an experiment

To run an experiment, first we need to set up an experiment folder. For cryoDRGN-AI to recognize a folder as an experiment folder, all it needs to contain is a `configs.yaml` file listing parameters used by the reconstruction model for dataset acquisition and model training:

```yaml
particles: inputs/particles.mrcs
ctf: inputs/ctf.pkl
quick_config:
    capture_setup: spa
    reconstruction_type: het
    conf_estimation: autodecoder
    pose_estimation: abinit
```

For those unfamiliar with creating files and folder in a command line setting, we have created the `drgnai setup` utility to assist you in the steps needed to set up a cryoDRGN-AI experiment. For example, the above config file can be created and placed in a new experiment folder `your_workdir` using:

```
drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
                     --capture-setup spa --reconstruction-type het \
                     --conf-estimation autodecoder --pose-estimation abinit
```

To begin an experiment it is sufficient to specify an input dataset and the four `quick_config` parameters described below. Major parameters have their own flags for `drgnai setup`; the remainder can be added to `configs.yaml` using the `--cfgs` flag:

```
drgnai setup your_workdir --particles inputs/particles.mrcs --ctf inputs/ctf.pkl \
                     --capture-setup spa --reconstruction-type homo \
                     --conf-estimation autodecoder --pose-estimation abinit \
                     --cfgs 'learning-rate=0.003' 'num-epochs=50' 
```

{% hint style="info" %}
Use `drgnai setup -h` to get a list of all parameters that have their own `setup` flags.
{% endhint %}

#### Quick Config

As a shortcut for the most important parameter settings we have introduced the `quick_config` parameter for use in `configs.yaml`, which uniquely amongst cryoDRGN-AI parameters contains four sub-parameters:

* `capture_setup`For the moment, only single-particle imaging (`spa`) is supported.
* `reconstruction_type` Whether we want to model a latent space for conformations (`het`) or instead do homogeneous reconstruction (`homo`).
* `conf_estimation` If doing heterogeneous reconstruction, what type of model to use for conformations: (`autodecoder` or `encoder`).
* `pose_estimation` Whether to model poses from scratch (`abinit`), use known poses (`fixed`), or refine known poses (`refine`).

These are listed in a nested manner under the `quick_config` entry in a `configs.yaml` as demonstrated in the example above.

#### Input Datasets

We also have to specify the input dataset. A cryoDRGN-AI experiment relies upon a stack of particles picked from a cryoEM imaging run; an input dataset for cryoDRGN-AI thus consists of, at minimum, a file with the picked particles and the CTF parameters. There are multiple ways of telling cryoDRGN-AI where these files are located:

1. Add `particles` and `ctf` entries to the `configs.yaml` file in your experiment folder, as well as a `datadir` entry if necessary when using a .star or .cs file particle stack.
2. Use the `--particles` and `--ctf` arguments (and `--datadir` if necessary) to the `drgnai setup` tool, which will place these entries in the `configs` file for you.
3. Set the `DRGNAI_DATASETS` environment variable to point to a file with an entry for the dataset.

We have already seen examples of the first two approaches; in the third approach, we create a file called e.g. `/home/drgnai-paths.yaml` that will contain dataset entries:

```yaml
new_data:
  particles: /scratch/consensus_particles/particles.128.txt
  pose: /scratch/consensus_particles/pose.pkl
  ctf: /scratch/consensus_particles/ctf.pkl
ankyrin_256:
  particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
  pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
  ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
ankyrin_256_filtered:
  particles: /scratch/43_empiar_11043/11043/data/consensus_particles/particles.256.txt
  pose: /scratch/43_empiar_11043/consensus_particles/pose.pkl
  ctf: /scratch/43_empiar_11043/11043/data/consensus_particles/ctf.pkl
  ind: /scratch/43_empiar_11043/11043/data/consensus_particles/ind.pkl
```

We then set the environment variable:

```
export DRGNAI_DATASETS=/home/drgnai-paths.yaml
```

Now we can use the dataset labels defined in the paths file as shortcuts when using either `drgnai setup` or `configs.yaml`:

```
drgnai setup your_workdir --dataset ankyrin_256 \
                     --capture-setup spa --reconstruction-type het \
                     --conf-estimation autodecoder --pose-estimation abinit
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ez-lab.gitbook.io/cryodrgn-ai/getting-started.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
