Full Documentation

Training Parameters

Quick Configs

The “quick configs” can be used to quickly define frequently used configurations for the reconstruct command. quick_config is a dictionary that can be defined in your config file. For example:

dataset: my_dataset
quick_config:
  capture_setup: spa
  reconstruction_type: homo
  pose_estimation: refine
  conf_estimation: autodecoder

capture_setup: [string, default = “spa”] Capture setup: “spa” for single-particle imaging.

reconstruction_type: [string, default = “het”] Reconstruction type: “het” for heterogeneous or “homo” for homogeneous.

pose_estimation: [string, default = “abinit”] Pose estimation mode: “abinit” for no initialization, “refine” to refine ground truth poses by gradient descent or “fixed” to use ground truth poses (you must then define pose).

conf_estimation: [string, default = “autodecoder”] Conformation estimation mode: “autodecoder”, “encoder” or “refine” to refine conformations by gradient descent (you must then define initial_conf).

⚠️ Other parameters defined in the config file will overwrite the values of the parameters defined through `quick_config`.

Dataset

particles: [string, required unless dataset is specified] Path to the picked particles (.mrcs / .star / .txt).

ctf: [string, required unless dataset is specified] Path to the CTF parameters (.pkl). Must be the size of the full dataset.

pose: [string, default = None] Path to the poses (.pkl). Must be the size of the full dataset.

dataset: [string, default = None] Name of the dataset, as defined in paths.json.

datadir: [string, default=None] When using a .star file or .cs file that specifies relative paths to .mrcs, this should be the path to the directory containing the .mrcs files.

ind: [string or int, default = None] Path to indices (.pkl) or number of images to keep (first images kept first). Will use the full dataset if None.

labels: [string, default = None] Path to ground truth labels (.pkl). Must be the size of the full dataset. Used for visualization purposes. Ignored if None.

relion31: [bool, default = False] Flag for relion 3.1 data format.

no_trans: [bool, default = False] Flag that indicates the dataset does not contain translations.

invert_data: [bool, default = True] Flag for not inverting input data (e.g. for EMPIAR-10076).

norm_mean: [bool, default = None] Manually override data normalization mean.

norm_std: [bool, default = None] Manually override data normalization standard deviation.

Initialization

use_gt_poses: [bool, default = False] Use ground truth poses.

refine_gt_poses: [bool, default = False] Flag for refining the poses.

use_gt_trans: [bool, default = False] Flag for using ground truth translations.

load: [string, default = None] Path to a checkpoint (.pkl).

initial_conf: [string, default] Path to initial conformations (.pkl). Will randomly initialize conformations if None.

Logging

log_interval: [int, default=10,000] Number of images between printed outputs.

log_heavy_interval: [int, default = 5] Number of epochs between tensorboard updates and checkpoint savings.

verbose_time: [bool, default = False] Report the speed of the algorithm in the printed outputs.

Data Loading

shuffle: [bool, default = True] Flag for shuffling the dataset.

lazy: [bool, default = False] Flag for lazy data loading.

num_workers: [int, default = 2] Number of workers.

max_threads: [int, default = 16] Number of threads.

fast_dataloading: [bool, default = False] Flag for using accelerated data loading (beta version).

shuffler_size: [bool, default = 32,768] Size of the shuffler, when using accelerated data loading.

batch_size_known_poses: [int, default = 16] Batch size when using known poses (random poses during pre-training or ground truth poses).

batch_size_hps: [int, default = 8] Batch size during hierarchical pose search.

batch_size_sgd: [int, default = 32] Batch size when using stochastic gradient descent.

Optimizers

hypervolume_optimizer_type: [string, default = “adam”] Optimizer type for the hypervolume (choice: “adam”).

pose_table_optimizer_type: [string, default = “adam”] Optimizer for the pose table (choice: “adam”, “lbfgs”).

conf_table_optimizer_type: [string, default = “adam”] Optimizer for the conformation table (choice: “adam”).

conf_encoder_optimizer_type: [string, default = “adam”] Optimizer for the conformation encoder.

lr: [float, default = 1e-4] Learning rate of the hypervolume.

lr_pose_table: [float, default = 1e-3] Learning rate of the pose table.

lr_conf_table: [float, default = 1e-2] Learning rate fo the conformation table.

lr_conf_encoder: [float, default = 1e-4] Learning rate of the conformation encoder.

wd: [float, default = 0.0] Weight decay used by Adam optimizers.

Scheduling

n_imgs_pose_search: [int, default = 500,000] Number of images used by hierarchical pose search.

epochs_sgd: [int, default = 100] Number of epochs of stochastic gradient descent.

pose_only_phase: [int, default = 0] Conformations will be random during pose_only_phase images.

Masking

output_mask: [string, default = “circ”] Type of output mask (choice: "circ", “frequency_marching”)

add_one_frequency_every: [int, default = 100,000] Frequency for adding new frequencies in the output mask, during HPS (in images).

n_frequencies_per_epoch: [int, default = 10] Number of frequencies to add in the output mask at each epoch, during SGD.

max_freq: [int, default = None] Highest frequency to use in the loss. Use all frequencies if None.

window_radius_gt_real: [float, default = 0.85] Radius of the circular mask applied on images in real space (maximum radius is 1).

l_start_fm: [int, default = 12] Starting size for the output mask when using frequency marching.

Loss

beta_conf: [string, default = 0.0] Beta term penalizing the KL divergence of the posterior distribution of conformations. Only used in variational mode.

trans_l1_regularizer: [string, default = 0.0] Strength of the L1 regularizer applied on estimated translations.

l2_smoothness_regularizer: [string, default = 0.0] Strength of the L2 smoothness regularization (penalization of strong gradients)

Conformations

variational_het: [bool, default = False] Flag activating the variational mode of conformation estimation.

z_dim: [int, default = 4] Dimension of conformations.

std_z_init: [float, default = 0.1] Standard deviation of the initial conformations (i.i.d. with a centered Gaussian distribution).

use_conf_encoder: [bool, default = False] Flag for using an encoder to predict conformations.

depth_cnn: [int, default = 5] Depth of the encoder.

channels_cnn: [int, default = 32] Number of channels in the encoder.

kernel_size_cnn: [int, default = 3] Size of the kernels in the encoder.

resolution_encoder: [int, default = None] Resolution of images given to the encoder. Images are nto downsampled if None.

Hypervolume

explicit_volume: [bool ,default = False] Flag for using an explicit volume (voxel array).

hypervolume_layers: [int, default = 3] Number of hidden layers in the hypervolume.

hypervolume_dim: [int, default = 256] Dimension of hidden layers in the hypervolume.

pe_type: [string, default = “gaussian”] Type of positional encoding for Fourier coordinates (choice: “gaussian”).

pe_dim: [int, default = 64] Number of frequencies used for positional encoding.

feat_sigma: [float, default = 0.5] Standard deviation of encoding frquencies.

hypervolume_domain: [string, default = “hartley”] Domain of the hypervolume (choice: “hartley”).

pe_type_conf: [string, default = None] Type of positional encoding for conformations (choice: None, “geom”).

Pre-training

n_imgs_pretrain: [int, default = 10,000] Number of images used for pre-training.

pretrain_with_gt_poses: [bool, default = False] Flag for using ground truth poses at pre-training time.

Pose Search

l_start: [int, default = 12] Number of frequencies used during the first pose search step.

l_end: [int, default = 32] Number of frequencies used during the last pose search step.

n_iter: [int, default = 4] Number of pose search iterations.

t_extent: [float, default = 20.0] Extent of the translation seach grid, in pixels.

t_n_grid: [int, default = 7] Number of point per dimension in the translation search grid.

t_x_shift: [float, default = 0.0] X-axis shift of the translation search grid.

t_y_shift: [float, default = 0.0] Y-axis shift of the translation search grid.

no_trans_search_at_pose_search: [bool, default = False] Flag for by-passing the translation search.

n_kept_poses: [int, default = 8] Number of poses kept per image.

base_healpy: [int, default = 2] Base healpy index.

Subtomogram Averaging (experimental!)

subtomogram_averaging: [bool, default = False] Flag for subtomogram averaging.

n_tilts: [int, default = 11] Number of tilts kepts per particle.

dose_per_tilt: [float, default = 2.93] Dose per tilt (TODO: specify units).

angle_per_tilt: [float, default = 3.0] Angle between consecutive tilts, in degrees.

n_tilts_pose_search: [int, default = 11] Number of tilts used during pose search.

average_over_tilts: [bool, default = False] Flag for averaging subtomograms over the tilt dimension before pose search.

tilt_axis_angle: [float, default = 0.0] Angle between the vertical axis and the tilt axis.

dose_exposure_correction: [bool, default = True] Flag for using a dos exposure correction.

Others

color_palette: [string, default = None] Type of palette used to visualize the predicted conformations (choice: None, “rainbow”, “linear”).

test_installation [bool, default = False] Flag for testing the installation.

seed: [int, default = -1] Random seed. Randomly chosen if negative.

multigpu: [bool, default = False] Flag for activating multi-GPU mode if more than one GPU is available.

Analysis Parameters

epoch: [int, default = 100] Number of epochs the model has been trained for.

skip_umap: [bool, default = False] Flag for skipping UMAP (can be time consuming on large datasets).

pc: [int, default = 2] Number of principal components that will be analyzed.

n_per_pc: [int, default = 10] Number of volumes samples along each principal component.

ksample: [int, default = 20] Number of centroids used for k-means.

invert: [int, default = True] Flag to multiply the volumes by -1 before saving them.

sample_z_idx: [list of ints, default = None] List if indices to sample. Not used if None.

trajectory_1d: [list of 3 ints, default = None] Used to generate a linear trajectory between two points. The first entry in the list is the index of the first volume, the second entry is the index of the last volume and the third entry is the number of volumes that will be generated between the two. Not used if None.

direct_traversal_txt: [string, default = None] Path to a list of indices (.txt). Will generate volumes sampled on linear path between consecutive pairs of indices (10 volumes between each pair). Not used if None.

z_values_txt: [string, default = None] Path to a list of z’s (.txt) to sample volumes from. Not used if None.

seed: [int, default = -1] Random seed. Randomly chosen if negative.

PreviousRunning a job

Last updated 1 month ago