Command Line Interface

Biotrainer provides several command-line interface (CLI) commands for different tasks. This document provides an overview of all available commands and their options.

Training

The train command is used to start a training run with a specified configuration:

biotrainer train --config path/to/config.yml

This command accepts either a path to a YAML configuration file or a configuration dictionary and executes the training pipeline according to the specified parameters.

Prediction

The predict command allows you to use a trained model for making predictions:

biotrainer predict --training-output-file path/to/training_output.json --model-input input_sequences [--save-embeddings]

Parameters:

--training-output-file: Path to the training output file generated during model training
--model-input: Either a path to a FASTA file or a comma-separated list of sequences
--save-embeddings: Optional flag to save the computed embeddings (Default: False)

The command will output predictions for each input sequence, displaying the sequence ID and its corresponding prediction.

Example

biotrainer predict --training-output-file examples/residue_to_class/output/out.yml --model-input "HMMHM","MAHM"

Format Conversion

The convert command helps to convert the deprecated three-way biotrainer files (sequence, labels, masks) to the new single file (input_file) input:

biotrainer convert --sequence-file sequences.fasta [--labels-file labels.txt] [--masks-file masks.txt] [--converted-file output.fasta] [--target-format fasta] [--skip_inconsistencies]

Parameters:

--sequence-file: Input sequence file
--labels-file: Optional file containing labels
--masks-file: Optional file containing masks
--converted-file: Output file name (Default: "converted.fasta")
--target-format: Target format for conversion (Default: "fasta", "csv" is planned but not currently supported)
--skip_inconsistencies: Whether to skip inconsistent sequences from multiple files (e.g. no label for a sequence, Default: False)

Auto-Evaluation

The autoeval command performs automatic evaluation of embedder model performance on a given framework of downstream tasks:

biotrainer autoeval --embedder-name embedder_name --framework framework_name [--min-seq-length min_length] [--max-seq-length max_length] [--use-half-precision]

Parameters:

--embedder-name: Name of the embedder to evaluate
--framework: Name of the framework to use (currently only "flip" is supported)
--min-seq-length: Minimum sequence length to consider (Default: 0)
--max-seq-length: Maximum sequence length to consider (Default: 2000)
--use-half-precision: Whether to use half-precision computation (Default: False)

The command will print progress updates during the evaluation process.

Important Notes

All parameters must be explicitly set when using the CLI commands
File paths can be provided as either relative or absolute paths
For the predict command, sequences can be provided either through a FASTA file or as a comma-separated list
The convert command is particularly useful for converting deprecated file formats to the current standard

Training​

Prediction​

Parameters:​

Example​

Format Conversion​

Parameters:​

Auto-Evaluation​

Parameters:​

Important Notes​

Training

Prediction

Parameters:

Example

Format Conversion

Parameters:

Auto-Evaluation

Parameters:

Important Notes