Skip to main content

Command Line Interface

Biotrainer provides several command-line interface (CLI) commands for different tasks. This document provides an overview of all available commands and their options.

Training

The train command is used to start a training run with a specified configuration:

biotrainer train --config path/to/config.yml

This command accepts either a path to a YAML configuration file or a configuration dictionary and executes the training pipeline according to the specified parameters.

Prediction

The predict command allows you to use a trained model for making predictions:

biotrainer predict --training-output-file path/to/training_output.json --model-input input_sequences [--save-embeddings]

Parameters:

  • --training-output-file: Path to the training output file generated during model training
  • --model-input: Either a path to a FASTA file or a comma-separated list of sequences
  • --save-embeddings: Optional flag to save the computed embeddings (Default: False)

The command will output predictions for each input sequence, displaying the sequence ID and its corresponding prediction.

Example

biotrainer predict --training-output-file examples/residue_to_class/output/out.yml --model-input "HMMHM","MAHM"

Format Conversion

The convert command helps to convert the deprecated three-way biotrainer files (sequence, labels, masks) to the new single file (input_file) input:

biotrainer convert --sequence-file sequences.fasta [--labels-file labels.txt] [--masks-file masks.txt] [--converted-file output.fasta] [--target-format fasta] [--skip_inconsistencies]

Parameters:

  • --sequence-file: Input sequence file
  • --labels-file: Optional file containing labels
  • --masks-file: Optional file containing masks
  • --converted-file: Output file name (Default: "converted.fasta")
  • --target-format: Target format for conversion (Default: "fasta", "csv" is planned but not currently supported)
  • --skip_inconsistencies: Whether to skip inconsistent sequences from multiple files (e.g. no label for a sequence, Default: False)

Auto-Evaluation

The autoeval command performs automatic evaluation of embedder model performance on a given framework of downstream tasks:

biotrainer autoeval --embedder-name embedder_name --framework framework_name [--min-seq-length min_length] [--max-seq-length max_length] [--use-half-precision]

Parameters:

  • --embedder-name: Name of the embedder to evaluate
  • --framework: Name of the framework to use (currently only "flip" is supported)
  • --min-seq-length: Minimum sequence length to consider (Default: 0)
  • --max-seq-length: Maximum sequence length to consider (Default: 2000)
  • --use-half-precision: Whether to use half-precision computation (Default: False)

The command will print progress updates during the evaluation process.

Important Notes

  • All parameters must be explicitly set when using the CLI commands
  • File paths can be provided as either relative or absolute paths
  • For the predict command, sequences can be provided either through a FASTA file or as a comma-separated list
  • The convert command is particularly useful for converting deprecated file formats to the current standard