Training¶
Training pipeline¶
Training runs from src/mlops_project/train.py and is configured via Hydra using configs/train_config.yaml.
The training run validates CUDA availability when meta.require_cuda=true, ensures the dataset and pairs exist (and preprocesses if needed), loads training and evaluation pairs, instantiates the SentenceTransformer model, and then trains with SentenceTransformerTrainer while evaluating precision@k during training.
Key configuration options include train.model (default all-MiniLM-L6-v2), train.loss set to MultipleNegativesRankingLoss or ContrastiveLoss, and the usual training controls like train.batch_size, train.epochs, train.warmup_ratio, and train.seed. Use meta.save_model to toggle checkpoint saving, and meta.use_gcs plus meta.bucket_name to load and save to GCS paths. Disable Weights & Biases logging with wandb.enabled=false for offline runs.
Run training and outputs¶
Override parameters:
uv run python src/mlops_project/train.py train.loss=ContrastiveLoss train.epochs=3 wandb.enabled=false
Docker + CUDA¶
Requires Docker with NVIDIA GPU support.
Build the CUDA training image:
Run training on GPU with mounted volumes:
docker run --rm --gpus all \
-v ${PWD}/data:/app/data \
-v ${PWD}/models:/app/models \
mlops-train:cuda uv run python src/mlops_project/train.py
Disable W&B logging:
docker run --rm --gpus all \
-v ${PWD}/data:/app/data \
-v ${PWD}/models:/app/models \
mlops-train:cuda uv run python src/mlops_project/train.py wandb.enabled=false
Models are saved under models/<model>-<loss>-<pairs>-<balanced> when meta.save_model=true. Logs stream to stdout and train.log via Loguru.