LLM Fine-Tuning Pipeline
End-to-end custom model training, delivered as Docker
Built by Rogue AI · Engineered end-to-end · Production since 2026
First clean docker-compose run end-to-end: late January 2026. Fifty-plus commits across training, conversion, and benchmark layers since. Most recent iteration: April 2026.
The problem
Off-the-shelf LLMs don't know your domain. Cloud fine-tuning APIs are expensive, slow, and leak proprietary training data. Most open-source fine-tuning recipes are notebook demos that fall over in production.
What I built
A containerized fine-tuning pipeline that runs on a single commodity GPU. Ingests JSONL training data, runs QLoRA training with a configurable base model, merges adapter weights, exports to GGUF for Ollama, and runs a benchmark harness — all from one docker compose up.
Architecture
Tech stack
What broke first
- ▸
Dataset quality dwarfs everything. Spent two weekends tuning rank/alpha before admitting the JSONL was the problem — 1,200 cleaned rows beat 8,000 dirty ones.
- ▸
Q4_K_M is a trap for technical text. The model started hallucinating CLI flags. Default quantization is now Q5_K_M — slower inference, less lying.
- ▸
Rank > 64 on a 13B base OOMs a 24 GB card no matter how clever your bitsandbytes config is. Page-table thrashing masquerades as 'just slow' until it isn't.
Outcome
Trained custom models on domain-specific corpora without sending data to third-party APIs. Inference served locally via Ollama on the same host. Replaces recurring fine-tuning spend with a one-time training run.
Honest limits
No eval harness for tool-use or function-calling yet — only perplexity and held-out task accuracy. The Modelfile generator hardcodes the system prompt; multi-tenant use would need that templated. RunPod is still cheaper for experiment runs, so the pipeline runs both, but the local path is what production uses.
