Training the Math LoRA Adapter
Step-by-step instructions for training the math adapter on neurocore (or any machine with an NVIDIA GPU and 8GB+ VRAM).
No GPU? Use the Colab notebook instead — it runs on a free T4 GPU and produces the same GGUF file.
Prerequisites
Section titled “Prerequisites”- NVIDIA GPU with 8GB+ VRAM (tested on RTX 2060 SUPER)
- CUDA 12.1+ with compatible PyTorch
- Python 3.10+
- Ollama installed and running
- The
qwen3:4bmodel pulled in Ollama
Overview
Section titled “Overview”The training pipeline:
- Prepare data — download GSM8K examples and format for Qwen3 chat template
- Train — QLoRA adapter training on Qwen3-4B using Unsloth, then merge LoRA weights and export as GGUF
- Deploy — load the merged GGUF into Ollama and verify with eval benchmark
The key design choice: we merge LoRA weights into the base model and export a standalone GGUF. Ollama does not support Qwen3 LoRA adapters via the ADAPTER directive (only Llama/Mistral/Gemma), so merging is required.
Phase 1: Environment Setup
Section titled “Phase 1: Environment Setup”Training uses Unsloth, which requires specific versions of transformers, peft, and torch that conflict with the locollm runtime. We use a separate venv (.venv-train) so training dependencies don’t break the main project.
cd ~/projects/research/loco-llmpython3 -m venv .venv-trainsource .venv-train/bin/activatepip install --upgrade pippip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zooWhy a separate venv? The main project uses
uv sync, but Unsloth’s dependency matrix is incompatible with locollm’s runtime deps. Keep the training venv separate.
Pull the base model in Ollama (if not already present):
ollama pull qwen3:4bPhase 2: Prepare Training Data
Section titled “Phase 2: Prepare Training Data”With the training venv active:
source .venv-train/bin/activatepython scripts/prepare_gsm8k.py --num-examples 200This produces adapters/math/training_data.jsonl — 200 math problems in Qwen3 chat format:
{"conversations": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}Each answer includes step-by-step reasoning ending with “The answer is N”.
Phase 3: Train and Export
Section titled “Phase 3: Train and Export”source .venv-train/bin/activatepython scripts/train_math_adapter.pyThis script:
- Loads
Qwen3-4Bin 4-bit quantization via Unsloth - Applies LoRA (r=16, alpha=32) to attention layers (q/k/v/o_proj)
- Trains the adapter for 3 epochs (effective batch size 8) with SFTTrainer
- Merges LoRA weights into the base model
- Exports as
Q4_K_MGGUF toadapters/math/gguf/
Training takes ~15 minutes on an RTX 2060 SUPER with 200 examples.
Training Hyperparameters
Section titled “Training Hyperparameters”| Parameter | Value | Rationale |
|---|---|---|
| LoRA rank | 16 | Standard middle ground for math reasoning |
| LoRA alpha | 32 | 2x rank scaling factor |
| Learning rate | 2e-4 | Standard QLoRA default |
| Epochs | 3 | Small dataset needs multiple passes |
| Batch size | 2 (gradient accum 4) | Fits 8GB VRAM, effective batch 8 |
| Max seq length | 1024 | Sufficient for GSM8K problems |
| Quantization | Q4_K_M | Matches project standard (~2.5GB) |
Phase 4: Deploy to Ollama
Section titled “Phase 4: Deploy to Ollama”Create a Modelfile and load into Ollama:
echo 'FROM ./adapters/math/gguf/unsloth.Q4_K_M.gguf' > adapters/math/Modelfileollama create locollm-math -f adapters/math/ModelfileOr use the CLI:
loco setupPhase 5: Verify
Section titled “Phase 5: Verify”Manual smoke test
Section titled “Manual smoke test”ollama run locollm-math "What is 15 + 27?"ollama run locollm-math "Solve for x: 2x + 5 = 13"Automated evaluation
Section titled “Automated evaluation”loco eval mathThis runs the 20-problem benchmark comparing base qwen3:4b vs the merged math adapter. Expect the adapter-trained model to score higher due to consistent answer formatting.
Troubleshooting
Section titled “Troubleshooting”Out of memory during training: Reduce BATCH_SIZE to 1 in scripts/train_math_adapter.py. Gradient accumulation will compensate.
Unsloth install fails: Check CUDA version compatibility. Unsloth requires CUDA 12.1+. Run nvidia-smi to verify driver version.
GGUF file not found after training: Check adapters/math/gguf/ for any .gguf files — Unsloth may use a slightly different naming convention than expected.
loco setup skips math adapter: The GGUF must exist before setup can register it. Train first, then run setup.
Generic Training Script
Section titled “Generic Training Script”The math adapter can also be trained using the generic train_adapter.py:
python scripts/train_adapter.py --adapter-name mathThis is equivalent to train_math_adapter.py with the same defaults. The generic script supports all adapters (math, code, analysis) — see Training New Adapters for the full workflow.
Output Files
Section titled “Output Files”| File | Purpose |
|---|---|
adapters/math/training_data.jsonl | Formatted GSM8K training data |
adapters/math/gguf/unsloth.Q4_K_M.gguf | Merged GGUF ready for Ollama |
adapters/math/checkpoints/ | Training checkpoints (can be deleted after export) |
adapters/math/Modelfile | Ollama Modelfile pointing to the GGUF |