Training the Math LoRA Adapter

Step-by-step instructions for training the math adapter on neurocore (or any machine with an NVIDIA GPU and 8GB+ VRAM).

No GPU? Use the Colab notebook instead — it runs on a free T4 GPU and produces the same GGUF file.

Prerequisites

NVIDIA GPU with 8GB+ VRAM (tested on RTX 2060 SUPER)
CUDA 12.1+ with compatible PyTorch
Python 3.10+
Ollama installed and running
The qwen3:4b model pulled in Ollama

Overview

The training pipeline:

Prepare data — download GSM8K examples and format for Qwen3 chat template
Train — QLoRA adapter training on Qwen3-4B using Unsloth, then merge LoRA weights and export as GGUF
Deploy — load the merged GGUF into Ollama and verify with eval benchmark

The key design choice: we merge LoRA weights into the base model and export a standalone GGUF. Ollama does not support Qwen3 LoRA adapters via the ADAPTER directive (only Llama/Mistral/Gemma), so merging is required.

Phase 1: Environment Setup

Training uses Unsloth, which requires specific versions of transformers, peft, and torch that conflict with the locollm runtime. We use a separate venv (.venv-train) so training dependencies don’t break the main project.

cd ~/projects/research/loco-llm
python3 -m venv .venv-train
source .venv-train/bin/activate
pip install --upgrade pip
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

Why a separate venv? The main project uses uv sync, but Unsloth’s dependency matrix is incompatible with locollm’s runtime deps. Keep the training venv separate.

Pull the base model in Ollama (if not already present):

ollama pull qwen3:4b

Phase 2: Prepare Training Data

With the training venv active:

source .venv-train/bin/activate
python scripts/prepare_gsm8k.py --num-examples 200

This produces adapters/math/training_data.jsonl — 200 math problems in Qwen3 chat format:

{"conversations": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Each answer includes step-by-step reasoning ending with “The answer is N”.

Phase 3: Train and Export

source .venv-train/bin/activate
python scripts/train_math_adapter.py

This script:

Loads Qwen3-4B in 4-bit quantization via Unsloth
Applies LoRA (r=16, alpha=32) to attention layers (q/k/v/o_proj)
Trains the adapter for 3 epochs (effective batch size 8) with SFTTrainer
Merges LoRA weights into the base model
Exports as Q4_K_M GGUF to adapters/math/gguf/

Training takes ~15 minutes on an RTX 2060 SUPER with 200 examples.

Training Hyperparameters

Parameter	Value	Rationale
LoRA rank	16	Standard middle ground for math reasoning
LoRA alpha	32	2x rank scaling factor
Learning rate	2e-4	Standard QLoRA default
Epochs	3	Small dataset needs multiple passes
Batch size	2 (gradient accum 4)	Fits 8GB VRAM, effective batch 8
Max seq length	1024	Sufficient for GSM8K problems
Quantization	Q4_K_M	Matches project standard (~2.5GB)

Phase 4: Deploy to Ollama

Create a Modelfile and load into Ollama:

echo 'FROM ./adapters/math/gguf/unsloth.Q4_K_M.gguf' > adapters/math/Modelfile
ollama create locollm-math -f adapters/math/Modelfile

Or use the CLI:

loco setup

Phase 5: Verify

Manual smoke test

ollama run locollm-math "What is 15 + 27?"
ollama run locollm-math "Solve for x: 2x + 5 = 13"

Automated evaluation

loco eval math

This runs the 20-problem benchmark comparing base qwen3:4b vs the merged math adapter. Expect the adapter-trained model to score higher due to consistent answer formatting.

Troubleshooting

Out of memory during training: Reduce BATCH_SIZE to 1 in scripts/train_math_adapter.py. Gradient accumulation will compensate.

Unsloth install fails: Check CUDA version compatibility. Unsloth requires CUDA 12.1+. Run nvidia-smi to verify driver version.

GGUF file not found after training: Check adapters/math/gguf/ for any .gguf files — Unsloth may use a slightly different naming convention than expected.

loco setup skips math adapter: The GGUF must exist before setup can register it. Train first, then run setup.

Generic Training Script

The math adapter can also be trained using the generic train_adapter.py:

python scripts/train_adapter.py --adapter-name math

This is equivalent to train_math_adapter.py with the same defaults. The generic script supports all adapters (math, code, analysis) — see Training New Adapters for the full workflow.

Output Files

File	Purpose
`adapters/math/training_data.jsonl`	Formatted GSM8K training data
`adapters/math/gguf/unsloth.Q4_K_M.gguf`	Merged GGUF ready for Ollama
`adapters/math/checkpoints/`	Training checkpoints (can be deleted after export)
`adapters/math/Modelfile`	Ollama Modelfile pointing to the GGUF