Skip to content

Training the Math LoRA Adapter

Step-by-step instructions for training the math adapter on neurocore (or any machine with an NVIDIA GPU and 8GB+ VRAM).

No GPU? Use the Colab notebook instead — it runs on a free T4 GPU and produces the same GGUF file.

  • NVIDIA GPU with 8GB+ VRAM (tested on RTX 2060 SUPER)
  • CUDA 12.1+ with compatible PyTorch
  • Python 3.10+
  • Ollama installed and running
  • The qwen3:4b model pulled in Ollama

The training pipeline:

  1. Prepare data — download GSM8K examples and format for Qwen3 chat template
  2. Train — QLoRA adapter training on Qwen3-4B using Unsloth, then merge LoRA weights and export as GGUF
  3. Deploy — load the merged GGUF into Ollama and verify with eval benchmark

The key design choice: we merge LoRA weights into the base model and export a standalone GGUF. Ollama does not support Qwen3 LoRA adapters via the ADAPTER directive (only Llama/Mistral/Gemma), so merging is required.

Training uses Unsloth, which requires specific versions of transformers, peft, and torch that conflict with the locollm runtime. We use a separate venv (.venv-train) so training dependencies don’t break the main project.

Terminal window
cd ~/projects/research/loco-llm
python3 -m venv .venv-train
source .venv-train/bin/activate
pip install --upgrade pip
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

Why a separate venv? The main project uses uv sync, but Unsloth’s dependency matrix is incompatible with locollm’s runtime deps. Keep the training venv separate.

Pull the base model in Ollama (if not already present):

Terminal window
ollama pull qwen3:4b

With the training venv active:

Terminal window
source .venv-train/bin/activate
python scripts/prepare_gsm8k.py --num-examples 200

This produces adapters/math/training_data.jsonl — 200 math problems in Qwen3 chat format:

{"conversations": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

Each answer includes step-by-step reasoning ending with “The answer is N”.

Terminal window
source .venv-train/bin/activate
python scripts/train_math_adapter.py

This script:

  1. Loads Qwen3-4B in 4-bit quantization via Unsloth
  2. Applies LoRA (r=16, alpha=32) to attention layers (q/k/v/o_proj)
  3. Trains the adapter for 3 epochs (effective batch size 8) with SFTTrainer
  4. Merges LoRA weights into the base model
  5. Exports as Q4_K_M GGUF to adapters/math/gguf/

Training takes ~15 minutes on an RTX 2060 SUPER with 200 examples.

ParameterValueRationale
LoRA rank16Standard middle ground for math reasoning
LoRA alpha322x rank scaling factor
Learning rate2e-4Standard QLoRA default
Epochs3Small dataset needs multiple passes
Batch size2 (gradient accum 4)Fits 8GB VRAM, effective batch 8
Max seq length1024Sufficient for GSM8K problems
QuantizationQ4_K_MMatches project standard (~2.5GB)

Create a Modelfile and load into Ollama:

Terminal window
echo 'FROM ./adapters/math/gguf/unsloth.Q4_K_M.gguf' > adapters/math/Modelfile
ollama create locollm-math -f adapters/math/Modelfile

Or use the CLI:

Terminal window
loco setup
Terminal window
ollama run locollm-math "What is 15 + 27?"
ollama run locollm-math "Solve for x: 2x + 5 = 13"
Terminal window
loco eval math

This runs the 20-problem benchmark comparing base qwen3:4b vs the merged math adapter. Expect the adapter-trained model to score higher due to consistent answer formatting.

Out of memory during training: Reduce BATCH_SIZE to 1 in scripts/train_math_adapter.py. Gradient accumulation will compensate.

Unsloth install fails: Check CUDA version compatibility. Unsloth requires CUDA 12.1+. Run nvidia-smi to verify driver version.

GGUF file not found after training: Check adapters/math/gguf/ for any .gguf files — Unsloth may use a slightly different naming convention than expected.

loco setup skips math adapter: The GGUF must exist before setup can register it. Train first, then run setup.

The math adapter can also be trained using the generic train_adapter.py:

Terminal window
python scripts/train_adapter.py --adapter-name math

This is equivalent to train_math_adapter.py with the same defaults. The generic script supports all adapters (math, code, analysis) — see Training New Adapters for the full workflow.

FilePurpose
adapters/math/training_data.jsonlFormatted GSM8K training data
adapters/math/gguf/unsloth.Q4_K_M.ggufMerged GGUF ready for Ollama
adapters/math/checkpoints/Training checkpoints (can be deleted after export)
adapters/math/ModelfileOllama Modelfile pointing to the GGUF