Yes, we're a little loco · Open Source · MIT License

LocoLLM

Frontier AI on a budget. Crazy, right? We're building a routed swarm of tiny specialist models and testing whether they can outperform generalists on real tasks. No cloud. No API keys. Just your hardware doing more than you'd expect.

Get Started View on GitHub
terminal
$ loco setup
✓ Pulled qwen3:4b via Ollama (2.5 GB)
$ loco query "solve 2x + 5 = 13"
$ loco query "solve 2x + 5 = 13" --adapter math
✓ Math adapter loaded · x = 4
$ loco eval math # how much does the adapter help?
How It Works

One base model. Many specialists.
Sounds loco. Works great.

The idea is simple: instead of one giant model that's okay at everything, route each query to a lightweight specialist fine-tuned for that task. Research suggests this approach has real potential. We're building the tools to find out.

1

Single Base Model

Qwen3-4B quantized to Q4_K_M. Fits in 2.5 GB of VRAM. Runs on any GPU from the last 6 years, or even a laptop CPU.

2

LoRA Adapters

Tiny specialist layers (50-300 MB each) fine-tuned for specific domains. Hot-swap in milliseconds. We're starting with math, code, writing, and analysis.

3

Smart Router

Classifies your query and picks the best adapter automatically. No manual switching. Just ask your question. (This is the part we're building next.)

We're all a little loco here.

The sensible approach is to pay for API access and let someone else handle it. We'd rather find out what's possible on a GPU we bought secondhand for the price of a nice dinner. Maybe that's loco. We think it's worth finding out.

Skills Over Gear

Built on the 80-20 principle. You don't need a $10k GPU rig. A secondhand graphics card and good training data will get you 80% of the way there. That's not a limitation. That's the point.

🎓

Built Together

Every adapter, benchmark, and routing improvement makes the whole system smarter. Contribute a specialist and everyone benefits. That's the theory. We're building the evidence.

🔒

Runs Offline

No API keys. No cloud bills. No data leaving your machine. Everything runs locally through Ollama. Your queries, your hardware, your business.

📈

AI Last Resort

LocoLLM is a thinking partner, not an answer machine. Do the work first, then use AI to check, challenge, and sharpen. That's not old-fashioned. That's how you actually learn.

Who It's For

Are you loco enough?

If any of these sound like you, welcome to the club.

💰

The Budget Rebel

You refuse to pay per-token for something your own hardware can do. You've done the math on API costs and it offends you. Good. Channel that energy.

🤖

The Tinkerer

You want to understand how LLMs actually work by cracking them open and rewiring the internals. Fine-tuning a real adapter teaches more than any tutorial ever will.

🔬

The Researcher

You need reproducible local inference for experiments. You want to test whether a team of specialists really can beat a generalist. That's an open question. Help us answer it.

🏫

The Educator

You teach AI or computing and want a real project your classes can contribute to. Not a toy demo. Real infrastructure that grows with every cohort.

🔐

The Vault

Your data doesn't leave your machine. Period. Medical notes, legal research, personal journals, proprietary code. Local means local.

The 80-20er

You know the best gear doesn't make the best work. A $150 secondhand GPU and sharp training data might just surprise you. That's what we're betting on.

Architecture

Deceptively simple.

A query comes in, the router classifies it, the matching adapter loads onto the base model, and the response goes out. No orchestration frameworks. No agent graphs. No PhD required. Whether this simplicity is a strength or a limitation is what we're here to find out.

Your Query
Router (classifier)
math
code
writing
analysis
yours?
LoRA Adapters (50-300 MB each) · 4 building now, more to come
Qwen3-4B · Q4_K_M · 2.5 GB
Base model loaded once, adapters hot-swap

Early days. Eyes wide open.

LocoLLM is a young project. We're not pretending otherwise. Here's what exists today and where we're headed.

Proof of Concept

Single math adapter on Qwen3-4B, basic CLI, manual adapter loading. It works. We can measure the difference.

Now

MVP: Four Specialists

Math, code, writing, and analysis adapters. Simple router. Standardised evaluation. The first real test of whether routing beats a generalist.

Building

Validation

Rigorous benchmarks comparing specialist routing vs. base model across domains. Honest results, published openly, whatever they show.

Next

The Vision

A growing ecosystem of community-trained specialist adapters, smart routing, and one-command setup. Frontier-capable AI that runs on hardware you already own.

Vision
Get Involved

Join the loco ones.

LocoLLM is a collaborative project. Every adapter, benchmark, and improvement makes the whole system better for everyone. The barrier to entry is low. The ceiling is high.

🧪 Train an Adapter

Pick a domain you care about, curate a dataset, fine-tune with Unsloth or MLX, and contribute it back. Your specialisation becomes everyone's tool.

📊 Benchmark & Evaluate

Run standardised evaluations. Compare adapters against base models. Publish reproducible results. Help us prove what works.

💻 Improve the Router

The router is what makes the system feel smart. Better classification means better adapter selection. Bring your NLP skills.

📝 Write Documentation

Guides, tutorials, setup instructions, troubleshooting. Good docs lower the barrier for the next contributor.

"Frontier AI on consumer hardware? That's loco."
Yeah. That's the name. Want to help us find out if it works?

Get in early.

We're just getting started. The best time to shape a project is before it's finished.

View on GitHub Learn More