Frontier AI on a budget. Crazy, right? We're building a routed swarm of tiny specialist models and testing whether they can outperform generalists on real tasks. No cloud. No API keys. Just your hardware doing more than you'd expect.
The idea is simple: instead of one giant model that's okay at everything, route each query to a lightweight specialist fine-tuned for that task. Research suggests this approach has real potential. We're building the tools to find out.
Qwen3-4B quantized to Q4_K_M. Fits in 2.5 GB of VRAM. Runs on any GPU from the last 6 years, or even a laptop CPU.
Tiny specialist layers (50-300 MB each) fine-tuned for specific domains. Hot-swap in milliseconds. We're starting with math, code, writing, and analysis.
Classifies your query and picks the best adapter automatically. No manual switching. Just ask your question. (This is the part we're building next.)
The sensible approach is to pay for API access and let someone else handle it. We'd rather find out what's possible on a GPU we bought secondhand for the price of a nice dinner. Maybe that's loco. We think it's worth finding out.
Built on the 80-20 principle. You don't need a $10k GPU rig. A secondhand graphics card and good training data will get you 80% of the way there. That's not a limitation. That's the point.
Every adapter, benchmark, and routing improvement makes the whole system smarter. Contribute a specialist and everyone benefits. That's the theory. We're building the evidence.
No API keys. No cloud bills. No data leaving your machine. Everything runs locally through Ollama. Your queries, your hardware, your business.
LocoLLM is a thinking partner, not an answer machine. Do the work first, then use AI to check, challenge, and sharpen. That's not old-fashioned. That's how you actually learn.
If any of these sound like you, welcome to the club.
You refuse to pay per-token for something your own hardware can do. You've done the math on API costs and it offends you. Good. Channel that energy.
You want to understand how LLMs actually work by cracking them open and rewiring the internals. Fine-tuning a real adapter teaches more than any tutorial ever will.
You need reproducible local inference for experiments. You want to test whether a team of specialists really can beat a generalist. That's an open question. Help us answer it.
You teach AI or computing and want a real project your classes can contribute to. Not a toy demo. Real infrastructure that grows with every cohort.
Your data doesn't leave your machine. Period. Medical notes, legal research, personal journals, proprietary code. Local means local.
You know the best gear doesn't make the best work. A $150 secondhand GPU and sharp training data might just surprise you. That's what we're betting on.
A query comes in, the router classifies it, the matching adapter loads onto the base model, and the response goes out. No orchestration frameworks. No agent graphs. No PhD required. Whether this simplicity is a strength or a limitation is what we're here to find out.
LocoLLM is a young project. We're not pretending otherwise. Here's what exists today and where we're headed.
Single math adapter on Qwen3-4B, basic CLI, manual adapter loading. It works. We can measure the difference.
NowMath, code, writing, and analysis adapters. Simple router. Standardised evaluation. The first real test of whether routing beats a generalist.
BuildingRigorous benchmarks comparing specialist routing vs. base model across domains. Honest results, published openly, whatever they show.
NextA growing ecosystem of community-trained specialist adapters, smart routing, and one-command setup. Frontier-capable AI that runs on hardware you already own.
VisionLocoLLM is a collaborative project. Every adapter, benchmark, and improvement makes the whole system better for everyone. The barrier to entry is low. The ceiling is high.
Pick a domain you care about, curate a dataset, fine-tune with Unsloth or MLX, and contribute it back. Your specialisation becomes everyone's tool.
Run standardised evaluations. Compare adapters against base models. Publish reproducible results. Help us prove what works.
The router is what makes the system feel smart. Better classification means better adapter selection. Bring your NLP skills.
Guides, tutorials, setup instructions, troubleshooting. Good docs lower the barrier for the next contributor.
"Frontier AI on consumer hardware? That's loco."
Yeah. That's the name. Want to help us find out if it works?
We're just getting started. The best time to shape a project is before it's finished.