Frontier AI on a budget. Crazy, right? We're building a routed swarm of tiny specialist models and testing whether they can outperform generalists on real tasks. No cloud. No API keys. Just your hardware doing more than you'd expect.
The idea is simple: instead of one giant model that's okay at everything, route each query to a lightweight specialist fine-tuned for that task. Research suggests this approach has real potential. We're building the tools to find out.
Qwen3-4B quantized to Q4_K_M. Fits in 2.5 GB of VRAM. Tested on GPUs from a GTX 1050 Ti to an RTX 2060 Super. Runs on CPU too, just slower.
Tiny specialist layers fine-tuned for specific domains, merged into standalone GGUFs. We're starting with math, code, and analysis.
Classifies your query and picks the best adapter automatically. No manual switching. Just ask your question. Keyword-based now, ML classifier next.
The sensible approach is to pay for API access and let someone else handle it. We'd rather find out what's possible on a GPU we bought secondhand for the price of a nice dinner. Maybe that's loco. We think it's worth finding out.
You don't need a $10k GPU rig. A secondhand graphics card and good training data will get you most of the way there. That's not a limitation. That's the point.
Every adapter, benchmark, and routing improvement makes the whole system smarter. Contribute a specialist and everyone benefits. That's the theory. We're building the evidence.
No API keys. No cloud bills. No data leaving your machine. Everything runs locally through Ollama. Your queries, your hardware, your business.
LocoLLM is a thinking partner, not an answer machine. Do the work first, then use AI to check, challenge, and sharpen. That's not old-fashioned. That's how you actually learn.
If any of these sound like you, welcome to the club.
You refuse to pay per-token for something your own hardware can do. You've done the math on API costs and it offends you. Good. Channel that energy.
You want to understand how LLMs actually work by cracking them open and rewiring the internals. Fine-tuning a real adapter teaches more than any tutorial ever will.
You need reproducible local inference for experiments. You want to test whether a team of specialists really can beat a generalist. That's an open question. Help us answer it.
You teach AI or computing and want a real project your classes can contribute to. Not a toy demo. Real infrastructure that grows with every cohort.
Your data doesn't leave your machine. Period. Medical notes, legal research, personal journals, proprietary code. Local means local.
You know the best gear doesn't make the best work. A $150 secondhand GPU and sharp training data might just surprise you. That's what we're betting on.
A query comes in, the router picks an adapter, Ollama loads the matching model, and the response goes out. No orchestration frameworks. No agent graphs. No PhD required. Whether this simplicity is a strength or a limitation is what we're here to find out.
LocoLLM is a young project. We're not pretending otherwise. Here's what exists today and where we're headed.
Three adapters (math, code, analysis) on Qwen3-4B, CLI with query and chat, keyword router, evaluation harness. It works. We can measure the difference.
NowFirst student cohort adds new adapters, improves the router (keyword → ML classifier), and publishes benchmark results. The first real test of whether routing beats a generalist.
BuildingRigorous benchmarks comparing specialist routing vs. base model across domains. Honest results, published openly, whatever they show.
NextA growing ecosystem of community-trained specialist adapters, smart routing, and one-command setup. Competitive AI that runs on hardware you already own.
VisionLocoLLM is a collaborative project. Every adapter, benchmark, and improvement makes the whole system better for everyone. The barrier to entry is low. The ceiling is high.
Pick a domain you care about — math, code, business writing, security, legal. Curate a dataset, fine-tune, and contribute it back. Your specialisation becomes everyone's tool.
Run standardised evaluations. Compare adapters against base models. Publish reproducible results. Help us prove what works.
The router is what makes the system feel smart. Better classification means better adapter selection. Bring your NLP skills.
How do students actually use local AI? Design a study, observe real usage, and turn findings into actionable improvements. Research methods meet real users.
Is local AI actually cheaper than cloud? Build a cost model, run the numbers, and find out. Financial modelling meets open-source AI.
Guides, tutorials, video walkthroughs, onboarding redesign. Good docs lower the barrier for the next contributor.
If a project interests you but you're not sure you have the skills, that's probably the right project. The one that stretches you is the one you'll learn the most from.
LocoLLM is a School of Management and Marketing initiative at Curtin University. Whether you're a student looking for a capstone project, a researcher interested in collaboration, or just curious — we'd love to hear from you.
Project Lead: Michael Borck