LocoLLM Project Ideas
A catalogue of self-contained projects for capstone students and contributors. Each project is scoped to be achievable in one semester by a team of 2-4 people. Projects are grouped by discipline focus, but most are cross-disciplinary — a web interface project needs both design thinking and technical implementation.
How to read this page: Pick a project that interests you. Read the description and deliverables. If it sounds like something you want to work on, talk to the project lead. Most of the skills listed are learnable during the project — attitude and curiosity matter more than existing expertise.
Adapters: New Domains
Section titled “Adapters: New Domains”Build a new specialist adapter that extends the LocoLLM system. Each adapter project follows the same cycle: curate data, train the adapter, evaluate it, document the results, and submit a PR.
A1. Statistics & Data Analysis Adapter
Section titled “A1. Statistics & Data Analysis Adapter”Train an adapter that handles descriptive statistics, hypothesis testing, and data interpretation. Data sources: statistics textbook problems, Kaggle dataset descriptions, introductory stats course materials.
Why it matters: Statistics sits at the intersection of math and analysis — the current router has no good home for “what does this p-value mean?” questions. This adapter tests whether domain specialisation helps at the boundary between existing domains.
Deliverables: Trained adapter (GGUF), evaluation dataset (50+ problems), training log, benchmark comparison vs base model and vs math adapter on statistics queries.
Discipline fit: Information Systems, Data Analysis, Business Analytics
Skills: Data curation (most important), basic Python, statistical literacy, technical writing
A2. Business Writing Adapter
Section titled “A2. Business Writing Adapter”Train an adapter for business communication: emails, proposals, executive summaries, meeting agendas. Training data from business writing guides, style manuals, and curated examples.
Why it matters: Most AI writing tools optimise for generic fluency. A business writing specialist could learn conventions (brevity, action orientation, audience awareness) that a general model handles poorly.
Deliverables: Trained adapter, evaluation rubric (scored by LLM-judge or human evaluation), style guide for the training data, benchmark results.
Discipline fit: Marketing, Management, Business Communication
Skills: Strong writing ability, data curation, understanding of business communication norms, basic Python
A3. Security & Risk Analysis Adapter
Section titled “A3. Security & Risk Analysis Adapter”Train an adapter for cybersecurity concepts: risk assessment, vulnerability analysis, policy interpretation, incident response. Data from NIST frameworks, OWASP guides, and security case studies.
Why it matters: Security is a high-value domain where incorrect answers are dangerous. This project tests whether a small model can learn to be cautious and accurate in a domain where confidence calibration matters.
Deliverables: Trained adapter, evaluation dataset covering multiple security sub-domains, analysis of failure modes (where does the adapter give dangerously wrong answers?), comparison with base model.
Discipline fit: Information Systems, Cybersecurity, IT Management
Skills: Security domain knowledge, data curation, evaluation design, Python
A4. Legal / Compliance Adapter
Section titled “A4. Legal / Compliance Adapter”Train an adapter for plain-language explanation of legal and compliance concepts: privacy law (GDPR, Australian Privacy Act), contract interpretation, regulatory compliance. Not for legal advice — for understanding.
Why it matters: Business students encounter legal concepts constantly but legal language is opaque. An adapter that translates legalese into plain language has clear educational value. Also tests the “good enough threshold” — how accurate must a legal adapter be to be useful rather than harmful?
Deliverables: Trained adapter, evaluation dataset with expert-verified answers, explicit limitations document (what the adapter should NOT be used for), benchmark results.
Discipline fit: Law, Business Law, Compliance, Information Systems
Skills: Legal literacy, careful data curation (accuracy is critical), evaluation design, technical writing
Infrastructure: Making the System Better
Section titled “Infrastructure: Making the System Better”Improve LocoLLM’s core infrastructure. These projects work on the system itself rather than individual adapters.
I1. Classifier Router
Section titled “I1. Classifier Router”Replace the keyword router with a machine learning classifier. The current router matches keywords — it works for 3 adapters but will not scale to 10+. Build a router that uses text classification (TF-IDF + logistic regression, or a small sentence transformer) to route queries.
Why it matters: This is the core research question — does intelligent routing add value over a well-prompted base model? The classifier router is the first step toward answering it with real data.
Deliverables: Working classifier router (pluggable, same interface as keyword router), training pipeline using existing benchmark examples as labelled data, routing accuracy benchmark, comparison with keyword router.
Discipline fit: Information Systems, Data Analysis, AI/ML
Skills: Python, basic ML concepts (classification, train/test split), evaluation methodology
I2. Web Chat Interface
Section titled “I2. Web Chat Interface”Build a browser-based chat interface for LocoLLM. The current CLI works but is not approachable for non-technical users. A web UI would make the system accessible to students who are not comfortable in a terminal.
Why it matters: If LocoLLM is a teaching tool, it needs to be usable by the people it is meant to teach. Not everyone learns best in a terminal. A web interface also opens the door to features like conversation history, adapter selection dropdowns, and visual feedback on routing decisions.
Deliverables: Working web chat UI (Flask/FastAPI + simple frontend), adapter selection, routing indicator (shows which adapter is handling the query), conversation history, deployment documentation.
Discipline fit: Information Systems, Web Development, UX Design
Skills: Python (Flask or FastAPI), basic HTML/CSS/JavaScript, API design, UX thinking
I3. Leaderboard & Adapter Dashboard
Section titled “I3. Leaderboard & Adapter Dashboard”Build an automated leaderboard that ranks adapters by domain using benchmark scores. Display training metadata, version history, and performance trends across semesters.
Why it matters: As the adapter library grows, students need to see where their work stands relative to others. A leaderboard creates healthy competition and makes the project’s progress visible. It also provides the data needed to decide which adapters to activate for routing.
Deliverables: Leaderboard CLI command (loco leaderboard), generated static report (HTML or markdown), integration with registry benchmark scores, semester-over-semester trend tracking.
Discipline fit: Information Systems, Data Analysis, Data Visualisation
Skills: Python, data processing, basic web or reporting (static site generation), registry/YAML understanding
I4. Tool Use Integration
Section titled “I4. Tool Use Integration”Add a Python sandbox that adapters can call during inference. Train a proof-of-concept adapter (math) that generates tool calls instead of computing answers directly. Benchmark tool-calling vs direct computation.
Why it matters: Language models are bad at arithmetic. Computers are good at it. This project tests whether small models can learn to delegate computation to tools — the same pattern that frontier models use with code interpreter. See Architecture Vision for the full rationale.
Deliverables: Python sandbox (safe subprocess execution), tool-call training data for math adapter, retrained math adapter, benchmark comparison (tool-calling vs direct), latency analysis.
Discipline fit: Information Systems, Software Engineering, AI/ML
Skills: Python, subprocess/sandboxing, training data design, evaluation methodology
I5. Automated Rebuild Pipeline
Section titled “I5. Automated Rebuild Pipeline”Build a pipeline that retrains all adapters when the base model changes. Automate: pull new base model, retrain each adapter, export GGUFs, run benchmarks, generate comparison report.
Why it matters: The base model changes yearly. With 3 adapters, manual retraining is fine. With 15, it is a day of manual work. Automation turns the annual rebuild into a one-command operation and makes base model migration decisions evidence-based.
Deliverables: Pipeline script (or Makefile), parameterised training configs, automated benchmark runner, comparison report generator (old base vs new base for each adapter).
Discipline fit: Information Systems, DevOps, Software Engineering
Skills: Python, scripting/automation, CI/CD concepts, testing methodology
Evaluation & Research
Section titled “Evaluation & Research”Projects focused on understanding how well the system works and generating publishable findings.
R1. Domain Benchmark Suite
Section titled “R1. Domain Benchmark Suite”Design and build comprehensive evaluation benchmarks for each adapter domain. The current benchmarks are minimal (50 questions). Production-quality benchmarks need 200+ questions per domain, difficulty tiers, and cross-domain contamination checks.
Why it matters: Benchmarks are the foundation of every claim the project makes. Weak benchmarks mean weak conclusions. This project directly improves the rigour of every other project’s results.
Deliverables: Expanded benchmark datasets for each domain, difficulty-tiered questions, contamination analysis (do benchmark questions appear in training data?), benchmark methodology document.
Discipline fit: Data Analysis, Research Methods, Information Systems
Skills: Domain knowledge (for question quality), data analysis, research methodology, attention to detail
R2. Base Model Comparison Study
Section titled “R2. Base Model Comparison Study”Systematically compare 3-5 candidate base models across all adapter domains. For each base model: train all adapters, run all benchmarks, measure quality, speed, and memory. Produce an evidence-based recommendation for the next academic year’s base model.
Why it matters: The base model decision is the highest-impact choice in the project. Currently it is made by reviewing published benchmarks and community consensus. This project would generate first-party evidence under LocoLLM’s specific constraints (4-bit, consumer hardware, LoRA adapter-trained).
Deliverables: Comparison report covering quality (benchmark scores per domain), speed (tokens/second), memory (peak RAM), and adapter training responsiveness (how much does each base model improve with LoRA?). Recommendation with supporting data.
Discipline fit: Data Analysis, Research Methods, Information Systems
Skills: Systematic evaluation, data analysis, technical writing, basic Python
R3. User Experience Study
Section titled “R3. User Experience Study”How do students actually use LocoLLM? Observe students using the system for real tasks. Document usage patterns, pain points, feature requests, and whether the “conversation not delegation” philosophy holds in practice.
Why it matters: The project makes assumptions about how students will use AI tools. This project tests those assumptions with real users. The findings inform both the technical roadmap and the pedagogical framing.
Deliverables: Study design (ethics approval if required), observation protocol, user interviews or surveys, findings report, actionable recommendations for the project.
Discipline fit: Marketing (consumer behaviour), Management (organisational behaviour), Information Systems (technology adoption)
Skills: Research design, qualitative analysis, interviewing, report writing
R4. Cost-Benefit Analysis: Local vs Cloud AI
Section titled “R4. Cost-Benefit Analysis: Local vs Cloud AI”Quantify the total cost of ownership for LocoLLM vs cloud AI alternatives across a semester. Include hardware, electricity, setup time, maintenance, and the value of capabilities that are hard to price (privacy, no rate limits, offline access).
Why it matters: “It’s free” is an oversimplification. Students’ time has value. Hardware has cost. This project produces an honest economic analysis that either supports or challenges the local-first thesis.
Deliverables: Cost model (spreadsheet or tool), sensitivity analysis (what if electricity costs X? what if free tiers improve?), comparison across student personas (budget-constrained, time-constrained, privacy-conscious), written report.
Discipline fit: Business Analytics, Economics, Information Systems, Management
Skills: Financial modelling, cost analysis, research methodology, clear writing
Documentation & Communication
Section titled “Documentation & Communication”Projects that improve how the project communicates with its audiences.
D1. Onboarding Experience
Section titled “D1. Onboarding Experience”Redesign the getting-started experience for new users and contributors. Current docs are thorough but assume technical comfort. Create a guided pathway that takes someone from “what is this?” to “I have a working adapter” with minimal friction.
Why it matters: The best system in the world is useless if people cannot get started with it. The onboarding experience determines whether a new student contributes or gives up in the first hour.
Deliverables: Revised getting-started guide, quick-start tutorial (15 minutes to first result), troubleshooting FAQ, user testing with 3-5 new users, before/after comparison of setup success rate.
Discipline fit: Marketing (communications), Information Systems, UX Design
Skills: Clear writing, empathy for beginners, user testing, basic familiarity with git/CLI
D2. Project Website & Marketing
Section titled “D2. Project Website & Marketing”Design and build a public-facing project website that communicates what LocoLLM is, why it matters, and how to get involved. Current landing page exists but could be more compelling. Consider audience: prospective students, academic colleagues, open-source contributors.
Why it matters: The project needs to attract contributors and communicate its value to stakeholders (faculty, potential collaborators, funding bodies). The website is the first impression.
Deliverables: Redesigned website (static site), clear messaging for each audience, visual design, analytics setup, content strategy document.
Discipline fit: Marketing, Communications, Web Design, Information Systems
Skills: Web design (static site generators), copywriting, visual design, audience analysis, basic analytics
D3. Video Tutorials & Demos
Section titled “D3. Video Tutorials & Demos”Create a series of short video tutorials (3-5 minutes each) covering key workflows: setup, training an adapter, running evaluation, using chat, contributing a PR.
Why it matters: Not everyone learns from documentation. Video lowers the barrier for visual learners and creates shareable content that can be used in lectures, social media, and conference presentations.
Deliverables: 4-6 short videos, scripts, screen recordings with narration, published to YouTube or equivalent, linked from project docs.
Discipline fit: Marketing (content creation), Communications, Education
Skills: Screencasting, clear narration, basic video editing, understanding of the workflows being demonstrated
A Note on Skills
Section titled “A Note on Skills”Every project lists relevant skills, but here is the honest truth: most of these skills are learnable during the project. Nobody arrives at a capstone knowing everything they need. The projects are designed so that the learning is the work.
What matters more than any specific skill:
- Curiosity — willingness to figure out how things work, not just follow instructions
- Persistence — things will break, results will be unexpected, the first attempt will not work
- Communication — the ability to explain what you did and what you found, clearly and honestly
- Collaboration — these are team projects; showing up and contributing consistently matters more than brilliance
If a project interests you but you are not sure you have the skills, that is probably the right project. The one that stretches you is the one you will learn the most from.
How Projects Connect
Section titled “How Projects Connect”These projects are not isolated. They feed into each other:
New adapters (A1-A4) | +--> need benchmarks (R1) to prove they work +--> appear on the leaderboard (I3) +--> benefit from tool use (I4) |Router upgrade (I1) | +--> uses benchmark data as training labels +--> tested by the UX study (R3) |Web interface (I2) | +--> makes UX study (R3) possible at scale +--> benefits from onboarding work (D1) |Rebuild pipeline (I5) | +--> uses base model comparison (R2) to decide what to rebuild against +--> retrains all adapters (A1-A4) automaticallyA team working on any one project benefits from and contributes to the others. This is by design — it mirrors how real software projects work.