10 Hands‑On Projects to Explore the Raspberry Pi 5 AI HAT+ 2
10 practical Raspberry Pi 5 AI HAT+ 2 projects to test speech, LLMs, vision, and micro‑apps—actionable steps and starter code for developers (2026).
Hook — Ship demos faster: test the Raspberry Pi 5 AI HAT+ 2 with tiny, real projects
If you’re a developer or maker who needs reliable sample builds to learn the new AI HAT+ 2, this article gives you 10 focused, hands‑on projects that prove concepts fast. You’ll avoid the usual “big research project” trap and get working demos that test speech, on‑device LLMs, local transcription, vision, micro‑apps, and privacy‑first assistants. Each entry includes components, difficulty, a step‑by‑step plan, and code or command snippets you can copy and adapt.
Why these projects matter in 2026
Late‑2025 and early‑2026 brought two important shifts for edge developers: affordable NPUs and better on‑device model runtimes, and the rise of “micro apps” — short‑lived, single‑purpose applications people create for personal productivity. The Raspberry Pi 5 paired with the AI HAT+ 2 (released in late 2025) gives hobbyists and teams a small, affordable platform to run quantized GGUF/ggml LLMs, offline speech recognition, and multimodal inference at the edge. These 10 projects were chosen to help you validate practical features quickly and build a portfolio of demos for product decisions.
Preparation: base setup and tools
- Hardware: Raspberry Pi 5, AI HAT+ 2, USB microphone or HAT mic array (if not onboard), SSD or fast SD card for models, optional Pi camera.
- OS: Raspberry Pi OS (64‑bit) or Ubuntu 24.04/26.04 arm64 builds optimized for Pi 5.
- Drivers & firmware: install the official AI HAT+ 2 drivers and runtime from the vendor (often a Debian package or apt repo).
- Runtimes: llama.cpp / GGML / GGUF runtimes for local LLMs, whisper.cpp or faster‑whisper for transcription, ONNX Runtime or Torch‑script/TVM for vision models.
- Model formats: use quantized GGUF/ggml models (8/4bit) to fit on device memory and accelerate inference.
How to read each project
For each project below you’ll find: goal, components, difficulty, a concise implementation plan, minimal code or commands, and extension ideas to take the demo further.
1. Chatbot kiosk — wall‑mounted assistant for demos
Goal
Build a local, privacy‑first chatbot kiosk with voice I/O and a small touchscreen. Ideal for museum demos, retail, or a home QA station.
Components
- Pi 5 + AI HAT+ 2, 7” touchscreen, USB mic or mic array, speaker
- Quantized local LLM GGUF model (small/medium, e.g., 3B quantized)
- llama.cpp (or similar) and a simple web frontend (Flask/React)
Difficulty
Intermediate — systems integration and UX polish.
Quick steps
- Install runtime and download a quantized GGUF model to /opt/models.
- Run a local LLM server using llama.cpp:
./main -m /opt/models/gguf‑model.gguf --listen --server - Create a minimal Flask app to forward messages to the server and display responses on the touchscreen.
- Hook up hotword detection (Porcupine) to wake the kiosk and start recording.
# minimal flask proxy (app.py)
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)
LLM_SERVER = 'http://127.0.0.1:8000'
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
rsp = requests.post(LLM_SERVER + '/generate', json={'prompt': data['prompt']})
return jsonify(rsp.json())
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
Extensions
- Add TTS with Coqui TTS or EdgeTTS for smoother voice output.
- Integrate camera vision to answer questions about objects placed in front of the kiosk.
2. Offline transcription station — local speech‑to‑text for interviews
Goal
Transcribe interviews and meetings on the device without cloud uploads — great for privacy‑sensitive workflows.
Components
- AI HAT+ 2, good USB/XLR mic, whisper.cpp/faster‑whisper, optional noise‑reduction model (ONNX)
Difficulty
Easy to intermediate.
Quick steps
- Install whisper.cpp and build with ARM optimizations.
- Download a small quantized whisper model or community tiny model.
- Record audio and run offline transcription:
./main -m tiny.en.gguf -f interview.wav -otxt - Postprocess: punctuation, speaker diarization via pyannote (if CPU budget allows).
Extensions
- Bundle a simple UI that lets you tag and export timestamps as SRT or JSON.
- Run on‑device summarization with a small LLM after transcription.
3. Personal assistant agent — calendars, local tools, and shortcuts
Goal
Run a local agent that automates tasks like file search, calendar lookup, or executing scripts — ideal for power users.
Components
- Pi 5 + AI HAT+ 2, local LLM, simple agent code (Python), OAuth tokens stored locally
Difficulty
Intermediate to advanced (security considerations).
Quick steps
- Run an LLM server and build an agent loop that accepts commands, plans tasks, and executes allowed actions.
- Define safe action primitives (list files, open URL, run backup). Use an allowlist for CLI commands.
- Implement a prompt template for planning and verification before executing any action.
# safe_runner.py — sketch
ALLOWED = {'list_dir': 'ls -la', 'backup': '/usr/local/bin/backup.sh'}
def run(action):
if action in ALLOWED:
import subprocess
return subprocess.check_output(ALLOWED[action].split())
return b'not allowed'
Security tips
- Keep tokens local and encrypted using system keystore.
- Require a PIN or physical button press for sensitive actions.
4. Micro‑apps host — run tiny personalised apps locally
Goal
Host and serve “micro apps” like a mood tracker, quick recipes, or one‑off utilities — aligned with the 2026 micro‑apps movement where people build ephemeral, highly personal apps.
“A new era of app creation: micro apps let people quickly build single‑purpose tools.” — 2025 trend analysis
Components
- Pi 5 + AI HAT+ 2, Docker or Podman for containerised micro apps, lightweight reverse proxy (Caddy/Nginx)
Difficulty
Easy to intermediate.
Quick steps
- Install Podman/Docker and Caddy for automatic HTTPS on your LAN.
- Define a micro app template: single Flask/Node service with a model call or simple JS logic.
- Provide a web UI and an API to list installed micro apps and launch them locally.
# Dockerfile example (micro app)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
CMD ["python","app.py"]
Extensions
- Implement an in‑browser editor and one‑click deploy for new micro apps — pair that with proven cloud patterns for simple CI and delivery.
- Support model sandboxing to limit memory and CPU per app.
5. Edge AI vision + chat — ask about what the camera sees
Goal
Combine a vision model with an LLM: take a picture and ask questions (object counts, labels, or simple scene descriptions).
Components
- Pi Camera, OpenCV, ONNX object detection model (YOLOv8/Edge), small LLM for Q&A
Difficulty
Intermediate.
Quick steps
- Run a lightweight object detector via ONNX Runtime with CPU/NPU acceleration.
- Extract detected labels, bounding boxes, and counts.
- Format a prompt to the LLM describing the scene and ask follow‑ups locally.
# pseudo: build prompt for LLM
labels = ['chair','person','table']
prompt = f"I detected: {', '.join(labels)}. Answer: How many people?"
Extensions
- Add person re‑identification or simple behavior rules for home automation.
6. Privacy‑first smart speaker — local wake‑word + offline TTS/STT
Goal
Build a smart speaker that never sends audio to the cloud: local wake detection, offline STT, local intent handling, and on‑device TTS.
Components
- Porcupine or similar wake‑word, whisper.cpp, Coqui TTS or small quantized TTS, audio manager
Difficulty
Intermediate to advanced (audio pipeline complexity).
Quick steps
- Set up wake‑word engine and continuous audio loop.
- On wake, record short clip, run whisper.cpp, parse the transcription, and call intent handlers.
- Respond using on‑device TTS. Consider audio ducking and priority sounds.
Extensions
- Federated learning: collect anonymized intent counts and only send metadata for model improvement — pair this with secure data workflows and team processes from operational playbooks like operationalizing secure collaboration.
7. Local LLM playground — benchmark quantized models
Goal
Compare memory, latency, and quality for different quantized GGUF models and runtimes on the AI HAT+ 2.
Components
- Multiple GGUF models, llama.cpp, bench scripts, monitoring (htop, perf)
Difficulty
Easy to intermediate.
Quick steps
- Download models (1B, 3B, 7B quantized variants).
- Run repeatable prompts and measure time to token, memory usage, and output quality.
- Document tradeoffs — use this to choose a model size for production edge services.
Commands
# bench script (bash)
for m in /opt/models/*.gguf; do
echo "Testing $m"
/opt/llama/main -m "$m" -p "Explain recursion in 50 words" -n 128 --timings
done
8. Classroom AI tutor — small K‑12 lesson helper
Goal
Provide local exam practice, short quizzes, or code exercises in classrooms with limited internet access.
Components
- Raspberry Pi 5, AI HAT+ 2, LLM model tuned or prompt‑engineered, web UI for students
Difficulty
Easy to intermediate (pedagogical design required).
Quick steps
- Set up a simple web app that serves exercises and collects answers.
- Use prompt templates for question generation and grading heuristics.
- Run everything locally on the Pi and allow teacher overrides.
Extensions
- Support multi‑seat labs using local network discovery and load balancing across multiple Pi units.
9. Smart home voice rules — conditional automation without cloud
Goal
Use natural language rules to control devices: “If no movement after 10 PM, set thermostat to eco.”
Components
- Home automation hub (Home Assistant), Pi with AI HAT+ 2 for NLP parsing, MQTT or local API hooks
Difficulty
Intermediate.
Quick steps
- Capture voice commands and transcribe locally.
- Use LLM to parse action and convert to structured rule (JSON) for Home Assistant.
- Store rules locally and add an audit UI for reviewing and approving rules before activation.
10. Mini MLOps flow — CI for edge models and OTA updates
Goal
Establish a small, repeatable pipeline for deploying updated quantized models to your fleet of Pi 5 devices running AI HAT+ 2.
Components
- Git repo for model config, build scripts for quantization, an artifact server (S3/local), and a lightweight updater on the Pi
Difficulty
Advanced but high value for production projects.
Quick steps
- Automate quantization in CI: input model -> quantize -> test accuracy/latency -> publish artifact.
- Pi runs a signed update check and downloads new models only when passing checksum and signature verification.
- Provide rollback and staging channels for safe rollout — use established cloud patterns for safe distribution and rollout.
Security considerations
- Sign model artifacts and use TLS for transfer. Keep private keys off device.
- Limit who can trigger deployments and record everything in an audit log.
Practical tips for success
- Start small: one model and one I/O modality — e.g., speech only — then expand.
- Quantize aggressively: 4/8‑bit GGUF models dramatically reduce memory and make on‑device LLMs practical.
- Monitor resource usage: use top, htop, and runtime logging; NPUs offload but watch fallback to CPU.
- Prioritize privacy: store audio and tokens locally; encrypt sensitive data at rest.
- Design UX for latency: prefetch and cache prompts or partial outputs to mask cold‑start delays.
- Use hybrid architectures: fall back to cloud only when necessary (and explicit), keeping day‑to‑day inference local.
2026 trends & future directions
In 2026 you'll see more off‑the‑shelf quantization toolchains and broader support for GGUF/ggml formats across runtimes. Expect the HAT ecosystem to standardize APIs for NPU offload, making porting models easier. The micro‑apps movement — people building quick personal apps — will continue to grow: edge devices running small private LLMs will power a wave of personal automation and domain‑specific assistants. Finally, federated and privacy‑preserving training will be viable for hobbyist fleets, enabling continuous improvement without centralizing raw data.
Actionable starter checklist (15–30 minutes)
- Flash a 64‑bit OS and apply vendor HAT drivers.
- Install llama.cpp and whisper.cpp (or your preferred runtimes).
- Download a small quantized LLM and a tiny whisper model into /opt/models.
- Run a simple prompt to confirm inference works:
./llama/main -m /opt/models/small.gguf -p "Hello" - Wire a microphone and test local transcription:
./whisper/main -m /opt/models/tiny.gguf -f test.wav
Final takeaways
The Raspberry Pi 5 + AI HAT+ 2 combination turns a hobby board into a capable edge AI workstation. The projects above are intentionally practical: you’ll get working demos in hours (not months) and iterate toward production patterns like OTA model updates and local privacy controls. Focus on one modality, pick a quantized model, and ship a demo. That concrete experience is the best way to evaluate on‑device tradeoffs and inform architecture decisions.
Call to action
Pick one project above and get started today — then share your build and metrics. If you want a tailored starter repo (scripts, Dockerfiles, and prompt templates) for any of these demos, request a kit in the comments or download our free Pi 5 AI HAT+ 2 starter bundle. Ship a demo, learn the limits, and iterate — edge AI is practical now, and these mini projects are the fastest route from concept to working proof.
Related Reading
- Evolving Edge Hosting in 2026: Advanced Strategies for Portable Cloud Platforms and Developer Experience
- Pop‑Up to Persistent: Cloud Patterns, On‑Demand Printing and Seller Workflows for 2026
- Operationalizing Secure Collaboration and Data Workflows in 2026
- The Creator Synopsis Playbook 2026: AI Orchestration, Micro-Formats, and Distribution Signals
- Wage Lawsuits and Microcap Healthcare Stocks: Lessons from the Wisconsin Back-Wages Ruling
- Character Evolution on Screen: How Rehab Storylines Change Medical Drama Tropes
- Star Wars-Themed Workouts: Build Fandom-Fueled Training Programs That Stick
- Pack Like a Pro: Travel Bag Essentials for Taking Your Dog on a Weekend Trip
- What Legal Newsletters Teach Creators About Trust and Frequency (Lessons from SCOTUSblog)
Related Topics
codeguru
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
From Our Network
Trending stories across our publication group