ProjectsEdge AIRaspberry Pi

10 Hands‑On Projects to Explore the Raspberry Pi 5 AI HAT+ 2

ccodeguru

2026-01-26

11 min read

10 practical Raspberry Pi 5 AI HAT+ 2 projects to test speech, LLMs, vision, and micro‑apps—actionable steps and starter code for developers (2026).

Hook — Ship demos faster: test the Raspberry Pi 5 AI HAT+ 2 with tiny, real projects

If you’re a developer or maker who needs reliable sample builds to learn the new AI HAT+ 2, this article gives you 10 focused, hands‑on projects that prove concepts fast. You’ll avoid the usual “big research project” trap and get working demos that test speech, on‑device LLMs, local transcription, vision, micro‑apps, and privacy‑first assistants. Each entry includes components, difficulty, a step‑by‑step plan, and code or command snippets you can copy and adapt.

Why these projects matter in 2026

Late‑2025 and early‑2026 brought two important shifts for edge developers: affordable NPUs and better on‑device model runtimes, and the rise of “micro apps” — short‑lived, single‑purpose applications people create for personal productivity. The Raspberry Pi 5 paired with the AI HAT+ 2 (released in late 2025) gives hobbyists and teams a small, affordable platform to run quantized GGUF/ggml LLMs, offline speech recognition, and multimodal inference at the edge. These 10 projects were chosen to help you validate practical features quickly and build a portfolio of demos for product decisions.

Preparation: base setup and tools

Hardware: Raspberry Pi 5, AI HAT+ 2, USB microphone or HAT mic array (if not onboard), SSD or fast SD card for models, optional Pi camera.
OS: Raspberry Pi OS (64‑bit) or Ubuntu 24.04/26.04 arm64 builds optimized for Pi 5.
Drivers & firmware: install the official AI HAT+ 2 drivers and runtime from the vendor (often a Debian package or apt repo).
Runtimes: llama.cpp / GGML / GGUF runtimes for local LLMs, whisper.cpp or faster‑whisper for transcription, ONNX Runtime or Torch‑script/TVM for vision models.
Model formats: use quantized GGUF/ggml models (8/4bit) to fit on device memory and accelerate inference.

How to read each project

For each project below you’ll find: goal, components, difficulty, a concise implementation plan, minimal code or commands, and extension ideas to take the demo further.

1. Chatbot kiosk — wall‑mounted assistant for demos

Goal

Build a local, privacy‑first chatbot kiosk with voice I/O and a small touchscreen. Ideal for museum demos, retail, or a home QA station.

Components

Pi 5 + AI HAT+ 2, 7” touchscreen, USB mic or mic array, speaker
Quantized local LLM GGUF model (small/medium, e.g., 3B quantized)
llama.cpp (or similar) and a simple web frontend (Flask/React)

Difficulty

Intermediate — systems integration and UX polish.

Quick steps

Install runtime and download a quantized GGUF model to /opt/models.
Run a local LLM server using llama.cpp: ./main -m /opt/models/gguf‑model.gguf --listen --server
Create a minimal Flask app to forward messages to the server and display responses on the touchscreen.
Hook up hotword detection (Porcupine) to wake the kiosk and start recording.

# minimal flask proxy (app.py)
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)

LLM_SERVER = 'http://127.0.0.1:8000'

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    rsp = requests.post(LLM_SERVER + '/generate', json={'prompt': data['prompt']})
    return jsonify(rsp.json())

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Extensions

Add TTS with Coqui TTS or EdgeTTS for smoother voice output.
Integrate camera vision to answer questions about objects placed in front of the kiosk.

2. Offline transcription station — local speech‑to‑text for interviews

Goal

Transcribe interviews and meetings on the device without cloud uploads — great for privacy‑sensitive workflows.

Components

AI HAT+ 2, good USB/XLR mic, whisper.cpp/faster‑whisper, optional noise‑reduction model (ONNX)

Difficulty

Easy to intermediate.

Quick steps

Install whisper.cpp and build with ARM optimizations.
Download a small quantized whisper model or community tiny model.
Record audio and run offline transcription: ./main -m tiny.en.gguf -f interview.wav -otxt
Postprocess: punctuation, speaker diarization via pyannote (if CPU budget allows).

Extensions

Bundle a simple UI that lets you tag and export timestamps as SRT or JSON.
Run on‑device summarization with a small LLM after transcription.

3. Personal assistant agent — calendars, local tools, and shortcuts

Goal

Run a local agent that automates tasks like file search, calendar lookup, or executing scripts — ideal for power users.

Components

Pi 5 + AI HAT+ 2, local LLM, simple agent code (Python), OAuth tokens stored locally

Difficulty

Intermediate to advanced (security considerations).

Quick steps

Run an LLM server and build an agent loop that accepts commands, plans tasks, and executes allowed actions.
Define safe action primitives (list files, open URL, run backup). Use an allowlist for CLI commands.
Implement a prompt template for planning and verification before executing any action.

# safe_runner.py — sketch
ALLOWED = {'list_dir': 'ls -la', 'backup': '/usr/local/bin/backup.sh'}

def run(action):
    if action in ALLOWED:
        import subprocess
        return subprocess.check_output(ALLOWED[action].split())
    return b'not allowed'

Security tips

Keep tokens local and encrypted using system keystore.
Require a PIN or physical button press for sensitive actions.

4. Micro‑apps host — run tiny personalised apps locally

Goal

Host and serve “micro apps” like a mood tracker, quick recipes, or one‑off utilities — aligned with the 2026 micro‑apps movement where people build ephemeral, highly personal apps.

“A new era of app creation: micro apps let people quickly build single‑purpose tools.” — 2025 trend analysis

Components

Pi 5 + AI HAT+ 2, Docker or Podman for containerised micro apps, lightweight reverse proxy (Caddy/Nginx)

Difficulty

Easy to intermediate.

Quick steps

Install Podman/Docker and Caddy for automatic HTTPS on your LAN.
Define a micro app template: single Flask/Node service with a model call or simple JS logic.
Provide a web UI and an API to list installed micro apps and launch them locally.

# Dockerfile example (micro app)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
CMD ["python","app.py"]

Extensions

Implement an in‑browser editor and one‑click deploy for new micro apps — pair that with proven cloud patterns for simple CI and delivery.
Support model sandboxing to limit memory and CPU per app.

5. Edge AI vision + chat — ask about what the camera sees

Goal

Combine a vision model with an LLM: take a picture and ask questions (object counts, labels, or simple scene descriptions).

Components

Pi Camera, OpenCV, ONNX object detection model (YOLOv8/Edge), small LLM for Q&A

Difficulty

Intermediate.

Quick steps

Run a lightweight object detector via ONNX Runtime with CPU/NPU acceleration.
Extract detected labels, bounding boxes, and counts.
Format a prompt to the LLM describing the scene and ask follow‑ups locally.

# pseudo: build prompt for LLM
labels = ['chair','person','table']
prompt = f"I detected: {', '.join(labels)}. Answer: How many people?"

Extensions

Add person re‑identification or simple behavior rules for home automation.

6. Privacy‑first smart speaker — local wake‑word + offline TTS/STT

Goal

Build a smart speaker that never sends audio to the cloud: local wake detection, offline STT, local intent handling, and on‑device TTS.

Components

Porcupine or similar wake‑word, whisper.cpp, Coqui TTS or small quantized TTS, audio manager

Difficulty

Intermediate to advanced (audio pipeline complexity).

Quick steps

Set up wake‑word engine and continuous audio loop.
On wake, record short clip, run whisper.cpp, parse the transcription, and call intent handlers.
Respond using on‑device TTS. Consider audio ducking and priority sounds.

Extensions

Federated learning: collect anonymized intent counts and only send metadata for model improvement — pair this with secure data workflows and team processes from operational playbooks like operationalizing secure collaboration.

7. Local LLM playground — benchmark quantized models

Goal

Compare memory, latency, and quality for different quantized GGUF models and runtimes on the AI HAT+ 2.

Components

Multiple GGUF models, llama.cpp, bench scripts, monitoring (htop, perf)

Difficulty

Easy to intermediate.

Quick steps

Download models (1B, 3B, 7B quantized variants).
Run repeatable prompts and measure time to token, memory usage, and output quality.
Document tradeoffs — use this to choose a model size for production edge services.

Commands

# bench script (bash)
for m in /opt/models/*.gguf; do
  echo "Testing $m"
  /opt/llama/main -m "$m" -p "Explain recursion in 50 words" -n 128 --timings
done

8. Classroom AI tutor — small K‑12 lesson helper

Goal

Provide local exam practice, short quizzes, or code exercises in classrooms with limited internet access.

Components

Raspberry Pi 5, AI HAT+ 2, LLM model tuned or prompt‑engineered, web UI for students

Difficulty

Easy to intermediate (pedagogical design required).

Quick steps

Set up a simple web app that serves exercises and collects answers.
Use prompt templates for question generation and grading heuristics.
Run everything locally on the Pi and allow teacher overrides.

Extensions

Support multi‑seat labs using local network discovery and load balancing across multiple Pi units.

9. Smart home voice rules — conditional automation without cloud

Goal

Use natural language rules to control devices: “If no movement after 10 PM, set thermostat to eco.”

Components

Home automation hub (Home Assistant), Pi with AI HAT+ 2 for NLP parsing, MQTT or local API hooks

Difficulty

Intermediate.

Quick steps

Capture voice commands and transcribe locally.
Use LLM to parse action and convert to structured rule (JSON) for Home Assistant.
Store rules locally and add an audit UI for reviewing and approving rules before activation.

10. Mini MLOps flow — CI for edge models and OTA updates

Goal

Establish a small, repeatable pipeline for deploying updated quantized models to your fleet of Pi 5 devices running AI HAT+ 2.

Components

Git repo for model config, build scripts for quantization, an artifact server (S3/local), and a lightweight updater on the Pi

Difficulty

Advanced but high value for production projects.

Quick steps

Automate quantization in CI: input model -> quantize -> test accuracy/latency -> publish artifact.
Pi runs a signed update check and downloads new models only when passing checksum and signature verification.
Provide rollback and staging channels for safe rollout — use established cloud patterns for safe distribution and rollout.

Security considerations

Sign model artifacts and use TLS for transfer. Keep private keys off device.
Limit who can trigger deployments and record everything in an audit log.

Practical tips for success

Start small: one model and one I/O modality — e.g., speech only — then expand.
Quantize aggressively: 4/8‑bit GGUF models dramatically reduce memory and make on‑device LLMs practical.
Monitor resource usage: use top, htop, and runtime logging; NPUs offload but watch fallback to CPU.
Prioritize privacy: store audio and tokens locally; encrypt sensitive data at rest.
Design UX for latency: prefetch and cache prompts or partial outputs to mask cold‑start delays.
Use hybrid architectures: fall back to cloud only when necessary (and explicit), keeping day‑to‑day inference local.

2026 trends & future directions

In 2026 you'll see more off‑the‑shelf quantization toolchains and broader support for GGUF/ggml formats across runtimes. Expect the HAT ecosystem to standardize APIs for NPU offload, making porting models easier. The micro‑apps movement — people building quick personal apps — will continue to grow: edge devices running small private LLMs will power a wave of personal automation and domain‑specific assistants. Finally, federated and privacy‑preserving training will be viable for hobbyist fleets, enabling continuous improvement without centralizing raw data.

Actionable starter checklist (15–30 minutes)

Flash a 64‑bit OS and apply vendor HAT drivers.
Install llama.cpp and whisper.cpp (or your preferred runtimes).
Download a small quantized LLM and a tiny whisper model into /opt/models.
Run a simple prompt to confirm inference works: ./llama/main -m /opt/models/small.gguf -p "Hello"
Wire a microphone and test local transcription: ./whisper/main -m /opt/models/tiny.gguf -f test.wav

Final takeaways

The Raspberry Pi 5 + AI HAT+ 2 combination turns a hobby board into a capable edge AI workstation. The projects above are intentionally practical: you’ll get working demos in hours (not months) and iterate toward production patterns like OTA model updates and local privacy controls. Focus on one modality, pick a quantized model, and ship a demo. That concrete experience is the best way to evaluate on‑device tradeoffs and inform architecture decisions.

Call to action

Pick one project above and get started today — then share your build and metrics. If you want a tailored starter repo (scripts, Dockerfiles, and prompt templates) for any of these demos, request a kit in the comments or download our free Pi 5 AI HAT+ 2 starter bundle. Ship a demo, learn the limits, and iterate — edge AI is practical now, and these mini projects are the fastest route from concept to working proof.

codeguru

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

Bot Technology in Agriculture: What Developers Need to Know

security•9 min read

Security Audit Checklist for Link Shortening Services — 2026 Edition

edge•9 min read

Future‑Proofing Your Pages in 2026: Headless, Edge, and Personalization Strategies

From Our Network

Trending stories across our publication group

Teardown: Pixel 9's Hardware Clues to AirDrop-Like Features — Antennas, SoC, and Coexistence

circuits.pro

Teardown•10 min read

Teardown: Pixel 9's Hardware Clues to AirDrop-Like Features — Antennas, SoC, and Coexistence

Data Engineering Interview Prep: How to Show You Know ClickHouse

codeacademy.site

interview prep•10 min read

Data Engineering Interview Prep: How to Show You Know ClickHouse

WCET in CI: integrate RocqStat timing analysis into your embedded CI pipeline

codenscripts.com

embedded•10 min read

WCET in CI: integrate RocqStat timing analysis into your embedded CI pipeline

2026-02-04T09:31:57.310Z

Hook — Ship demos faster: test the Raspberry Pi 5 AI HAT+ 2 with tiny, real projects

Why these projects matter in 2026

Preparation: base setup and tools

How to read each project

1. Chatbot kiosk — wall‑mounted assistant for demos

Goal

Components

Difficulty

Quick steps

Extensions

2. Offline transcription station — local speech‑to‑text for interviews

Goal

Components

Difficulty

Quick steps

Extensions

3. Personal assistant agent — calendars, local tools, and shortcuts

Goal

Components

Difficulty

Quick steps

Security tips

4. Micro‑apps host — run tiny personalised apps locally

Goal

Components

Difficulty

Quick steps

Extensions

5. Edge AI vision + chat — ask about what the camera sees

Goal

Components

Difficulty

Quick steps

Extensions

6. Privacy‑first smart speaker — local wake‑word + offline TTS/STT

Goal

Components

Difficulty

Quick steps

Extensions

7. Local LLM playground — benchmark quantized models

Goal

Components

Difficulty

Quick steps

Commands

8. Classroom AI tutor — small K‑12 lesson helper

Goal

Components

Difficulty

Quick steps

Extensions

9. Smart home voice rules — conditional automation without cloud

Goal

Components

Difficulty

Quick steps

10. Mini MLOps flow — CI for edge models and OTA updates

Goal

Components

Difficulty

Quick steps

Security considerations

Practical tips for success

2026 trends & future directions

Actionable starter checklist (15–30 minutes)

Final takeaways

Call to action

Related Reading

Related Topics

codeguru

Up Next

Bot Technology in Agriculture: What Developers Need to Know

Security Audit Checklist for Link Shortening Services — 2026 Edition

Future‑Proofing Your Pages in 2026: Headless, Edge, and Personalization Strategies

From Our Network

Teardown: Pixel 9's Hardware Clues to AirDrop-Like Features — Antennas, SoC, and Coexistence

Data Engineering Interview Prep: How to Show You Know ClickHouse

WCET in CI: integrate RocqStat timing analysis into your embedded CI pipeline