Show HN: Talk to your Mac offline – sub-second Voice AI (Apple Silicon and MLX)
github.comI wanted a voice assistant that feels realtime but runs completely offline. This prototype uses MLX + FastAPI on Apple Silicon to hit sub-second latency for speech-to-speech conversations.
Repo: https://github.com/shubhdotai/offline-voice-ai
It’s fast, minimal, and hackable — would love feedback on latency tricks, model swaps, or use-cases you’d like to see next.
A few details for anyone curious: • MacBook Pro M3 (16GB) • STT: mlx-community/whisper-small.en-mlx-q4 • LLM: mlx-community/LFM2-1.2B-4bit • TTS: hexgrad/Kokoro-82M • Backend: FastAPI + WebSocket streaming • Interruption: VAD with configurable “quiet probability”
Current avg latency: ~850 ms end-to-end (speech → LLM → speech).
Goal: keep it fast, under ~1K LOC, and clean so anyone can swap models or adapt it to their use case.
Feedback welcome on model choices, latency wins, and better UX for barge-in/turn-taking.