mshubham 2 days ago

A few details for anyone curious: • MacBook Pro M3 (16GB) • STT: mlx-community/whisper-small.en-mlx-q4 • LLM: mlx-community/LFM2-1.2B-4bit • TTS: hexgrad/Kokoro-82M • Backend: FastAPI + WebSocket streaming • Interruption: VAD with configurable “quiet probability”

Current avg latency: ~850 ms end-to-end (speech → LLM → speech).

Goal: keep it fast, under ~1K LOC, and clean so anyone can swap models or adapt it to their use case.

Feedback welcome on model choices, latency wins, and better UX for barge-in/turn-taking.