Show HN: Open-Source Voice AI Badge Powered by ESP32+WebRTC
github.comhi!
video[0]
The idea is you could carry around this hardware and ask it any questions about the conference. Who is speaking, what are they speaking about etc... it connects via WebRTC to a LLM and you get a bunch of info.
This is a workshop/demo project I did for a conference. When I was talking to the organizers I mentioned that I enjoy doing hardware + WebRTC projects. They thought that was cool and so we ran with it.
I have been doing these ESP32 + voice ai projects for a bit now. Started with an embedded sdk for livekit[1] that jul 2024 and been noodling with it since then. This code then found its way into pipecat/livekit etc...
So I hope it inspires you to go build with hardware and webrtc. It's a REALLY fun space right now. Lots of different cheap microcontrollers and even more cool projects.
To make it open source in the fullest sense one needs to document what youve done. This esp repo could use some details on what protocols the hardware speaks, sequence diagrams, auth, etc. I doubt you running webrtc on esp
WebRTC is running on the esp32! Library is libpeer[0]
[0] https://github.com/sepfy/libpeer
Hello,
Maybe possible to propose as challenge an ESP32 project to play music according to what is stored on the memory card (completely offline)?
There are already voice recognition happening offline, but there isn't yet something that can find relevant music and play it offline.
There are a great many.
https://www.google.com/search?q=esp32+mp3+player&oq=esp32+mp...
Interesting point about the ESP32 and music playback! I've been tinkering with similar projects, and it’s wild how much potential these little devices have. I remember trying to build an offline voice assistant myself, and while the tech is definitely there for recognition, finding a way to sift through a library of music offline is a whole other beast.
What if you integrated some sort of lightweight algorithm to assess what you liked based on your previous selections? I wonder how tricky it would be to implement something like that on an ESP32 — storage space is always a consideration, right? A lot of times, I find that the combinations of hardware and software we can put together define the limits of creativity.
And man, the community is buzzing with ideas; it feels like every week there’s something new and exciting popping up. I can't help but imagine what's next! Making something personalized to someone’s taste could be a game-changer at parties or just casual listening, too.
I think you would need the ESP32 to connect to another host. Doing Speech-to-Text, LLM, Text-to-speech is pretty intensive. Even if you connect to a Raspberry Pi.
But totally possible! It's a great idea and would love too help you build it :)
Wire some Open Source together and just start with a small collection of ogg files.
There was one a startup called Snips https://snips.ai/ which made an open-source voice recognition engine running on an RPi.