I would like to use a rpi zero w to detect and produce audio for a voice assistant project. Basically I need the pi to detect a wake word, record audio, and play audio back to the user. I was wondering if the mic+ or the ultra could be a good choice for this. What is the range of the microphone? And can it play audio back to the user without connecting it to external speakers?
Thank you, I was wondering to not to use an internet connection for some basic voice recognition patterns, for security and reliability reasons. My overall idea is to have a dedicate subnet, segmented with a firewall, with an home assistant engine able to operate independently even if the internet connection is down, if possible.
So this the firewall may allow few basic and controlled external egresses for example for the weather forecast or some RSS feeds, system update (to be checked because once created a system like that any update must be verified before to be applied) and nothing else.
I have few experiences in the past on text to speech with Loquendo, but it seems to me a cannon to kill mosquitos!
Nice that there are other people interested in this!
Let me explain a bit more what I am trying to achieve with this project, hopefully @giangiacomo you can pick up something useful here.
Basically I am trying to setup the following: I want every room in the house to have an audio streaming/speaking satellite that I can activate with a wakeword and give commands to, and that can respond to these commands. Currently in the prototype I have a raspberry pi 4 which is connected to a playstation eye (for the mic array, audio input) and is connected to a pair of cheap speakers (the pebble, for audio output). On the pi I am running rhasspy (https://rhasspy.readthedocs.io/en/latest/) which takes care of wakeword detection, playing response wav audio, and streaming recorded wav audio on MQTT topics when a wakeword is detected.
Then on another machine (a standard laptop for now) I have running the MQTT broker, the speech to text service (also managed through rhasspy, using Kaldi) and a custom python app that listens to the MQTT topics and uses a RASA nlp bot (https://rasa.com/) to extract the intent of the speech, and to forward detected actions to a rasa action server. The action server basically calls home assistant or any other third service to execute the detected action. All of this happens offline, only local network needed.
So my question here was wether I can replace the playstation eye + speakers in this setup with something more compact like the MIC+ or ULTRA+. Even more awesome would be if I could enable
the following scenario using MIC+ or ULTRA+:
I speak the wakeword and utter the command “play Bill Evans blue in green”
The satellite sends the recorded wav over to the base station, where stt and intent recon happens, then the rasa action server pings the satellite back which does the job of starting the playback of the song on the satellite (e.g. using spotify or youtube) (still need to think how to implement this with a custom app)
So yeh, I was wondering if the MIC+ or ULTRA+ could be useful for this kind of things.
@rpino it’s perfectly something I was wondering!
I was aware about Rasa, but only for the text messaging, not for the audio, the implementation of the bridge with the speech to text capability is smart.
What is the main “init” command you are using? I’d like to have a init keyword as the “Hey Siri” or “Ok Google” for example (mine would be something similar to “Hey Jarvis” - in honour of Edwin Jarvis the Tony Stark’s AI).
Would be possible to setup this kind of specific command or is the mic in always-on active listening? Or maybe it’s a mix of both and when the right pattern is identified, then it starts?
Another thing I was wondering is the vocal inflection: the Apple assistant uses a pattern recognition on one specific voice (I guess it would be the same for Google) avoiding the reception of the command of non-authorised people.
About the security, in my mind, due the fact that I want also to add some actions (as close the windows for eg) I want to add also another security layer, probably with a biometric check (fingerprint), so that in case of specific command as ", please " it may say something as “Ok, place your finger in the reader to authorise”.
For me the music & lights and other simple actuations as “make a phone call to xxx” are ok, and maybe available from almost everyone (with the basic inflection auth sys), but the most complex ones must require a second level of check.
definitely the MIC+ or ULTRA+ could be useful for this, these hat boards are definitely cheap and useful. Even if they require at least a PI for every room, and the total cost may be quite hi, I fear. Due this I was also wondering if there’s any chances to have something less “intelligent” than a PI and definitely cheaper, to be easily places in every room which can communicate with the main server, a PI 4 with 8gb of ram could be enough, and finally this component could be the only one who can apply the “intelligence”
@giangiacomo rhasspy has a wakeword detection module called raven that allows one to specify a custom wakeword. Basically you record a few audio files where you speak the word and rhasspy will use them to train the wakeword algo.
Unfortunately it does not do voice recognition, so you can’t by default lock it to recognize only specific people. However rhasspy being open source I was indeed thinking of writing a module for this, but for now I am busy with developing an open source app to tie all this together into a homeAI software.
As for the audio recording, you can setup rhasspy so that the audio recorded by the satellite device remains on the device until a wakeword is detected, when it is then sent to the mqtt topic.
As for the additional authentication, that would indeed be a very cool feature to have. I don’t think there’s anything out of the box for that, one would have to have a fingerprint scanner and write custom code but it’s definitely possible.
One thing I want to do with the homeAI app I mentioned is add image recognition capabilities. E.g. if there’s a camera attached to the pi it could recognise the face of the command issuer. Of course the main use of this is for security cameras, so I wouldn’t have this in every room, only a selected few. For authentication and user recognition I think voice recognition from a voice database would be the best.
I am currently working on a number of different voice assistant based projects. GassistPi has a number of features for both entertainment and home automation. There are branches to the GassistPi project like smart screen, volumio built-in with assistant, etc. Though the GassistPi is based on Google Assistant SDK, I am slowly working and annexing options to take it off Google’s grid. There is also another project or rather scripts called Assistant-Pi that will help people setup both Alexa and Google Assistant on a single Pi.
I am posting the Assistants’ details here just in case someone might find it interesting or might find it to match their use case or needs.
I like your project this is great , and we would like to offer your 1 ULTRA++ and 1 MIC+ to support your project.
I hope that one day you could get read of the hegemonic Google assistant SDK, even if i recognize that it is not an easy thing to do.
no news from shivasiddharth, yes I think the Luxe is a good alternative for a satellite, it could be done now without the wakeup word. Implementing the local wakeup word is another story. It is definitely something i’m looking at for 2023.