Muse Luxe voice assistant now possible?

Raspiaudio · May 8, 2023, 3:30pm

On the right what you refer as ‘ESP32’ is Muse Proto (yellow board), correct?
Could you it be an issue with the Whisper model that is used and the processing power of the Raspberry pi 4?

I have attached an example of the sound recorded on a Muse Proto I think that the quality is quite good here is a recording with an arduino app we wrote so it sets your expectations on the quality, talking at 30cm from the mic.

rasper · May 8, 2023, 4:01pm

The Muse Proto sounds really good! So it seems the Luxe is significantly worse? I guess then I will rather use the Proto for voice assistant purposes.

Raspiaudio · May 8, 2023, 4:06pm

i will also make a recording on the Luxe, but I think the quality is comparable to the Proto from what I remember. This is why I think it is more related to some settings tweaking. We have somebody of the team working on that subject this week, I will post result here.

Deco · May 8, 2023, 4:19pm

no i mean a ESP32 D1 mini + INMP441 + MAX98357 (and 11 wires)
and yes it could very likely be a issue with whisper, there is lots of them (like not giving audio reply), i d believe it could be a matter of raising the gain, like i did with the ESP/INMP441. only did the 11db gain and have not tried on the muses.

Deco · May 8, 2023, 4:22pm

it’s the microphone we discuss yes the sound is great on both luxe and proto, but for voice assistant to work it needs a good mic (gain), and even then whisper is horrible at “decoding” that to words, but it’s a work in progress, the year has many months left :))

MarkTheMagnificent · May 8, 2023, 5:12pm

Deco, would you please share your ESPhome config. It seems you figured out the audio part too.

Raspiaudio · May 8, 2023, 6:22pm

Thanks I have installed this yalm on the luxe without issue I can play TTS, but not sure how to record when pressing the play button it seems the be in “Assist in progress” but where should I see the speech to text result?

thanks

Deco · May 8, 2023, 7:15pm

i figured it out for the luxe, and for esp32, not proto, i just tried the same yalm from luxe on the proto and not sure what is different from luxe to proto, but lux finds a I2C device, proto does not:

Luxe:
[19:33:22][I][i2c.arduino:069]: Results from i2c bus scan:
[19:33:22][I][i2c.arduino:075]: Found i2c device at address 0x10

Proto:
[19:32:03][I][i2c.arduino:069]: Results from i2c bus scan:
[19:32:03][I][i2c.arduino:071]: Found no i2c devices!

what are you using? proto or luxe?, and sure i can send you the yaml i use in my Luxe if thats what you need :), or for ESP32+external I2S mic.

Deco · May 8, 2023, 7:19pm

in settings, voice assistant, and the assistant you use (default), top right corner, debug. there yuo can follow as it goes along the pipeline.

i can play tts too, but i can’t get voice assistant to do it when replying, i can see in debug under raw that the result is created ready to be played bu nothing happens

MarkTheMagnificent · May 8, 2023, 9:11pm

In the meantime I used the yaml @DTTerastar posted earlier and it kind of works.
I had to change the GPIO12 parameter to be pulled up for the play button to work and changed the i2s_audio modul parameters to GPIOxx instead of just a number. I have a Luxe and it seems to work. So it sends something when I press the play button. Only problem it fails on STT with
[E][voice_assistant:145]: Error: stt-stream-failed - Speech to text failed

I tried to use the built in HA cloud based speech to text service which works on mobile phones and on the web interface.

In the logs I also saw this:

Voice error: Error processing en-AU speech: 400 No audio data received
Voice error: Error processing en-AU speech: 400 Invalid HTTP request.

So there is a chance that nothing is sent by the device or there is a network error somewhere.

Also would be great to understand how ESPHome sends the data to HA. I use a different subnet for Wifi clients and I hope it uses TCP or UDP not some non routable protocols. (I have found no documentation on this)

Deco · May 9, 2023, 3:38am

it looks like this when it works:

[05:35:51][D][binary_sensor:036]: ‘Assist Button’: Sending state ON
[05:35:51][D][voice_assistant:065]: Requesting start…
[05:35:51][D][voice_assistant:045]: Starting…
[05:35:51][D][voice_assistant:083]: Assist Pipeline running
[05:35:53][D][binary_sensor:036]: ‘Assist Button’: Sending state OFF
[05:35:53][D][voice_assistant:073]: Signaling stop…
[05:35:55][D][voice_assistant:097]: Speech recognised as: " How many lights are on?"
[05:35:55][D][voice_assistant:112]: Response: “2”
[05:35:55][D][voice_assistant:127]: Response URL: “http://10.66.66.4:8123/api/tts_proxy/da4b9237bacccdf19c0760cab7aec4a8359010b0_en-us_f7a2402831_tts.piper.raw”
[05:35:55][D][media_player:059]: ‘Notifier’ - Setting
[05:35:55][D][media_player:066]: Media URL: http://10.66.66.4:8123/api/tts_proxy/da4b9237bacccdf19c0760cab7aec4a8359010b0_en-us_f7a2402831_tts.piper.raw
[05:35:55][D][voice_assistant:132]: Assist Pipeline ended

and in debug you will see it dooes return a media source to be played.

Deco · May 9, 2023, 3:44am

and here is my working configuration for Muse Luxe:

esphome:
name: raspiaudio-muse-luxe-0096f0
friendly_name: RaspiAudio Muse Luxe

esp32:
board: esp-wrover-kit
framework:
type: arduino

external_components:

source: github://pr#3552
components: [es8388]
refresh: 0s
source: github://pr#4775
components: [adc, i2s_audio, microphone]
refresh: 0s

i2c:
sda: GPIO18
scl: GPIO23

es8388:

logger:

api:
encryption:
key: vuw4DCRhZUaPpQbhnbsDT25d1oS2qmikefWHlplU8Xc=

ota:

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

captive_portal:

i2s_audio:

i2s_lrclk_pin: 25
i2s_bclk_pin: 5

media_player:

platform: i2s_audio
name: ‘’
dac_type: external
id: speaker_i2s
i2s_dout_pin: 26
mode: mono

microphone:

platform: i2s_audio
adc_type: external
pdm: false
id: mic_i2s
i2s_din_pin: 35

voice_assistant:
microphone: mic_i2s
on_start:
then:
- light.turn_on:
id: led
blue: 1.0
red: 0.0
green: 0.0
state: true
on_tts_start:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
on_tts_end:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
- media_player.play_media:
media_url: !lambda |-
return x;
on_end:
then:
- delay: 1s
- if:
condition:
media_player.is_playing: {}
then:
- wait_until:
condition:
media_player.is_idle: {}
- light.turn_off:
id: led
state: false
else:
- light.turn_off:
id: led
state: false
on_error:
then:
- light.turn_on:
id: led
blue: 0.0
red: 1.0
green: 0.0
state: true
- delay: 1s
- light.turn_off:
id: led
state: false

binary_sensor:

platform: gpio
pin:
number: GPIO12
inverted: true
mode:
input: true
output: false
open_drain: false
pullup: true
pulldown: false
drive_strength: 20.0
name: Button
on_press:
- then:
  - voice_assistant.start: {}
    on_release:
- then:
  - voice_assistant.stop: {}
platform: gpio
pin:
number: GPIO19
inverted: true
mode:
input: true
pullup: true
name: ${friendly_name} Volume Up
on_click:
- media_player.volume_up:
platform: gpio
pin:
number: GPIO32
inverted: true
mode:
input: true
pullup: true
name: ${friendly_name} Volume Down
on_click:
- media_player.volume_down:

light:

platform: fastled_clockless
id: led
name: ‘’
disabled_by_default: true
entity_category: config
pin: 22
default_transition_length: 0s
chipset: SK6812
num_leds: 1
rgb_order: GRB
restore_mode: ALWAYS_OFF
gamma_correct: 2.8
flash_transition_length: 0s

sensor:

platform: adc
pin: GPIO33
name: ${name} Battery
icon: “mdi:battery-outline”
update_interval: 15s
accuracy_decimals: 3
attenuation: 11db
raw: true
filters:
- multiply: 0.00173913 # 2300 → 4, for attenuation 11db, based on Olivier’s code
- exponential_moving_average:
  alpha: 0.2
  send_every: 2
- delta: 0.002

dapsaille · May 9, 2023, 6:29am

I get it working by activatting SSL and https on my HA instance.

Try to access the return response in your browser, if it fails, activate https.

Raspiaudio · May 9, 2023, 6:57am

For the Muse Proto (yellow board) here is the untested Yalm :
download here

esphome:
name: smart-speaker
friendly_name: Smart SpeakerP
name_add_mac_suffix: false
min_version: 2023.4.4
on_boot:
then:
- output.turn_on: pw
- output.turn_off: gain
esp32:
board: esp-wrover-kit
framework:
version: 2.0.5
source: ~3.20005.0
platform_version: platformio/espressif32 @ 5.3.0
type: arduino
variant: ESP32

external_components:

source: github://pr#4775
components: [adc, i2s_audio, microphone]
refresh: 0s

output:

platform: gpio
pin:
number: 21
mode: OUTPUT
id: pw
platform: gpio
pin:
number: 23
mode: OUTPUT
id: gain

logger:
baud_rate: 115200
tx_buffer_size: 512
deassert_rts_dtr: false
hardware_uart: UART0
level: DEBUG
logs: {}

api:
port: 6053
password: ‘’
reboot_timeout: 15min

ota:
safe_mode: true
port: 3232
reboot_timeout: 5min
num_attempts: 10

wifi:
ap:
password: ${wifi_ap_password}
ap_timeout: 1min
domain: .z13.org
reboot_timeout: 15min
power_save_mode: LIGHT
fast_connect: false
networks:

ssid: wifilr
password: casanice
priority: 0.0

captive_portal: {}
improv_serial: {}

i2s_audio:

i2s_lrclk_pin: 25
i2s_bclk_pin: 5

microphone:

platform: i2s_audio
id: echo_microphone
i2s_din_pin: 35
adc_type: external
pdm: false

media_player:

platform: i2s_audio
name: ‘’
id: echo_audio
i2s_dout_pin: 26
mode: mono
disabled_by_default: false
dac_type: external

voice_assistant:
microphone: echo_microphone
on_start:
then:
- light.turn_on:
id: led
blue: 1.0
red: 0.0
green: 0.0
state: true

on_tts_start:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
on_tts_end:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
- media_player.play_media:
media_url: !lambda |-
return x;
on_end:
then:
- delay: 1s
- if:
condition:
media_player.is_playing: {}
then:
- wait_until:
condition:
media_player.is_idle: {}
- light.turn_off:
id: led
state: false
else:
- light.turn_off:
id: led
state: false
on_error:
then:
- light.turn_on:
id: led
blue: 0.0
red: 1.0
green: 0.0
state: true
- delay: 1s
- light.turn_off:
id: led
state: false

binary_sensor:

platform: gpio
pin:
number: GPIO12
inverted: true
mode:
input: true
output: false
open_drain: false
pullup: false
pulldown: false
drive_strength: 20.0
name: Button
disabled_by_default: true
entity_category: diagnostic
id: echo_button
on_press:
- then:
  - voice_assistant.start: {}
    on_release:
- then:
  - voice_assistant.stop: {}
platform: gpio
pin:
number: GPIO19
inverted: true
mode:
input: true
pullup: true
name: ${friendly_name} Volume Up
on_click:
- media_player.volume_up:
platform: gpio
pin:
number: GPIO32
inverted: true
mode:
input: true
pullup: true
name: ${friendly_name} Volume Down
on_click:
- media_player.volume_down:

light:

platform: fastled_clockless
id: led
name: ‘’
disabled_by_default: true
entity_category: config
pin: 22
default_transition_length: 0s
chipset: SK6812
num_leds: 1
rgb_order: GRB
restore_mode: ALWAYS_OFF
gamma_correct: 2.8
flash_transition_length: 0s

sensor:

platform: adc
pin: GPIO33
name: ${name} Battery
icon: “mdi:battery-outline”
update_interval: 15s
accuracy_decimals: 3
attenuation: 11db
raw: true
filters:
- multiply: 0.00173913 # 2300 → 4, for attenuation 11db, based on Olivier’s code
- exponential_moving_average:
  alpha: 0.2
  send_every: 2
- delta: 0.002

Deco · May 9, 2023, 11:38am

thats the web interface, we don’t need that when using voice assistant from other devices like the Muse, only from browsers due to their security.

Deco · May 9, 2023, 11:57am

didn’t work for me, still no output from micrphone resulting in a “voice_assistant:145]: Error: stt-no-text-recognized - No text recognized” error. (and led turns red)

i think the problem is in reading the microphone? is it the same microphone in both luxe and muse?

btw. shouldn’t it be gpio0 for the button when it’s the proto

Deco · May 9, 2023, 12:09pm

here is the simplified and working code for ESP32 (WRoom Dev. Board) + INMP441 + MAX98357:

esphome:
name: esphome-web-f1d734
friendly_name: wroom32

esp32:
board: esp32dev
framework:
type: arduino

external_components:

source: github://pr#4775
components: [adc, i2s_audio, microphone]
refresh: 0s

logger:

api:
encryption:
key: “gZD5pY+6PnlZiI012s2HwSHF4TWZ/NwRQD+Lq50uzSU=”

ota:

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

ap:
ssid: “Esphome-Web-F1D734”
password: “8qrKcUO9erk0”

captive_portal:

i2s_audio:

id: i2s_out
i2s_lrclk_pin: GPIO26
i2s_bclk_pin: GPIO27
id: i2s_in
i2s_lrclk_pin: GPIO19
i2s_bclk_pin: GPIO18

media_player:

platform: i2s_audio
id: media_out
name: Notifier
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO14
mode: mono

microphone:

platform: i2s_audio
adc_type: external
pdm: false
id: mic_i2s
i2s_audio_id: i2s_in
i2s_din_pin: GPIO23

voice_assistant:
microphone: mic_i2s
on_tts_end:
then:
- media_player.play_media:
media_url: !lambda |-
return x;

binary_sensor:

platform: gpio
pin:
number: GPIO05
inverted: true
mode:
input: true
pullup: true
name: Assist Button
on_press:
- voice_assistant.start:
  on_release:
- voice_assistant.stop:

MarkTheMagnificent · May 10, 2023, 6:19pm

Thanks. Almost the same as mine. It should work in theory but I have a network issue.
Device is on a different subnet, HA runs in a docker with direct host network. I figured out the voice stream using UDP with 512kb packet stream just as I press the play button. It uses a random port on the destination side each time I press the button. Still yet to figure out what could be the problem (I let through every possible port for testing without luck). No real parameters for voice assitant on ESP home side to tweak I will open a ticket on github. Anyhow thanks again for sharing your config.

Deco · May 10, 2023, 7:14pm

what do you see in the device log in esphome, and in assistant debug? nothing that gives a clue?

Raspiaudio · May 11, 2023, 5:17pm

Ok I got the Speech to text (STT) part working here is the proof

I can only get the spoken sentence transcribed in the logs of the Muse Luxe.

I think the result is very good taking in account my French accent and the fact that I am using the simplest tiny int8 Whisper model running on a small Rapsberry pi 4.