Muse Luxe voice assistant now possible?

On the right what you refer as ‘ESP32’ is Muse Proto (yellow board), correct?
Could you it be an issue with the Whisper model that is used and the processing power of the Raspberry pi 4?

I have attached an example of the sound recorded on a Muse Proto I think that the quality is quite good here is a recording with an arduino app we wrote so it sets your expectations on the quality, talking at 30cm from the mic.

The Muse Proto sounds really good! So it seems the Luxe is significantly worse? I guess then I will rather use the Proto for voice assistant purposes.

i will also make a recording on the Luxe, but I think the quality is comparable to the Proto from what I remember. This is why I think it is more related to some settings tweaking. We have somebody of the team working on that subject this week, I will post result here.

3 Likes

no i mean a ESP32 D1 mini + INMP441 + MAX98357 (and 11 wires) :slight_smile:
and yes it could very likely be a issue with whisper, there is lots of them (like not giving audio reply), i d believe it could be a matter of raising the gain, like i did with the ESP/INMP441. only did the 11db gain and have not tried on the muses.

it’s the microphone we discuss :slight_smile: yes the sound is great on both luxe and proto, but for voice assistant to work it needs a good mic (gain), and even then whisper is horrible at “decoding” that to words, but it’s a work in progress, the year has many months left :))

Deco, would you please share your ESPhome config. It seems you figured out the audio part too. :slight_smile:

1 Like

Thanks I have installed this yalm on the luxe without issue I can play TTS, but not sure how to record when pressing the play button it seems the be in “Assist in progress” but where should I see the speech to text result?

thanks

1 Like

i figured it out for the luxe, and for esp32, not proto, i just tried the same yalm from luxe on the proto and not sure what is different from luxe to proto, but lux finds a I2C device, proto does not:

Luxe:
[19:33:22][I][i2c.arduino:069]: Results from i2c bus scan:
[19:33:22][I][i2c.arduino:075]: Found i2c device at address 0x10

Proto:
[19:32:03][I][i2c.arduino:069]: Results from i2c bus scan:
[19:32:03][I][i2c.arduino:071]: Found no i2c devices!

what are you using? proto or luxe?, and sure i can send you the yaml i use in my Luxe if thats what you need :), or for ESP32+external I2S mic. :wink:

1 Like

in settings, voice assistant, and the assistant you use (default), top right corner, debug. there yuo can follow as it goes along the pipeline.

i can play tts too, but i can’t get voice assistant to do it when replying, i can see in debug under raw that the result is created ready to be played bu nothing happens

In the meantime I used the yaml @DTTerastar posted earlier and it kind of works.
I had to change the GPIO12 parameter to be pulled up for the play button to work and changed the i2s_audio modul parameters to GPIOxx instead of just a number. I have a Luxe and it seems to work. So it sends something when I press the play button. Only problem it fails on STT with
[E][voice_assistant:145]: Error: stt-stream-failed - Speech to text failed

I tried to use the built in HA cloud based speech to text service which works on mobile phones and on the web interface.

In the logs I also saw this:

  • Voice error: Error processing en-AU speech: 400 No audio data received
  • Voice error: Error processing en-AU speech: 400 Invalid HTTP request.

So there is a chance that nothing is sent by the device or there is a network error somewhere.

Also would be great to understand how ESPHome sends the data to HA. I use a different subnet for Wifi clients and I hope it uses TCP or UDP not some non routable protocols. (I have found no documentation on this)

it looks like this when it works:

[05:35:51][D][binary_sensor:036]: ‘Assist Button’: Sending state ON
[05:35:51][D][voice_assistant:065]: Requesting start…
[05:35:51][D][voice_assistant:045]: Starting…
[05:35:51][D][voice_assistant:083]: Assist Pipeline running
[05:35:53][D][binary_sensor:036]: ‘Assist Button’: Sending state OFF
[05:35:53][D][voice_assistant:073]: Signaling stop…
[05:35:55][D][voice_assistant:097]: Speech recognised as: " How many lights are on?"
[05:35:55][D][voice_assistant:112]: Response: “2”
[05:35:55][D][voice_assistant:127]: Response URL: “http://10.66.66.4:8123/api/tts_proxy/da4b9237bacccdf19c0760cab7aec4a8359010b0_en-us_f7a2402831_tts.piper.raw
[05:35:55][D][media_player:059]: ‘Notifier’ - Setting
[05:35:55][D][media_player:066]: Media URL: http://10.66.66.4:8123/api/tts_proxy/da4b9237bacccdf19c0760cab7aec4a8359010b0_en-us_f7a2402831_tts.piper.raw
[05:35:55][D][voice_assistant:132]: Assist Pipeline ended

and in debug you will see it dooes return a media source to be played.

and here is my working configuration for Muse Luxe:

esphome:
name: raspiaudio-muse-luxe-0096f0
friendly_name: RaspiAudio Muse Luxe

esp32:
board: esp-wrover-kit
framework:
type: arduino

external_components:

  • source: github://pr#3552
    components: [es8388]
    refresh: 0s
  • source: github://pr#4775
    components: [adc, i2s_audio, microphone]
    refresh: 0s

i2c:
sda: GPIO18
scl: GPIO23

es8388:

logger:

api:
encryption:
key: vuw4DCRhZUaPpQbhnbsDT25d1oS2qmikefWHlplU8Xc=

ota:

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

captive_portal:

i2s_audio:

  • i2s_lrclk_pin: 25
    i2s_bclk_pin: 5

media_player:

  • platform: i2s_audio
    name: ‘’
    dac_type: external
    id: speaker_i2s
    i2s_dout_pin: 26
    mode: mono

microphone:

  • platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    i2s_din_pin: 35

voice_assistant:
microphone: mic_i2s
on_start:
then:
- light.turn_on:
id: led
blue: 1.0
red: 0.0
green: 0.0
state: true
on_tts_start:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
on_tts_end:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
- media_player.play_media:
media_url: !lambda |-
return x;
on_end:
then:
- delay: 1s
- if:
condition:
media_player.is_playing: {}
then:
- wait_until:
condition:
media_player.is_idle: {}
- light.turn_off:
id: led
state: false
else:
- light.turn_off:
id: led
state: false
on_error:
then:
- light.turn_on:
id: led
blue: 0.0
red: 1.0
green: 0.0
state: true
- delay: 1s
- light.turn_off:
id: led
state: false

binary_sensor:

  • platform: gpio
    pin:
    number: GPIO12
    inverted: true
    mode:
    input: true
    output: false
    open_drain: false
    pullup: true
    pulldown: false
    drive_strength: 20.0
    name: Button
    on_press:
    • then:
      • voice_assistant.start: {}
        on_release:
    • then:
      • voice_assistant.stop: {}
  • platform: gpio
    pin:
    number: GPIO19
    inverted: true
    mode:
    input: true
    pullup: true
    name: ${friendly_name} Volume Up
    on_click:
    • media_player.volume_up:
  • platform: gpio
    pin:
    number: GPIO32
    inverted: true
    mode:
    input: true
    pullup: true
    name: ${friendly_name} Volume Down
    on_click:
    • media_player.volume_down:

light:

  • platform: fastled_clockless
    id: led
    name: ‘’
    disabled_by_default: true
    entity_category: config
    pin: 22
    default_transition_length: 0s
    chipset: SK6812
    num_leds: 1
    rgb_order: GRB
    restore_mode: ALWAYS_OFF
    gamma_correct: 2.8
    flash_transition_length: 0s

sensor:

  • platform: adc
    pin: GPIO33
    name: ${name} Battery
    icon: “mdi:battery-outline”
    update_interval: 15s
    accuracy_decimals: 3
    attenuation: 11db
    raw: true
    filters:
    • multiply: 0.00173913 # 2300 → 4, for attenuation 11db, based on Olivier’s code
    • exponential_moving_average:
      alpha: 0.2
      send_every: 2
    • delta: 0.002
1 Like

I get it working by activatting SSL and https on my HA instance.

Try to access the return response in your browser, if it fails, activate https.

For the Muse Proto (yellow board) here is the untested Yalm :
download here

esphome:
name: smart-speaker
friendly_name: Smart SpeakerP
name_add_mac_suffix: false
min_version: 2023.4.4
on_boot:
then:
- output.turn_on: pw
- output.turn_off: gain
esp32:
board: esp-wrover-kit
framework:
version: 2.0.5
source: ~3.20005.0
platform_version: platformio/espressif32 @ 5.3.0
type: arduino
variant: ESP32

external_components:

  • source: github://pr#4775
    components: [adc, i2s_audio, microphone]
    refresh: 0s

output:

  • platform: gpio
    pin:
    number: 21
    mode: OUTPUT
    id: pw
  • platform: gpio
    pin:
    number: 23
    mode: OUTPUT
    id: gain

logger:
baud_rate: 115200
tx_buffer_size: 512
deassert_rts_dtr: false
hardware_uart: UART0
level: DEBUG
logs: {}

api:
port: 6053
password: ‘’
reboot_timeout: 15min

ota:
safe_mode: true
port: 3232
reboot_timeout: 5min
num_attempts: 10

wifi:
ap:
password: ${wifi_ap_password}
ap_timeout: 1min
domain: .z13.org
reboot_timeout: 15min
power_save_mode: LIGHT
fast_connect: false
networks:

  • ssid: wifilr
    password: casanice
    priority: 0.0

captive_portal: {}
improv_serial: {}

i2s_audio:

  • i2s_lrclk_pin: 25
    i2s_bclk_pin: 5

microphone:

  • platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: 35
    adc_type: external
    pdm: false

media_player:

  • platform: i2s_audio
    name: ‘’
    id: echo_audio
    i2s_dout_pin: 26
    mode: mono
    disabled_by_default: false
    dac_type: external

voice_assistant:
microphone: echo_microphone
on_start:
then:
- light.turn_on:
id: led
blue: 1.0
red: 0.0
green: 0.0
state: true

on_tts_start:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
on_tts_end:
then:
- light.turn_on:
id: led
blue: 0.0
red: 0.0
green: 1.0
state: true
- media_player.play_media:
media_url: !lambda |-
return x;
on_end:
then:
- delay: 1s
- if:
condition:
media_player.is_playing: {}
then:
- wait_until:
condition:
media_player.is_idle: {}
- light.turn_off:
id: led
state: false
else:
- light.turn_off:
id: led
state: false
on_error:
then:
- light.turn_on:
id: led
blue: 0.0
red: 1.0
green: 0.0
state: true
- delay: 1s
- light.turn_off:
id: led
state: false

binary_sensor:

  • platform: gpio
    pin:
    number: GPIO12
    inverted: true
    mode:
    input: true
    output: false
    open_drain: false
    pullup: false
    pulldown: false
    drive_strength: 20.0
    name: Button
    disabled_by_default: true
    entity_category: diagnostic
    id: echo_button
    on_press:
    • then:
      • voice_assistant.start: {}
        on_release:
    • then:
      • voice_assistant.stop: {}
  • platform: gpio
    pin:
    number: GPIO19
    inverted: true
    mode:
    input: true
    pullup: true
    name: ${friendly_name} Volume Up
    on_click:
    • media_player.volume_up:
  • platform: gpio
    pin:
    number: GPIO32
    inverted: true
    mode:
    input: true
    pullup: true
    name: ${friendly_name} Volume Down
    on_click:
    • media_player.volume_down:

light:

  • platform: fastled_clockless
    id: led
    name: ‘’
    disabled_by_default: true
    entity_category: config
    pin: 22
    default_transition_length: 0s
    chipset: SK6812
    num_leds: 1
    rgb_order: GRB
    restore_mode: ALWAYS_OFF
    gamma_correct: 2.8
    flash_transition_length: 0s

sensor:

  • platform: adc
    pin: GPIO33
    name: ${name} Battery
    icon: “mdi:battery-outline”
    update_interval: 15s
    accuracy_decimals: 3
    attenuation: 11db
    raw: true
    filters:
    • multiply: 0.00173913 # 2300 → 4, for attenuation 11db, based on Olivier’s code
    • exponential_moving_average:
      alpha: 0.2
      send_every: 2
    • delta: 0.002
1 Like

thats the web interface, we don’t need that when using voice assistant from other devices like the Muse, only from browsers due to their security.

didn’t work for me, still no output from micrphone resulting in a “voice_assistant:145]: Error: stt-no-text-recognized - No text recognized” error. (and led turns red)

i think the problem is in reading the microphone? is it the same microphone in both luxe and muse?

btw. shouldn’t it be gpio0 for the button when it’s the proto :slight_smile:

here is the simplified and working code for ESP32 (WRoom Dev. Board) + INMP441 + MAX98357:

esphome:
name: esphome-web-f1d734
friendly_name: wroom32

esp32:
board: esp32dev
framework:
type: arduino

external_components:

  • source: github://pr#4775
    components: [adc, i2s_audio, microphone]
    refresh: 0s

logger:

api:
encryption:
key: “gZD5pY+6PnlZiI012s2HwSHF4TWZ/NwRQD+Lq50uzSU=”

ota:

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

ap:
ssid: “Esphome-Web-F1D734”
password: “8qrKcUO9erk0”

captive_portal:

i2s_audio:

  • id: i2s_out
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO27
  • id: i2s_in
    i2s_lrclk_pin: GPIO19
    i2s_bclk_pin: GPIO18

media_player:

  • platform: i2s_audio
    id: media_out
    name: Notifier
    dac_type: external
    i2s_audio_id: i2s_out
    i2s_dout_pin: GPIO14
    mode: mono

microphone:

  • platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO23

voice_assistant:
microphone: mic_i2s
on_tts_end:
then:
- media_player.play_media:
media_url: !lambda |-
return x;

binary_sensor:

  • platform: gpio
    pin:
    number: GPIO05
    inverted: true
    mode:
    input: true
    pullup: true
    name: Assist Button
    on_press:
    • voice_assistant.start:
      on_release:
    • voice_assistant.stop:

Thanks. Almost the same as mine. It should work in theory but I have a network issue.
Device is on a different subnet, HA runs in a docker with direct host network. I figured out the voice stream using UDP with 512kb packet stream just as I press the play button. It uses a random port on the destination side each time I press the button. Still yet to figure out what could be the problem (I let through every possible port for testing without luck). No real parameters for voice assitant on ESP home side to tweak :frowning: I will open a ticket on github. Anyhow thanks again for sharing your config.

1 Like

what do you see in the device log in esphome, and in assistant debug? nothing that gives a clue?

Ok I got the Speech to text (STT) part working here is the proof :slight_smile:

I can only get the spoken sentence transcribed in the logs of the Muse Luxe.

I think the result is very good taking in account my French accent and the fact that I am using the simplest tiny int8 Whisper model running on a small Rapsberry pi 4.

2 Likes