Voice Assist et Esphome

Nicolas117 · Mars 18, 2024, 10:37

Bonjour,

Depuis quelques jours, j’essaie de configurer un ESP32 et un ESP32-S3 en tant qu’assistant vocal. Pour cela, j’utilise un microphone INMP441.

Voici la conf de mes deux esp:

esphome:
  name: assistant-chambre
  friendly_name: Assistant Chambre
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
    - light.turn_on:
        id: led_ww
        blue: 100%
        brightness: 60%
        effect: fast pulse
esp32:
  board: esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type: esp-idf

    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_AUDIO_BOARD_CUSTOM: "y"
   
psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:
  level: DEBUG

ota:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
      # Set this to the IP of the ESP
      static_ip: 192.168.1.23
      # Set this to the IP address of the router. Often ends with .1
      gateway: 192.168.1.1
      # The subnet of the network. 255.255.255.0 works for most home networks.
      subnet: 255.255.255.0

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:

# Enable Home Assistant API
api:
  on_client_connected:
        then:
          - delay: 50ms
          - light.turn_off: led_ww
          - micro_wake_word.start:
  on_client_disconnected:
        then:
          - voice_assistant.stop: 

button:
  - platform: restart
    name: "Restart"
    id: but_rest

switch:
  - platform: template
    id: mute
    name: mute
    optimistic: true
    on_turn_on: 
      - micro_wake_word.stop:
      - voice_assistant.stop:
      - light.turn_on:
          id: led_ww           
          red: 100%
          green: 0%
          blue: 0%
          brightness: 60%
          effect: fast pulse 
      - delay: 2s
      - light.turn_off:
          id: led_ww
      - light.turn_on:
          id: led_ww           
          red: 100%
          green: 0%
          blue: 0%
          brightness: 30%
    on_turn_off:
      - micro_wake_word.start:
      - light.turn_on:
          id: led_ww           
          red: 0%
          green: 100%
          blue: 0%
          brightness: 60%
          effect: fast pulse 
      - delay: 2s
      - light.turn_off:
          id: led_ww 
   
light:
  - platform: esp32_rmt_led_strip
    id: led_ww
    rgb_order: GRB
    pin: GPIO48
    num_leds: 1
    rmt_channel: 0
    chipset: ws2812
    name: "on board light"
    effects:
      - pulse:
      - pulse:
          name: "Fast Pulse"
          transition_length: 0.5s
          update_interval: 0.5s
          min_brightness: 0%
          max_brightness: 100%
          
          
 # Audio and Voice Assistant Config      

i2s_audio:
  - id: i2s_mic
    i2s_lrclk_pin: GPIO3
    i2s_bclk_pin: GPIO2
  - id: i2s_spk
    i2s_lrclk_pin: GPIO6
    i2s_bclk_pin: GPIO7

microphone:
  platform: i2s_audio
  id: va_mic
  adc_type: external
  i2s_audio_id: i2s_mic
  i2s_din_pin: GPIO4
  channel: left
  pdm: false

speaker:
  platform: i2s_audio
  id: va_spk
  dac_type: external
  i2s_audio_id: i2s_spk
  i2s_dout_pin: GPIO8
    
micro_wake_word:
  on_wake_word_detected:
    - voice_assistant.start:
    - light.turn_on:
        id: led_ww           
        red: 30%
        green: 30%
        blue: 70%
        brightness: 60%
        effect: fast pulse 
  model: hey_jarvis
    
voice_assistant:
  id: va
  microphone: va_mic
  speaker: va_spk
  noise_suppression_level: 2.0
  volume_multiplier: 4.0
  on_stt_end:
       then: 
         - light.turn_off: led_ww
  on_error:
          - micro_wake_word.start:  
  on_end:
        then:
          - light.turn_off: led_ww
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start:

esphome:
  name: test
  friendly_name: test

esp32:
  board: esp32dev
  framework:
    type: arduino
    version: recommended

# Enable logging
logger:
  level: DEBUG

# Enable Home Assistant API
api:

ota:

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  manual_ip:
      # Set this to the IP of the ESP
      static_ip: 192.168.1.19
      # Set this to the IP address of the router. Often ends with .1
      gateway: 192.168.1.1
      # The subnet of the network. 255.255.255.0 works for most home networks.
      subnet: 255.255.255.0

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:

captive_portal:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26 #WS 
    i2s_bclk_pin: GPIO25 #SCK

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33  #SD Pin from the INMP441 Microphone


voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  use_wake_word: false

  
  on_wake_word_detected: 
    - light.turn_on:
        id: led_light
  on_listening: 
    - light.turn_on:
        id: led_light
        effect: "Scan Effect With Custom Values"
        red: 63%
        green: 13%
        blue: 93%
  
  on_stt_end:
    - light.turn_on:
        id: led_light
        effect: "None"
        red: 0%
        green: 100%
        blue: 0%

  on_error: 
    - light.turn_on:
        id: led_light
        effect: "None"
    - if:
        condition:
          switch.is_on: use_wake_word
        then:

          - switch.turn_off: use_wake_word
          - delay: 1sec 
          - switch.turn_on: use_wake_word

  on_tts_start:                                    # this is required to play the output on a media player
    - homeassistant.service:
        service: tts.speak
        data:
          media_player_entity_id: media_player.marantz_sr6015    #replace this with your media player entity id
          message: !lambda 'return x;'
          entity_id: tts.piper                 #replace this with your piper tts id.


  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:

  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
 
  on_end:
    - light.turn_off:
        id: led_light



binary_sensor:
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:


switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

light:
  - platform: neopixelbus
    id: led_light
    type: grb
    pin: GPIO32      # DIN pin of the LED Strip
    num_leds: 9      # change the Number of LEDS according to your LED Strip.
    name: "Light"
    variant: SK6812
    default_transition_length: 0.5s
      
    effects:
      - addressable_scan:
          name: Scan Effect With Custom Values
          move_interval: 50ms
          scan_width: 2

Home Assistant, OpenWakeWord, Whisper et Piper sont installés sur Docker. Lorsque je teste l’ensemble du processus avec mon téléphone en mode debug, le mot de réveil et les commandes prononcés sont détectés. Sur mes ESP, le mot de réveil n’est pas détecté.

J’ai ensuite testé microWakeWord sur mon ESP32-S3 et cette fois-ci cela a marché.

Par contre, c’est la partie Whisper qui ne semble pas fonctionner plus. On dirait presque un soucis de connexion entre mes esp et ha/openwakeword/whisper/piper.

Je ne comprends pas ce qui se passe et je suis à court d’idées. Quelqu’un pourrait-il m’indiquer d’où vient le problème, s’il vous plaît ?

WarC0zes · Mars 19, 2024, 3:54

Salut,
Le meilleur combo est Vosk, au lieu de whisper et Porcupine ou Snowboy , au lieu de OpenWakeWord.

Bob · Mars 19, 2024, 7:18

Bonjour @Nicolas117
J’utilise les mêmes solutions que @WarC0zes, le S3 très réactif mais je n’en ai sorti aucun retour son, si ça fonctionne pour toi je suis preneur de ta conf stp
Bob

Nicolas117 · Mars 19, 2024, 7:31

Merci @WarC0zes, mais avant de complexifier les choses j’aimerais faire fonctionner mes esp.
Ma pipeline fonctionne plutôt bien lorsque j’utilise mon téléphone (whisper répond en 2 à 3s).
Le problème semble vraiment être du coté d’esphome.
J’ai eu beau faire tous les tutos du web, rien ne fonctionne. Je désespère .

Krull56 · Mars 19, 2024, 7:50

A moins d’avoir un très bon GPU , c’est plutôt étonnant comme temps de réponse avec Whisper en Français
Ou de la sorcellerie ?

Fais un test avec vosk et tiens nous au courant de tes résultats avec tes esp. J’en ai 4 en prod ( esp simple, pas S3) et pas le moindre soucis.

Il nous faudrait aussi les logs de ton esp pour tenter d’identifier le pb

Nicolas117 · Mars 19, 2024, 9:25

Je suis avec un i3 sans gpu pour le conteneur whisper avec le model small-int8.
Ok je vais tester vosk mais je n’ai pas vu de doc sur le hub docker pour l’installation de l’image. De ce que je comprends sur le github il faut que je télécharge le modèle français sur le conteneur et que je passe le model utilisé en argument ?

Pour les logs de mes esp:

[20:23:14][I][app:102]: ESPHome version 2024.2.2 compiled on Mar 19 2024, 20:22:36
[20:23:14][C][wifi:577]: WiFi:
[20:23:14][C][wifi:409]:   Local MAC: A0:B7:65:63:24:7C
[20:23:14][C][wifi:414]:   SSID: [redacted]
[20:23:14][C][wifi:415]:   IP Address: 192.168.1.19
[20:23:14][C][wifi:417]:   BSSID: [redacted]
[20:23:14][C][wifi:418]:   Hostname: 'test'
[20:23:14][C][wifi:420]:   Signal strength: -50 dB ▂▄▆█
[20:23:14][C][wifi:424]:   Channel: 11
[20:23:14][C][wifi:425]:   Subnet: 255.255.255.0
[20:23:14][C][wifi:426]:   Gateway: 192.168.1.1
[20:23:14][C][wifi:427]:   DNS1: 0.0.0.0
[20:23:14][C][wifi:428]:   DNS2: 0.0.0.0
[20:23:14][C][logger:447]: Logger:
[20:23:14][C][logger:448]:   Level: DEBUG
[20:23:14][C][logger:449]:   Log Baud Rate: 115200
[20:23:14][C][logger:451]:   Hardware UART: UART0
[20:23:14][C][template.switch:068]: Template Switch 'Use wake word'
[20:23:14][C][template.switch:091]:   Restore Mode: restore defaults to ON
[20:23:14][C][template.switch:057]:   Optimistic: YES
[20:23:14][C][captive_portal:088]: Captive Portal:
[20:23:14][C][mdns:115]: mDNS:
[20:23:14][C][mdns:116]:   Hostname: test
[20:23:14][C][ota:096]: Over-The-Air Updates:
[20:23:14][C][ota:097]:   Address: 192.168.1.19:3232
[20:23:14][C][ota:100]:   Using Password.
[20:23:14][C][ota:103]:   OTA version: 2.
[20:23:14][C][api:139]: API Server:
[20:23:14][C][api:140]:   Address: 192.168.1.19:6053
[20:23:14][C][api:142]:   Using noise encryption: YES
[20:25:13][D][switch:012]: 'Use wake word' Turning ON.
[20:25:13][D][switch:055]: 'Use wake word': Sending state ON
[20:25:13][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[20:25:13][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[20:25:13][D][voice_assistant:118]: microphone not running
[20:25:13][D][voice_assistant:202]: Requesting start...
[20:25:13][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[20:25:13][D][voice_assistant:437]: Client started, streaming microphone
[20:25:13][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[20:25:13][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[20:25:13][D][voice_assistant:155]: Starting Microphone
[20:25:13][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[20:25:13][D][voice_assistant:523]: Event Type: 1
[20:25:13][D][voice_assistant:526]: Assist Pipeline running
[20:25:13][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[20:25:13][D][voice_assistant:523]: Event Type: 9
[20:25:15][D][switch:016]: 'Use wake word' Turning OFF.
[20:25:15][D][switch:055]: 'Use wake word': Sending state OFF
[20:25:15][D][voice_assistant:516]: Signaling stop...
[20:25:15][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[20:25:15][D][voice_assistant:422]: Desired state set to IDLE
[20:25:15][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[20:25:15][D][voice_assistant:523]: Event Type: 0
[20:25:15][E][voice_assistant:653]: Error: no_wake_word - No wake word detected
[20:25:15][D][voice_assistant:516]: Signaling stop...
[20:25:15][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to STOP_MICROPHONE
[20:25:15][D][voice_assistant:422]: Desired state set to IDLE
[20:25:15][D][voice_assistant:416]: State changed from STOP_MICROPHONE to IDLE
[20:25:15][D][voice_assistant:523]: Event Type: 2
[20:25:15][D][voice_assistant:613]: Assist Pipeline ended
[20:25:21][D][switch:012]: 'Use wake word' Turning ON.
[20:25:21][D][switch:055]: 'Use wake word': Sending state ON
[20:25:21][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[20:25:21][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[20:25:21][D][voice_assistant:118]: microphone not running
[20:25:21][D][voice_assistant:202]: Requesting start...
[20:25:21][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[20:25:21][D][voice_assistant:437]: Client started, streaming microphone
[20:25:21][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[20:25:21][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[20:25:21][D][voice_assistant:155]: Starting Microphone
[20:25:21][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[20:25:21][D][voice_assistant:523]: Event Type: 1
[20:25:21][D][voice_assistant:526]: Assist Pipeline running
[20:25:21][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[20:25:21][D][voice_assistant:523]: Event Type: 9
[20:25:57][D][esp32.preferences:114]: Saving 1 preferences to flash...
[20:25:57][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[20:27:58][I][ota:117]: Boot seems successful, resetting boot loop counter.
[20:27:58][D][esp32.preferences:114]: Saving 1 preferences to flash...
[20:27:58][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

Et pour le s3 avec microwakeword :

[22:22:32][C][wifi:577]: WiFi:
[22:22:32][C][wifi:409]:   Local MAC: 48:27:E2:D3:0D:A0
[22:22:32][C][wifi:414]:   SSID: [redacted]
[22:22:32][C][wifi:415]:   IP Address: 192.168.1.23
[22:22:32][C][wifi:417]:   BSSID: [redacted]
[22:22:32][C][wifi:418]:   Hostname: 'assistant-chambre'
[22:22:32][C][wifi:420]:   Signal strength: -39 dB ▂▄▆█
[22:22:32][C][wifi:424]:   Channel: 11
[22:22:32][C][wifi:425]:   Subnet: 255.255.255.0
[22:22:32][C][wifi:426]:   Gateway: 192.168.1.1
[22:22:32][C][wifi:427]:   DNS1: 0.0.0.0
[22:22:32][C][wifi:428]:   DNS2: 0.0.0.0
[22:22:32][D][light:036]: 'on board light' Setting:
[22:22:32][D][light:085]:   Transition length: 1.0s
[22:22:32][W][micro_wake_word:150]: Wake word is already running
[22:22:32][C][logger:447]: Logger:
[22:22:32][C][logger:448]:   Level: DEBUG
[22:22:32][C][logger:449]:   Log Baud Rate: 115200
[22:22:32][C][logger:451]:   Hardware UART: USB_SERIAL_JTAG
[22:22:32][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[22:22:32][C][esp32_rmt_led_strip:176]:   Pin: 11
[22:22:32][C][esp32_rmt_led_strip:177]:   Channel: 0
[22:22:32][C][esp32_rmt_led_strip:202]:   RGB Order: GRB
[22:22:32][C][esp32_rmt_led_strip:203]:   Max refresh rate: 0
[22:22:32][C][esp32_rmt_led_strip:204]:   Number of LEDs: 1

J’ai pas beaucoup d’infos dans les logs.

Nicolas117 · Mars 20, 2024, 7:23

J’ai ajouté porcupine et vosk. Vosk est plus rapide ± 0.3s avec le modèle small mais est un peu moins précis que whisper.

En ce qui concerne mes esp, les résultats sont malheureusement les mêmes. Le wake word n’est pas détecté et coté stt il ne se passe rien.

INFO ESPHome 2024.2.2
INFO Reading configuration /config/test.yaml...
INFO Starting log output from 192.168.1.19 using esphome API
INFO Successfully connected to test @ 192.168.1.19 in 0.211s
INFO Successful handshake with test @ 192.168.1.19 in 0.099s
[08:02:47][I][app:102]: ESPHome version 2024.2.2 compiled on Mar 20 2024, 07:57:22
[08:02:47][C][wifi:577]: WiFi:
[08:02:47][C][wifi:409]:   Local MAC: A0:B7:65:63:24:7C
[08:02:47][C][wifi:414]:   SSID: [redacted]
[08:02:47][C][wifi:415]:   IP Address: 192.168.1.19
[08:02:47][C][wifi:417]:   BSSID: [redacted]
[08:02:47][C][wifi:418]:   Hostname: 'test'
[08:02:47][C][wifi:420]:   Signal strength: -57 dB ▂▄▆█
[08:02:47][C][wifi:424]:   Channel: 11
[08:02:47][C][wifi:425]:   Subnet: 255.255.255.0
[08:02:47][C][wifi:426]:   Gateway: 192.168.1.1
[08:02:47][C][wifi:427]:   DNS1: 0.0.0.0
[08:02:47][C][wifi:428]:   DNS2: 0.0.0.0
[08:02:47][C][logger:447]: Logger:
[08:02:47][C][logger:448]:   Level: DEBUG
[08:02:47][C][logger:449]:   Log Baud Rate: 115200
[08:02:47][C][logger:451]:   Hardware UART: UART0
[08:02:47][C][light:103]: Light 'Light'
[08:02:47][C][light:105]:   Default Transition Length: 0.5s
[08:02:47][C][light:106]:   Gamma Correct: 2.80
[08:02:47][C][template.switch:068]: Template Switch 'Use wake word'
[08:02:47][C][template.switch:091]:   Restore Mode: restore defaults to ON
[08:02:47][C][template.switch:057]:   Optimistic: YES
[08:02:47][C][status:034]: Status Binary Sensor 'API Connection'
[08:02:47][C][status:034]:   Device Class: 'connectivity'
[08:02:47][C][captive_portal:088]: Captive Portal:
[08:02:47][C][mdns:115]: mDNS:
[08:02:47][C][mdns:116]:   Hostname: test
[08:02:47][C][ota:096]: Over-The-Air Updates:
[08:02:47][C][ota:097]:   Address: 192.168.1.19:3232
[08:02:47][C][ota:100]:   Using Password.
[08:02:47][C][ota:103]:   OTA version: 2.
[08:02:47][C][api:139]: API Server:
[08:02:47][C][api:140]:   Address: 192.168.1.19:6053
[08:02:47][C][api:142]:   Using noise encryption: YES
[08:02:58][D][switch:012]: 'Use wake word' Turning ON.
[08:02:58][D][switch:055]: 'Use wake word': Sending state ON
[08:02:58][D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[08:02:58][D][voice_assistant:422]: Desired state set to START_MICROPHONE
[08:02:58][D][voice_assistant:118]: microphone not running
[08:02:58][D][voice_assistant:202]: Requesting start...
[08:02:58][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:02:58][D][voice_assistant:437]: Client started, streaming microphone
[08:02:58][D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[08:02:58][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:02:58][D][voice_assistant:155]: Starting Microphone
[08:02:58][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:02:58][D][voice_assistant:523]: Event Type: 1
[08:02:58][D][voice_assistant:526]: Assist Pipeline running
[08:02:58][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[08:02:58][D][voice_assistant:523]: Event Type: 9
[08:03:18][D][esp32.preferences:114]: Saving 1 preferences to flash...
[08:03:18][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

Quand j’active le debug sur ha au niveau de la pipeline assist, avec mon esp-32s3 qui a microwakeword d’activé le .wav destiné à whisper ou vosk fait 44 octets.
De même avec l’esp32 pour le wakword et porcupine/openwakeword lorsque que je coupe l’utilisation du wakeword via le switch.

Je ne comprends vraiment pas ce qui est incorrect dans ma configuration.