Instabilité HAOS sous Proxmox

Bonjour à tous,

J’ai des problèmes d’instabilité suite à une migration de mon système sur une machine dédiée (Beelink U59 passé à 16Go en ram) par l’intermédiaire d’une VM sur proxmox. L’installation doit dater de septembre environ et pour moi le problème ne vient pas d’HA vu qu’il ne change pas avec les MàJ.

Les crash sont assez réguliés (1~4j je dirais). HA est completement KO (plus d’accès, de lien Alexa, d’automation etc) alors qu’en passant par proxmox la VM est bien UP (voir capture ci-après). Seule solution que j’ai trouvé, un petit STOP/START de la VM pour récupérer le tout.

Les logs ne remontent pas jusqu’au dernier crash (avant la capture d’hier) donc je suis un peu pauvre en détails… Il y a une astuce pour récupérer ça autre que home-assistant.log et home-assistant.log.1?

Merci d’avance.

Ma configuration


version core-2022.12.8
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.7
os_name Linux
os_version 5.15.80
arch x86_64
timezone Europe/Paris
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.29.0
Stage running
Available Repositories 1271
Downloaded Repositories 33
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 9.4
update_channel stable
supervisor_version supervisor-2022.12.1
agent_version 1.4.1
docker_version 20.10.19
disk_total 30.8 GB
disk_used 8.0 GB
healthy true
supported true
board ova
supervisor_api ok
version_api ok
installed_addons Samba share (10.0.0), Terminal & SSH (9.6.1), AppDaemon (0.11.0), Let’s Encrypt (4.12.7), Studio Code Server (5.5.1), Home Assistant Google Drive Backup (0.109.2), Mosquitto broker (6.1.3), Zigbee2MQTT (1.28.4-1)
Dashboards
dashboards 4
resources 25
views 9
mode storage
Recorder
oldest_recorder_run 20 décembre 2022 à 16:42
current_recorder_run 30 décembre 2022 à 07:29
estimated_db_size 52.69 MiB
database_engine sqlite
database_version 3.38.5
___

Salut

Le log.1 doit contenir les infos avant le dernier reboot (donc au moment du plantage)

Concernant la config, j’ai aussi un HAOS sous proxmox qui fonctionne sans souci… avec une config nettement plus petite (1CPU/2GO de ram)… Donc avec 16Go de ram, tu es normalement très très large


Jette un oeil egalement sur les courbes, voir s’il n’y a pas une surcharge à un moment donné, dûe à une automatisation par exemple

Si la VM est toujours UP tu peux jeter un oeil à la console via proxmox avant de faire le stop/start. Ca doit te raconter des choses en passant par la commande « ha »… Tu peux aussi regarder les logs par ce moyen là. Ou via le share samba que tu as installé.

J’ai une config approchante (je suis en HA container dans un VM de proxmox) et c’est très solide. Je fais une mise à jour tous les 2/3 mois et je n’ai pas de plantage entre les deux.

Merci pour les infos. Je me doute qu’une installation bien faite doit être super stable sur ce genre de miniPC mais j’ai du mal à trouver d’où ça vient… Pour la charge, on voit le crash du matin ainsi que le redémarrage en rentrant :

Alors mon log.1 date de ce matin 7h alors que je n’ai pas de redémarrage à cette heure.
Si ça vous parle :

log.1:

2022-12-30 05:35:04.637 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration ui_lovelace_minimalist which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.643 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration tapo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.645 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration lovelace_gen which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.654 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.659 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration fontawesome which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.665 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration rte_ecowatt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.675 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration xiaomi_cloud_map_extractor which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.677 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration alexa_media which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.678 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration browser_mod which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.679 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration entity_controller which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:04.680 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration garbage_collection which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 05:35:07.401 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly
2022-12-30 05:35:07.624 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=572 from 2022-12-29 17:54:04.945960)
2022-12-30 05:35:22.752 ERROR (MainThread) [homeassistant.components.sensor] Error while setting up rte_ecowatt platform for sensor
Traceback (most recent call last):
  File "/config/custom_components/rte_ecowatt/__init__.py", line 187, in update_method
    raise UpdateFailed(
homeassistant.helpers.update_coordinator.UpdateFailed: Error communicating with RTE API: requests too frequent to RTE API

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 225, in _async_refresh
    self.data = await self._async_update_data()
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 181, in _async_update_data
    return await self.update_method()
  File "/config/custom_components/rte_ecowatt/__init__.py", line 206, in update_method
    raise UpdateFailed(f"Error communicating with API: {err}")
homeassistant.helpers.update_coordinator.UpdateFailed: Error communicating with API: Error communicating with RTE API: requests too frequent to RTE API

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 281, in _async_setup_platform
    await asyncio.shield(task)
  File "/config/custom_components/rte_ecowatt/sensor.py", line 98, in async_setup_entry
    await rte_coordinator.async_config_entry_first_refresh()
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 197, in async_config_entry_first_refresh
    raise ex
homeassistant.exceptions.ConfigEntryNotReady: Error communicating with API: Error communicating with RTE API: requests too frequent to RTE API
2022-12-30 06:14:03.192 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.vicare_dhw_charging_active is taking over 10 seconds
2022-12-30 06:14:03.408 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.vicare_outside_temperature is taking over 10 seconds
2022-12-30 06:14:05.367 ERROR (MainThread) [homeassistant.helpers.entity] Update for binary_sensor.vicare_dhw_charging_active fails
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 527, in async_update_ha_state
    await self.async_device_update()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 730, in async_device_update
    await task
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/vicare/binary_sensor.py", line 236, in update
    self._state = self.entity_description.value_getter(self._api)
  File "/usr/src/homeassistant/homeassistant/components/vicare/binary_sensor.py", line 82, in <lambda>
    value_getter=lambda api: api.getDomesticHotWaterChargingActive(),
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareUtils.py", line 45, in feature_flag_wrapper
    return wrapper(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareUtils.py", line 38, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareDevice.py", line 164, in getDomesticHotWaterChargingActive
    return self.service.getProperty("heating.dhw.charging")["properties"]["active"]["value"]
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareCachedService.py", line 24, in getProperty
    data = self.__get_or_update_cache()
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareCachedService.py", line 42, in __get_or_update_cache
    data = self.fetch_all_features()
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareService.py", line 64, in fetch_all_features
    return self.oauth_manager.get(url)
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareAbstractOAuthManager.py", line 41, in get
    self.__handle_server_error(response)
  File "/usr/local/lib/python3.10/site-packages/PyViCare/PyViCareAbstractOAuthManager.py", line 60, in __handle_server_error
    raise PyViCareInternalServerError(response)
PyViCare.PyViCareUtils.PyViCareInternalServerError: (PyViCareInternalServerError(...), 'Request failed with status code 502 and message "DEVICE_COMMUNICATION_ERROR". ViCare ErrorId: req-65de31d4840043689a703626fa1ae42d')

log :

2022-12-30 07:29:48.911 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration ui_lovelace_minimalist which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.912 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration tapo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.913 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration lovelace_gen which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.915 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.916 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration fontawesome which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.917 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration rte_ecowatt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.918 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration xiaomi_cloud_map_extractor which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.919 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration alexa_media which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.920 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration browser_mod which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.922 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration entity_controller which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:48.923 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration garbage_collection which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 07:29:51.861 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly
2022-12-30 07:29:52.011 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=573 from 2022-12-30 04:35:07.116559)
2022-12-30 07:30:11.948 WARNING (MainThread) [homeassistant.helpers.entity] Update of camera.aspirateur is taking over 10 seconds
2022-12-30 08:19:48.446 ERROR (MainThread) [homeassistant.components.alexa.state_report] Error when sending ChangeReport for light.sde_ampoule_wc to Alexa: INVALID_ACCESS_TOKEN_EXCEPTION: Access token is not valid.
2022-12-30 09:50:47.504 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.vicare_dhw_charging_active is taking over 10 seconds
2022-12-30 09:50:47.613 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.vicare_outside_temperature is taking over 10 seconds
2022-12-30 13:31:33.665 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration ui_lovelace_minimalist which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.669 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration tapo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.671 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration lovelace_gen which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.672 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.673 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration fontawesome which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.673 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration rte_ecowatt which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.674 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration xiaomi_cloud_map_extractor which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.675 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration alexa_media which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.676 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration browser_mod which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.677 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration entity_controller which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2022-12-30 13:31:33.678 WARNING (SyncWorker_13) [homeassistant.loader] We found a custom integration garbage_collection which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

J’avais creusé pendant un passage, l’erreur sur Alexa est connu mais ne pose pas de soucis, et coté ViCare il y a des petites pertes de liens de temps en temps mais bon c’est mieux que rien.

Bref j’attends mon prochain soucis pour sortir rapidement le fichier log.1 et ceux de proxmox.

A une époque j’avais aussi des crashs d’un VM sous Proxmox celle de Jeedom et donc j’ai cherché dans tous les sens.
Résultat c’était le disque dur en fait 2/3 clusters défectueux sur mon SSD M2. Je l’ai changé sans changer quoique soit à ma configuration et c’est reparti sans crash.
Donc peut être analyser le disque proxmox avec une solution clé bootable Linux

Et tu avais un crash coté jeedom mais sans signe coté proxmox? Je pensais qu’un pb matériel serait visible coté proxmox mais bon…

En surveillant un peu plus, j’ai encore une reboot de HA ce matin vers 6h sans concéquence (j’ai ajouté uptime pour suivre). Rien dans le log vers cette heure et aucun signe coté proxmox. J’ai donc des reboot réguliés et des crash ponctuels.
Il y a moyen d’augmenter le niveau des log peut-être pour voir quelque chose de plus?

Bonjour,

Je suis dans le même cas que toi. Beelink U59 avec Proxmox, et crash aléatoire de la VM.

On trouve de nombreux cas similaires, et le problème semble concerner cette génération de processeurs sous proxmox.

Pour certains le passage au Kernel 5.19.17-1-pve a résolu le problème, ou réduit le nombre de crashs en tous cas.
C’est ce que j’ai fais : apt install pve-kernel-5.19

Je n’ai ce mini pc que depuis une semaine, difficile d’être certains d’avoir résolu le problème, j’ai l’impression de ne plus avoir de crashs.

J’ai aussi mis en place un watchdog au niveau de la VM et j’ai l’impression que ca fonctionne, lors de mon dernier crash la VM a simplement rebouté (mais était-ce avant ou après la maj de kernel? je ne sais plus) :
Modifier la config de la VM :

nano /etc/pve/qemu-server/[server_id].conf

et ajouter la ligne suivante :

watchdog: model=i6300esb,action=reset

Puis installation de l’addon watchdog-dev sur HA.

J’ai aussi passé le cpu en powersave suivant certaines recommandations du forum proxmox, pour cela j’ai utilisé le script de tteck : (voir Proxmox CPU Scaling Governor).

Voilà, pas suffisamment de recul pour savoir si le problème est réglé mais saches que tu n’es pas seul concerné et j’espère t’avoir donné des pistes de recherches.
Malheureusement le forum ne me laisse pas poster de lien, mais google devrait t’aider.

Merci des infos.

J’avais déjà mis en place la gestion du processeur mais j’ai mis à jour le kernel voir ce que ça donne. Plus qu’à croiser les doigts pour gagner en stabilité!

Salut,

Même souci, passé sous kernel 6.1 et pas de crash depuis 4j. :pray:

Ok je note en secours mais pour l’instant ça semble stable en 5.19 donc je continue à tester. Si ça pouvait reduire le nombre de « ça marche pas ton truc » ça serait une bonne première étape !

Non j’avais rien vu dans les log de proxmox
Mais je ne suis pas expert en la matière non plus donc peut être que je suis passé à côté de quelques choses.
Mais quand j’ai analysé la santé du disque je n’était pas bon à 100% et j’avais quelques To d’écriture de données quelques choses d’affolant je trouve
Aujourd’hui j’ai recyclé le disque en clé USB avec des trucs pas important dessus

Salut a tous

pour les problemes de SMART avec proxmox
il y a la video de tonton jo

En complément du kernel 5.19 (et du scaling governor) j’ai installé le intel-microcode comme suggéré sur le forum proxmox. Pour l’instant j’en suis à 4 jours d’uptime.

Bonjour,

Même problème pour moi avec un Beelink U59 avec Proxmox et VM HA, plantage aléatoire la nuit…

est ce que ce type de log vous dit quelque chose?

Si tu es sur le kernel d’origine tu devrais essayer les solutions présentées par balloo ça semble solutionner ces plantages.