[Tracking issue] Monitoring #150

Open
opened 2024-10-10 18:32:02 +02:00 by rlahfa · 0 comments
Owner

Database

Move to VictoriaMetrics database with long term retention.

Agent

Move all agents to VictoriaMetrics agent, we obtain a new QL which is much better.

Exporters

Move to a similar style to the one in ForkOS, i.e. you can enable any exporter on any node and those are automatically collected for you by the local agent, the agent send it to the VM database.

Dashboards

@rlahfa I need to add my own declarative dashboard system.

Alerting

VMAlert is good, we can even make use of it in CI via the integration testing system, I never built that for my own infra yet but it seems the must to be good™.

New exporters

  • Scaphandre for power monitoring per process & per VM via the QEMU feature
  • iDRAC exporter
  • php-fpm exporter
  • pve exporter for krz01
  • nginx exporter
  • cgroup exporter from Arian (I can provide some code for that)

New services

  • Add Tempo for OTEL collecting: useful for our most advanced stuff, probably Nix-related as Lix & Tvix will introduce more and more OTEL
  • Add Loki for logging collection: useful for the ISP, for our AP, so that we can send our logs and forget about them
## Database Move to VictoriaMetrics database with long term retention. ## Agent Move all agents to VictoriaMetrics agent, we obtain a new QL which is much better. ## Exporters Move to a similar style to the one in ForkOS, i.e. you can enable any exporter on any node and those are automatically collected for you by the local agent, the agent send it to the VM database. ## Dashboards @rlahfa I need to add my own declarative dashboard system. ## Alerting VMAlert is good, we can even make use of it in CI via the integration testing system, I never built that for my own infra yet but it seems the must to be good™. ## New exporters - Scaphandre for power monitoring per process & per VM via the QEMU feature - iDRAC exporter - php-fpm exporter - pve exporter for krz01 - nginx exporter - cgroup exporter from Arian (I can provide some code for that) ## New services - Add Tempo for OTEL collecting: useful for our most advanced stuff, probably Nix-related as Lix & Tvix will introduce more and more OTEL - Add Loki for logging collection: useful for the ISP, for our AP, so that we can send our logs and forget about them
Luj added this to the Monitoring project 2024-10-12 20:56:46 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: DGNum/infrastructure#150
No description provided.