add documentation about monitoring
BIN
docs/assets/monitoring_grafana_alert_rules_5xx_1.png
Normal file
|
After Width: | Height: | Size: 233 KiB |
BIN
docs/assets/monitoring_grafana_alert_rules_5xx_2.png
Normal file
|
After Width: | Height: | Size: 246 KiB |
BIN
docs/assets/monitoring_grafana_dashboards_Errors_1.png
Normal file
|
After Width: | Height: | Size: 444 KiB |
BIN
docs/assets/monitoring_grafana_dashboards_Errors_2.png
Normal file
|
After Width: | Height: | Size: 441 KiB |
BIN
docs/assets/monitoring_grafana_dashboards_Performance_1.png
Normal file
|
After Width: | Height: | Size: 628 KiB |
BIN
docs/assets/monitoring_grafana_dashboards_Performance_2.png
Normal file
|
After Width: | Height: | Size: 267 KiB |
BIN
docs/assets/monitoring_grafana_folder.png
Normal file
|
After Width: | Height: | Size: 48 KiB |
53
docs/blocks/monitoring.md
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
# Monitoring Block
|
||||
|
||||
This block sets up the monitoring stack for Self Host Blocks. It is composed of:
|
||||
|
||||
- Grafana as the dashboard frontend.
|
||||
- Prometheus as the database for metrics.
|
||||
- Loki as the database for logs.
|
||||
|
||||
## Provisioning
|
||||
|
||||
Self Host Blocks will create automatically the following resources:
|
||||
|
||||
- For Grafana:
|
||||
- datasources
|
||||
- dashboards
|
||||
- contact points
|
||||
- notification policies
|
||||
- alerts
|
||||
- For Prometheus, the following exporters and related scrapers:
|
||||
- node
|
||||
- smartctl
|
||||
- nginx
|
||||
- For Loki, the following exporters and related scrapers:
|
||||
- systemd
|
||||
|
||||
Those resources are namespaced as appropriate under the Self Host Blocks namespace:
|
||||
|
||||

|
||||
|
||||
## Errors Dashboard
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is misbehaving.
|
||||
|
||||

|
||||

|
||||
|
||||
The yellow and red dashed vertical bars correspond to the [Requests Error Budget
|
||||
Alert](#requests-error-budget-alert) firing.
|
||||
|
||||
## Performance Dashboard
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is performing poorly.
|
||||
|
||||

|
||||

|
||||
|
||||
## Requests Error Budget Alert
|
||||
|
||||
This alert will fire when the ratio between number of requests getting a 5XX response from a service
|
||||
and the total requests to that service exceeds 1%.
|
||||
|
||||

|
||||

|
||||