include monitoring documentation in manual
|
|
@ -1,118 +0,0 @@
|
|||
# Monitoring Block
|
||||
|
||||
This block sets up the monitoring stack for Self Host Blocks. It is composed of:
|
||||
|
||||
- Grafana as the dashboard frontend.
|
||||
- Prometheus as the database for metrics.
|
||||
- Loki as the database for logs.
|
||||
|
||||
## Configuration
|
||||
|
||||
```nix
|
||||
shb.monitoring = {
|
||||
enable = true;
|
||||
subdomain = "grafana";
|
||||
inherit domain;
|
||||
contactPoints = [ "me@example.com" ];
|
||||
adminPasswordFile = config.sops.secrets."monitoring/admin_password".path;
|
||||
secretKeyFile = config.sops.secrets."monitoring/secret_key".path;
|
||||
};
|
||||
|
||||
sops.secrets."monitoring/admin_password" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
sops.secrets."monitoring/secret_key" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
```
|
||||
|
||||
With that, Grafana, Prometheus, Loki and Promtail are setup! You can access `Grafana` at
|
||||
`grafana.example.com` with user `admin` and password ``.
|
||||
|
||||
I recommend adding a STMP server configuration so you receive alerts by email:
|
||||
|
||||
```nix
|
||||
shb.monitoring.smtp = {
|
||||
from_address = "grafana@$example.com";
|
||||
from_name = "Grafana";
|
||||
host = "smtp.mailgun.org";
|
||||
port = 587;
|
||||
username = "postmaster@mg.example.com";
|
||||
passwordFile = config.sops.secrets."monitoring/smtp".path;
|
||||
};
|
||||
|
||||
sops.secrets."monitoring/secret_key" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
```
|
||||
|
||||
Since all logs are now stored in Loki, you can probably reduce the systemd journal retention
|
||||
time with:
|
||||
|
||||
```nix
|
||||
# See https://www.freedesktop.org/software/systemd/man/journald.conf.html#SystemMaxUse=
|
||||
services.journald.extraConfig = ''
|
||||
SystemMaxUse=2G
|
||||
SystemKeepFree=4G
|
||||
SystemMaxFileSize=100M
|
||||
MaxFileSec=day
|
||||
'';
|
||||
```
|
||||
|
||||
## Provisioning
|
||||
|
||||
Self Host Blocks will create automatically the following resources:
|
||||
|
||||
- For Grafana:
|
||||
- datasources
|
||||
- dashboards
|
||||
- contact points
|
||||
- notification policies
|
||||
- alerts
|
||||
- For Prometheus, the following exporters and related scrapers:
|
||||
- node
|
||||
- smartctl
|
||||
- nginx
|
||||
- For Loki, the following exporters and related scrapers:
|
||||
- systemd
|
||||
|
||||
Those resources are namespaced as appropriate under the Self Host Blocks namespace:
|
||||
|
||||

|
||||
|
||||
## Errors Dashboard
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is misbehaving.
|
||||
|
||||

|
||||

|
||||
|
||||
The yellow and red dashed vertical bars correspond to the [Requests Error Budget
|
||||
Alert](#requests-error-budget-alert) firing.
|
||||
|
||||
## Performance Dashboard
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is performing poorly.
|
||||
|
||||

|
||||

|
||||
|
||||
## Requests Error Budget Alert
|
||||
|
||||
This alert will fire when the ratio between number of requests getting a 5XX response from a service
|
||||
and the total requests to that service exceeds 1%.
|
||||
|
||||

|
||||

|
||||
7
docs/blocks/monitoring/alerts-requests-error-budger.md
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# Requests Error Budget Alert {#blocks-monitoring-budget-alerts}
|
||||
|
||||
This alert will fire when the ratio between number of requests getting a 5XX response from a service
|
||||
and the total requests to that service exceeds 1%.
|
||||
|
||||

|
||||

|
||||
|
Before Width: | Height: | Size: 233 KiB After Width: | Height: | Size: 233 KiB |
|
Before Width: | Height: | Size: 246 KiB After Width: | Height: | Size: 246 KiB |
|
Before Width: | Height: | Size: 444 KiB After Width: | Height: | Size: 444 KiB |
|
Before Width: | Height: | Size: 441 KiB After Width: | Height: | Size: 441 KiB |
|
Before Width: | Height: | Size: 628 KiB After Width: | Height: | Size: 628 KiB |
|
Before Width: | Height: | Size: 267 KiB After Width: | Height: | Size: 267 KiB |
|
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 48 KiB |
64
docs/blocks/monitoring/configuration.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# Configuration {#blocks-monitoring-configuration}
|
||||
|
||||
```nix
|
||||
shb.monitoring = {
|
||||
enable = true;
|
||||
subdomain = "grafana";
|
||||
inherit domain;
|
||||
contactPoints = [ "me@example.com" ];
|
||||
adminPasswordFile = config.sops.secrets."monitoring/admin_password".path;
|
||||
secretKeyFile = config.sops.secrets."monitoring/secret_key".path;
|
||||
};
|
||||
|
||||
sops.secrets."monitoring/admin_password" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
sops.secrets."monitoring/secret_key" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
```
|
||||
|
||||
With that, Grafana, Prometheus, Loki and Promtail are setup! You can access `Grafana` at
|
||||
`grafana.example.com` with user `admin` and password ``.
|
||||
|
||||
I recommend adding a STMP server configuration so you receive alerts by email:
|
||||
|
||||
```nix
|
||||
shb.monitoring.smtp = {
|
||||
from_address = "grafana@$example.com";
|
||||
from_name = "Grafana";
|
||||
host = "smtp.mailgun.org";
|
||||
port = 587;
|
||||
username = "postmaster@mg.example.com";
|
||||
passwordFile = config.sops.secrets."monitoring/smtp".path;
|
||||
};
|
||||
|
||||
sops.secrets."monitoring/secret_key" = {
|
||||
sopsFile = ./secrets.yaml;
|
||||
mode = "0400";
|
||||
owner = "grafana";
|
||||
group = "grafana";
|
||||
restartUnits = [ "grafana.service" ];
|
||||
};
|
||||
```
|
||||
|
||||
Since all logs are now stored in Loki, you can probably reduce the systemd journal retention
|
||||
time with:
|
||||
|
||||
```nix
|
||||
# See https://www.freedesktop.org/software/systemd/man/journald.conf.html#SystemMaxUse=
|
||||
services.journald.extraConfig = ''
|
||||
SystemMaxUse=2G
|
||||
SystemKeepFree=4G
|
||||
SystemMaxFileSize=100M
|
||||
MaxFileSec=day
|
||||
'';
|
||||
```
|
||||
9
docs/blocks/monitoring/dashboard-errors.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Errors Dashboard {#blocks-monitoring-error-dashboard}
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is misbehaving.
|
||||
|
||||

|
||||

|
||||
|
||||
The yellow and red dashed vertical bars correspond to the [Requests Error Budget
|
||||
Alert](#blocks-monitoring-budget-alerts) firing.
|
||||
6
docs/blocks/monitoring/dashboard-performance.md
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# Performance Dashboard {#blocks-monitoring-performance-dashboard}
|
||||
|
||||
This dashboard is meant to be the first stop to understand why a service is performing poorly.
|
||||
|
||||

|
||||

|
||||
17
docs/blocks/monitoring/default.md
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
# Monitoring Block {#blocks-monitoring}
|
||||
|
||||
Defined in [`/modules/blocks/monitoring.nix`](@REPO@/modules/blocks/monitoring.nix).
|
||||
|
||||
This block sets up the monitoring stack for Self Host Blocks. It is composed of:
|
||||
|
||||
- Grafana as the dashboard frontend.
|
||||
- Prometheus as the database for metrics.
|
||||
- Loki as the database for logs.
|
||||
|
||||
```{=include=} parts
|
||||
configuration.md
|
||||
provisioning.md
|
||||
dashboard-errors.md
|
||||
dashboard-performance.md
|
||||
alerts-requests-error-budger.md
|
||||
```
|
||||
20
docs/blocks/monitoring/provisioning.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# Provisioning {#blocks-monitoring-provisioning}
|
||||
|
||||
Self Host Blocks will create automatically the following resources:
|
||||
|
||||
- For Grafana:
|
||||
- datasources
|
||||
- dashboards
|
||||
- contact points
|
||||
- notification policies
|
||||
- alerts
|
||||
- For Prometheus, the following exporters and related scrapers:
|
||||
- node
|
||||
- smartctl
|
||||
- nginx
|
||||
- For Loki, the following exporters and related scrapers:
|
||||
- systemd
|
||||
|
||||
Those resources are namespaced as appropriate under the Self Host Blocks namespace:
|
||||
|
||||
[](./assets/folder.png)
|
||||
|
|
@ -7,6 +7,10 @@
|
|||
preface.md
|
||||
```
|
||||
|
||||
```{=include=} parts html:into-file=//blocks-monitoring.html
|
||||
blocks/monitoring/default.md
|
||||
```
|
||||
|
||||
```{=include=} appendix html:into-file=//options.html
|
||||
options.md
|
||||
```
|
||||
|
|
|
|||