Files
gitea-migration/setup/pi-monitoring/USAGE_GUIDE.md

188 lines
4.1 KiB
Markdown

# Pi Monitoring Usage Guide
Step-by-step runbook for setting up a brand-new Raspberry Pi as your monitoring and container admin node.
## 1) Prepare Raspberry Pi OS
Recommended image: Raspberry Pi OS Lite 64-bit (Bookworm).
In Raspberry Pi Imager advanced options:
- set hostname (example: `pi-ops`)
- enable SSH
- configure SSH key auth
- set username/password
- set timezone/locale
Use wired Ethernet and an SSD for persistent data.
## 2) Bootstrap host
SSH to the Pi:
```bash
ssh <user>@<pi-ip>
cd /path/to/gitea-migration/setup/pi-monitoring
./bootstrap_pi.sh --timezone=America/New_York --yes
```
What this does:
- updates OS packages
- installs hardening tools (`ufw`, `fail2ban`, unattended upgrades)
- installs Docker Engine + Compose plugin
- sets Docker daemon log rotation/live-restore
- opens firewall ports for monitoring stack
If this is the first Docker install, log out and log back in once.
## 3) Mount SSD
Identify disk/partition:
```bash
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT
```
Mount it at `/srv/ops` (example partition `/dev/sda1`):
```bash
./mount_ssd.sh --device=/dev/sda1 --mount-point=/srv/ops --yes
```
This creates persistent directories:
- `/srv/ops/portainer/data`
- `/srv/ops/grafana/data`
- `/srv/ops/prometheus/data`
- `/srv/ops/prometheus/targets`
- `/srv/ops/uptime-kuma/data`
- `/srv/ops/backups`
## 4) Configure stack env
```bash
cp stack.env.example stack.env
```
Edit `stack.env` and set at minimum:
- `OPS_ROOT` (usually `/srv/ops`)
- `GRAFANA_ADMIN_PASSWORD` (strong value)
- any non-default ports if needed
## 5) Deploy stack
```bash
./deploy_stack.sh --yes
./status.sh
```
Expected endpoints:
- Portainer: `https://<pi-ip>:9443`
- Grafana: `http://<pi-ip>:3000`
- Prometheus: `http://<pi-ip>:9090`
- Uptime Kuma: `http://<pi-ip>:3001`
## 6) Add Fedora + Unraid into single Portainer view
Install Portainer Agent on each remote Docker host.
From your admin machine (or from Pi if it can SSH to hosts):
```bash
./install_portainer_agent_remote.sh --host=<unraid-ip> --user=<unraid-user> --port=<unraid-ssh-port> --yes
./install_portainer_agent_remote.sh --host=<fedora-ip> --user=<fedora-user> --port=<fedora-ssh-port> --yes
```
Then in Portainer UI:
1. `Environments` -> `Add environment`
2. Choose `Docker Standalone`
3. Endpoint URL examples:
- `tcp://<unraid-ip>:9001`
- `tcp://<fedora-ip>:9001`
## 7) Add Prometheus targets for Fedora/Unraid node metrics
`deploy_stack.sh` creates `/srv/ops/prometheus/targets/external.yml` from template.
Edit that file to point to remote node-exporter targets:
```yaml
- labels:
job: unraid-node
targets:
- 192.168.1.82:9100
- labels:
job: fedora-node
targets:
- 192.168.1.90:9100
```
Reload Prometheus config:
```bash
docker compose --env-file stack.env -f docker-compose.yml exec prometheus \
wget -qO- --post-data='' http://127.0.0.1:9090/-/reload
```
## 8) Day-2 operations
Upgrade stack:
```bash
./upgrade_stack.sh --yes
```
Upgrade and prune old dangling images:
```bash
./upgrade_stack.sh --prune --yes
```
Backup stack:
```bash
./backup_stack.sh --retention-days=14 --yes
```
Restore stack:
```bash
./restore_stack.sh --archive=/srv/ops/backups/pi-monitoring-<timestamp>.tar.gz --yes
./deploy_stack.sh --yes
```
Stop stack:
```bash
./teardown_stack.sh --yes
```
Stop and delete persistent data:
```bash
./teardown_stack.sh --remove-data --yes
```
## 9) Recommended hardening
- Keep all services on LAN/VPN only; avoid WAN exposure.
- Use unique strong admin passwords for Portainer/Grafana.
- Keep `stack.env` readable only by admin user (`chmod 600 stack.env`).
- Back up `/srv/ops/backups` to another host/NAS.
- Regularly patch OS + container images.
## 10) Troubleshooting
Docker permission denied:
- re-login after `usermod -aG docker <user>`
Grafana container restart loop:
- check permissions on `${OPS_ROOT}/grafana/data` (UID/GID 472)
Prometheus not scraping remote hosts:
- verify remote exporter reachable: `curl http://<host>:9100/metrics`
- verify entries in `/srv/ops/prometheus/targets/external.yml`
Portainer cannot connect to endpoint:
- verify agent running on remote host: `docker ps | grep portainer_agent`
- check firewall for TCP `9001`