feat: add runner conversion scripts and strengthen cutover automation
This commit is contained in:
416
runners-conversion/augur/README.md
Normal file
416
runners-conversion/augur/README.md
Normal file
@@ -0,0 +1,416 @@
|
||||
# Self-Hosted GitHub Actions Runner (Docker)
|
||||
|
||||
Run GitHub Actions CI on your own Linux server instead of GitHub-hosted runners.
|
||||
Eliminates laptop CPU burden, avoids runner-minute quotas, and gives faster feedback.
|
||||
|
||||
## How It Works
|
||||
|
||||
Each runner container:
|
||||
1. Starts up, generates a short-lived registration token from your GitHub PAT
|
||||
2. Registers with GitHub in **ephemeral mode** (one job per lifecycle)
|
||||
3. Picks up a CI job, executes it, and exits
|
||||
4. Docker's `restart: unless-stopped` brings it back for the next job
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker Engine 24+ and Docker Compose v2
|
||||
- A GitHub Personal Access Token (classic) with **`repo`** and **`read:packages`** scopes
|
||||
- Network access to `github.com`, `api.github.com`, and `ghcr.io`
|
||||
|
||||
## One-Time GitHub Setup
|
||||
|
||||
Before deploying, the repository needs write permissions for the image build workflow.
|
||||
|
||||
### Enable GHCR image builds
|
||||
|
||||
The `build-runner-image.yml` workflow pushes Docker images to GHCR using the
|
||||
`GITHUB_TOKEN`. By default, this token is read-only and the workflow will fail
|
||||
silently (zero steps executed, no runner assigned).
|
||||
|
||||
Fix by allowing write permissions for Actions workflows:
|
||||
|
||||
```bash
|
||||
gh api -X PUT repos/OWNER/REPO/actions/permissions/workflow \
|
||||
-f default_workflow_permissions=write \
|
||||
-F can_approve_pull_request_reviews=false
|
||||
```
|
||||
|
||||
Alternatively, keep read-only defaults and create a dedicated PAT secret with
|
||||
`write:packages` scope, then reference it in the workflow instead of `GITHUB_TOKEN`.
|
||||
|
||||
### Build the runner image
|
||||
|
||||
Trigger the GHCR image build (first time and whenever Dockerfile/entrypoint changes):
|
||||
|
||||
```bash
|
||||
gh workflow run build-runner-image.yml
|
||||
```
|
||||
|
||||
Wait for the workflow to complete (~5 min):
|
||||
|
||||
```bash
|
||||
gh run list --workflow=build-runner-image.yml --limit=1
|
||||
```
|
||||
|
||||
The image is also rebuilt automatically:
|
||||
- On push to `main` when `infra/runners/Dockerfile` or `entrypoint.sh` changes
|
||||
- Weekly (Monday 06:00 UTC) to pick up OS patches and runner agent updates
|
||||
|
||||
## Deploy on Your Server
|
||||
|
||||
### Choose an image source
|
||||
|
||||
| Method | Files needed on server | Registry auth? | Best for |
|
||||
|--------|----------------------|----------------|----------|
|
||||
| **Self-hosted registry** | `docker-compose.yml`, `.env`, `envs/augur.env` | No (your network) | Production — push once, pull from any machine |
|
||||
| **GHCR** | `docker-compose.yml`, `.env`, `envs/augur.env` | Yes (`docker login ghcr.io`) | GitHub-native workflow |
|
||||
| **Build locally** | All 5 files (+ `Dockerfile`, `entrypoint.sh`) | No | Quick start, no registry needed |
|
||||
|
||||
### Option A: Self-hosted registry (recommended)
|
||||
|
||||
For the full end-to-end workflow (build image on your Mac, push to Unraid registry,
|
||||
start runner), see the [CI Workflow Guide](../../docs/ci-workflows.md#lifecycle-2-offload-ci-to-a-server-unraid).
|
||||
|
||||
The private Docker registry is configured at `infra/registry/`. It listens on port 5000,
|
||||
accessible from the LAN. Docker treats `localhost` registries as insecure by default —
|
||||
no `daemon.json` changes needed on the server. To push from another machine, add
|
||||
`<UNRAID_IP>:5000` to `insecure-registries` in that machine's Docker daemon config.
|
||||
|
||||
### Option B: GHCR
|
||||
|
||||
Requires the `build-runner-image.yml` workflow to have run successfully
|
||||
(see [One-Time GitHub Setup](#one-time-github-setup)).
|
||||
|
||||
```bash
|
||||
# 1. Copy environment templates
|
||||
cp .env.example .env
|
||||
cp envs/augur.env.example envs/augur.env
|
||||
|
||||
# 2. Edit .env — set your GITHUB_PAT
|
||||
# 3. Edit envs/augur.env — set REPO_URL, RUNNER_NAME, resource limits
|
||||
|
||||
# 4. Authenticate Docker with GHCR (one-time, persists to ~/.docker/config.json)
|
||||
echo "$GITHUB_PAT" | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
|
||||
|
||||
# 5. Pull and start
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
|
||||
# 6. Verify runner is registered
|
||||
docker compose ps
|
||||
docker compose logs -f runner-augur
|
||||
```
|
||||
|
||||
### Option C: Build locally
|
||||
|
||||
No registry needed — builds the image directly on the target machine.
|
||||
Requires `Dockerfile` and `entrypoint.sh` alongside the compose file.
|
||||
|
||||
```bash
|
||||
# 1. Copy environment templates
|
||||
cp .env.example .env
|
||||
cp envs/augur.env.example envs/augur.env
|
||||
|
||||
# 2. Edit .env — set your GITHUB_PAT
|
||||
# 3. Edit envs/augur.env — set REPO_URL, RUNNER_NAME, resource limits
|
||||
|
||||
# 4. Build and start
|
||||
docker compose up -d --build
|
||||
|
||||
# 5. Verify runner is registered
|
||||
docker compose ps
|
||||
docker compose logs -f runner-augur
|
||||
```
|
||||
|
||||
### Verify the runner is online in GitHub
|
||||
|
||||
```bash
|
||||
gh api repos/OWNER/REPO/actions/runners \
|
||||
--jq '.runners[] | {name, status, labels: [.labels[].name]}'
|
||||
```
|
||||
|
||||
## Activate Self-Hosted CI
|
||||
|
||||
Set the repository variable `CI_RUNS_ON` so the CI workflow targets your runner:
|
||||
|
||||
```bash
|
||||
gh variable set CI_RUNS_ON --body '["self-hosted", "Linux", "X64"]'
|
||||
```
|
||||
|
||||
To revert to GitHub-hosted runners:
|
||||
```bash
|
||||
gh variable delete CI_RUNS_ON
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Shared Config (`.env`)
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `GITHUB_PAT` | Yes | GitHub PAT with `repo` + `read:packages` scope |
|
||||
|
||||
### Per-Repo Config (`envs/<repo>.env`)
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `REPO_URL` | Yes | — | Full GitHub repository URL |
|
||||
| `RUNNER_NAME` | Yes | — | Unique runner name within the repo |
|
||||
| `RUNNER_LABELS` | No | `self-hosted,Linux,X64` | Comma-separated runner labels |
|
||||
| `RUNNER_GROUP` | No | `default` | Runner group |
|
||||
| `RUNNER_IMAGE` | No | `ghcr.io/aiinfuseds/augur-runner:latest` | Docker image to use |
|
||||
| `RUNNER_CPUS` | No | `6` | CPU limit for the container |
|
||||
| `RUNNER_MEMORY` | No | `12G` | Memory limit for the container |
|
||||
|
||||
## Adding More Repos
|
||||
|
||||
1. Copy the per-repo env template:
|
||||
```bash
|
||||
cp envs/augur.env.example envs/myrepo.env
|
||||
```
|
||||
|
||||
2. Edit `envs/myrepo.env` — set `REPO_URL`, `RUNNER_NAME`, and resource limits.
|
||||
|
||||
3. Add a service block to `docker-compose.yml`:
|
||||
```yaml
|
||||
runner-myrepo:
|
||||
image: ${RUNNER_IMAGE:-ghcr.io/aiinfuseds/augur-runner:latest}
|
||||
build: .
|
||||
env_file:
|
||||
- .env
|
||||
- envs/myrepo.env
|
||||
init: true
|
||||
read_only: true
|
||||
tmpfs:
|
||||
- /tmp:size=2G
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
stop_grace_period: 5m
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: "${RUNNER_CPUS:-6}"
|
||||
memory: "${RUNNER_MEMORY:-12G}"
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "pgrep", "-f", "Runner.Listener"]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
logging:
|
||||
driver: json-file
|
||||
options:
|
||||
max-size: "50m"
|
||||
max-file: "3"
|
||||
volumes:
|
||||
- myrepo-work:/home/runner/_work
|
||||
```
|
||||
|
||||
4. Add the volume at the bottom of `docker-compose.yml`:
|
||||
```yaml
|
||||
volumes:
|
||||
augur-work:
|
||||
myrepo-work:
|
||||
```
|
||||
|
||||
5. Start: `docker compose up -d`
|
||||
|
||||
## Scaling
|
||||
|
||||
Run multiple concurrent runners for the same repo:
|
||||
|
||||
```bash
|
||||
# Scale to 3 runners for augur
|
||||
docker compose up -d --scale runner-augur=3
|
||||
```
|
||||
|
||||
Each container gets a unique runner name (Docker appends a suffix).
|
||||
Set `RUNNER_NAME` to a base name like `unraid-augur` — scaled instances become
|
||||
`unraid-augur-1`, `unraid-augur-2`, etc.
|
||||
|
||||
## Resource Tuning
|
||||
|
||||
Each repo can have different resource limits in its env file:
|
||||
|
||||
```env
|
||||
# Lightweight repo (linting only)
|
||||
RUNNER_CPUS=2
|
||||
RUNNER_MEMORY=4G
|
||||
|
||||
# Heavy repo (Go builds + extensive tests)
|
||||
RUNNER_CPUS=8
|
||||
RUNNER_MEMORY=16G
|
||||
```
|
||||
|
||||
### tmpfs Sizing
|
||||
|
||||
The `/tmp` tmpfs defaults to 2G. If your CI writes large temp files,
|
||||
increase it in `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
tmpfs:
|
||||
- /tmp:size=4G
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
```bash
|
||||
# Container status and health
|
||||
docker compose ps
|
||||
|
||||
# Live logs
|
||||
docker compose logs -f runner-augur
|
||||
|
||||
# Last 50 log lines
|
||||
docker compose logs --tail 50 runner-augur
|
||||
|
||||
# Resource usage
|
||||
docker stats runner-augur
|
||||
```
|
||||
|
||||
## Updating the Runner Image
|
||||
|
||||
To pull the latest GHCR image:
|
||||
```bash
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
To rebuild locally:
|
||||
```bash
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Using a Self-Hosted Registry
|
||||
|
||||
See the [CI Workflow Guide](../../docs/ci-workflows.md#lifecycle-2-offload-ci-to-a-server-unraid)
|
||||
for the full build-push-start workflow with a self-hosted registry.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Image build workflow fails with zero steps
|
||||
|
||||
The `build-runner-image.yml` workflow needs `packages: write` permission.
|
||||
If the repo's default workflow permissions are read-only, the job fails
|
||||
instantly (0 steps, no runner assigned). See [One-Time GitHub Setup](#one-time-github-setup).
|
||||
|
||||
### `docker compose pull` returns "access denied" or 403
|
||||
|
||||
The GHCR package inherits the repository's visibility. For private repos,
|
||||
authenticate Docker first:
|
||||
|
||||
```bash
|
||||
echo "$GITHUB_PAT" | docker login ghcr.io -u USERNAME --password-stdin
|
||||
```
|
||||
|
||||
Or make the package public:
|
||||
```bash
|
||||
gh api -X PATCH /user/packages/container/augur-runner -f visibility=public
|
||||
```
|
||||
|
||||
Or skip GHCR entirely and build locally: `docker compose build`.
|
||||
|
||||
### Runner doesn't appear in GitHub
|
||||
|
||||
1. Check logs: `docker compose logs runner-augur`
|
||||
2. Verify `GITHUB_PAT` has `repo` scope
|
||||
3. Verify `REPO_URL` is correct (full HTTPS URL)
|
||||
4. Check network: `docker compose exec runner-augur curl -s https://api.github.com`
|
||||
|
||||
### Runner appears "offline"
|
||||
|
||||
The runner may have exited after a job. Check:
|
||||
```bash
|
||||
docker compose ps # Is the container running?
|
||||
docker compose restart runner-augur # Force restart
|
||||
```
|
||||
|
||||
### OOM (Out of Memory) kills
|
||||
|
||||
Increase `RUNNER_MEMORY` in the per-repo env file:
|
||||
```env
|
||||
RUNNER_MEMORY=16G
|
||||
```
|
||||
|
||||
Then: `docker compose up -d`
|
||||
|
||||
### Stale/ghost runners in GitHub
|
||||
|
||||
Ephemeral runners deregister automatically after each job. If a container
|
||||
was killed ungracefully (power loss, `docker kill`), the runner may appear
|
||||
stale. It will auto-expire after a few hours, or remove manually:
|
||||
|
||||
```bash
|
||||
# List runners
|
||||
gh api repos/OWNER/REPO/actions/runners --jq '.runners[] | {id, name, status}'
|
||||
|
||||
# Remove stale runner by ID
|
||||
gh api -X DELETE repos/OWNER/REPO/actions/runners/RUNNER_ID
|
||||
```
|
||||
|
||||
### Disk space
|
||||
|
||||
Check work directory volume usage:
|
||||
```bash
|
||||
docker system df -v
|
||||
```
|
||||
|
||||
Clean up unused volumes:
|
||||
```bash
|
||||
docker compose down -v # Remove work volumes
|
||||
docker volume prune # Remove all unused volumes
|
||||
```
|
||||
|
||||
## Unraid Notes
|
||||
|
||||
- **Docker login persistence**: `docker login ghcr.io` writes credentials to
|
||||
`/root/.docker/config.json`. On Unraid, `/root` is on the USB flash drive
|
||||
and persists across reboots. Verify with `cat /root/.docker/config.json`
|
||||
after login.
|
||||
- **Compose file location**: Place the 3 files (`docker-compose.yml`, `.env`,
|
||||
`envs/augur.env`) in a share directory (e.g., `/mnt/user/appdata/augur-runner/`).
|
||||
- **Alternative to GHCR**: If you don't want to deal with registry auth on Unraid,
|
||||
copy the `Dockerfile` and `entrypoint.sh` alongside the compose file and use
|
||||
`docker compose up -d --build` instead. No registry needed.
|
||||
|
||||
## Security
|
||||
|
||||
| Measure | Description |
|
||||
|---------|-------------|
|
||||
| Ephemeral mode | Fresh runner state per job — no cross-job contamination |
|
||||
| PAT scope isolation | PAT generates a short-lived registration token; PAT never touches the runner agent |
|
||||
| Non-root user | Runner process runs as UID 1000, not root |
|
||||
| no-new-privileges | Prevents privilege escalation via setuid/setgid binaries |
|
||||
| tini (PID 1) | Proper signal forwarding and zombie process reaping |
|
||||
| Log rotation | Prevents disk exhaustion from verbose CI output (50MB x 3 files) |
|
||||
|
||||
### PAT Scope
|
||||
|
||||
Use the minimum scope required:
|
||||
- **Classic token**: `repo` + `read:packages` scopes
|
||||
- **Fine-grained token**: Repository access → Only select repositories → Read and Write for Administration
|
||||
|
||||
### Network Considerations
|
||||
|
||||
The runner container needs outbound access to:
|
||||
- `github.com` (clone repos, download actions)
|
||||
- `api.github.com` (registration, status)
|
||||
- `ghcr.io` (pull runner image — only if using GHCR)
|
||||
- Package registries (`proxy.golang.org`, `registry.npmjs.org`, etc.)
|
||||
|
||||
No inbound ports are required.
|
||||
|
||||
## Stopping and Removing
|
||||
|
||||
```bash
|
||||
# Stop runners (waits for stop_grace_period)
|
||||
docker compose down
|
||||
|
||||
# Stop and remove work volumes
|
||||
docker compose down -v
|
||||
|
||||
# Stop, remove volumes, and delete the locally built image
|
||||
docker compose down -v --rmi local
|
||||
```
|
||||
Reference in New Issue
Block a user