gitea-migration/runners-conversion/augur/README.md

# Self-Hosted GitHub Actions Runner (Docker)

Run GitHub Actions CI on your own Linux server instead of GitHub-hosted runners.
Eliminates laptop CPU burden, avoids runner-minute quotas, and gives faster feedback.

## How It Works

Each runner container:
1. Starts up, generates a short-lived registration token from your GitHub PAT
2. Registers with GitHub in **ephemeral mode** (one job per lifecycle)
3. Picks up a CI job, executes it, and exits
4. Docker's `restart: unless-stopped` brings it back for the next job

## Prerequisites

- Docker Engine 24+ and Docker Compose v2
- A GitHub Personal Access Token (classic) with **`repo`** and **`read:packages`** scopes
- Network access to `github.com`, `api.github.com`, and `ghcr.io`

## One-Time GitHub Setup

Before deploying, the repository needs write permissions for the image build workflow.

### Enable GHCR image builds

The `build-runner-image.yml` workflow pushes Docker images to GHCR using the
`GITHUB_TOKEN`. By default, this token is read-only and the workflow will fail
silently (zero steps executed, no runner assigned).

Fix by allowing write permissions for Actions workflows:

```bash
gh api -X PUT repos/OWNER/REPO/actions/permissions/workflow \
  -f default_workflow_permissions=write \
  -F can_approve_pull_request_reviews=false
```

Alternatively, keep read-only defaults and create a dedicated PAT secret with
`write:packages` scope, then reference it in the workflow instead of `GITHUB_TOKEN`.

### Build the runner image

Trigger the GHCR image build (first time and whenever Dockerfile/entrypoint changes):

```bash
gh workflow run build-runner-image.yml
```

Wait for the workflow to complete (~5 min):

```bash
gh run list --workflow=build-runner-image.yml --limit=1
```

The image is also rebuilt automatically:
- On push to `main` when `infra/runners/Dockerfile` or `entrypoint.sh` changes
- Weekly (Monday 06:00 UTC) to pick up OS patches and runner agent updates

## Deploy on Your Server

### Choose an image source

| Method | Files needed on server | Registry auth? | Best for |
|--------|----------------------|----------------|----------|
| **Self-hosted registry** | `docker-compose.yml`, `.env`, `envs/augur.env` | No (your network) | Production — push once, pull from any machine |
| **GHCR** | `docker-compose.yml`, `.env`, `envs/augur.env` | Yes (`docker login ghcr.io`) | GitHub-native workflow |
| **Build locally** | All 5 files (+ `Dockerfile`, `entrypoint.sh`) | No | Quick start, no registry needed |

### Option A: Self-hosted registry (recommended)

For the full end-to-end workflow (build image on your Mac, push to Unraid registry,
start runner), see the [CI Workflow Guide](../../docs/ci-workflows.md#lifecycle-2-offload-ci-to-a-server-unraid).

The private Docker registry is configured at `infra/registry/`. It listens on port 5000,
accessible from the LAN. Docker treats `localhost` registries as insecure by default —
no `daemon.json` changes needed on the server. To push from another machine, add
`<UNRAID_IP>:5000` to `insecure-registries` in that machine's Docker daemon config.

### Option B: GHCR

Requires the `build-runner-image.yml` workflow to have run successfully
(see [One-Time GitHub Setup](#one-time-github-setup)).

```bash
# 1. Copy environment templates
cp .env.example .env
cp envs/augur.env.example envs/augur.env

# 2. Edit .env — set your GITHUB_PAT
# 3. Edit envs/augur.env — set REPO_URL, RUNNER_NAME, resource limits

# 4. Authenticate Docker with GHCR (one-time, persists to ~/.docker/config.json)
echo "$GITHUB_PAT" | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin

# 5. Pull and start
docker compose pull
docker compose up -d

# 6. Verify runner is registered
docker compose ps
docker compose logs -f runner-augur
```

### Option C: Build locally

No registry needed — builds the image directly on the target machine.
Requires `Dockerfile` and `entrypoint.sh` alongside the compose file.

```bash
# 1. Copy environment templates
cp .env.example .env
cp envs/augur.env.example envs/augur.env

# 2. Edit .env — set your GITHUB_PAT
# 3. Edit envs/augur.env — set REPO_URL, RUNNER_NAME, resource limits

# 4. Build and start
docker compose up -d --build

# 5. Verify runner is registered
docker compose ps
docker compose logs -f runner-augur
```

### Verify the runner is online in GitHub

```bash
gh api repos/OWNER/REPO/actions/runners \
  --jq '.runners[] | {name, status, labels: [.labels[].name]}'
```

## Activate Self-Hosted CI

Set the repository variable `CI_RUNS_ON` so the CI workflow targets your runner:

```bash
gh variable set CI_RUNS_ON --body '["self-hosted", "Linux", "X64"]'
```

To revert to GitHub-hosted runners:
```bash
gh variable delete CI_RUNS_ON
```

## Configuration

### Shared Config (`.env`)

| Variable | Required | Description |
|----------|----------|-------------|
| `GITHUB_PAT` | Yes | GitHub PAT with `repo` + `read:packages` scope |

### Per-Repo Config (`envs/<repo>.env`)

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `REPO_URL` | Yes | — | Full GitHub repository URL |
| `RUNNER_NAME` | Yes | — | Unique runner name within the repo |
| `RUNNER_LABELS` | No | `self-hosted,Linux,X64` | Comma-separated runner labels |
| `RUNNER_GROUP` | No | `default` | Runner group |
| `RUNNER_IMAGE` | No | `ghcr.io/aiinfuseds/augur-runner:latest` | Docker image to use |
| `RUNNER_CPUS` | No | `6` | CPU limit for the container |
| `RUNNER_MEMORY` | No | `12G` | Memory limit for the container |

## Adding More Repos

1. Copy the per-repo env template:
   ```bash
   cp envs/augur.env.example envs/myrepo.env
   ```

2. Edit `envs/myrepo.env` — set `REPO_URL`, `RUNNER_NAME`, and resource limits.

3. Add a service block to `docker-compose.yml`:
   ```yaml
   runner-myrepo:
     image: ${RUNNER_IMAGE:-ghcr.io/aiinfuseds/augur-runner:latest}
     build: .
     env_file:
       - .env
       - envs/myrepo.env
     init: true
     read_only: true
     tmpfs:
       - /tmp:size=2G
     security_opt:
       - no-new-privileges:true
     stop_grace_period: 5m
     deploy:
       resources:
         limits:
           cpus: "${RUNNER_CPUS:-6}"
           memory: "${RUNNER_MEMORY:-12G}"
     restart: unless-stopped
     healthcheck:
       test: ["CMD", "pgrep", "-f", "Runner.Listener"]
       interval: 30s
       timeout: 5s
       retries: 3
       start_period: 30s
     logging:
       driver: json-file
       options:
         max-size: "50m"
         max-file: "3"
     volumes:
       - myrepo-work:/home/runner/_work
   ```

4. Add the volume at the bottom of `docker-compose.yml`:
   ```yaml
   volumes:
     augur-work:
     myrepo-work:
   ```

5. Start: `docker compose up -d`

## Scaling

Run multiple concurrent runners for the same repo:

```bash
# Scale to 3 runners for augur
docker compose up -d --scale runner-augur=3
```

Each container gets a unique runner name (Docker appends a suffix).
Set `RUNNER_NAME` to a base name like `unraid-augur` — scaled instances become
`unraid-augur-1`, `unraid-augur-2`, etc.

## Resource Tuning

Each repo can have different resource limits in its env file:

```env
# Lightweight repo (linting only)
RUNNER_CPUS=2
RUNNER_MEMORY=4G

# Heavy repo (Go builds + extensive tests)
RUNNER_CPUS=8
RUNNER_MEMORY=16G
```

### tmpfs Sizing

The `/tmp` tmpfs defaults to 2G. If your CI writes large temp files,
increase it in `docker-compose.yml`:

```yaml
tmpfs:
  - /tmp:size=4G
```

## Monitoring

```bash
# Container status and health
docker compose ps

# Live logs
docker compose logs -f runner-augur

# Last 50 log lines
docker compose logs --tail 50 runner-augur

# Resource usage
docker stats runner-augur
```

## Updating the Runner Image

To pull the latest GHCR image:
```bash
docker compose pull
docker compose up -d
```

To rebuild locally:
```bash
docker compose build
docker compose up -d
```

### Using a Self-Hosted Registry

See the [CI Workflow Guide](../../docs/ci-workflows.md#lifecycle-2-offload-ci-to-a-server-unraid)
for the full build-push-start workflow with a self-hosted registry.

## Troubleshooting

### Image build workflow fails with zero steps

The `build-runner-image.yml` workflow needs `packages: write` permission.
If the repo's default workflow permissions are read-only, the job fails
instantly (0 steps, no runner assigned). See [One-Time GitHub Setup](#one-time-github-setup).

### `docker compose pull` returns "access denied" or 403

The GHCR package inherits the repository's visibility. For private repos,
authenticate Docker first:

```bash
echo "$GITHUB_PAT" | docker login ghcr.io -u USERNAME --password-stdin
```

Or make the package public:
```bash
gh api -X PATCH /user/packages/container/augur-runner -f visibility=public
```

Or skip GHCR entirely and build locally: `docker compose build`.

### Runner doesn't appear in GitHub

1. Check logs: `docker compose logs runner-augur`
2. Verify `GITHUB_PAT` has `repo` scope
3. Verify `REPO_URL` is correct (full HTTPS URL)
4. Check network: `docker compose exec runner-augur curl -s https://api.github.com`

### Runner appears "offline"

The runner may have exited after a job. Check:
```bash
docker compose ps          # Is the container running?
docker compose restart runner-augur  # Force restart
```

### OOM (Out of Memory) kills

Increase `RUNNER_MEMORY` in the per-repo env file:
```env
RUNNER_MEMORY=16G
```

Then: `docker compose up -d`

### Stale/ghost runners in GitHub

Ephemeral runners deregister automatically after each job. If a container
was killed ungracefully (power loss, `docker kill`), the runner may appear
stale. It will auto-expire after a few hours, or remove manually:

```bash
# List runners
gh api repos/OWNER/REPO/actions/runners --jq '.runners[] | {id, name, status}'

# Remove stale runner by ID
gh api -X DELETE repos/OWNER/REPO/actions/runners/RUNNER_ID
```

### Disk space

Check work directory volume usage:
```bash
docker system df -v
```

Clean up unused volumes:
```bash
docker compose down -v   # Remove work volumes
docker volume prune      # Remove all unused volumes
```

## Unraid Notes

- **Docker login persistence**: `docker login ghcr.io` writes credentials to
  `/root/.docker/config.json`. On Unraid, `/root` is on the USB flash drive
  and persists across reboots. Verify with `cat /root/.docker/config.json`
  after login.
- **Compose file location**: Place the 3 files (`docker-compose.yml`, `.env`,
  `envs/augur.env`) in a share directory (e.g., `/mnt/user/appdata/augur-runner/`).
- **Alternative to GHCR**: If you don't want to deal with registry auth on Unraid,
  copy the `Dockerfile` and `entrypoint.sh` alongside the compose file and use
  `docker compose up -d --build` instead. No registry needed.

## Security

| Measure | Description |
|---------|-------------|
| Ephemeral mode | Fresh runner state per job — no cross-job contamination |
| PAT scope isolation | PAT generates a short-lived registration token; PAT never touches the runner agent |
| Non-root user | Runner process runs as UID 1000, not root |
| no-new-privileges | Prevents privilege escalation via setuid/setgid binaries |
| tini (PID 1) | Proper signal forwarding and zombie process reaping |
| Log rotation | Prevents disk exhaustion from verbose CI output (50MB x 3 files) |

### PAT Scope

Use the minimum scope required:
- **Classic token**: `repo` + `read:packages` scopes
- **Fine-grained token**: Repository access → Only select repositories → Read and Write for Administration

### Network Considerations

The runner container needs outbound access to:
- `github.com` (clone repos, download actions)
- `api.github.com` (registration, status)
- `ghcr.io` (pull runner image — only if using GHCR)
- Package registries (`proxy.golang.org`, `registry.npmjs.org`, etc.)

No inbound ports are required.

## Stopping and Removing

```bash
# Stop runners (waits for stop_grace_period)
docker compose down

# Stop and remove work volumes
docker compose down -v

# Stop, remove volumes, and delete the locally built image
docker compose down -v --rmi local
```