118 lines
4.9 KiB
Markdown
118 lines
4.9 KiB
Markdown
# TODO — Phase 7.5 Nginx -> Caddy Consolidation
|
|
|
|
## Why this exists
|
|
|
|
This file captures the decisions and migration context for the one-time "phase 7.5"
|
|
work so we do not lose reasoning between sessions.
|
|
|
|
## What happened so far
|
|
|
|
1. The original `phase8_cutover.sh` was designed for one wildcard zone
|
|
(`*.${CADDY_DOMAIN}`), mainly for Gitea cutover.
|
|
2. The homelab currently has two active DNS zones in scope:
|
|
- `sintheus.com` (legacy services behind Nginx)
|
|
- `privacyindesign.com` (new Gitea public endpoint)
|
|
3. Decision made: run a one-time migration where a single Caddy instance serves
|
|
both zones, then gradually retire Nginx.
|
|
4. Implemented: `phase7_5_nginx_to_caddy.sh` to generate/deploy a multi-domain
|
|
Caddyfile and run canary/full rollout modes.
|
|
|
|
## Current design decisions
|
|
|
|
1. Public ingress should be HTTPS-only for all migrated hostnames.
|
|
2. Backend scheme is mixed for now:
|
|
- Keep `http://` upstream where service does not yet have TLS.
|
|
- Keep `https://` where already available.
|
|
3. End-to-end HTTPS is a target state, not an immediate requirement.
|
|
4. A strict toggle exists in phase 7.5:
|
|
- `--strict-backend-https` fails if any upstream is `http://`.
|
|
5. Canary-first rollout:
|
|
- first migration target is `tower.sintheus.com`.
|
|
6. Canary mode is additive:
|
|
- preserves existing Caddy routes
|
|
- updates only a managed canary block for `tower.sintheus.com`.
|
|
|
|
## Host map and backend TLS status
|
|
|
|
### Canary scope (default mode)
|
|
|
|
- `tower.sintheus.com -> https://192.168.1.82:443` (TLS backend; cert verify skipped)
|
|
- `${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000` (HTTP backend for now)
|
|
|
|
### Full migration scope
|
|
|
|
- `ai.sintheus.com -> http://192.168.1.82:8181`
|
|
- `photos.sintheus.com -> http://192.168.1.222:2283`
|
|
- `fin.sintheus.com -> http://192.168.1.233:8096`
|
|
- `disk.sintheus.com -> http://192.168.1.52:80`
|
|
- `pi.sintheus.com -> http://192.168.1.4:80`
|
|
- `plex.sintheus.com -> http://192.168.1.111:32400`
|
|
- `sync.sintheus.com -> http://192.168.1.119:8384`
|
|
- `syno.sintheus.com -> https://100.108.182.16:5001` (verify skipped)
|
|
- `tower.sintheus.com -> https://192.168.1.82:443` (verify skipped)
|
|
- `${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000`
|
|
|
|
## Definition of done (phase 7.5)
|
|
|
|
Phase 7.5 is done only when all are true:
|
|
|
|
1. Caddy is running on Unraid with generated multi-domain config.
|
|
2. Canary host `tower.sintheus.com` is reachable over HTTPS through Caddy.
|
|
3. Canary routing is proven by at least one path:
|
|
- `curl --resolve` tests, or
|
|
- split-DNS/hosts override, or
|
|
- intentional DNS cutover.
|
|
4. Legacy Nginx remains available for non-migrated hosts during canary.
|
|
5. No critical regressions observed for at least 24 hours on canary traffic.
|
|
|
|
## Definition of done (final state after full migration)
|
|
|
|
1. All selected domains route to Caddy through the intended ingress path:
|
|
- LAN-only: split-DNS/private resolution to Caddy, or
|
|
- public: DNS to WAN ingress that forwards 443 to Caddy.
|
|
2. Caddy serves valid certificates for both zones.
|
|
3. Functional checks pass for each service (UI load, API, websocket/streaming where relevant).
|
|
4. Nginx is no longer on the request path for migrated domains.
|
|
5. Long-term target: all backends upgraded to `https://` and strict mode passes.
|
|
|
|
## What remains to happen
|
|
|
|
1. Run canary:
|
|
- `./phase7_5_nginx_to_caddy.sh --mode=canary`
|
|
2. Route canary traffic to Caddy using one method:
|
|
- `curl --resolve` for zero-DNS-change testing, or
|
|
- split-DNS/private DNS, or
|
|
- explicit DNS cutover if desired.
|
|
3. Observe errors/latency/app behavior for at least 24 hours.
|
|
4. If canary is clean, run full:
|
|
- `./phase7_5_nginx_to_caddy.sh --mode=full`
|
|
5. Move remaining routes in batches (DNS or split-DNS, depending on ingress model).
|
|
6. Validate each app after each batch.
|
|
7. After everything is stable, plan Nginx retirement.
|
|
8. Later hardening pass:
|
|
- enable TLS on each backend service one by one
|
|
- flip each corresponding upstream to `https://`
|
|
- finally run `--strict-backend-https` and require it to pass.
|
|
|
|
## Risks and why mixed backend HTTP is acceptable short-term
|
|
|
|
1. Risk: backend HTTP is unencrypted on LAN.
|
|
- Mitigation: traffic stays on trusted local network, temporary state only.
|
|
2. Risk: if strict mode is enabled too early, rollout blocks.
|
|
- Mitigation: keep strict mode off until backend TLS coverage improves.
|
|
3. Risk: moving all DNS at once can create broad outage.
|
|
- Mitigation: canary-first and batch DNS cutover.
|
|
|
|
## Operational notes
|
|
|
|
1. If Caddyfile already exists, phase 7.5 backs it up as:
|
|
- `${CADDY_DATA_PATH}/Caddyfile.pre_phase7_5.<timestamp>`
|
|
2. Compose stack path for Caddy:
|
|
- `${UNRAID_COMPOSE_DIR}/caddy/docker-compose.yml`
|
|
3. Script does not change Cloudflare DNS records automatically.
|
|
- DNS updates are intentional/manual to keep blast radius controlled.
|
|
4. Do not set public Cloudflare proxied records to private `192.168.x.x` addresses.
|
|
5. Canary updates are enclosed between markers:
|
|
- `# BEGIN_PHASE7_5_CANARY`
|
|
- `# END_PHASE7_5_CANARY`
|