# TODO — Phase 7.5 Nginx -> Caddy Consolidation ## Why this exists This file captures the decisions and migration context for the one-time "phase 7.5" work so we do not lose reasoning between sessions. ## What happened so far 1. The original `phase8_cutover.sh` was designed for one wildcard zone (`*.${CADDY_DOMAIN}`), mainly for Gitea cutover. 2. The homelab currently has two active DNS zones in scope: - `sintheus.com` (legacy services behind Nginx) - `privacyindesign.com` (new Gitea public endpoint) 3. Decision made: run a one-time migration where a single Caddy instance serves both zones, then gradually retire Nginx. 4. Implemented: `phase7_5_nginx_to_caddy.sh` to generate/deploy a multi-domain Caddyfile and run canary/full rollout modes. ## Current design decisions 1. Public ingress should be HTTPS-only for all migrated hostnames. 2. Backend scheme is mixed for now: - Keep `http://` upstream where service does not yet have TLS. - Keep `https://` where already available. 3. End-to-end HTTPS is a target state, not an immediate requirement. 4. A strict toggle exists in phase 7.5: - `--strict-backend-https` fails if any upstream is `http://`. 5. Canary-first rollout: - first migration target is `tower.sintheus.com`. ## Host map and backend TLS status ### Canary scope (default mode) - `tower.sintheus.com -> https://192.168.1.82:443` (TLS backend; cert verify skipped) - `${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000` (HTTP backend for now) ### Full migration scope - `ai.sintheus.com -> http://192.168.1.82:8181` - `photos.sintheus.com -> http://192.168.1.222:2283` - `fin.sintheus.com -> http://192.168.1.233:8096` - `disk.sintheus.com -> http://192.168.1.52:80` - `pi.sintheus.com -> http://192.168.1.4:80` - `plex.sintheus.com -> http://192.168.1.111:32400` - `sync.sintheus.com -> http://192.168.1.119:8384` - `syno.sintheus.com -> https://100.108.182.16:5001` (verify skipped) - `tower.sintheus.com -> https://192.168.1.82:443` (verify skipped) - `${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000` ## Definition of done (phase 7.5) Phase 7.5 is done only when all are true: 1. Caddy is running on Unraid with generated multi-domain config. 2. Canary host `tower.sintheus.com` is reachable over HTTPS through Caddy. 3. Canary routing is proven by at least one path: - `curl --resolve` tests, or - split-DNS/hosts override, or - intentional DNS cutover. 4. Legacy Nginx remains available for non-migrated hosts during canary. 5. No critical regressions observed for at least 24 hours on canary traffic. ## Definition of done (final state after full migration) 1. All selected domains route to Caddy through the intended ingress path: - LAN-only: split-DNS/private resolution to Caddy, or - public: DNS to WAN ingress that forwards 443 to Caddy. 2. Caddy serves valid certificates for both zones. 3. Functional checks pass for each service (UI load, API, websocket/streaming where relevant). 4. Nginx is no longer on the request path for migrated domains. 5. Long-term target: all backends upgraded to `https://` and strict mode passes. ## What remains to happen 1. Run canary: - `./phase7_5_nginx_to_caddy.sh --mode=canary` 2. Route canary traffic to Caddy using one method: - `curl --resolve` for zero-DNS-change testing, or - split-DNS/private DNS, or - explicit DNS cutover if desired. 3. Observe errors/latency/app behavior for at least 24 hours. 4. If canary is clean, run full: - `./phase7_5_nginx_to_caddy.sh --mode=full` 5. Move remaining routes in batches (DNS or split-DNS, depending on ingress model). 6. Validate each app after each batch. 7. After everything is stable, plan Nginx retirement. 8. Later hardening pass: - enable TLS on each backend service one by one - flip each corresponding upstream to `https://` - finally run `--strict-backend-https` and require it to pass. ## Risks and why mixed backend HTTP is acceptable short-term 1. Risk: backend HTTP is unencrypted on LAN. - Mitigation: traffic stays on trusted local network, temporary state only. 2. Risk: if strict mode is enabled too early, rollout blocks. - Mitigation: keep strict mode off until backend TLS coverage improves. 3. Risk: moving all DNS at once can create broad outage. - Mitigation: canary-first and batch DNS cutover. ## Operational notes 1. If Caddyfile already exists, phase 7.5 backs it up as: - `${CADDY_DATA_PATH}/Caddyfile.pre_phase7_5.` 2. Compose stack path for Caddy: - `${UNRAID_COMPOSE_DIR}/caddy/docker-compose.yml` 3. Script does not change Cloudflare DNS records automatically. - DNS updates are intentional/manual to keep blast radius controlled. 4. Do not set public Cloudflare proxied records to private `192.168.x.x` addresses.