Files
gitea-migration/TODO.md

5.0 KiB

TODO — Phase 7.5 Nginx -> Caddy Consolidation

Why this exists

This file captures the decisions and migration context for the one-time "phase 7.5" work so we do not lose reasoning between sessions.

What happened so far

  1. The original phase8_cutover.sh was designed for one wildcard zone (*.${CADDY_DOMAIN}), mainly for Gitea cutover.
  2. The homelab currently has two active DNS zones in scope:
    • sintheus.com (legacy services behind Nginx)
    • privacyindesign.com (new Gitea public endpoint)
  3. Decision made: run a one-time migration where a single Caddy instance serves both zones, then gradually retire Nginx.
  4. Implemented: phase7_5_nginx_to_caddy.sh to generate/deploy a multi-domain Caddyfile and run canary/full rollout modes.

Current design decisions

  1. Public ingress should be HTTPS-only for all migrated hostnames.
  2. Backend scheme is mixed for now:
    • Keep http:// upstream where service does not yet have TLS.
    • Keep https:// where already available.
  3. End-to-end HTTPS is a target state, not an immediate requirement.
  4. A strict toggle exists in phase 7.5:
    • --strict-backend-https fails if any upstream is http://.
  5. Canary-first rollout:
    • first migration target is tower.sintheus.com.
  6. Canary mode is additive:
    • preserves existing Caddy routes
    • updates only a managed canary block for tower.sintheus.com.

Host map and backend TLS status

Canary scope (default mode)

  • tower.sintheus.com -> https://192.168.1.82:443 (TLS backend; cert verify skipped)
  • ${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000 (HTTP backend for now)

Full migration scope

  • ai.sintheus.com -> http://192.168.1.82:8181
  • photos.sintheus.com -> http://192.168.1.222:2283
  • fin.sintheus.com -> http://192.168.1.233:8096
  • disk.sintheus.com -> http://192.168.1.52:80
  • pi.sintheus.com -> http://192.168.1.4:80
  • plex.sintheus.com -> http://192.168.1.111:32400
  • sync.sintheus.com -> http://192.168.1.119:8384
  • syno.sintheus.com -> https://100.108.182.16:5001 (verify skipped)
  • tower.sintheus.com -> https://192.168.1.82:443 (verify skipped)
  • ${GITEA_DOMAIN} -> http://${UNRAID_GITEA_IP}:3000

Definition of done (phase 7.5)

Phase 7.5 is done only when all are true:

  1. Caddy is running on Unraid with generated multi-domain config.
  2. Canary host tower.sintheus.com is reachable over HTTPS through Caddy.
  3. Canary routing is proven by at least one path:
    • curl --resolve tests, or
    • split-DNS/hosts override, or
    • intentional DNS cutover.
  4. Legacy Nginx remains available for non-migrated hosts during canary.
  5. No critical regressions observed for at least 24 hours on canary traffic.

Definition of done (final state after full migration)

  1. All selected domains route to Caddy through the intended ingress path:
    • LAN-only: split-DNS/private resolution to Caddy, or
    • public: DNS to WAN ingress that forwards 443 to Caddy.
  2. Caddy serves valid certificates for both zones.
  3. Functional checks pass for each service (UI load, API, websocket/streaming where relevant).
  4. Nginx is no longer on the request path for migrated domains.
  5. Long-term target: all backends upgraded to https:// and strict mode passes.

What remains to happen

  1. Run canary:
    • ./phase7_5_nginx_to_caddy.sh --mode=canary
  2. Route canary traffic to Caddy using one method:
    • curl --resolve for zero-DNS-change testing, or
    • split-DNS/private DNS, or
    • explicit DNS cutover if desired.
  3. Observe errors/latency/app behavior for at least 24 hours.
  4. If canary is clean, run full:
    • ./phase7_5_nginx_to_caddy.sh --mode=full
  5. Move remaining routes in batches (DNS or split-DNS, depending on ingress model).
  6. Validate each app after each batch.
  7. After everything is stable, plan Nginx retirement.
  8. Later hardening pass:
    • enable TLS on each backend service one by one
    • flip each corresponding upstream to https://
    • finally run --strict-backend-https and require it to pass.

Risks and why mixed backend HTTP is acceptable short-term

  1. Risk: backend HTTP is unencrypted on LAN.
    • Mitigation: traffic stays on trusted local network, temporary state only.
  2. Risk: if strict mode is enabled too early, rollout blocks.
    • Mitigation: keep strict mode off until backend TLS coverage improves.
  3. Risk: moving all DNS at once can create broad outage.
    • Mitigation: canary-first and batch DNS cutover.

Operational notes

  1. If Caddyfile already exists, phase 7.5 backs it up as:
    • ${CADDY_DATA_PATH}/Caddyfile.pre_phase7_5.<timestamp>
  2. Compose stack path for Caddy:
    • ${UNRAID_COMPOSE_DIR}/caddy/docker-compose.yml
  3. Script does not change Cloudflare DNS records automatically.
    • DNS updates are intentional/manual to keep blast radius controlled.
  4. Do not set public Cloudflare proxied records to private 192.168.x.x addresses.
  5. Canary upsert behavior is domain-aware:
    • if site block for the canary domain does not exist, it is added
    • if site block exists, it is replaced in-place
    • previous block content is printed in logs before replacement