Gitea Migration Toolkit

Automated migration of GitHub repositories to self-hosted Gitea, with backup mirroring and push-mirror offsite redundancy. 43 shell scripts, 9 config templates, ~8,000 lines of bash.

What This Does

Moves GitHub repos to a self-hosted Gitea instance on Unraid, sets up a backup Gitea mirror on Fedora, and keeps GitHub as an offsite push mirror. After migration, Gitea is the primary git host — all CI runs on Gitea Actions, GitHub receives automatic push mirrors, and Fedora pulls from Unraid on a schedule. Supports any number of repos via space-delimited REPO_NAMES in .env.

The entire process is driven from a MacBook over SSH. Nothing is installed on the remote machines beyond what the setup scripts explicitly provision.

Architecture

                    ┌──────────────────────────────────────────────┐
                    │               MacBook (Control Plane)        │
                    │   Runs all scripts locally, SSHs into hosts  │
                    │   Native macOS runner (launchd)              │
                    └──────────┬──────────────────┬────────────────┘
                               │ SSH              │ SSH
                    ┌──────────▼──────────┐  ┌────▼───────────────┐
                    │   Unraid (Primary)   │  │   Fedora (Backup)  │
                    │   Gitea + Caddy      │  │   Gitea (mirror)   │
                    │   Docker runners     │  │   Docker runners   │
                    │   macvlan networking  │  │   Backup storage   │
                    └──────────┬──────────┘  └────▲───────────────┘
                               │                  │
                               │  pull mirror     │
                               │  (8h interval)   │
                               └──────────────────┘
                               │
                               │ push mirror (on commit + 8h)
                               ▼
                    ┌──────────────────────┐
                    │   GitHub (Offsite)   │
                    │   Read-only mirror   │
                    │   Actions disabled   │
                    └──────────────────────┘

Data flow after migration:

  • Developers push to Gitea on Unraid (via HTTPS reverse proxy)
  • Gitea pushes to GitHub on every commit and on an 8-hour schedule
  • Fedora pulls from Unraid on an 8-hour schedule
  • Backup dumps are created on Unraid and SCP'd directly to Fedora

The 9-Phase Pipeline

Phase Script What It Does
1 phase1_gitea_unraid.sh Deploy Gitea on Unraid via Docker Compose, create admin user, generate API token, create organization
2 phase2_gitea_fedora.sh Deploy Gitea on Fedora (backup instance), create admin user, generate backup API token
3 phase3_runners.sh Get runner registration token, deploy all runners from runners.conf
4 phase4_migrate_repos.sh Import repos from GitHub to Unraid, create pull mirrors on Fedora
5 phase5_migrate_pipelines.sh Copy .github/workflows/ to .gitea/workflows/, apply context variable fixes
6 phase6_github_mirrors.sh Configure push mirrors from Gitea to GitHub, disable GitHub Actions
7 phase7_branch_protection.sh Apply branch protection rules to all repos
8 phase8_cutover.sh Deploy Caddy HTTPS reverse proxy (Cloudflare DNS-01 or existing certs), mark GitHub repos as mirrors
7.5 (optional) phase7_5_nginx_to_caddy.sh One-time multi-domain Nginx -> Caddy migration helper (canary/full), supports sintheus.com + privacyindesign.com in one Caddy
9 phase9_security.sh Deploy Semgrep + Trivy + Gitleaks security scanning workflows

Each phase has three scripts: the main script, a _post_check.sh that independently verifies success, and a _teardown.sh that cleanly reverses the phase.

File Structure

gitea-migration/
├── .env.example              # Configuration template (copy to .env)
├── runners.conf.example      # Runner definitions template
├── lib/common.sh             # Shared functions + .env validators
├── setup/
│   ├── configure_env.sh      # Interactive .env wizard (~63 prompts)
│   ├── configure_runners.sh  # Interactive runner definition wizard
│   ├── macbook.sh            # Local prerequisites (brew packages)
│   ├── unraid.sh             # Remote prerequisites (static binaries)
│   ├── fedora.sh             # Remote prerequisites (dnf packages)
│   ├── cross_host_ssh.sh     # SSH key exchange between Unraid and Fedora
│   ├── env_to_bitwarden.sh   # Export .env to Bitwarden JSON import format
│   ├── bitwarden_to_env.sh   # Restore .env from Bitwarden CLI
│   ├── nginx-to-caddy/       # Nginx inventory + basic conversion toolkit
│   ├── pi-monitoring/        # Raspberry Pi monitoring/control-plane module
│   └── cleanup.sh            # Manifest-driven rollback of setup
├── templates/                # Config templates (.tpl + envsubst)
│   ├── app.ini.tpl
│   ├── docker-compose-gitea.yml.tpl
│   ├── docker-compose-runner.yml.tpl
│   ├── Caddyfile.tpl
│   ├── docker-compose-caddy.yml.tpl
│   ├── runner-config.yaml.tpl
│   ├── com.gitea.runner.plist.tpl
│   ├── com.gitea.runner.newsyslog.conf.tpl
│   └── workflows/security-scan.yml.tpl
├── contracts/gitea-api.md    # API contract documentation
├── backup/
│   ├── backup_primary.sh     # Gitea dump, SCP to Fedora
│   └── restore_to_primary.sh # Restore dump to Unraid
├── preflight.sh              # 24 pre-flight validation checks
├── run_all.sh                # Full pipeline orchestration
├── post-migration-check.sh   # Read-only infrastructure state check
├── teardown_all.sh           # Reverse teardown (9 to 1)
├── phase7_5_nginx_to_caddy.sh # Optional one-time Nginx -> Caddy consolidation step
├── TODO.md                   # Phase 7.5 migration context, backlog, and DoD
├── manage_runner.sh          # Dynamic runner add/remove/list
├── phase{1-9}_*.sh           # Main phase scripts
├── phase{1-9}_post_check.sh  # Verification scripts
└── phase{1-9}_teardown.sh    # Reversal scripts

Design Decisions and Rationale

Why bash scripts instead of Ansible/Terraform/Pulumi?

The migration targets a handful of repos across 3 machines with a one-time execution path. Ansible requires installing agents or running a control node; Terraform manages ongoing state that doesn't apply to a one-shot migration; Pulumi requires a runtime. Bash scripts with SSH are zero-dependency beyond what's already on a Mac, run anywhere, are readable without framework knowledge, and produce no ongoing state to manage. The downside is more verbose error handling and no built-in parallelism, but for a sequential 9-phase pipeline that's acceptable.

Why a single MacBook control plane?

All scripts run from the MacBook and SSH into remotes. This means:

  • No agents, daemons, or software installed on servers beyond the migration targets
  • One place to look at logs, one place to re-run failed phases
  • The MacBook doesn't need to stay connected after each phase completes — phases are atomic
  • Trade-off: the MacBook must be on the same network (or VPN) as both servers

Why Docker Compose for Gitea but native binary for macOS runner?

Docker Desktop on macOS is heavyweight (~4 GB), requires a commercial license for organizations, and is unreliable for long-running background services (it suspends when the Mac sleeps). A native act_runner binary with a launchd plist is 30 MB, survives sleep/wake cycles, and by default starts at login via ~/Library/LaunchAgents/. For headless Macs or dedicated CI machines, set boot = true in runners.conf to install the plist to /Library/LaunchDaemons/ instead — this starts the runner at boot before any user logs in (requires sudo for plist installation and launchctl load/unload). On Linux, Docker is the native container runtime with no overhead, so Docker Compose is the obvious choice there.

Why envsubst templates instead of Jinja2/Helm/gomplate?

envsubst is a single binary from GNU gettext with zero dependencies. Templates are plain config files with ${VAR} placeholders — anyone can read them without learning a template language. The trade-off is no conditionals or loops in templates. The scripts work around this by using marker-block stripping with sed (e.g., sqlite3 vs external DB blocks in the docker-compose template).

Why check-before-act idempotency instead of desired-state?

Every operation checks if its target already exists before creating it. This is simpler to implement in bash and easier to debug — you can see exactly which step was skipped vs executed. The trade-off is that it cannot detect drift (e.g., someone manually changed a Gitea setting between runs). For a one-time migration, drift detection adds complexity without value.

Database support

All four Gitea-supported database backends are available: sqlite3, mysql, postgres, and mssql. Set GITEA_DB_TYPE in .env — sqlite3 is the default and needs no additional configuration. For external databases, the toolkit deploys a containerized database alongside Gitea (PostgreSQL 16, MySQL 8.0, or MSSQL 2022) with health checks, and the wizard prompts for connection details (host, port, name, user, password) only when needed. Backup/restore handles SQL dump import into the correct database engine.

Why Caddy reverse proxy?

Caddy with the Cloudflare DNS plugin handles wildcard TLS certificates automatically via DNS-01 challenge — no port 80 exposure needed, no certbot cron jobs, and zero-touch renewal. The slothcroissant/caddy-cloudflaredns Docker image bundles the plugin. For environments without Cloudflare, TLS_MODE=existing supports manual cert/key paths. Each host gets its own Caddy container on a dedicated macvlan IP.

Why mark GitHub repos as mirrors instead of archiving them?

An earlier version archived GitHub repos during Phase 8. This was changed because archived repos reject all pushes, which breaks the push mirrors configured in Phase 6. Instead, repos are marked with a [MIRROR] description prefix, wiki/projects/Pages are disabled, and the original settings are saved to a JSON state file for exact restoration on teardown.

Why separate Gitea instances instead of built-in replication?

Gitea doesn't have built-in multi-node replication. The Fedora instance is a completely independent Gitea that pulls mirrors from Unraid. This is simpler than database replication, works across different networks, and provides a fully functional standby — if Unraid dies, Fedora has a complete Gitea instance with all repos, not just a database replica.

Why the three-script-per-phase pattern (do / verify / undo)?

  • The main script may partially succeed before failing. The post-check tells you exactly what's working.
  • Post-checks can run independently — useful for debugging without re-running the whole phase.
  • Teardown scripts reverse only what their phase created, making selective rollback possible.

Why pipe stderr for logs and stdout for data?

All log_* functions write to stderr. API wrappers return JSON on stdout. This means you can do result=$(gitea_api GET /user) without log messages contaminating the JSON. Piping through jq works cleanly.

Compromises

Shared admin credentials across instances

Unraid and Fedora use the same GITEA_ADMIN_USER and GITEA_ADMIN_PASSWORD. This simplifies setup (one set of credentials) and makes the pull mirror authentication straightforward (Fedora authenticates to Unraid using the shared admin password). The trade-off is reduced isolation — compromising one set of credentials compromises both instances. For a personal or small-team setup, this is acceptable.

Dynamic repo list

The scripts read REPO_NAMES from .env — a space-delimited list of repo names (e.g., REPO_NAMES=myapp backend infra). The get_repo_list() helper in lib/common.sh splits it into individual names. Phase scripts use read -ra REPOS <<< "$REPO_NAMES" to build an array, supporting any number of repos.

Workflow migration is syntactic, not semantic

Phase 5 copies workflow files and does a sed replacement of github.* context variables to gitea.* inside ${{ }} expressions. It does NOT:

  • Validate YAML syntax
  • Check if referenced GitHub marketplace actions exist in Gitea
  • Migrate secrets, OIDC providers, or environment configurations
  • Handle composite actions or reusable workflows

Full semantic migration would require parsing YAML, understanding the GitHub Actions schema, and mapping every action to a Gitea equivalent. For a small number of repos, manual review after automated migration is faster than building a full converter.

No automatic rollback on failure

If Phase 5 fails halfway through, Phase 4's repos are still migrated and Phase 3's runners are still running. The user must manually run teardown_all.sh --through=5 to roll back. Automatic rollback was rejected because:

  • Determining "what succeeded" in a partially-failed phase is complex
  • Some failures are transient (network timeout) and re-running the phase is the correct fix
  • Automatic rollback of destructive operations (deleting repos) should always require human confirmation

Migration polling is timeout-based, not event-driven

Phase 4 polls the Gitea API every N seconds to check if a migration completed, with a configurable timeout. Gitea's migration API doesn't support webhooks or long-polling, so polling is the only option. The defaults (3-second interval, 600-second timeout) work for repos up to ~1 GB. Larger repos need a higher timeout via MIGRATION_POLL_TIMEOUT_SEC in .env.

No parallel phase execution

Phases run strictly sequentially. Phase 4 could potentially import repos in parallel, and Phase 3 could deploy runners concurrently. Sequential execution was chosen because:

  • Bash parallelism (& + wait) makes error handling complex
  • The total migration time is dominated by network transfers, not script execution
  • Sequential execution produces readable, linear logs

Docker socket mounted in runner containers

Runner containers get /var/run/docker.sock mounted, giving them root-equivalent access to the host's Docker daemon. This is required for runners to spawn job containers but is a security concern for untrusted code. For a private instance with trusted users, this is the standard Gitea runner deployment.

Native runner boot mode requires sudo

When boot = true is set in runners.conf, manage_runner.sh uses sudo for three operations: copying the plist to /Library/LaunchDaemons/, loading/unloading the service via launchctl, and removing the plist on teardown. The plist includes a <key>UserName</key> entry so the daemon process runs as the deploying user, not root. The newsyslog config (log rotation) always requires sudo regardless of boot mode, since it installs to /etc/newsyslog.d/.

Backup archives are unencrypted

gitea dump produces a zip file containing the database, all repos, and config. This is transferred over SSH (encrypted in transit) and stored on Fedora's filesystem. At-rest encryption is the user's responsibility (e.g., LUKS on the Fedora backup volume).

Phase 8 state snapshot lives in .manifests/

The JSON file that records pre-cutover GitHub repo settings is stored alongside install manifests in .manifests/. This directory is gitignored (machine-specific state). If the user deletes .manifests/ before running Phase 8 teardown, the teardown falls back to parsing the original description from the [MIRROR] ... — was: ORIGINAL format, but cannot restore homepage, wiki, projects, or Pages settings.

TLS certificate renewal

When TLS_MODE=cloudflare, Caddy handles certificate renewal automatically via the Cloudflare DNS-01 challenge — no cron jobs or manual intervention needed. Caddy renews certificates 30 days before expiry and persists them in $CADDY_DATA_PATH/data. When TLS_MODE=existing, cert renewal is the user's responsibility.

Security Notes

  • Sensitive files (.env, runners.conf, .manifests/, *.pem, *.key, *.crt) are in .gitignore
  • API tokens are generated by the scripts and written to .env — never hardcoded
  • SSH uses BatchMode=yes (no password prompts) and StrictHostKeyChecking=accept-new
  • Passwords are only used for initial admin creation and token generation — all subsequent API calls use tokens
  • Runner containers mount the Docker socket — this is root-equivalent access to the host
  • Cross-host SSH keys are ed25519 with no passphrase (automation keys)

Prerequisites

Machine Requirements
MacBook macOS, Homebrew, jq >= 1.6, curl >= 7.70, git >= 2.30, shellcheck >= 0.8, gh >= 2.0, bw >= 2.0
Unraid Linux, Docker >= 20.0, docker-compose >= 2.0, jq >= 1.6, passwordless sudo for SSH user
Fedora Linux with dnf, Docker CE >= 20.0, docker-compose >= 2.0, jq >= 1.6, passwordless sudo for SSH user
Network MacBook can SSH to both servers; for TLS_MODE=cloudflare, provide CLOUDFLARE_API_TOKEN plus PUBLIC_DNS_TARGET_IP (public ingress IP recommended; private IP requires PHASE8_ALLOW_PRIVATE_DNS_TARGET=true)

Quick Start

cp .env.example .env
cp runners.conf.example runners.conf
# Edit both files, then:
./run_all.sh

Internal API URLs are not manually configured: scripts derive GITEA_INTERNAL_URL from UNRAID_GITEA_IP and GITEA_BACKUP_INTERNAL_URL from FEDORA_GITEA_IP.

See USAGE_GUIDE.md for the full walkthrough, edge cases, and rollback procedures.

Description
No description provided
Readme 546 KiB
Languages
Shell 98%
Smarty 1.4%
Dockerfile 0.6%