refactor(k8s): invert scheduler env var to make affinity default
Change ACTIONS_RUNNER_USE_KUBE_SCHEDULER to ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER to make affinity-based scheduling the first-class (default) implementation. Breaking Change: - OLD: Set ACTIONS_RUNNER_USE_KUBE_SCHEDULER=true to enable affinity (opt-in) - NEW: Affinity is enabled by default, set ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true to disable (opt-out) Code changes: - utils.ts: Rename constant and invert useKubeScheduler() logic - rwo-affinity-test.ts: Update tests to verify default affinity behavior - ADR 0135: Update to reflect opt-out model - README: Update guidance to reflect default behavior Co-authored-by: Sisyphus <sisyphus@ohmyopencode.com>
This commit is contained in:
@@ -21,16 +21,16 @@ For environments where RWX is unavailable or undesirable, we support a `ReadWrit
|
||||
### Operational Guidance
|
||||
|
||||
1. **Preferred Model (RWX):** Operators should configure the runner with a PVC supporting `ReadWriteMany`.
|
||||
2. **Fallback Model (RWO):** If using `ReadWriteOnce`, operators must enable the Kubernetes scheduler integration by setting `ACTIONS_RUNNER_USE_KUBE_SCHEDULER=true`.
|
||||
3. **Node Selection:** When scheduler integration is enabled, the hook applies a `requiredDuringSchedulingIgnoredDuringExecution` node affinity targeting the runner's current node (`kubernetes.io/hostname`).
|
||||
2. **Fallback Model (RWO):** If using `ReadWriteOnce`, the Kubernetes scheduler integration is enabled by default. Operators can optionally disable it by setting `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true` (not recommended).
|
||||
3. **Node Selection:** By default, the hook applies a `requiredDuringSchedulingIgnoredDuringExecution` node affinity targeting the runner's current node (`kubernetes.io/hostname`).
|
||||
4. **Implementation Details:**
|
||||
- The hook determines the node name via `getCurrentNodeName()` and applies affinity in `packages/k8s/src/k8s/index.ts` (lines 101, 165).
|
||||
- The scheduler behavior is toggled by the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` environment variable, as defined in `packages/k8s/src/k8s/utils.ts` (line 16).
|
||||
- The scheduler is enabled by default. Setting `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true` disables it, as defined in `packages/k8s/src/k8s/utils.ts` (line 16).
|
||||
- The PVC claim name defaults to `${ACTIONS_RUNNER_POD_NAME}-work` unless overridden by `ACTIONS_RUNNER_CLAIM_NAME` (`packages/k8s/src/hooks/constants.ts`, lines 27-33).
|
||||
|
||||
### Non-Recommendations
|
||||
|
||||
We explicitly do **not** recommend the use of `spec.nodeName` for operator-driven scheduling. While the hook uses `nodeName` as a legacy fallback when `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` is not set to `true` (`packages/k8s/src/k8s/index.ts`, lines 103, 167), this bypasses the Kubernetes scheduler and can lead to scheduling failures or resource imbalances. Operators should always prefer the affinity-based approach for RWO volumes.
|
||||
We explicitly do **not** recommend the use of `spec.nodeName` for operator-driven scheduling. While the hook uses `nodeName` as a legacy fallback when `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER` is set to `true` (`packages/k8s/src/k8s/index.ts`, lines 103, 167), this bypasses the Kubernetes scheduler and can lead to scheduling failures or resource imbalances. Operators should prefer the default affinity-based approach for RWO volumes.
|
||||
|
||||
## Alternatives
|
||||
|
||||
@@ -40,13 +40,13 @@ We explicitly do **not** recommend the use of `spec.nodeName` for operator-drive
|
||||
## Consequences
|
||||
|
||||
- **Flexibility:** RWX users benefit from the ability to schedule job pods on any node in the cluster, maximizing resource utilization.
|
||||
- **Node Coupling:** RWO users remain coupled to the node where the runner pod is running. The hook ensures job pods are scheduled on the same node via affinity to maintain workspace integrity.
|
||||
- **Configuration:** Operators must be aware of the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` toggle when using RWO. This toggle controls whether the hook uses `nodeName` (bypassing the scheduler) or node affinity (using the scheduler) to pin the pod to the runner's node. RWX configurations do not require this toggle for basic operation.
|
||||
- **Node Coupling:** RWO users remain coupled to the node where the runner pod is running. The hook ensures job pods are scheduled on the same node via affinity (enabled by default) to maintain workspace integrity.
|
||||
- **Configuration:** Operators using RWO can rely on the default affinity-based scheduling. Setting `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true` will fall back to legacy `nodeName` pinning (not recommended). RWX configurations do not require any special configuration for basic operation.
|
||||
|
||||
## Migration Guidance
|
||||
|
||||
Operators migrating from an RWO setup that relied on default `nodeName` behavior to a more robust affinity-based setup should:
|
||||
1. Ensure the runner pod has the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` environment variable set to `true`.
|
||||
Operators migrating from an RWO setup that relied on legacy `nodeName` behavior can continue by setting `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true`, but should migrate to the default affinity-based scheduling for better scheduler integration:
|
||||
1. Remove the `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER` environment variable to use the default affinity-based scheduling.
|
||||
2. Verify that the runner's ServiceAccount has the necessary permissions to list pods (to determine its own node).
|
||||
|
||||
## Non-Goals
|
||||
|
||||
@@ -30,7 +30,7 @@ rules:
|
||||
- The `ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER` env should be set to true to prevent the runner from running any jobs outside of a container
|
||||
- The runner pod should map a persistent volume claim into the `_work` directory
|
||||
- The `ACTIONS_RUNNER_CLAIM_NAME` env should be set to the persistent volume claim that contains the runner's working directory, otherwise it defaults to `${ACTIONS_RUNNER_POD_NAME}-work`
|
||||
- The `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` env can be set to `true` to enable the Kubernetes scheduler for job pods. When set to `true`, the hook uses `nodeAffinity` to ensure job pods are scheduled correctly (essential for `ReadWriteOnce` volumes). If not set, the hook defaults to a legacy mode where job pods are pinned to the same node as the runner pod using `nodeName`.
|
||||
- By default, the hook uses the Kubernetes scheduler with `nodeAffinity` to ensure job pods are scheduled correctly (essential for `ReadWriteOnce` volumes). Setting `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER=true` will fall back to legacy `nodeName` pinning (not recommended).
|
||||
|
||||
## Storage Guidance
|
||||
The K8s hooks require a shared volume between the runner pod and the job pods to share the workspace and other internal directories.
|
||||
@@ -41,13 +41,13 @@ The preferred way to configure storage is using a `ReadWriteMany` (RWX) Persiste
|
||||
To migrate from RWO to RWX:
|
||||
1. Provision a new `ReadWriteMany` StorageClass if one is not available.
|
||||
2. Update your PVC definition to use `accessModes: [ReadWriteMany]`.
|
||||
3. Remove the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` environment variable, as affinity is no longer required for pod placement.
|
||||
3. No additional environment variables are needed - affinity-based scheduling is the default.
|
||||
|
||||
### RWO Fallback (Affinity-based)
|
||||
If `ReadWriteMany` storage is not available, you can use `ReadWriteOnce` (RWO) storage. In this mode, all job pods must be scheduled on the same node as the runner pod that owns the PVC.
|
||||
|
||||
To enable this safely:
|
||||
1. Ensure `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` is set to `true`.
|
||||
1. The default affinity-based scheduling works automatically. Do not set `ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER`.
|
||||
2. The hooks will automatically add a `nodeAffinity` to the job pods, ensuring they are scheduled on the same node as the runner pod (`kubernetes.io/hostname` match).
|
||||
|
||||
> **Note:** We do not recommend manually setting `nodeName` in the pod template, as the hooks handle node placement automatically via affinity when the scheduler is enabled.
|
||||
|
||||
@@ -13,7 +13,8 @@ export const DEFAULT_CONTAINER_ENTRY_POINT_ARGS = [`-f`, `/dev/null`]
|
||||
export const DEFAULT_CONTAINER_ENTRY_POINT = 'tail'
|
||||
|
||||
export const ENV_HOOK_TEMPLATE_PATH = 'ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE'
|
||||
export const ENV_USE_KUBE_SCHEDULER = 'ACTIONS_RUNNER_USE_KUBE_SCHEDULER'
|
||||
export const ENV_DISABLE_KUBE_SCHEDULER =
|
||||
'ACTIONS_RUNNER_DISABLE_KUBE_SCHEDULER'
|
||||
|
||||
export function containerVolumes(
|
||||
userMountVolumes: Mount[] = [],
|
||||
@@ -374,7 +375,7 @@ export function readExtensionFromFile(): k8s.V1PodTemplateSpec | undefined {
|
||||
}
|
||||
|
||||
export function useKubeScheduler(): boolean {
|
||||
return process.env[ENV_USE_KUBE_SCHEDULER] === 'true'
|
||||
return process.env[ENV_DISABLE_KUBE_SCHEDULER] !== 'true'
|
||||
}
|
||||
|
||||
export enum PodPhase {
|
||||
|
||||
@@ -3,7 +3,7 @@ import { cleanupJob } from '../src/hooks'
|
||||
import { prepareJob } from '../src/hooks/prepare-job'
|
||||
import { TestHelper } from './test-setup'
|
||||
import { getPodByName } from '../src/k8s'
|
||||
import { ENV_USE_KUBE_SCHEDULER } from '../src/k8s/utils'
|
||||
import { ENV_DISABLE_KUBE_SCHEDULER } from '../src/k8s/utils'
|
||||
|
||||
jest.useRealTimers()
|
||||
|
||||
@@ -22,12 +22,10 @@ describe('RWO Affinity Behavior (Scheduler Mode)', () => {
|
||||
afterEach(async () => {
|
||||
await cleanupJob()
|
||||
await testHelper.cleanup()
|
||||
delete process.env[ENV_USE_KUBE_SCHEDULER]
|
||||
delete process.env[ENV_DISABLE_KUBE_SCHEDULER]
|
||||
})
|
||||
|
||||
it('should add nodeAffinity with hostname selector when scheduler mode is enabled', async () => {
|
||||
process.env[ENV_USE_KUBE_SCHEDULER] = 'true'
|
||||
|
||||
it('should add nodeAffinity with hostname selector by default', async () => {
|
||||
await prepareJob(prepareJobData.args, prepareJobOutputFilePath)
|
||||
|
||||
const content = JSON.parse(
|
||||
@@ -65,7 +63,7 @@ describe('RWO Affinity Behavior (Scheduler Mode)', () => {
|
||||
})
|
||||
|
||||
it('should NOT add nodeAffinity when scheduler mode is disabled', async () => {
|
||||
process.env[ENV_USE_KUBE_SCHEDULER] = 'false'
|
||||
process.env[ENV_DISABLE_KUBE_SCHEDULER] = 'true'
|
||||
|
||||
await prepareJob(prepareJobData.args, prepareJobOutputFilePath)
|
||||
|
||||
@@ -82,9 +80,7 @@ describe('RWO Affinity Behavior (Scheduler Mode)', () => {
|
||||
expect(pod.spec?.nodeName).toBeDefined()
|
||||
})
|
||||
|
||||
it('should fail assertion if affinity block is missing when scheduler mode is enabled', async () => {
|
||||
process.env[ENV_USE_KUBE_SCHEDULER] = 'true'
|
||||
|
||||
it('should fail assertion if affinity block is missing by default', async () => {
|
||||
await prepareJob(prepareJobData.args, prepareJobOutputFilePath)
|
||||
|
||||
const content = JSON.parse(
|
||||
@@ -113,9 +109,7 @@ describe('RWO Affinity Behavior (Scheduler Mode)', () => {
|
||||
).toBeGreaterThan(0)
|
||||
})
|
||||
|
||||
it('should use correct node name from runner pod in affinity values', async () => {
|
||||
process.env[ENV_USE_KUBE_SCHEDULER] = 'true'
|
||||
|
||||
it('should use correct node name from runner pod in affinity values by default', async () => {
|
||||
const runnerPodName = process.env.ACTIONS_RUNNER_POD_NAME
|
||||
|
||||
await prepareJob(prepareJobData.args, prepareJobOutputFilePath)
|
||||
|
||||
Reference in New Issue
Block a user