Skip to content

CI: Detect changed paths and gate cuda.bindings tests#1908

Open
cpcloud wants to merge 5 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299
Open

CI: Detect changed paths and gate cuda.bindings tests#1908
cpcloud wants to merge 5 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Adds a detect-changes job to ci.yml using native git merge-base + git diff --name-only to classify which top-level modules changed in a PR
  • Composes gating outputs respecting the dependency graph: cuda_pathfinder → cuda_bindings → cuda_core
  • Threads skip-bindings-test input through test-wheel-linux.yml and test-wheel-windows.yml so cuda.bindings tests are skipped when only unrelated modules changed
  • Splits the overloaded SKIP_CUDA_BINDINGS_TEST env var into two orthogonal flags: USE_BACKPORT_BINDINGS (artifact fetch for CTK major mismatch) and SKIP_CUDA_BINDINGS_TEST (test-time gate)
  • Non-PR events (push to main, tag, schedule) unconditionally run everything
  • Build-side gating outputs are emitted but not yet consumed (follow-up PR)

Closes #299

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

cpcloud and others added 5 commits April 14, 2026 18:15
Add a detect-changes job to ci.yml that classifies which top-level
modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder,
cuda_python, cuda_python_test_helpers, shared infra) using
dorny/paths-filter. The job emits composed gating outputs that account
for the dependency graph (pathfinder -> bindings -> core).

Thread a new skip-bindings-test input through the reusable test-wheel
workflows so cuda.bindings tests (and their Cython counterparts) are
skipped when the detect-changes output for test_bindings is false. For
PRs that only touch cuda_core this skips the expensive bindings suite
while still running cuda.core and cuda.pathfinder tests against the
wheel built in the current CI run.

Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars
into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport
branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST
remains the test-time gate. This lets path-filter-based skips reuse the
existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a
cross-branch artifact fetch.

Non-PR events (push to main, tag, schedule, workflow_dispatch) still
exercise the full pipeline.

Refs NVIDIA#299
Two fixes from code review:

1. Set `base: main` on dorny/paths-filter so backport PRs targeting
   non-default branches still diff against main for path detection.

2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The
   path-filter override only skips bindings tests; cuda.core Cython
   tests should still run on core-only PRs. Cython skip is now driven
   solely by CTK minor-version mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dorny/paths-filter v3 already diffs against the repo default branch
for non-default-branch pushes. Explicit base: main was redundant and
would produce wrong baselines for backport PRs targeting release
branches (all files seen as changed vs main).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove third-party dorny/paths-filter action dependency. Use
git merge-base + git diff --name-only + grep instead. Same
behavior, zero supply-chain risk, full control over base ref
resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the ci-detect-changes-299 branch from 67f8baf to d205955 Compare April 14, 2026 22:15
Copy link
Copy Markdown
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated with Cursor GPT-5.4 Extra High Fast


Reviewed the final tree for PR 1908 at commit d205955

actionlint passed on the touched workflows, so the notes below are about CI behavior / control flow, not YAML syntax.

1. High: detect-changes can fail without clearly failing the final required status

Evidence

  • All three test jobs now depend on detect-changes:
    • .github/workflows/ci.yml:284
    • .github/workflows/ci.yml:309
    • .github/workflows/ci.yml:334
  • The terminal checks job does not depend on detect-changes directly and only inspects:
    • doc
    • test-linux-64
    • test-linux-aarch64
    • test-windows
      at .github/workflows/ci.yml:365
  • The shell logic in checks only treats failure and cancelled as errors:
    • .github/workflows/ci.yml:394
    • .github/workflows/ci.yml:398

Why this matters

  • If detect-changes fails, the downstream test jobs can become skipped because a required dependency did not complete successfully.
  • In that state, checks may still exit 0 because it does not treat those skipped results as an error.
  • Result: a broken path-detection step could suppress or reduce CI coverage without producing a clearly failing final status.

Agent hint

  • Re-check GitHub Actions needs semantics for always() plus an upstream failed prerequisite leading to downstream skipped.
  • The most direct fix is probably one of:
    • add detect-changes to checks.needs and explicitly require needs.detect-changes.result == 'success'
    • or treat unexpected skipped test jobs as fatal when doc_only != true
  • Preserve the intended [doc-only] skip behavior while making accidental skips fail closed.

2. Medium: diff baseline is hardcoded to origin/main

Evidence

  • detect-changes computes the merge base against origin/main at .github/workflows/ci.yml:118.
  • The repo also maintains a release/backport branch in ci/versions.yml:4:
    • backport_branch: "12.9.x"

Why this matters

  • For PR 1908, whose base is main, this is fine.
  • If the same pull-request/* CI path is ever used for PRs targeting 12.9.x or another non-main branch, changed-path classification will be wrong.
  • Wrong classification can suppress outputs such as test_bindings, which means under-testing on backport / release PRs.

Agent hint

  • First confirm whether pull-request/* branches are ever created for non-main PR bases in this repo.
  • If yes, derive the true PR base branch before git merge-base instead of assuming main.
  • Because this workflow runs on pushes to pull-request/*, github.base_ref may not be available; one workable approach is to resolve the PR number from github.ref_name and query gh pr view --json baseRefName.
  • If non-main PR bases are impossible by construction, add a short comment documenting that invariant so future readers do not "fix" it incorrectly.
  • This area already had churn within the PR history, so the next agent should validate the actual branch-generation semantics before changing behavior.

Bottom line

  • I did not find a concrete bug in the USE_BACKPORT_BINDINGS / SKIP_CUDA_BINDINGS_TEST split itself.
  • The two findings above are the only items that looked important enough to hand back to the PR author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Detect if build/test is needed

2 participants