feat: auto-resolve stale run locks [DATA-31226]#19
Open
quocnguyendinh wants to merge 5 commits intomasterfrom
Open
feat: auto-resolve stale run locks [DATA-31226]#19quocnguyendinh wants to merge 5 commits intomasterfrom
quocnguyendinh wants to merge 5 commits intomasterfrom
Conversation
Add CLAUDE.md and knowledge/ folder documenting architecture, design patterns, conventions, and component layers to enable AI coding agents to follow existing patterns when working on the codebase. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move knowledge files from knowledge/ to .claude/rules/ so they are automatically loaded into every Claude Code session. Replace root CLAUDE.md with .claude/CLAUDE.md. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each rule file now has a changelog table at the bottom tracking date, PR, and description of changes for decision traceability. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When pods are killed during redeployment, RUNNING locks are left unreleased, blocking future runs. This adds automatic resolution of stale RUNNING records older than a configurable timeout (default 3h) before checking for active locks. - Add --lock-timeout CLI option (hours, default 3) - Add resolve_stale_running_records() at repository and service layer - Call resolve before checking for running records in RunManager - Map created_at column in DiffaCheckRun ORM model Refs: DATA-31226 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update architecture, cli, configuration, and database-layer rule files with lock timeout feature details and changelog entries for PR #19. Refs: DATA-31226 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0a26841 to
281919c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
When Kubernetes pods are killed (e.g., during downgrade/redeployment), Diffa's
start_run()creates a RUNNING record indiffa_check_runsbut the signal handlers never fire, so the lock is never released. Future runs see the stale RUNNING record and throwRunningCheckRunsException, requiring manual intervention.The fix: before raising
RunningCheckRunsException, check if the RUNNING records are older than a timeout threshold. If so, auto-mark them as FAILED and proceed.Summary
--lock-timeoutCLI option to configure the timeout in hoursJira
DATA-31226
Changes
data_models.py— mapcreated_atcolumn inDiffaCheckRunORM modeldiffa_check_run.py— addresolve_stale_running_records()(repo) +resolve_stale_check_runs()(service)config.py— addlock_timeout_hourstoDiffaConfig, thread throughConfigManager.configure()cli.py— add--lock-timeoutoptionrun_manager.py— call resolve before checking for running recordstest_run_manager.py— add tests for stale lock resolution + call orderingTest plan
resolve_stale_check_runscalled beforegetting_running_check_runsRunningCheckRunsExceptioncreated_at, rundiffa data-diff, verify auto-resolution🤖 Generated with Claude Code