perf(worker): Bulk fetch and update testruns for flake processing#802
perf(worker): Bulk fetch and update testruns for flake processing#802sentry[bot] wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5654748. Configure here.
|
|
||
| # Bulk-update flakes that have expired (reached 30 consecutive passes). | ||
| if expiring_flakes: | ||
| Flake.objects.bulk_update(expiring_flakes, ["end_date"]) |
There was a problem hiding this comment.
Expiring flakes lose count and recent_passes_count updates
High Severity
When a flake expires in handle_pass, recent_passes_count and count are incremented in memory, then the flake is removed from curr_flakes and appended to expiring_flakes. However, bulk_update(expiring_flakes, ["end_date"]) only persists the end_date field. The modified recent_passes_count and count values are never saved to the database. The original code used .save() which persisted all fields. The existing test test_flake_expiry_and_recreation asserts recent_passes_count == 30 and would fail with this change.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5654748. Configure here.
|
|
||
| # Fetch all testruns for all uploads in a single query, filtered to only | ||
| # relevant outcomes (failures and passes for known flaky tests). | ||
| all_testruns = list(get_testruns(upload_ids, list(curr_flakes.keys()))) |
There was a problem hiding this comment.
Pre-filtered query misses passes for newly created flakes
Medium Severity
get_testruns is called once with curr_flakes.keys() captured before processing begins, filtering passes to only known flaky test IDs. But handle_failure can create new flakes during processing. Passes for these newly created flakes in subsequent uploads are excluded from the query results and silently dropped. The original code fetched all testruns per upload without outcome filtering, so those passes were previously processed.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5654748. Configure here.
| ) | ||
|
|
||
| # Bulk-update flakes that have expired (reached 30 consecutive passes). | ||
| if expiring_flakes: |
There was a problem hiding this comment.
Bug: The bulk_update for expiring flakes only saves the end_date, discarding in-memory updates to count and recent_passes_count.
Severity: MEDIUM
Suggested Fix
Add count and recent_passes_count to the list of fields in the bulk_update call for expiring_flakes. The updated call should be Flake.objects.bulk_update(expiring_flakes, ["end_date", "count", "recent_passes_count"]).
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: apps/worker/services/test_analytics/ta_process_flakes.py#L164
Potential issue: When a test flake reaches 30 consecutive passes and expires, its
`count` and `recent_passes_count` are incremented in memory. However, the subsequent
`Flake.objects.bulk_update` call is configured to only persist the `end_date` field. As
a result, the updated counts for the final pass are not saved to the database, leaving
the expired flake record with stale data. This is a regression from the previous
implementation that used `.save()` and persisted all modified fields.
Did we get this right? 👍 / 👎 to inform future reviews.
| upload_filter = Q(upload_id__in=upload_ids) | ||
| flaky_pass_filter = Q(outcome="pass") & Q(test_id__in=curr_flake_test_ids) |
There was a problem hiding this comment.
Bug: Passes for newly created flakes are missed because test runs are fetched upfront before all flakes in a batch are identified.
Severity: MEDIUM
Suggested Fix
Refactor the logic to fetch test runs on a per-upload basis within the loop, or re-query for test runs after processing failures and creating new flakes. This ensures that pass events for newly created flakes within the same batch are correctly processed.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: apps/worker/services/test_analytics/ta_process_flakes.py#L42-L43
Potential issue: The logic fetches all test runs for a batch of uploads at the
beginning, based on the set of flakes that exist at that moment. If a test becomes flaky
for the first time in an early upload within the batch, any subsequent 'pass' results
for that same test in later uploads of the same batch are ignored. This happens because
the initial query for test runs did not account for tests that would become flaky during
processing, causing their pass events to be missed.
Did we get this right? 👍 / 👎 to inform future reviews.


Fixes WORKER-Y78. The issue was that: N+1 queries occur because
get_testrunsis called in a loop for each upload, andhandle_passindividually saves expiring flakes.get_testrunsto fetch testruns for multiple uploads and intelligently filter for only failures or passes for currently known flaky tests.This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13290548
Not quite right? Click here to continue debugging with Seer.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Medium Risk
Touches core flake-processing query/filtering and update paths; behavior should be equivalent but incorrect filtering or grouping could cause missed/extra flake state updates.
Overview
Refactors flake processing to avoid N+1 DB queries by fetching all relevant
Testruns for a commit’s uploads in a single query and grouping them byupload_idbefore processing.Tightens the fetched
Testrunset to failures/errors plus passes only for currently-known flaky tests, and replaces per-flake.save()calls on expiration (30 consecutive passes) with abulk_updateofend_date.Reviewed by Cursor Bugbot for commit 5654748. Bugbot is set up for automated code reviews on this repo. Configure here.