Skip to content

feat: Resolve OOM in high-cardinality dimension merging [DATA-23847]#20

Open
quocnguyendinh wants to merge 4 commits intomasterfrom
DATA-23847/resolve-OOM-dimension-merging
Open

feat: Resolve OOM in high-cardinality dimension merging [DATA-23847]#20
quocnguyendinh wants to merge 4 commits intomasterfrom
DATA-23847/resolve-OOM-dimension-merging

Conversation

@quocnguyendinh
Copy link
Copy Markdown
Collaborator

@quocnguyendinh quocnguyendinh commented Apr 5, 2026

Summary

Resolves OOM when dimension columns have high cardinality by making the merge pipeline memory-efficient.

  • Replace O(n) dict-based merge with O(1) two-pointer generator algorithm
  • Use server-side cursors (generator factory pattern) to stream DB rows in batches instead of buffering entire result sets
  • Sunset _build_check_summary and related helper methods — instead log invalid diffs inline during _merge_by_check_date to avoid materializing the full merged list a second time
  • Fix check_date sort priority in dimension tuple to match SQL ORDER BY

Jira: DATA-23847

Test plan

  • All 15 existing unit tests pass
  • Verify with high-cardinality dimension config in staging

- Replace O(n) dict-based merge with O(1) two-pointer generator
- Use server-side cursors with generator factory pattern for true DB streaming
- Fix check_date sort priority in dimension tuple ordering
- Fix MergedCountCheck dimensions property and equality check
- Inline diff logging in _merge_by_check_date, remove _build_check_summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@quocnguyendinh quocnguyendinh self-assigned this Apr 5, 2026
quocnguyendinh and others added 3 commits April 6, 2026 01:02
…om __eq__

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@quocnguyendinh quocnguyendinh changed the title DATA-23847: Resolve OOM in high-cardinality dimension merging feat: Resolve OOM in high-cardinality dimension merging [DATA-23847] Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant