Skip to content

Origin tracking test for native Postgres#114

Merged
mason-sharp merged 2 commits intomainfrom
task/native-pg-origin-test
Apr 20, 2026
Merged

Origin tracking test for native Postgres#114
mason-sharp merged 2 commits intomainfrom
task/native-pg-origin-test

Conversation

@mason-sharp
Copy link
Copy Markdown
Member

Also make tests less flakey

Extend PG test to connect to all three nodes. The new test sets up
n3 → n1/n2 logical replication, verifies origin tracking through
GetNodeOriginNames, then runs a full diff + preserve-origin repair
cycle confirming both origins and timestamps are preserved.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 19, 2026

Warning

Rate limit exceeded

@mason-sharp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 9 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 9 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 714f3134-6005-4136-8450-46c9b3b9e1c1

📥 Commits

Reviewing files that changed from the base of the PR and between 1f13996 and 8ba45ca.

📒 Files selected for processing (2)
  • tests/integration/merkle_tree_test.go
  • tests/integration/test_env_test.go
📝 Walkthrough

Walkthrough

Added a third native PostgreSQL node to integration tests, introduced a data-sync helper, and refactored test setup and preserve-origin verification to operate across a 3-node logical replication topology.

Changes

Cohort / File(s) Summary
Docker Infrastructure
tests/integration/docker-compose-native.yaml
Added native-n3 service (postgres:17) with same credentials and startup flags (track_commit_timestamp=on, wal_level=logical) and port mapping as existing native nodes.
Test Environment Helpers
tests/integration/test_env_test.go
Added (*testEnv).awaitDataSync(t, qualifiedTableName) which polls row counts on N1 and N2 until they match within an assertEventually window.
Merkle Tree Tests
tests/integration/merkle_tree_test.go
Inserted env.awaitDataSync(...) before merkle-tree setup in multiple tests to ensure counts are synchronized prior to mtree initialization/build.
Native PG Tests
tests/integration/native_pg_test.go
Expanded cluster to 3 nodes (nativeServiceN3, n3Host, n3Port, n3Pool); added waitForPort() and getHostPort() helpers; create pgcrypto on all three nodes; refactored testNativePreserveOrigin to publish on n3 and subscribe on n1/n2, capture baseline origins/timestamps on n1, set repairTask.PreserveOrigin = true, and add post-repair origin and commit-timestamp verifications on n2.

Poem

🐰 Three hops, three nodes, I bound and sing,
Docker burrow hums, replication on the wing,
Counts align like carrots in a row,
Origins kept where commit-timestamps glow,
Thump — tests pass, the meadow's all aglow! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Origin tracking test for native Postgres' directly reflects the main objective of the PR, which adds a native Postgres origin-tracking test and ensures it's clear and specific.
Description check ✅ Passed The description 'Also make tests less flakey' is related to the changeset, as multiple changes are specifically designed to reduce test flakiness (awaitDataSync gates in merkle tree tests).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch task/native-pg-origin-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 19, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 14 complexity · -4 duplication

Metric Results
Complexity 14
Duplication -4

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/integration/native_pg_test.go (1)

609-617: Minor: SET session_replication_role leaks to the pool after commit.

SET (without LOCAL) inside a transaction persists at session scope even after COMMIT, so the connection returned to state.n2Pool here stays in replica role. In this test everything that follows is read-only, and this is the last t.Run in TestNativePG, so there's no concrete impact today — but it's a subtle footgun if the test grows or the pool is reused elsewhere. Prefer SET LOCAL, or explicitly RESET session_replication_role before commit.

♻️ Proposed fix
 	tx, err := state.n2Pool.Begin(ctx)
 	require.NoError(t, err)
-	_, err = tx.Exec(ctx, "SET session_replication_role = 'replica'")
+	_, err = tx.Exec(ctx, "SET LOCAL session_replication_role = 'replica'")
 	require.NoError(t, err)
 	for _, id := range sampleIDs {
 		_, err = tx.Exec(ctx, fmt.Sprintf("DELETE FROM %s WHERE id = $1", qualifiedTableName), id)
 		require.NoError(t, err)
 	}
 	require.NoError(t, tx.Commit(ctx))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/native_pg_test.go` around lines 609 - 617, The transaction
sets session_replication_role at session scope which leaks into the pooled
connection; change the test code around state.n2Pool.Begin / tx to avoid
persisting the role by using a transaction-local setting or resetting it before
returning the connection: replace the "SET session_replication_role = 'replica'"
call with "SET LOCAL session_replication_role = 'replica'" (or alternatively
execute "RESET session_replication_role" on tx before tx.Commit) so the pool
connection returned by tx.Commit does not remain in replica mode.
tests/integration/docker-compose-native.yaml (1)

44-57: Optional: deduplicate the three near-identical service blocks via YAML anchors.

All three services share image, env, command, and ports — a YAML anchor (&native-pg + <<: *native-pg) would remove the copy-paste and ensure n3 stays in sync with any future change to n1/n2. Purely a readability/maintenance win; not required.

♻️ Sketch
services:
  native-n1: &native-pg
    image: postgres:17
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: testdb
    command:
      - "postgres"
      - "-c"
      - "track_commit_timestamp=on"
      - "-c"
      - "wal_level=logical"
    ports:
      - "5432"
  native-n2:
    <<: *native-pg
  native-n3:
    <<: *native-pg
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/docker-compose-native.yaml` around lines 44 - 57, Create a
YAML anchor for the shared Postgres service config and apply it to native-n1,
native-n2, and native-n3 to avoid duplication: extract the common keys (image,
environment, command, ports) into an anchor (e.g., &native-pg) and replace each
service body with an alias merge (<<: *native-pg) so native-n1, native-n2, and
native-n3 all inherit the same configuration.
tests/integration/test_env_test.go (1)

398-414: Optional: give each count query a bounded context.

assertEventually only checks its 30 s deadline between polls, so if a count(*) call ever stalls (e.g., a stuck connection during cleanup of a prior subtest) this gate can block well past 30 s. Since the helper exists specifically to de-flake tests, a per-attempt timeout would make the worst case deterministic. Not required — current implementation is fine for the healthy path.

♻️ Proposed tweak
 func (e *testEnv) awaitDataSync(t *testing.T, qualifiedTableName string) {
 	t.Helper()
-	ctx := context.Background()
 	assertEventually(t, 30*time.Second, func() error {
+		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		defer cancel()
 		var n1Count, n2Count int
 		if err := e.N1Pool.QueryRow(ctx, fmt.Sprintf("SELECT count(*) FROM %s", qualifiedTableName)).Scan(&n1Count); err != nil {
 			return fmt.Errorf("counting rows on n1: %w", err)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_env_test.go` around lines 398 - 414, awaitDataSync's
per-poll DB queries (N1Pool.QueryRow and N2Pool.QueryRow) can block longer than
the assertEventually poll period; wrap each QueryRow call in a short per-attempt
context (e.g., ctxAttempt, cancel := context.WithTimeout(ctx, 2*time.Second)
with defer cancel()) and use that ctxAttempt for QueryRow/Scan so each attempt
times out deterministically; ensure you propagate or wrap context deadline
errors in the returned errors so assertEventually sees the failure and continues
polling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/merkle_tree_test.go`:
- Line 571: The comment above the test misstates behavior: although it says
"this test only uses n1", the test later calls newTableDiffTask(ServiceN1,
ServiceN2) after CDC events, so it can be affected by prior subtest writes to
n2; either update the comment to explicitly explain why you intentionally
omitted awaitDataSync (e.g., to test CDC latency or to allow eventual
consistency) or add the same awaitDataSync(...) gate used in other subtests
before building the merkle tree on ServiceN1 to ensure no bleed from previous
cleanup/replication; locate the test around newTableDiffTask and the surrounding
comment and apply the chosen change (update comment text or insert awaitDataSync
referencing ServiceN1 and ServiceN2).

---

Nitpick comments:
In `@tests/integration/docker-compose-native.yaml`:
- Around line 44-57: Create a YAML anchor for the shared Postgres service config
and apply it to native-n1, native-n2, and native-n3 to avoid duplication:
extract the common keys (image, environment, command, ports) into an anchor
(e.g., &native-pg) and replace each service body with an alias merge (<<:
*native-pg) so native-n1, native-n2, and native-n3 all inherit the same
configuration.

In `@tests/integration/native_pg_test.go`:
- Around line 609-617: The transaction sets session_replication_role at session
scope which leaks into the pooled connection; change the test code around
state.n2Pool.Begin / tx to avoid persisting the role by using a
transaction-local setting or resetting it before returning the connection:
replace the "SET session_replication_role = 'replica'" call with "SET LOCAL
session_replication_role = 'replica'" (or alternatively execute "RESET
session_replication_role" on tx before tx.Commit) so the pool connection
returned by tx.Commit does not remain in replica mode.

In `@tests/integration/test_env_test.go`:
- Around line 398-414: awaitDataSync's per-poll DB queries (N1Pool.QueryRow and
N2Pool.QueryRow) can block longer than the assertEventually poll period; wrap
each QueryRow call in a short per-attempt context (e.g., ctxAttempt, cancel :=
context.WithTimeout(ctx, 2*time.Second) with defer cancel()) and use that
ctxAttempt for QueryRow/Scan so each attempt times out deterministically; ensure
you propagate or wrap context deadline errors in the returned errors so
assertEventually sees the failure and continues polling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8c39172-b83e-4311-b0c1-2c4f46a65938

📥 Commits

Reviewing files that changed from the base of the PR and between 16354d3 and 23c4880.

📒 Files selected for processing (4)
  • tests/integration/docker-compose-native.yaml
  • tests/integration/merkle_tree_test.go
  • tests/integration/native_pg_test.go
  • tests/integration/test_env_test.go

Comment thread tests/integration/merkle_tree_test.go Outdated
@mason-sharp mason-sharp force-pushed the task/native-pg-origin-test branch from 23c4880 to 1f13996 Compare April 19, 2026 21:30
Sequential merkle tree subtests share a table whose cleanup (repair via
spock) may still be replicating when the next subtest starts, causing
spurious failures. awaitDataSync polls until n1 and n2 row counts match
before each data-mutating subtest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 2760bb0e7bac8c8e0c74f474f0986b3720514759)
@mason-sharp mason-sharp force-pushed the task/native-pg-origin-test branch from 1f13996 to 8ba45ca Compare April 19, 2026 21:44
@mason-sharp mason-sharp requested a review from rasifr April 19, 2026 21:58
@mason-sharp mason-sharp merged commit e64de5c into main Apr 20, 2026
3 checks passed
@mason-sharp mason-sharp deleted the task/native-pg-origin-test branch April 20, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants