Support for org-scoped ClickHouse by matt-aitken · Pull Request #3333 · triggerdotdev/trigger.dev

matt-aitken · 2026-04-07T15:26:45Z

Added OrganizationDataStore which allows orgs to have data stored in specific separate services.

For now this is just used for ClickHouse. When using ClickHouse we get a client for the factory and pass in the org id.

Particular care has to be made with two hot-insert paths:

RunReplicationService
OTLPExporter

changeset-bot · 2026-04-07T15:26:52Z

⚠️ No Changeset found

Latest commit: 9a7fe81

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-04-07T15:27:13Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaces module-level ClickHouse singletons with a ClickhouseFactory that returns per-organization ClickHouse clients and event repositories. Adds OrganizationDataStore model and migration, ClickHouse secret schemas, OrganizationDataStoresRegistry with background reload/retry, admin UI/routes to manage data stores, and new clickhouse factory/instance modules and test helpers. Refactors event repository interfaces/implementations (metrics batching and insert APIs), updates many presenters, routes, services, and tests to use organization-scoped clients, and introduces related tests, env var, docs, and minor misc changes.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description explains the core feature (OrganizationDataStore for org-specific services, initially ClickHouse) and notes critical areas requiring attention (RunReplicationService, OTLPExporter), but does not follow the provided template structure.	Add template sections: Closes #, Checklist with contributing guide confirmation, Testing steps, Changelog, and Screenshots as applicable.
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Support for org-scoped ClickHouse' clearly describes the main change: adding organization-specific ClickHouse client support.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch org-scoped-clickhouse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.logs/route.tsx (1)

137-159: ⚠️ Potential issue | 🟠 Major

Don't let ClickHouse resolution bypass the deferred error path.

getClickhouseForOrganization() is now awaited before typeddefer(). If org datastore lookup or client initialization fails, the loader throws before TypedAwait can render its errorElement, and the page shell won't stream until that lookup finishes. Fold the factory lookup into listPromise so it shares the existing deferred failure handling.

♻️ Suggested shape

-  const logsClickhouse = await clickhouseFactory.getClickhouseForOrganization(project.organizationId, "logs");
-  const presenter = new LogsListPresenter($replica, logsClickhouse);
-
-  const listPromise = presenter
-    .call(project.organizationId, environment.id, {
-      userId,
-      projectId: project.id,
-      tasks: tasks.length > 0 ? tasks : undefined,
-      runId,
-      search,
-      levels,
-      period,
-      from,
-      to,
-      defaultPeriod: "1h",
-      retentionLimitDays
-    })
+  const listPromise = clickhouseFactory
+    .getClickhouseForOrganization(project.organizationId, "logs")
+    .then((logsClickhouse) =>
+      new LogsListPresenter($replica, logsClickhouse).call(project.organizationId, environment.id, {
+        userId,
+        projectId: project.id,
+        tasks: tasks.length > 0 ? tasks : undefined,
+        runId,
+        search,
+        levels,
+        period,
+        from,
+        to,
+        defaultPeriod: "1h",
+        retentionLimitDays,
+      })
+    )
     .catch((error) => {
       if (error instanceof ServiceValidationError) {
         return { error: error.message };
       }
       throw error;
     });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.logs/route.tsx
around lines 137 - 159, The ClickHouse client lookup is awaited up front which
can throw before the deferred error handling; instead, defer the factory call so
failures are handled by the same promise chain: build listPromise by calling
clickhouseFactory.getClickhouseForOrganization(...) inside the async chain used
to create the LogsListPresenter and call presenter.call (keep references to
clickhouseFactory.getClickhouseForOrganization, LogsListPresenter,
presenter.call, and listPromise) so any errors from the org datastore/client
init are caught by the existing .catch that handles ServiceValidationError and
rethrows other errors, rather than throwing before TypedAwait renders its
errorElement.

apps/webapp/app/routes/admin.api.v1.runs-replication.create.ts (1)

13-17: ⚠️ Potential issue | 🟠 Major

Don't silently drop the connection-tuning fields.

Lines 13-17 still require keepAliveEnabled, keepAliveIdleSocketTtl, and maxOpenConnections, but Lines 82-119 no longer pass them anywhere. This endpoint will accept those values and return success even though they no longer affect replication behavior. Either wire them into the new factory path or remove them from the request schema before merge.

One fix if these settings are no longer supported

 const CreateRunReplicationServiceParams = z.object({
   name: z.string(),
-  keepAliveEnabled: z.boolean(),
-  keepAliveIdleSocketTtl: z.number(),
-  maxOpenConnections: z.number(),
   maxFlushConcurrency: z.number(),
   flushIntervalMs: z.number(),
   flushBatchSize: z.number(),

Also applies to: 82-93, 95-119

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/routes/admin.api.v1.runs-replication.create.ts` around lines
13 - 17, The request schema CreateRunReplicationServiceParams still requires
keepAliveEnabled, keepAliveIdleSocketTtl, and maxOpenConnections but the handler
that constructs the replication service (the new factory path used in the run
replication creation flow) no longer consumes these fields, so they are
effectively ignored; either (A) thread these three fields into the replication
service factory/constructor/creation call (the function that builds the
replication client/service used in the handler) so the values are applied,
making sure to map the names exactly, or (B) remove these fields from
CreateRunReplicationServiceParams (and any validation/usage sites) so the API no
longer accepts unused settings; locate the schema
CreateRunReplicationServiceParams and the replication service
factory/constructor function referenced in the create run replication handler to
implement one of these fixes.

🧹 Nitpick comments (4)

apps/webapp/test/runsReplicationService.part2.test.ts (1)

26-27: Cover org-to-ClickHouse routing with a multi-store case.

All of these setups hand the service a factory backed by a single ClickHouse instance. That means the suite still passes if RunsReplicationService sends the wrong organization ID into the factory, which is the new behavior this change depends on. Add one case that maps two orgs to different clients and asserts each org's rows land in the correct backend.

Also applies to: 46-47, 155-156, 270-271, 391-392, 525-526, 632-633, 801-802, 926-927, 1139-1140
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/test/runsReplicationService.part2.test.ts` around lines 26 - 27,
The tests currently instantiate RunsReplicationService with a
TestReplicationClickhouseFactory backed by a single ClickHouse instance, so they
don't catch org-to-ClickHouse routing bugs; update the affected test cases
(instances where new RunsReplicationService is constructed with new
TestReplicationClickhouseFactory(clickhouse)) to create a multi-store scenario:
build two distinct Test ClickHouse clients (e.g., clickhouseA and clickhouseB),
construct a TestReplicationClickhouseFactory that maps orgA -> clickhouseA and
orgB -> clickhouseB, pass that factory into the RunsReplicationService, then
insert rows for both orgs and assert each org's rows appear in the corresponding
backend client (use the same helper assertions already used elsewhere in the
tests). Ensure you change all occurrences referenced (the other construction
sites of TestReplicationClickhouseFactory) so the suite validates org routing
across multiple stores.

apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts (1)

94-130: Validate and type config at this service boundary.

These methods accept config: any and persist it directly. Any caller that skips route-level validation can store a payload that later makes loadFromDatabase() skip the datastore entirely. Parse config here with the schema for kind, and replace any with the inferred type.

As per coding guidelines, "Use zod for validation in packages/core and apps/webapp."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts`
around lines 94 - 130, addDataStore and updateDataStore accept config: any and
persist it directly; validate and type the config at this service boundary using
the zod schema for the DataStoreKind before storing secrets or writing to
Prisma. Locate the methods addDataStore and updateDataStore and replace the
loose any type with the inferred TypeScript type from the zod parser; run the
appropriate zod.parse/partial-parse for the provided kind (e.g.,
schemaForKind(kind)) and throw or return a clear error on parse failure, then
use the validated config when calling getSecretStore().setSecret(secretKey, ...)
and when writing config to this._prisma.organizationDataStore.create/update;
ensure loadFromDatabase behavior is preserved by storing only the validated
shape (or a secured secretKey) and include error handling to surface validation
failures.

apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts (2)

336-389: Consider reducing duplication between clickhouse and clickhouse_v2 cases.

The two cases share ~95% identical configuration. Extracting common options would improve maintainability.

♻️ Proposed refactor

 function buildEventRepository(store: string, clickhouse: ClickHouse): ClickhouseEventRepository {
+  const commonOptions = {
+    clickhouse,
+    batchSize: env.EVENTS_CLICKHOUSE_BATCH_SIZE,
+    flushInterval: env.EVENTS_CLICKHOUSE_FLUSH_INTERVAL_MS,
+    maximumTraceSummaryViewCount: env.EVENTS_CLICKHOUSE_MAX_TRACE_SUMMARY_VIEW_COUNT,
+    maximumTraceDetailedSummaryViewCount: env.EVENTS_CLICKHOUSE_MAX_TRACE_DETAILED_SUMMARY_VIEW_COUNT,
+    maximumLiveReloadingSetting: env.EVENTS_CLICKHOUSE_MAX_LIVE_RELOADING_SETTING,
+    insertStrategy: env.EVENTS_CLICKHOUSE_INSERT_STRATEGY,
+    waitForAsyncInsert: env.EVENTS_CLICKHOUSE_WAIT_FOR_ASYNC_INSERT === "1",
+    asyncInsertMaxDataSize: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_MAX_DATA_SIZE,
+    asyncInsertBusyTimeoutMs: env.EVENTS_CLICKHOUSE_ASYNC_INSERT_BUSY_TIMEOUT_MS,
+    llmMetricsBatchSize: env.LLM_METRICS_BATCH_SIZE,
+    llmMetricsFlushInterval: env.LLM_METRICS_FLUSH_INTERVAL_MS,
+    llmMetricsMaxBatchSize: env.LLM_METRICS_MAX_BATCH_SIZE,
+    llmMetricsMaxConcurrency: env.LLM_METRICS_MAX_CONCURRENCY,
+    otlpMetricsBatchSize: env.METRICS_CLICKHOUSE_BATCH_SIZE,
+    otlpMetricsFlushInterval: env.METRICS_CLICKHOUSE_FLUSH_INTERVAL_MS,
+    otlpMetricsMaxConcurrency: env.METRICS_CLICKHOUSE_MAX_CONCURRENCY,
+  };
+
   switch (store) {
     case "clickhouse": {
       return new ClickhouseEventRepository({
-        clickhouse,
-        batchSize: env.EVENTS_CLICKHOUSE_BATCH_SIZE,
-        // ... all the options
+        ...commonOptions,
+        startTimeMaxAgeMs: env.EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS,
         version: "v1",
       });
     }
     case "clickhouse_v2": {
       return new ClickhouseEventRepository({
-        clickhouse: clickhouse,
-        batchSize: env.EVENTS_CLICKHOUSE_BATCH_SIZE,
-        // ... all the options
+        ...commonOptions,
         version: "v2",
       });
     }
     default: {
       throw new Error(`Unknown ClickHouse event repository store: ${store}`);
     }
   }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts` around lines
336 - 389, The switch in buildEventRepository duplicates nearly identical
ClickhouseEventRepository construction for "clickhouse" and "clickhouse_v2";
refactor by extracting the shared options into a single commonOptions object
(populate fields like clickhouse, batchSize, flushInterval,
maximumTraceSummaryViewCount, maximumTraceDetailedSummaryViewCount,
maximumLiveReloadingSetting, insertStrategy, waitForAsyncInsert,
asyncInsertMaxDataSize, asyncInsertBusyTimeoutMs, startTimeMaxAgeMs (only
present for v1), llmMetrics*, otlpMetrics*, etc.), then call new
ClickhouseEventRepository({...commonOptions, version: "v1"}) for "clickhouse"
and new ClickhouseEventRepository({...commonOptions, version: "v2"}) for
"clickhouse_v2"); ensure you preserve startTimeMaxAgeMs presence only where
required and keep explicit clickhouse property name to match the constructor.

164-179: Org-scoped clients use uniform configuration regardless of clientType.

Unlike the default clients where logs and replication have specialized settings (e.g., clickhouseSettings for logs, RUN_REPLICATION_* env vars for replication), all org-scoped clients use the same generic CLICKHOUSE_* configuration. If org-specific logs or replication clients require different settings (like max_memory_usage for logs queries), this could cause performance or resource issues.

Consider whether org-scoped clients need type-specific configurations, or document that org data stores intentionally use simplified settings.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts` around lines
164 - 179, buildOrgClickhouseClient currently creates org-scoped ClickHouse
clients with the same generic CLICKHOUSE_* config regardless of ClientType;
update it to apply type-specific settings (e.g., when clientType === "logs"
include clickhouseSettings like max_memory_usage or when clientType ===
"replication" honor RUN_REPLICATION_* env vars) or add a clear comment/docstring
stating that org clients intentionally use simplified settings. Locate
buildOrgClickhouseClient and branch on the ClientType argument to merge the same
specialized options used for default clients (name already contains clientType)
or document the design decision so callers know org clients lack type-specific
tuning.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/webapp/app/env.server.ts`:
- Around line 1304-1305: The env schema allows
ORGANIZATION_DATA_STORES_RELOAD_INTERVAL_MS to be zero or negative, enabling
tight-loop reloads; update the zod schema for
ORGANIZATION_DATA_STORES_RELOAD_INTERVAL_MS to enforce sensible bounds (e.g.,
.int().min(60000) to require at least 1 minute and optionally .max(86400000) to
cap at 24 hours) while keeping the existing default (5 * 60 * 1000); use
z.coerce.number().int().min(...).max(...).default(...) or a .refine validator if
you prefer custom logic so invalid env values fail validation.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx:
- Around line 248-253: The presenter is being given the same ClickHouse "query"
client twice; call clickhouseFactory.getClickhouseForOrganization a second time
with the logs-scoped role (e.g., "logs") to obtain a logs client and pass that
as the presenter’s third argument so ErrorGroupPresenter($replica, queryClient,
logsClient) receives a query-scoped client and a logs-scoped client respectively
(identify the factory call clickhouseFactory.getClickhouseForOrganization and
the constructor ErrorGroupPresenter to locate where to change).

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.models.$modelId/route.tsx:
- Around line 71-72: The ModelRegistryPresenter is being instantiated with only
ClickHouse but it needs the Prisma replica client for methods like
getModelDetail(); update the instantiation where you call new
ModelRegistryPresenter(clickhouse) to pass the Prisma replica (named $replica)
as the second argument, ensure you import $replica from ~/db.server if not
already imported, and verify presenter methods (e.g., getModelDetail,
getUserMetrics) now use the provided clients.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.test.tasks.$taskParam/route.tsx:
- Around line 105-106: Move the call to
clickhouseFactory.getClickhouseForOrganization(...) so it executes inside the
loader's existing try block before constructing the TestTaskPresenter;
specifically ensure the clickhouse lookup and assignment to clickhouse happen
inside the try that wraps presenter.call(), then instantiate presenter = new
TestTaskPresenter($replica, clickhouse) after the successful lookup so any
errors from getClickhouseForOrganization are caught, logged, and handled by the
existing redirect/path in the catch.

In `@apps/webapp/app/routes/admin.data-stores.tsx`:
- Line 31: Update the import of tryCatch in the admin.data-stores route to use
the package subpath export: replace the current root import of tryCatch (import
{ tryCatch } from "@trigger.dev/core") with the subpath import from
"@trigger.dev/core/utils" so the symbol tryCatch is imported via the utils
subpath.
- Around line 88-137: ClickhouseConnectionSchema.parse() is used directly in the
"add" and "update" action handlers and will throw on invalid ClickHouse URLs
before the surrounding tryCatch runs; replace both usages with
ClickhouseConnectionSchema.safeParse(connectionObject) and check the returned
.success flag, returning typedjson({ error: validation.error.message }, {
status: 400 }) (or similar) when validation fails, then pass the validated value
to organizationDataStoresRegistry.addDataStore / updateDataStore; locate the
parse call sites in the "add" branch (where ClickhouseConnectionSchema.parse is
called before calling organizationDataStoresRegistry.addDataStore) and the
"update" branch (conditional parse before
organizationDataStoresRegistry.updateDataStore) and apply the safeParse pattern
and explicit error handling there.
- Line 1: The component currently calls onOpenChange(false) during render when
an operation reports success (the direct call to onOpenChange in the
render/JSX); move this side-effect into a useEffect so it runs after render
instead: remove the onOpenChange(false) call from the render/JSX, add a
useEffect that watches the isSuccess (or equivalent success flag) and calls
onOpenChange(false) when it becomes true, and include onOpenChange and the
success flag in the effect dependency array to avoid stale closures. Ensure you
reference the same success state variable and the onOpenChange prop used in the
component (replace the inline call with the new effect).

In `@apps/webapp/app/routes/otel.v1.metrics.ts`:
- Line 10: The code eagerly awaits otlpExporter (const exporter = await
otlpExporter) before validating the request content-type, causing unnecessary
exporter startup and possible init failures for requests that should return 400;
move the await of otlpExporter so it only runs after the content-type check (the
branch that returns the unsupported-content-type response) so malformed/probe
traffic never triggers exporter initialization — locate the otlpExporter await
in the handler in otel.v1.metrics.ts and only resolve otlpExporter after the
content-type validation code path.

In `@apps/webapp/app/runEngine/services/triggerFailedTask.server.ts`:
- Around line 70-74: The current write selects repository/store via
getEventRepository but only records taskEventStore in the run model, which will
break reads if an org is re-pointed; update the write path in
triggerFailedTask.server.ts so that when calling getEventRepository you also
persist the concrete OrganizationDataStore identifier (e.g., the returned
repository/store metadata or an OrganizationDataStore enum/value) into the run
model alongside taskEventStore (or alternatively mark the org's datastore
assignment immutable until a migration flag is set); specifically, modify the
code that assigns repository/store and the run creation/update logic to save the
concrete datastore binding (referencing getEventRepository, repository, store,
taskEventStore, and OrganizationDataStore) so future reads resolve the original
backend used for those writes.

In `@apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts`:
- Around line 313-334: getEventsClickhouseClient currently instantiates a new
ClickHouse on each call; change it to use the existing singleton pattern: create
a singleton holder (e.g., const defaultEventsClickhouseClient = singleton(() =>
new ClickHouse({...same options...}))) and have getEventsClickhouseClient return
defaultEventsClickhouseClient() (or inline return singleton(() => new
ClickHouse(...))()). Keep the same URL parsing and options
(url.searchParams.delete("secure"), keepAlive, compression, etc.) but ensure
only one ClickHouse instance is created and reused across calls.
- Around line 362-384: The clickhouse_v2 case building the
ClickhouseEventRepository is missing the startTimeMaxAgeMs config that the
clickhouse case passes; update the clickhouse_v2 branch where
ClickhouseEventRepository is constructed to include startTimeMaxAgeMs:
env.EVENTS_CLICKHOUSE_START_TIME_MAX_AGE_MS so the same start-time
clamping/validation used by the v1 config is applied to v2 as well.

In
`@apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts`:
- Around line 72-75: The loop that builds the lookup map silently overwrites
keys produced from `${orgId}:${row.kind}`, causing nondeterministic behavior
when an org is assigned to multiple stores; before calling lookup.set(key,
parsed) check lookup.has(key) and if present throw a clear error (or assert)
that includes the conflicting orgId, kind, and identifiers for both the existing
entry and the current row (e.g., existingParsed/id and parsed/id) so overlapping
assignments discovered during findMany() fail fast and surface which rows/IDs
conflict.
- Around line 94-157: The addDataStore, updateDataStore and deleteDataStore
methods only write to the DB but don't update the in-memory registry map
(_lookup), leaving routing stale; after each successful mutation (after the
secret write and after the prisma create/update/delete) call the registry's
method that repopulates the in-memory map used by get() (the same function the
background loader uses — e.g. this.reload() / this._reloadLookup() /
this._refreshLookup(), whichever exists) so _lookup is refreshed immediately and
subsequent get() calls see the new data.

In `@apps/webapp/app/services/queryService.server.ts`:
- Around line 278-279: The concurrency limiter is currently keyed by projectId
but the ClickHouse client and execution are organization-scoped (see
clickhouseFactory.getClickhouseForOrganization and executeTSQL), so update the
limiter acquire/release calls (where limiter.acquire and limiter.release are
used around query execution) to use organizationId instead of projectId so
concurrent queries are limited per organization rather than per project; ensure
any helper that constructs limiter keys or caches limiter instances uses
organizationId consistently in the same places referenced around the query
execution.

In `@apps/webapp/app/services/runsReplicationService.server.ts`:
- Around line 655-685: The current loop serializes inserts per ClickHouse group
because each await blocks the next; change it to run per-group flushes
concurrently with a bounded concurrency limiter (e.g., p-limit or an internal
semaphore) so slow/unhealthy ClickHouse targets don't stall unrelated orgs. For
each group (the items handled around sortTaskRunInserts, sortPayloadInserts,
combinedTaskRunInserts/combinedPayloadInserts and using this.#insertWithRetry
which calls this.#insertTaskRunInserts and this.#insertPayloadInserts with
flushId), create an async task that performs the two insertWithRetry calls,
records the first taskRunError/payloadError seen, and updates
_taskRunsInsertedCounter/_payloadsInsertedCounter when each insert succeeds;
schedule those tasks through the limiter and await Promise.all on the scheduled
tasks; ensure you keep pushing group.taskRunInserts and group.payloadInserts
into combined* before scheduling so semantics remain the same.

In `@apps/webapp/app/v3/otlpExporter.server.ts`:
- Around line 1207-1215: The initializer initializeOTLPExporter is reading
settings directly from process.env; change it to use the typed env export
instead: replace process.env.OTLP_EXPORTER_VERBOSE with
env.OTLP_EXPORTER_VERBOSE (coerce to "1" check as before) and replace
process.env.SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT with
env.SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT and parseInt that value,
falling back to 8192; ensure the existing env import is used (or add it if
missing) and keep clickhouseFactory and OTLPExporter usage unchanged.

---

Outside diff comments:
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.logs/route.tsx:
- Around line 137-159: The ClickHouse client lookup is awaited up front which
can throw before the deferred error handling; instead, defer the factory call so
failures are handled by the same promise chain: build listPromise by calling
clickhouseFactory.getClickhouseForOrganization(...) inside the async chain used
to create the LogsListPresenter and call presenter.call (keep references to
clickhouseFactory.getClickhouseForOrganization, LogsListPresenter,
presenter.call, and listPromise) so any errors from the org datastore/client
init are caught by the existing .catch that handles ServiceValidationError and
rethrows other errors, rather than throwing before TypedAwait renders its
errorElement.

In `@apps/webapp/app/routes/admin.api.v1.runs-replication.create.ts`:
- Around line 13-17: The request schema CreateRunReplicationServiceParams still
requires keepAliveEnabled, keepAliveIdleSocketTtl, and maxOpenConnections but
the handler that constructs the replication service (the new factory path used
in the run replication creation flow) no longer consumes these fields, so they
are effectively ignored; either (A) thread these three fields into the
replication service factory/constructor/creation call (the function that builds
the replication client/service used in the handler) so the values are applied,
making sure to map the names exactly, or (B) remove these fields from
CreateRunReplicationServiceParams (and any validation/usage sites) so the API no
longer accepts unused settings; locate the schema
CreateRunReplicationServiceParams and the replication service
factory/constructor function referenced in the create run replication handler to
implement one of these fixes.

---

Nitpick comments:
In `@apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts`:
- Around line 336-389: The switch in buildEventRepository duplicates nearly
identical ClickhouseEventRepository construction for "clickhouse" and
"clickhouse_v2"; refactor by extracting the shared options into a single
commonOptions object (populate fields like clickhouse, batchSize, flushInterval,
maximumTraceSummaryViewCount, maximumTraceDetailedSummaryViewCount,
maximumLiveReloadingSetting, insertStrategy, waitForAsyncInsert,
asyncInsertMaxDataSize, asyncInsertBusyTimeoutMs, startTimeMaxAgeMs (only
present for v1), llmMetrics*, otlpMetrics*, etc.), then call new
ClickhouseEventRepository({...commonOptions, version: "v1"}) for "clickhouse"
and new ClickhouseEventRepository({...commonOptions, version: "v2"}) for
"clickhouse_v2"); ensure you preserve startTimeMaxAgeMs presence only where
required and keep explicit clickhouse property name to match the constructor.
- Around line 164-179: buildOrgClickhouseClient currently creates org-scoped
ClickHouse clients with the same generic CLICKHOUSE_* config regardless of
ClientType; update it to apply type-specific settings (e.g., when clientType ===
"logs" include clickhouseSettings like max_memory_usage or when clientType ===
"replication" honor RUN_REPLICATION_* env vars) or add a clear comment/docstring
stating that org clients intentionally use simplified settings. Locate
buildOrgClickhouseClient and branch on the ClientType argument to merge the same
specialized options used for default clients (name already contains clientType)
or document the design decision so callers know org clients lack type-specific
tuning.

In
`@apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts`:
- Around line 94-130: addDataStore and updateDataStore accept config: any and
persist it directly; validate and type the config at this service boundary using
the zod schema for the DataStoreKind before storing secrets or writing to
Prisma. Locate the methods addDataStore and updateDataStore and replace the
loose any type with the inferred TypeScript type from the zod parser; run the
appropriate zod.parse/partial-parse for the provided kind (e.g.,
schemaForKind(kind)) and throw or return a clear error on parse failure, then
use the validated config when calling getSecretStore().setSecret(secretKey, ...)
and when writing config to this._prisma.organizationDataStore.create/update;
ensure loadFromDatabase behavior is preserved by storing only the validated
shape (or a secured secretKey) and include error handling to surface validation
failures.

In `@apps/webapp/test/runsReplicationService.part2.test.ts`:
- Around line 26-27: The tests currently instantiate RunsReplicationService with
a TestReplicationClickhouseFactory backed by a single ClickHouse instance, so
they don't catch org-to-ClickHouse routing bugs; update the affected test cases
(instances where new RunsReplicationService is constructed with new
TestReplicationClickhouseFactory(clickhouse)) to create a multi-store scenario:
build two distinct Test ClickHouse clients (e.g., clickhouseA and clickhouseB),
construct a TestReplicationClickhouseFactory that maps orgA -> clickhouseA and
orgB -> clickhouseB, pass that factory into the RunsReplicationService, then
insert rows for both orgs and assert each org's rows appear in the corresponding
backend client (use the same helper assertions already used elsewhere in the
tests). Ensure you change all occurrences referenced (the other construction
sites of TestReplicationClickhouseFactory) so the suite validates org routing
across multiple stores.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3a31161c-3ac9-4af9-bcd2-f229a19f7a61

📥 Commits

Reviewing files that changed from the base of the PR and between def21b2 and a60f2cb.

📒 Files selected for processing (69)

.cursor/mcp.json
.server-changes/organization-scoped-clickhouse.md
CLAUDE.md
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/ApiRunListPresenter.server.ts
apps/webapp/app/presenters/v3/CreateBulkActionPresenter.server.ts
apps/webapp/app/presenters/v3/RunPresenter.server.ts
apps/webapp/app/presenters/v3/RunTagListPresenter.server.ts
apps/webapp/app/presenters/v3/SpanPresenter.server.ts
apps/webapp/app/presenters/v3/TaskListPresenter.server.ts
apps/webapp/app/presenters/v3/UsagePresenter.server.ts
apps/webapp/app/presenters/v3/ViewSchedulePresenter.server.ts
apps/webapp/app/presenters/v3/WaitpointPresenter.server.ts
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.dashboards.$dashboardKey/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.logs/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.models.$modelId/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.models._index/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.models.compare/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.prompts.$promptSlug/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.prompts._index/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs.$runParam/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.runs._index/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.test.tasks.$taskParam/route.tsx
apps/webapp/app/routes/admin.api.v1.runs-replication.create.ts
apps/webapp/app/routes/admin.api.v1.runs-replication.start.ts
apps/webapp/app/routes/admin.data-stores.tsx
apps/webapp/app/routes/admin.tsx
apps/webapp/app/routes/api.v1.prompts.$slug.ts
apps/webapp/app/routes/api.v1.prompts.$slug.versions.ts
apps/webapp/app/routes/api.v1.prompts._index.ts
apps/webapp/app/routes/otel.v1.logs.ts
apps/webapp/app/routes/otel.v1.metrics.ts
apps/webapp/app/routes/otel.v1.traces.ts
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.logs.$logId.tsx
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.logs.ts
apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.prompts.$promptSlug.generations.ts
apps/webapp/app/runEngine/concerns/traceEvents.server.ts
apps/webapp/app/runEngine/services/triggerFailedTask.server.ts
apps/webapp/app/services/admin/missingLlmModels.server.ts
apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts
apps/webapp/app/services/clickhouse/clickhouseSecretSchemas.server.ts
apps/webapp/app/services/clickhouseInstance.server.ts
apps/webapp/app/services/dataStores/organizationDataStoreConfigSchemas.server.ts
apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts
apps/webapp/app/services/dataStores/organizationDataStoresRegistryInstance.server.ts
apps/webapp/app/services/queryService.server.ts
apps/webapp/app/services/runsReplicationInstance.server.ts
apps/webapp/app/services/runsReplicationService.server.ts
apps/webapp/app/v3/eventRepository/clickhouseEventRepository.server.ts
apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
apps/webapp/app/v3/eventRepository/eventRepository.server.ts
apps/webapp/app/v3/eventRepository/eventRepository.types.ts
apps/webapp/app/v3/eventRepository/index.server.ts
apps/webapp/app/v3/otlpExporter.server.ts
apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts
apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/test/clickhouseFactory.test.ts
apps/webapp/test/organizationDataStoresRegistry.test.ts
apps/webapp/test/runsBackfiller.test.ts
apps/webapp/test/runsReplicationBenchmark.test.ts
apps/webapp/test/runsReplicationService.part1.test.ts
apps/webapp/test/runsReplicationService.part2.test.ts
apps/webapp/test/utils/replicationUtils.ts
apps/webapp/test/utils/testReplicationClickhouseFactory.ts
apps/webapp/test/utils/tracing.ts
internal-packages/database/prisma/migrations/20260331212308_add_organization_data_stores/migration.sql
internal-packages/database/prisma/schema.prisma

💤 Files with no reviewable changes (2)

apps/webapp/app/v3/eventRepository/clickhouseEventRepositoryInstance.server.ts
apps/webapp/app/services/clickhouseInstance.server.ts

coderabbitai · 2026-04-07T15:45:44Z

+      for (const [clickhouse, group] of groups) {
+        sortTaskRunInserts(group.taskRunInserts);
+        sortPayloadInserts(group.payloadInserts);
+        combinedTaskRunInserts.push(...group.taskRunInserts);
+        combinedPayloadInserts.push(...group.payloadInserts);
+
+        const [trErr] = await this.#insertWithRetry(
+          (attempt) => this.#insertTaskRunInserts(clickhouse, group.taskRunInserts, attempt),
+          "task run inserts",
+          flushId
+        );
+        if (trErr && !taskRunError) {
+          taskRunError = trErr;
+        }
+
+        const [plErr] = await this.#insertWithRetry(
+          (attempt) => this.#insertPayloadInserts(clickhouse, group.payloadInserts, attempt),
+          "payload inserts",
+          flushId
+        );
+        if (plErr && !payloadError) {
+          payloadError = plErr;
+        }
+
+        if (!trErr) {
+          this._taskRunsInsertedCounter.add(group.taskRunInserts.length);
+        }
+        if (!plErr) {
+          this._payloadsInsertedCounter.add(group.payloadInserts.length);
+        }
+      }


⚠️ Potential issue | 🟠 Major

Avoid serializing the per-ClickHouse-group flushes.

Each await here makes every org group wait behind the slowest ClickHouse target. If one org-specific service is unhealthy and spends a few seconds in retry/backoff, unrelated orgs in the same replication batch inherit that lag too. On this hot path, the groups should be flushed behind a bounded limiter instead of one-by-one.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/webapp/app/services/runsReplicationService.server.ts` around lines 655 - 685, The current loop serializes inserts per ClickHouse group because each await blocks the next; change it to run per-group flushes concurrently with a bounded concurrency limiter (e.g., p-limit or an internal semaphore) so slow/unhealthy ClickHouse targets don't stall unrelated orgs. For each group (the items handled around sortTaskRunInserts, sortPayloadInserts, combinedTaskRunInserts/combinedPayloadInserts and using this.#insertWithRetry which calls this.#insertTaskRunInserts and this.#insertPayloadInserts with flushId), create an async task that performs the two insertWithRetry calls, records the first taskRunError/payloadError seen, and updates _taskRunsInsertedCounter/_payloadsInsertedCounter when each insert succeeds; schedule those tasks through the limiter and await Promise.all on the scheduled tasks; ensure you keep pushing group.taskRunInserts and group.payloadInserts into combined* before scheduling so semantics remain the same.

Ideally we would do these in parallel but there can be issues with the linux network socket writing buffer (or something). We had to increase the net.ipv4.tcp_wmem to 4096 20480 8388608 for example because of the amount of data written to the socket was causing issues. I wonder if this could be driven via an env var like RUN_REPLICATION_GROUP_WRITING_STRATEGY="parallel|sequential"

@ericallam, that's a very pragmatic constraint — socket write-buffer exhaustion is a real issue at high throughput and silently manifests as latency spikes rather than hard errors, so it's easy to overlook. The env var approach is a clean way to unlock parallelism for operators who have already tuned net.ipv4.tcp_wmem (or are running on a network stack that doesn't hit this limit).

Here's a sketch of how it could be wired in:

// In RunsReplicationServiceOptions groupFlushStrategy?: "sequential" | "parallel"; groupFlushConcurrency?: number; // only used when strategy === "parallel"

Then inside #flushBatch, branch on the strategy:

const strategy = this.options.groupFlushStrategy ?? "sequential"; const groupEntries = Array.from(groups.entries()); // Collect combined inserts regardless of strategy for (const [, group] of groupEntries) { sortTaskRunInserts(group.taskRunInserts); sortPayloadInserts(group.payloadInserts); combinedTaskRunInserts.push(...group.taskRunInserts); combinedPayloadInserts.push(...group.payloadInserts); } const flushGroup = async ([clickhouse, group]: [ClickHouse, typeof groups extends Map<infer _, infer V> ? V : never]) => { const [trErr] = await this.#insertWithRetry( (attempt) => this.#insertTaskRunInserts(clickhouse, group.taskRunInserts, attempt), "task run inserts", flushId ); if (trErr && !taskRunError) taskRunError = trErr; const [plErr] = await this.#insertWithRetry( (attempt) => this.#insertPayloadInserts(clickhouse, group.payloadInserts, attempt), "payload inserts", flushId ); if (plErr && !payloadError) payloadError = plErr; if (!trErr) this._taskRunsInsertedCounter.add(group.taskRunInserts.length); if (!plErr) this._payloadsInsertedCounter.add(group.payloadInserts.length); }; if (strategy === "parallel") { const limit = pLimit(this.options.groupFlushConcurrency ?? groupEntries.length); await Promise.all(groupEntries.map((entry) => limit(() => flushGroup(entry)))); } else { for (const entry of groupEntries) { await flushGroup(entry); } }

The env var can then be read in runsReplicationInstance.server.ts / the admin create route and mapped to groupFlushStrategy. Keeping "sequential" as the default is the right call — parallelism can be an opt-in for deployments that have raised the socket buffer limits.

✏️ Learnings added

Learnt from: ericallam Repo: triggerdotdev/trigger.dev PR: 3333 File: apps/webapp/app/services/runsReplicationService.server.ts:655-685 Timestamp: 2026-04-17T13:20:11.004Z Learning: In `apps/webapp/app/services/runsReplicationService.server.ts`, the per-ClickHouse-group inserts inside `#flushBatch` are intentionally serialized (sequential) by default. Parallelizing group flushes causes Linux socket write-buffer pressure that required kernel tuning (`net.ipv4.tcp_wmem` set to `4096 20480 8388608`). Parallel flushing may be exposed as an opt-in via a `groupFlushStrategy` env var/option, but sequential must remain the safe default. Do not flag the sequential loop as a performance issue without acknowledging this constraint.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: matt-aitken Repo: triggerdotdev/trigger.dev PR: 2264 File: apps/webapp/app/services/runsRepository.server.ts:172-174 Timestamp: 2025-07-12T18:06:04.133Z Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Learnt from: ericallam Repo: triggerdotdev/trigger.dev PR: 3368 File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:24-67 Timestamp: 2026-04-13T21:44:00.032Z Learning: In `apps/webapp/app/services/taskIdentifierRegistry.server.ts`, the sequential upsert/updateMany/findMany writes in `syncTaskIdentifiers` are intentionally NOT wrapped in a Prisma transaction. This function runs only during deployment-change events (low-concurrency path), and any partial `isInLatestDeployment` state is acceptable because it self-corrects on the next deployment. Do not flag this as a missing-transaction/atomicity issue in future reviews.

Learnt from: matt-aitken Repo: triggerdotdev/trigger.dev PR: 3331 File: apps/webapp/app/runEngine/concerns/batchPayloads.server.ts:112-136 Timestamp: 2026-04-07T14:12:59.018Z Learning: In `apps/webapp/app/runEngine/concerns/batchPayloads.server.ts`, the `pRetry` call wrapping `uploadPacketToObjectStore` intentionally retries **all** error types (no `shouldRetry` filter / `AbortError` guards). The maintainer explicitly prefers over-retrying to under-retrying because multiple heterogeneous object store backends are supported and it is impractical to enumerate all permanent error signatures. Do not flag this as an issue in future reviews.

Learnt from: CR Repo: triggerdotdev/trigger.dev PR: 0 File: packages/redis-worker/CLAUDE.md:0-0 Timestamp: 2026-03-02T12:43:43.173Z Learning: Applies to packages/redis-worker/**/redis-worker/src/worker.ts : Worker loop and job processing should implement concurrency control in src/worker.ts

Learnt from: matt-aitken Repo: triggerdotdev/trigger.dev PR: 3333 File: apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts:115-178 Timestamp: 2026-04-17T11:28:53.324Z Learning: In `apps/webapp/app/services/dataStores/organizationDataStoresRegistry.server.ts`, the `addDataStore`, `updateDataStore`, and `deleteDataStore` methods intentionally do NOT call `reload()` or update the in-memory `_lookup` map after writing to the database. The registry runs as a singleton on many servers simultaneously; refreshing in-memory state on only the server that handled the mutation would create inconsistent routing across the fleet. The intended propagation mechanism is the periodic background reload controlled by `ORGANIZATION_DATA_STORES_RELOAD_INTERVAL_MS` (default 60s) in `organizationDataStoresRegistryInstance.server.ts`. Some brief routing disruption when switching an org between data stores is an accepted operational trade-off. Do not flag the mutation methods for missing in-memory refresh in future reviews.

Learnt from: ericallam Repo: triggerdotdev/trigger.dev PR: 3219 File: internal-packages/run-engine/src/run-queue/index.ts:3264-3353 Timestamp: 2026-03-14T10:06:33.506Z Learning: In `internal-packages/run-engine/src/run-queue/index.ts`, the `dequeueMessagesFromCkQueue` Lua command intentionally dequeues only 1 message per CK sub-queue per iteration (round-robin). This is by design for fairness across concurrency keys (CKs). The `actualMaxCount * 3` scan limit for `ZRANGEBYSCORE` on the CK index is also intentional — it is adequate because each CK independently gets the full queue concurrency limit, so per-CK concurrency blocks rarely occur.

Learnt from: matt-aitken Repo: triggerdotdev/trigger.dev PR: 2870 File: apps/webapp/app/services/redisConcurrencyLimiter.server.ts:56-66 Timestamp: 2026-01-12T17:18:09.451Z Learning: In `apps/webapp/app/services/redisConcurrencyLimiter.server.ts`, the query concurrency limiter will not be deployed with Redis Cluster mode, so multi-key operations (keyKey and globalKey in different hash slots) are acceptable and will function correctly in standalone Redis mode.

Learnt from: CR Repo: triggerdotdev/trigger.dev PR: 0 File: apps/webapp/CLAUDE.md:0-0 Timestamp: 2026-04-16T14:19:16.309Z Learning: Applies to apps/webapp/app/v3/services/{cancelTaskRun,batchTriggerV3}.server.ts : When editing services that branch on `RunEngineVersion` to support both V1 and V2 (e.g., `cancelTaskRun.server.ts`, `batchTriggerV3.server.ts`), only modify V2 code paths

Learnt from: ericallam Repo: triggerdotdev/trigger.dev PR: 3244 File: apps/webapp/app/components/code/TextEditor.tsx:81-86 Timestamp: 2026-03-22T13:26:12.060Z Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Learnt from: matt-aitken Repo: triggerdotdev/trigger.dev PR: 3187 File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204 Timestamp: 2026-03-22T19:24:14.403Z Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Learnt from: myftija Repo: triggerdotdev/trigger.dev PR: 3274 File: apps/webapp/app/services/runsReplicationService.server.ts:922-924 Timestamp: 2026-03-26T09:02:07.973Z Learning: When parsing Trigger.dev task run annotations in server-side services, keep `TaskRun.annotations` strictly conforming to the `RunAnnotations` schema from `trigger.dev/core/v3`. If the code already uses `RunAnnotations.safeParse` (e.g., in a `#parseAnnotations` helper), treat that as intentional/necessary for atomic, schema-accurate annotation handling. Do not recommend relaxing the annotation payload schema or using a permissive “passthrough” parse path, since the annotations are expected to be written atomically in one operation and should not contain partial/legacy payloads that would require a looser parser.

coderabbitai

♻️ Duplicate comments (2)

apps/webapp/app/v3/eventRepository/index.server.ts (2)

218-222: ⚠️ Potential issue | 🔴 Critical

recordRunEvent can route Postgres runs into the ClickHouse factory path.

For legacy runs where foundRun.taskEventStore is Postgres, Line 218-222 still resolves via clickhouseFactory, which can break event writes on this hot path.

🐛 Proposed fix

-    const { repository: $eventRepository } =
-      await clickhouseFactory.getEventRepositoryForOrganization(
-        foundRun.taskEventStore,
-        foundRun.runtimeEnvironment.organizationId
-      );
+    const { repository: $eventRepository } = await getV3EventRepository(
+      foundRun.runtimeEnvironment.organizationId,
+      foundRun.taskEventStore
+    );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/eventRepository/index.server.ts` around lines 218 - 222,
recordRunEvent is incorrectly routing runs whose foundRun.taskEventStore is
Postgres into clickhouseFactory.getEventRepositoryForOrganization; change the
routing to branch on foundRun.taskEventStore and call the appropriate factory
when resolving the repository (e.g., use
postgresFactory.getEventRepositoryForOrganization or the generic repository
factory for Postgres stores instead of clickhouseFactory). Locate the call to
clickhouseFactory.getEventRepositoryForOrganization and add a guard that checks
foundRun.taskEventStore (or the TaskEventStore enum) and delegates to the
correct factory/method so Postgres runs use the Postgres event repository path.

80-99: ⚠️ Potential issue | 🟠 Major

Postgres fallback returns inconsistent store token and still pre-calls ClickHouse factory.

Line 84 and Line 98 return a Postgres repository but may report a non-Postgres store via getTaskEventStore(). Also, Line 87 calls the ClickHouse factory before fully validating store type, which weakens the default fallback intent for unexpected persisted values.

🐛 Proposed fix

 export async function getEventRepository(
   organizationId: string,
   featureFlags: Record<string, unknown> | undefined,
   parentStore: string | undefined
 ): Promise<{ repository: IEventRepository; store: string }> {
   const taskEventStore = parentStore ?? (await resolveTaskEventRepositoryFlag(featureFlags));
 
-  // Support legacy Postgres store for self-hosters
-  if (taskEventStore === EVENT_STORE_TYPES.POSTGRES) {
-    return { repository: eventRepository, store: getTaskEventStore() };
-  }
-
-  const { repository: resolvedRepository } =
-    await clickhouseFactory.getEventRepositoryForOrganization(taskEventStore, organizationId);
-
   switch (taskEventStore) {
+    case EVENT_STORE_TYPES.POSTGRES: {
+      return { repository: eventRepository, store: EVENT_STORE_TYPES.POSTGRES };
+    }
     case EVENT_STORE_TYPES.CLICKHOUSE_V2: {
+      const { repository: resolvedRepository } =
+        await clickhouseFactory.getEventRepositoryForOrganization(taskEventStore, organizationId);
       return { repository: resolvedRepository, store: EVENT_STORE_TYPES.CLICKHOUSE_V2 };
     }
     case EVENT_STORE_TYPES.CLICKHOUSE: {
+      const { repository: resolvedRepository } =
+        await clickhouseFactory.getEventRepositoryForOrganization(taskEventStore, organizationId);
       return { repository: resolvedRepository, store: EVENT_STORE_TYPES.CLICKHOUSE };
     }
     default: {
-      return { repository: eventRepository, store: getTaskEventStore() };
+      return { repository: eventRepository, store: EVENT_STORE_TYPES.POSTGRES };
     }
   }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/eventRepository/index.server.ts` around lines 80 - 99, The
code prematurely calls clickhouseFactory.getEventRepositoryForOrganization and
can return a mismatched store token; fix by first determining taskEventStore
(using parentStore ?? resolveTaskEventRepositoryFlag(...)) and if taskEventStore
=== EVENT_STORE_TYPES.POSTGRES immediately return { repository: eventRepository,
store: EVENT_STORE_TYPES.POSTGRES } without calling
clickhouseFactory.getEventRepositoryForOrganization; only call
clickhouseFactory.getEventRepositoryForOrganization when taskEventStore is not
POSTGRES, and in the default fallback return use getTaskEventStore() or
EVENT_STORE_TYPES.POSTGRES consistently so returned store token matches the
repository (refer to taskEventStore, EVENT_STORE_TYPES.POSTGRES,
clickhouseFactory.getEventRepositoryForOrganization, getTaskEventStore,
eventRepository, organizationId).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@apps/webapp/app/v3/eventRepository/index.server.ts`:
- Around line 218-222: recordRunEvent is incorrectly routing runs whose
foundRun.taskEventStore is Postgres into
clickhouseFactory.getEventRepositoryForOrganization; change the routing to
branch on foundRun.taskEventStore and call the appropriate factory when
resolving the repository (e.g., use
postgresFactory.getEventRepositoryForOrganization or the generic repository
factory for Postgres stores instead of clickhouseFactory). Locate the call to
clickhouseFactory.getEventRepositoryForOrganization and add a guard that checks
foundRun.taskEventStore (or the TaskEventStore enum) and delegates to the
correct factory/method so Postgres runs use the Postgres event repository path.
- Around line 80-99: The code prematurely calls
clickhouseFactory.getEventRepositoryForOrganization and can return a mismatched
store token; fix by first determining taskEventStore (using parentStore ??
resolveTaskEventRepositoryFlag(...)) and if taskEventStore ===
EVENT_STORE_TYPES.POSTGRES immediately return { repository: eventRepository,
store: EVENT_STORE_TYPES.POSTGRES } without calling
clickhouseFactory.getEventRepositoryForOrganization; only call
clickhouseFactory.getEventRepositoryForOrganization when taskEventStore is not
POSTGRES, and in the default fallback return use getTaskEventStore() or
EVENT_STORE_TYPES.POSTGRES consistently so returned store token matches the
repository (refer to taskEventStore, EVENT_STORE_TYPES.POSTGRES,
clickhouseFactory.getEventRepositoryForOrganization, getTaskEventStore,
eventRepository, organizationId).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fcb3eb53-e928-4269-b5f1-e7983bb980be

📥 Commits

Reviewing files that changed from the base of the PR and between c5f7ca3 and de2996d.

📒 Files selected for processing (3)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsx
apps/webapp/app/v3/eventRepository/index.server.ts

🚧 Files skipped from review as they are similar to previous changes (2)

apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsx
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx

📜 Review details

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from packages/core using subpaths only, never from the root. Example: @trigger.dev/core/subpath instead of @trigger.dev/core.

Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks. They stay on the branch throughout development and must be stripped by agentcrumbs strip before merge.

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Prefer static imports over dynamic imports. Only use dynamic import() when circular dependencies cannot be resolved otherwise, code splitting is genuinely needed for performance, or the module must be loaded conditionally at runtime.

Files:

apps/webapp/app/v3/eventRepository/index.server.ts

🧠 Learnings (16)

📓 Common learnings

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: For apps and internal packages in `apps/*` and `internal-packages/*`, use `typecheck` for verification. Never use `build` - building proves almost nothing about correctness.

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: For public packages in `packages/*`, use `build` for verification.

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: Add a changeset when modifying any public package (`packages/*` or `integrations/*`). Default to patch for bug fixes and minor changes. Confirm with maintainers before selecting minor (new features). Never select major without explicit approval.

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: When modifying only server components (`apps/webapp/`, `apps/supervisor/`, etc.) with no package changes, add a `.server-changes/` file instead of a changeset.

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: Zod must be pinned to a single version across the entire monorepo (currently `3.25.76`). When adding zod to a new or existing package, use the exact same version as the rest of the repo - never a different version or a range.

Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-16T16:43:43.344Z
Learning: Only modify V2 code paths when encountering V1/V2 branching in services. Legacy V1-only code includes MarQS queue, triggerTaskV1, cancelTaskRunV1. All new work uses Run Engine 2.0 (`internal/run-engine`) and redis-worker.

📚 Learning: 2025-07-12T18:06:04.133Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-03-26T10:02:25.354Z

Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3254
File: apps/webapp/app/services/platformNotifications.server.ts:363-385
Timestamp: 2026-03-26T10:02:25.354Z
Learning: In `triggerdotdev/trigger.dev`, the `getNextCliNotification` fallback in `apps/webapp/app/services/platformNotifications.server.ts` intentionally uses `prisma.orgMember.findFirst` (single org) when no `projectRef` is provided. This is acceptable for v1 because the CLI (`dev` and `login` commands) always passes `projectRef` in normal usage, making the fallback a rare edge case. Do not flag the single-org fallback as a multi-org correctness bug in this file.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-13T21:44:00.032Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:24-67
Timestamp: 2026-04-13T21:44:00.032Z
Learning: In `apps/webapp/app/services/taskIdentifierRegistry.server.ts`, the sequential upsert/updateMany/findMany writes in `syncTaskIdentifiers` are intentionally NOT wrapped in a Prisma transaction. This function runs only during deployment-change events (low-concurrency path), and any partial `isInLatestDeployment` state is acceptable because it self-corrects on the next deployment. Do not flag this as a missing-transaction/atomicity issue in future reviews.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-16T14:21:14.907Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: internal-packages/database/prisma/schema.prisma:666-666
Timestamp: 2026-04-16T14:21:14.907Z
Learning: In `triggerdotdev/trigger.dev`, the `BackgroundWorkerTask` covering index on `(runtimeEnvironmentId, slug, triggerSource)` lives in `internal-packages/database/prisma/migrations/20260413000000_add_bwt_covering_index/migration.sql` as a `CREATE INDEX CONCURRENTLY IF NOT EXISTS`, intentionally in its own migration file separate from the `TaskIdentifier` table migration. Do not flag this index as missing from the schema migrations in future reviews.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-16T13:45:18.782Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/test/engine/taskIdentifierRegistry.test.ts:3-19
Timestamp: 2026-04-16T13:45:18.782Z
Learning: In `apps/webapp/test/engine/taskIdentifierRegistry.test.ts`, the `vi.mock` calls for `~/services/taskIdentifierCache.server` (stubbing `getTaskIdentifiersFromCache` and `populateTaskIdentifierCache`), `~/models/task.server` (stubbing `getAllTaskIdentifiers`), and `~/db.server` (stubbing `prisma` and `$replica`) are intentional. The suite uses real Postgres via testcontainers for all `TaskIdentifier` DB operations, but isolates the Redis cache layer and legacy query fallback as separate concerns not exercised in this test file. Do not flag these mocks as violations of the no-mocks policy in future reviews.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-16T14:21:15.229Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/components/logs/LogsTaskFilter.tsx:135-163
Timestamp: 2026-04-16T14:21:15.229Z
Learning: In `triggerdotdev/trigger.dev` PR `#3368`, the `TaskIdentifier` table has a `@unique([runtimeEnvironmentId, slug])` DB constraint, guaranteeing one row per (environment, slug). In components like `apps/webapp/app/components/logs/LogsTaskFilter.tsx` and `apps/webapp/app/components/runs/v3/RunFilters.tsx`, using `key={item.slug}` for SelectItem list items is correct and unique. Do NOT flag `key={item.slug}` as potentially non-unique — the old duplicate-(slug, triggerSource) issue only existed with the legacy `DISTINCT` query, which this registry replaces.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-13T21:44:00.644Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:49-58
Timestamp: 2026-04-13T21:44:00.644Z
Learning: In `triggerdotdev/trigger.dev`, a deployment with zero tasks is not a realistic scenario in practice. Do not flag missing handling for empty-task deployments in `apps/webapp/app/services/taskIdentifierRegistry.server.ts` or related registry/sync logic.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-03-22T19:34:22.737Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsx:99-103
Timestamp: 2026-03-22T19:34:22.737Z
Learning: In `apps/webapp/app/presenters/v3/ErrorsListPresenter.server.ts`, the `statuses` filter (UNRESOLVED | RESOLVED | IGNORED) is applied in-memory after the ClickHouse query and a batch Postgres lookup via `getErrorGroupStates`. This is intentional: `status` lives in the Postgres `ErrorGroupState` table, not in ClickHouse, so post-query filtering is the correct approach. Do not flag this as a missing predicate or a no-op filter during code review.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-15T15:39:06.868Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-15T15:39:06.868Z
Learning: For V1/V2 branching in services within `apps/webapp/app/v3/`, only modify V2 code paths. All new work uses Run Engine 2.0 (`internal/run-engine`) and redis-worker, not legacy V1 code (MarQS queue, triggerTaskV1, cancelTaskRunV1, etc.).

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-16T14:19:16.309Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.309Z
Learning: Applies to apps/webapp/{app/v3/services/triggerTask.server.ts,app/v3/services/batchTriggerV3.server.ts} : In `triggerTask.server.ts` and `batchTriggerV3.server.ts`, do NOT add database queries. Task defaults (TTL, etc.) are resolved via `backgroundWorkerTask.findFirst()` in the queue concern (`queues.server.ts`). Piggyback on the existing query instead of adding new ones

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-16T14:19:16.309Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.309Z
Learning: Applies to apps/webapp/app/v3/services/{cancelTaskRun,batchTriggerV3}.server.ts : When editing services that branch on `RunEngineVersion` to support both V1 and V2 (e.g., `cancelTaskRun.server.ts`, `batchTriggerV3.server.ts`), only modify V2 code paths

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-04-07T14:12:59.018Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3331
File: apps/webapp/app/runEngine/concerns/batchPayloads.server.ts:112-136
Timestamp: 2026-04-07T14:12:59.018Z
Learning: In `apps/webapp/app/runEngine/concerns/batchPayloads.server.ts`, the `pRetry` call wrapping `uploadPacketToObjectStore` intentionally retries **all** error types (no `shouldRetry` filter / `AbortError` guards). The maintainer explicitly prefers over-retrying to under-retrying because multiple heterogeneous object store backends are supported and it is impractical to enumerate all permanent error signatures. Do not flag this as an issue in future reviews.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-03-22T13:26:12.060Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-03-22T19:24:14.403Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

📚 Learning: 2026-03-29T19:16:28.864Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

apps/webapp/app/v3/eventRepository/index.server.ts

🔇 Additional comments (1)

apps/webapp/app/v3/eventRepository/index.server.ts (1)

24-33: Good extraction for org-scoped repository resolution.

resolveEventRepositoryForStore() cleanly centralizes org-aware ClickHouse lookup with Prisma fallback.

The only way to get a ClickHouse client now is through the factory. Refactored all existing code to use that and pass in an org. The runReplication and otlpExporter are the hot paths here which need special attention in reviews.

…resenter clients Co-Authored-By: Matt Aitken <matt@mattaitken.com>

Co-Authored-By: Matt Aitken <matt@mattaitken.com>

The factory should not import eventRepository.server.ts — doing so pulls the tracePubSub singleton into any module graph that imports the factory, which eagerly connects to Redis at module load time (see singleton.ts). The fallback now lives in index.server.ts via getEventRepositoryForStore, which is called from RunPresenter, SpanPresenter, and recordRunEvent. This restores runsReplicationBenchmark.test.ts's module isolation while still handling non-ClickHouse ("taskEvent"/"postgres") stores. Co-Authored-By: Matt Aitken <matt@mattaitken.com>

Sort findMany() by `key` for a stable winner when multiple rows assign the same `${orgId}:${kind}`, and log an error identifying the winning and ignored rows instead of overwriting silently. Does not fail the load — failing the registry would break every customer, not just the misconfigured orgs. Co-Authored-By: Matt Aitken <matt@mattaitken.com>

…tries

…into tests

…stry.server.ts Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration

Devin Review found 3 new potential issues.

View 13 additional findings in Devin Review.

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

deepshekhardas pushed a commit to deepshekhardas/trigger.dev that referenced this pull request Apr 7, 2026

chore: add changeset for org-scoped ClickHouse (triggerdotdev#3333)

ffc9e6e

matt-aitken force-pushed the org-scoped-clickhouse branch from a60f2cb to 6179dc7 Compare April 15, 2026 14:02

This comment was marked as resolved.

Sign in to view

matt-aitken force-pushed the org-scoped-clickhouse branch from 6179dc7 to 66f294e Compare April 15, 2026 16:52

This comment was marked as resolved.

Sign in to view

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

matt-aitken added 18 commits April 17, 2026 11:17

Implement organization-scoped ClickHouse instances

4d2a672

The only way to get a ClickHouse client now is through the factory. Refactored all existing code to use that and pass in an org. The runReplication and otlpExporter are the hot paths here which need special attention in reviews.

Better replication performance

357252a

Removed dynamic imports

db8b8a7

otlpExporter.server reverted to main

8c03961

Switch to a DataStore registry

77949e8

Admin page for adding data stores

c3ab67f

New admin page, lots of improvements to make it more robust and testable

c7d425e

Use the clickhouseFactory directly. WIP on new event repository

8bef49d

Errors switched to using org-specific clickhouses

afb7df9

RunPresenter: use org event repository

ad30843

SpanPresenter: use org event repository

925376e

Move metrics sending to the event repository

b107030

We don't need separate stores for metrics

b7dc66a

Back to main runReplicationService

d6e8b60

Run Replication using the factory

84ae9ba

Get the tracer from the provider

78e4c66

Use resolveEventRepositoryForStore everywhere

0396ea7

Admin data stores editing

bac353a

matt-aitken and others added 5 commits April 17, 2026 11:18

Reload the data store every minute

247714c

Work with Postgres EventRepository for self-hosters

1217be2

Error fingerprint should use the logs client

ce15a8a

Retry the initial boot using p-retry

023154b

Fix Devin Review bugs: postgres event repository fallback and error p…

5cd8588

…resenter clients Co-Authored-By: Matt Aitken <matt@mattaitken.com>

matt-aitken force-pushed the org-scoped-clickhouse branch from de2996d to 5cd8588 Compare April 17, 2026 10:18

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot and others added 7 commits April 17, 2026 10:46

Fall back to Postgres eventRepository for non-ClickHouse stores

b1694a3

Co-Authored-By: Matt Aitken <matt@mattaitken.com>

try/catch getting the clickhouse client

fd4e207

Better handling for no org id on task run

95dc130

Test for deterministic data store resolution if there are multiple en…

c28ee49

…tries

Separate the clickhouseFactoryInstance so the db client isn't pulled …

6b3fcd3

…into tests

matt-aitken marked this pull request as ready for review April 17, 2026 13:11

Use the same ClickHouse settings for org specific clients

d55216b

This comment was marked as resolved.

Sign in to view

Update apps/webapp/app/services/dataStores/organizationDataStoresRegi…

9a7fe81

…stry.server.ts Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration bot reviewed Apr 17, 2026

View reviewed changes

Comment thread apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts

Comment thread apps/webapp/app/routes/admin.data-stores.tsx

Comment thread apps/webapp/app/services/clickhouse/clickhouseFactory.server.ts

Uh oh!

Conversation

matt-aitken commented Apr 7, 2026

Uh oh!

changeset-bot bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericallam Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Apr 7, 2026 •

edited

Loading

coderabbitai bot commented Apr 7, 2026 •

edited

Loading

coderabbitai bot Apr 7, 2026 •

edited

Loading