Add builder-based execution path (AppendSlot + builder kernels)#7600
Add builder-based execution path (AppendSlot + builder kernels)#7600joseph-isaacs wants to merge 10 commits intodevelopfrom
Conversation
Introduces a builder-driven execution mode alongside the existing
`ExecuteSlot` / `Done` model. Encodings can now append directly into an
`ArrayBuilder` without materializing intermediate canonical arrays.
New types (in `vortex_array::builder_kernel`):
* `BuilderStep { Done, ExecuteSlot, AppendSlot }`
* `BuilderResult { array, builder, step }`
* `AppendToBuilderKernel<V>` typed kernel trait
* `DynAppendToBuilderKernel` type-erased dispatch trait
* `BuilderKernelSession` session variable keyed by encoding id
New executor surface:
* `ExecutionStep::AppendSlot(usize)` to bootstrap the builder path
* `execute_into_builder(array, builder, ctx)` public entry point
* internal `drive_builder` + `execute_until_predicate` helpers
The `Chunked` encoding gets a builder kernel that scans its chunk slots,
returning `BuilderStep::AppendSlot` for the first still-populated slot.
When a chunk is driven through the builder path the executor takes the
slot, leaving it as `None` — the kernel iterates to the next `Some` on
the next call and returns `Done` once all chunks are consumed.
Tests exercise both `Chunked(Primitive)` and nested
`Chunked(Chunked(Primitive))` through the builder path.
Known issue documented via comment in `drive_builder`: the plan relies on
`take_slot_unchecked` truly leaving the taken slot as `None`, but that
only happens when the `Arc` has refcount 1. The driver now explicitly
drops extra references after `optimize()` and passes arrays by value
through the kernel to keep refcount at one; any future caller that keeps
an outstanding clone of the array they hand to the builder path would
re-trigger the original infinite loop.
Signed-off-by: Claude <noreply@anthropic.com>
Previously `take_slot_unchecked` only actually mutated the parent when the `Arc` had refcount 1; otherwise it cloned the child and left the slot as `Some`. That was fine for the existing take->put-back pattern but broke the builder path, which relies on `slot(i).is_none()` to track consumed children (causing an infinite loop whenever a caller held an outstanding clone of the array). Fix: on the shared-Arc path, clone the child out and construct a fresh parent with that slot set to None via a new unsafe `DynArray::with_slots_unchecked` that skips `V::validate`. The None state is invisible outside the executor — callers still must put the slot back (or drive it to completion via the builder path) before the parent escapes. Update the chunked builder-kernel tests to deliberately hold an extra clone of the array across the `execute_into_builder` call, exercising the previously-broken shared-Arc path. Signed-off-by: Claude <noreply@anthropic.com>
`Executable for ArrayRef::execute` is defined as "remove one node from the execution path." `ExecutionStep::AppendSlot` is an exception: there is no useful partial progress point inside a builder-driven sub-tree, so single-step execute drives the slot all the way to canonical and splices the result back in one call. Call that out on both the trait impl and the `AppendSlot` variant so the collapsed semantics aren't a surprise to callers. Signed-off-by: Claude <noreply@anthropic.com>
…-execution-path-0qvq2 Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> # Conflicts: # vortex-array/src/executor.rs
Merging this PR will degrade performance by 37.99%
Performance ChangesComparing |
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.980x ➖ datafusion / vortex-file-compressed (0.980x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.989x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.013x ➖, 0↑ 0↓)
datafusion / parquet (0.999x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.004x ➖, 1↑ 1↓)
duckdb / vortex-compact (1.010x ➖, 0↑ 0↓)
duckdb / parquet (1.021x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.914x ➖, 34↑ 1↓)
datafusion / vortex-compact (0.928x ➖, 24↑ 0↓)
datafusion / parquet (0.950x ➖, 17↑ 0↓)
duckdb / vortex-file-compressed (1.001x ➖, 3↑ 2↓)
duckdb / vortex-compact (1.006x ➖, 1↑ 2↓)
duckdb / parquet (1.005x ➖, 1↑ 2↓)
duckdb / duckdb (1.022x ➖, 1↑ 7↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.854x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.748x ➖, 3↑ 0↓)
datafusion / parquet (0.882x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.945x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.920x ➖, 1↑ 0↓)
duckdb / parquet (0.910x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.971x ➖, 2↑ 0↓)
datafusion / vortex-compact (1.102x ❌, 0↑ 15↓)
datafusion / parquet (0.981x ➖, 0↑ 0↓)
datafusion / arrow (0.951x ➖, 5↑ 0↓)
duckdb / vortex-file-compressed (1.019x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.031x ➖, 0↑ 2↓)
duckdb / parquet (1.035x ➖, 0↑ 1↓)
duckdb / duckdb (1.043x ➖, 0↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.864x ➖, 2↑ 0↓)
datafusion / vortex-compact (0.853x ➖, 4↑ 1↓)
datafusion / parquet (0.915x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.929x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.004x ➖, 0↑ 0↓)
duckdb / parquet (0.939x ➖, 0↑ 0↓)
Full attributed analysis
|
BENCHMARK FAILEDBenchmark |
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.733x ➖, 8↑ 0↓)
datafusion / vortex-compact (0.914x ➖, 0↑ 0↓)
datafusion / parquet (0.844x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.985x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.934x ➖, 0↑ 0↓)
duckdb / parquet (0.956x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.954x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.958x ➖, 0↑ 0↓)
datafusion / parquet (0.976x ➖, 2↑ 1↓)
datafusion / arrow (0.976x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.991x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.985x ➖, 0↑ 0↓)
duckdb / parquet (0.924x ➖, 6↑ 0↓)
duckdb / duckdb (0.979x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
BENCHMARK FAILEDBenchmark |
No description provided.