Add refinement types (and `NormalizedVector`) by connortsui20 · Pull Request #7614 · vortex-data/vortex

connortsui20 · 2026-04-23T15:42:28Z

I might separate this into 2 PRs

Summary

Closes: #000

Testing

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Adds a sibling extension type to Vector over FixedSizeList<float, N>, tagged vortex.tensor.normalized_vector, whose rows are guaranteed unit-norm or zero. - New types/normalized_vector/ module with the ExtVTable impl, constructors (try_new validates; unsafe new_unchecked for lossy encodings), and the AnyNormalizedVector strict matcher. - AnyVector now accepts both Vector and NormalizedVector; VectorMatcherMetadata gains an is_normalized bit so callers can distinguish. - Both types share validate_vector_storage_dtype, so the storage-shape contract stays in one place. - Register NormalizedVector in the session and re-export the module. No scalar fn or encoding behaviour changes; that wiring comes in a follow-up commit. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…boquant - L2Denorm::try_new_array now validates + promotes a plain Vector child to NormalizedVector so downstream operators can rely on the type-level invariant. - Replace DenormOrientation with NormalForm { Plain, Normalized, Denormalized }, and route CosineSimilarity, InnerProduct, and L2Norm through it. Cosine of two normalized inputs collapses to a plain dot, L2Norm of a NormalizedVector short-circuits to a constant 1.0. - SorfTransform accepts both Vector and NormalizedVector children (via the widened AnyVector matcher) and always produces a plain Vector output, since the inverse transform does not preserve unit norm. - Split turboquant_encode_unchecked into turboquant_encode_normalized, which takes an AnyNormalizedVector extension view and drops the unsafe contract in favour of a type-level precondition. turboquant_encode normalizes up front and forwards. Update the sole out-of-crate caller in vortex/benches/single_encoding_throughput.rs. - Add the normalized_vector_array test helper in utils.rs for the new scalar-fn tests. - Regenerate vortex-tensor/public-api.lock. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…r Vector NormalizedVector previously sat alongside Vector as a sibling ExtVTable with the same FSL storage. Promote it to a true RefinementVTable<Source = ExtRefinedSource<Vector>> so its storage dtype is Extension(Vector(FSL)) and the type system records it as a refinement of Vector rather than a lookalike. - AnyVector goes back to strictly matching plain Vector; AnyNormalizedVector is the matcher for the refinement. TensorMatch gains a NormalizedVector variant and AnyTensor considers all three families. - NormalizedVector constructors build an inner Vector extension first, then the outer refinement. wrap_vector_unchecked is the new entry point for callers that already have a validated Vector in hand. - Add inner_vector_array / vector_fsl_storage_dtype helpers so scalar fns and TurboQuant can drill past the extra extension layer when they need FSL. - L2Denorm::try_new_array promotes plain Vector children via wrap_vector_unchecked after validation; turboquant_encode_normalized drills to the FSL before reading elements. - try_build_constant_l2_denorm is gated on AnyVector (not AnyTensor) so FixedShapeTensor constants stay in the generic cosine path rather than getting wrapped as a NormalizedVector of the wrong family. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…nement Replaces the separate RefinementVTable trait hierarchy (with a typed Source associated type, two source markers, and a blanket ExtVTable impl) with a single defaulted method on ExtVTable: fn is_refinement(&self) -> bool { false } When true, the vtable declares that its storage dtype is the type it refines, so scalar-fn dispatch can transparently peel the refinement when a fn does not accept it. The peel rule itself lands in a follow-up commit; this one is a pure trait refactor. - Delete vortex-array/src/dtype/extension/refinement.rs entirely (RefinementVTable, RefinedSource, PrimitiveRefinedSource, ExtRefinedSource, refine_array_scalar_default, and the blanket impl). - Convert DivisibleInt and EvenDivisibleInt test extensions from RefinementVTable to plain ExtVTable. Divisibility and even-ness checks now run in unpack_native; the RefinementVTable-specific validate_array override test is dropped because there is no analogue in the new design. - Convert NormalizedVector from RefinementVTable to plain ExtVTable. Storage layout is unchanged (still Extension(Vector, FSL<float, N>)); validate_dtype confirms the inner Vector extension, unpack_native forwards the storage value through untouched, and the bulk unit-norm check stays in try_new as before. - Regenerate vortex-array/public-api.lock and vortex-tensor/public-api.lock. No behavioural changes. All 308 vortex-tensor tests (307 prior + one new is_refinement_is_true) pass. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…truction When a scalar fn is given a refinement-typed input that its `return_dtype` rejects, the framework now transparently peels the refinement one level at a time until the fn accepts the shape. This lands in `ScalarFnArray::try_new` rather than as a reduce rule because `try_new` already calls `return_dtype` and aborts on error — no ScalarFnArray tree is ever constructed to reduce when a refinement input is rejected. Algorithm (see `peel_refinements_and_resolve_dtype` for details): 1. Compute `scalar_fn.return_dtype(arg_dtypes)` on the current children. 2. If it succeeds, done. This covers both non-refinement inputs and fns that explicitly accept the refinement (category B / C / D from the plan — specialization path). 3. If it errors, peel one level from every child whose dtype is an extension dtype with `is_refinement() == true`. Replace each with its storage array. 4. If no children were peeled, return the original error. 5. Otherwise, loop back to step 1 with the peeled children. Multi-level refinement chains (e.g. EvenDivisibleInt → DivisibleInt → U64) unwind one level per iteration. Implements category A from the plan (refinement-transparent scalar fns): when a generic fn doesn't know about a refinement, the refinement is lost and the fn operates on the source storage. Refinement-preserving semantics (categories C and D) are deferred; the TODO(connor) in `peel_refinements_and_resolve_dtype` documents the intended direction — an inverted-control hook on the refinement vtable (rather than per-fn specialization, which is blocked by the vortex-array → downstream crate dependency direction). Exposes `ExtDTypeRef::is_refinement()` as a type-erased forwarder on `DynExtDType`. No other public API changes. Four new unit tests in `arrays::scalar_fn::array::tests`: - peels_single_level_refinement_through_strict_add: `Binary(Add)` over `DivisibleInt(U64)` succeeds and returns `U64`. - peels_two_level_refinement_chain_through_strict_add: `EvenDivisibleInt(DivisibleInt(U64))` unwinds both layers. - does_not_peel_non_refinement_extension: `Uuid` (is_refinement == false) is not peeled; the fn's original error surfaces. - does_not_peel_when_scalar_fn_accepts_refinement: `Binary(Eq)` accepts extension inputs directly, so children retain their refinement dtypes. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/refinement branch from cfd8f52 to aed8bf9 Compare April 23, 2026 18:18

connortsui20 changed the title ~~Add refinement types (and NormalizedExtension~~ Add refinement types (and NormalizedExtension) Apr 23, 2026

connortsui20 changed the title ~~Add refinement types (and NormalizedExtension)~~ Add refinement types (and NormalizedVector) Apr 23, 2026

connortsui20 added 6 commits April 24, 2026 10:00

add refinement type extension vtable

dc69cc8

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/refinement branch from cbc8dfb to 3913b1e Compare April 24, 2026 14:00

connortsui20 closed this Apr 24, 2026

connortsui20 deleted the ct/refinement branch April 24, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add refinement types (and `NormalizedVector`)#7614

Add refinement types (and `NormalizedVector`)#7614
connortsui20 wants to merge 6 commits intodevelopfrom
ct/refinement

connortsui20 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

connortsui20 commented Apr 23, 2026

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant