Skip to content

Add refinement types (and NormalizedVector)#7614

Closed
connortsui20 wants to merge 6 commits intodevelopfrom
ct/refinement
Closed

Add refinement types (and NormalizedVector)#7614
connortsui20 wants to merge 6 commits intodevelopfrom
ct/refinement

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

I might separate this into 2 PRs

Summary

Closes: #000

Testing

@connortsui20 connortsui20 changed the title Add refinement types (and NormalizedExtension Add refinement types (and NormalizedExtension) Apr 23, 2026
@connortsui20 connortsui20 changed the title Add refinement types (and NormalizedExtension) Add refinement types (and NormalizedVector) Apr 23, 2026
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Adds a sibling extension type to Vector over FixedSizeList<float, N>, tagged
vortex.tensor.normalized_vector, whose rows are guaranteed unit-norm or zero.

- New types/normalized_vector/ module with the ExtVTable impl, constructors
  (try_new validates; unsafe new_unchecked for lossy encodings), and the
  AnyNormalizedVector strict matcher.
- AnyVector now accepts both Vector and NormalizedVector; VectorMatcherMetadata
  gains an is_normalized bit so callers can distinguish.
- Both types share validate_vector_storage_dtype, so the storage-shape contract
  stays in one place.
- Register NormalizedVector in the session and re-export the module.

No scalar fn or encoding behaviour changes; that wiring comes in a follow-up
commit.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…boquant

- L2Denorm::try_new_array now validates + promotes a plain Vector child to
  NormalizedVector so downstream operators can rely on the type-level invariant.
- Replace DenormOrientation with NormalForm { Plain, Normalized, Denormalized },
  and route CosineSimilarity, InnerProduct, and L2Norm through it. Cosine of
  two normalized inputs collapses to a plain dot, L2Norm of a NormalizedVector
  short-circuits to a constant 1.0.
- SorfTransform accepts both Vector and NormalizedVector children (via the
  widened AnyVector matcher) and always produces a plain Vector output, since
  the inverse transform does not preserve unit norm.
- Split turboquant_encode_unchecked into turboquant_encode_normalized, which
  takes an AnyNormalizedVector extension view and drops the unsafe contract in
  favour of a type-level precondition. turboquant_encode normalizes up front
  and forwards. Update the sole out-of-crate caller in
  vortex/benches/single_encoding_throughput.rs.
- Add the normalized_vector_array test helper in utils.rs for the new
  scalar-fn tests.
- Regenerate vortex-tensor/public-api.lock.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…r Vector

NormalizedVector previously sat alongside Vector as a sibling ExtVTable with the
same FSL storage. Promote it to a true RefinementVTable<Source = ExtRefinedSource<Vector>>
so its storage dtype is Extension(Vector(FSL)) and the type system records it as
a refinement of Vector rather than a lookalike.

- AnyVector goes back to strictly matching plain Vector; AnyNormalizedVector is
  the matcher for the refinement. TensorMatch gains a NormalizedVector variant
  and AnyTensor considers all three families.
- NormalizedVector constructors build an inner Vector extension first, then the
  outer refinement. wrap_vector_unchecked is the new entry point for callers
  that already have a validated Vector in hand.
- Add inner_vector_array / vector_fsl_storage_dtype helpers so scalar fns and
  TurboQuant can drill past the extra extension layer when they need FSL.
- L2Denorm::try_new_array promotes plain Vector children via
  wrap_vector_unchecked after validation; turboquant_encode_normalized drills
  to the FSL before reading elements.
- try_build_constant_l2_denorm is gated on AnyVector (not AnyTensor) so
  FixedShapeTensor constants stay in the generic cosine path rather than
  getting wrapped as a NormalizedVector of the wrong family.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…nement

Replaces the separate RefinementVTable trait hierarchy (with a typed Source
associated type, two source markers, and a blanket ExtVTable impl) with a single
defaulted method on ExtVTable:

    fn is_refinement(&self) -> bool { false }

When true, the vtable declares that its storage dtype is the type it refines, so
scalar-fn dispatch can transparently peel the refinement when a fn does not
accept it. The peel rule itself lands in a follow-up commit; this one is a pure
trait refactor.

- Delete vortex-array/src/dtype/extension/refinement.rs entirely
  (RefinementVTable, RefinedSource, PrimitiveRefinedSource, ExtRefinedSource,
  refine_array_scalar_default, and the blanket impl).
- Convert DivisibleInt and EvenDivisibleInt test extensions from RefinementVTable
  to plain ExtVTable. Divisibility and even-ness checks now run in unpack_native;
  the RefinementVTable-specific validate_array override test is dropped because
  there is no analogue in the new design.
- Convert NormalizedVector from RefinementVTable to plain ExtVTable. Storage
  layout is unchanged (still Extension(Vector, FSL<float, N>)); validate_dtype
  confirms the inner Vector extension, unpack_native forwards the storage value
  through untouched, and the bulk unit-norm check stays in try_new as before.
- Regenerate vortex-array/public-api.lock and vortex-tensor/public-api.lock.

No behavioural changes. All 308 vortex-tensor tests (307 prior + one new
is_refinement_is_true) pass.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…truction

When a scalar fn is given a refinement-typed input that its `return_dtype`
rejects, the framework now transparently peels the refinement one level at a
time until the fn accepts the shape. This lands in `ScalarFnArray::try_new`
rather than as a reduce rule because `try_new` already calls `return_dtype`
and aborts on error — no ScalarFnArray tree is ever constructed to reduce when
a refinement input is rejected.

Algorithm (see `peel_refinements_and_resolve_dtype` for details):

  1. Compute `scalar_fn.return_dtype(arg_dtypes)` on the current children.
  2. If it succeeds, done. This covers both non-refinement inputs and fns
     that explicitly accept the refinement (category B / C / D from the plan
     — specialization path).
  3. If it errors, peel one level from every child whose dtype is an
     extension dtype with `is_refinement() == true`. Replace each with its
     storage array.
  4. If no children were peeled, return the original error.
  5. Otherwise, loop back to step 1 with the peeled children. Multi-level
     refinement chains (e.g. EvenDivisibleInt → DivisibleInt → U64) unwind one
     level per iteration.

Implements category A from the plan (refinement-transparent scalar fns): when
a generic fn doesn't know about a refinement, the refinement is lost and the
fn operates on the source storage. Refinement-preserving semantics
(categories C and D) are deferred; the TODO(connor) in
`peel_refinements_and_resolve_dtype` documents the intended direction — an
inverted-control hook on the refinement vtable (rather than per-fn
specialization, which is blocked by the vortex-array → downstream crate
dependency direction).

Exposes `ExtDTypeRef::is_refinement()` as a type-erased forwarder on
`DynExtDType`. No other public API changes.

Four new unit tests in `arrays::scalar_fn::array::tests`:

  - peels_single_level_refinement_through_strict_add: `Binary(Add)` over
    `DivisibleInt(U64)` succeeds and returns `U64`.
  - peels_two_level_refinement_chain_through_strict_add:
    `EvenDivisibleInt(DivisibleInt(U64))` unwinds both layers.
  - does_not_peel_non_refinement_extension: `Uuid` (is_refinement == false)
    is not peeled; the fn's original error surfaces.
  - does_not_peel_when_scalar_fn_accepts_refinement: `Binary(Eq)` accepts
    extension inputs directly, so children retain their refinement dtypes.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 deleted the ct/refinement branch April 24, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant