Add refinement types (and NormalizedVector)#7614
Closed
connortsui20 wants to merge 6 commits intodevelopfrom
Closed
Add refinement types (and NormalizedVector)#7614connortsui20 wants to merge 6 commits intodevelopfrom
NormalizedVector)#7614connortsui20 wants to merge 6 commits intodevelopfrom
Conversation
cfd8f52 to
aed8bf9
Compare
NormalizedExtensionNormalizedExtension)
NormalizedExtension)NormalizedVector)
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Adds a sibling extension type to Vector over FixedSizeList<float, N>, tagged vortex.tensor.normalized_vector, whose rows are guaranteed unit-norm or zero. - New types/normalized_vector/ module with the ExtVTable impl, constructors (try_new validates; unsafe new_unchecked for lossy encodings), and the AnyNormalizedVector strict matcher. - AnyVector now accepts both Vector and NormalizedVector; VectorMatcherMetadata gains an is_normalized bit so callers can distinguish. - Both types share validate_vector_storage_dtype, so the storage-shape contract stays in one place. - Register NormalizedVector in the session and re-export the module. No scalar fn or encoding behaviour changes; that wiring comes in a follow-up commit. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…boquant
- L2Denorm::try_new_array now validates + promotes a plain Vector child to
NormalizedVector so downstream operators can rely on the type-level invariant.
- Replace DenormOrientation with NormalForm { Plain, Normalized, Denormalized },
and route CosineSimilarity, InnerProduct, and L2Norm through it. Cosine of
two normalized inputs collapses to a plain dot, L2Norm of a NormalizedVector
short-circuits to a constant 1.0.
- SorfTransform accepts both Vector and NormalizedVector children (via the
widened AnyVector matcher) and always produces a plain Vector output, since
the inverse transform does not preserve unit norm.
- Split turboquant_encode_unchecked into turboquant_encode_normalized, which
takes an AnyNormalizedVector extension view and drops the unsafe contract in
favour of a type-level precondition. turboquant_encode normalizes up front
and forwards. Update the sole out-of-crate caller in
vortex/benches/single_encoding_throughput.rs.
- Add the normalized_vector_array test helper in utils.rs for the new
scalar-fn tests.
- Regenerate vortex-tensor/public-api.lock.
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…r Vector NormalizedVector previously sat alongside Vector as a sibling ExtVTable with the same FSL storage. Promote it to a true RefinementVTable<Source = ExtRefinedSource<Vector>> so its storage dtype is Extension(Vector(FSL)) and the type system records it as a refinement of Vector rather than a lookalike. - AnyVector goes back to strictly matching plain Vector; AnyNormalizedVector is the matcher for the refinement. TensorMatch gains a NormalizedVector variant and AnyTensor considers all three families. - NormalizedVector constructors build an inner Vector extension first, then the outer refinement. wrap_vector_unchecked is the new entry point for callers that already have a validated Vector in hand. - Add inner_vector_array / vector_fsl_storage_dtype helpers so scalar fns and TurboQuant can drill past the extra extension layer when they need FSL. - L2Denorm::try_new_array promotes plain Vector children via wrap_vector_unchecked after validation; turboquant_encode_normalized drills to the FSL before reading elements. - try_build_constant_l2_denorm is gated on AnyVector (not AnyTensor) so FixedShapeTensor constants stay in the generic cosine path rather than getting wrapped as a NormalizedVector of the wrong family. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…nement
Replaces the separate RefinementVTable trait hierarchy (with a typed Source
associated type, two source markers, and a blanket ExtVTable impl) with a single
defaulted method on ExtVTable:
fn is_refinement(&self) -> bool { false }
When true, the vtable declares that its storage dtype is the type it refines, so
scalar-fn dispatch can transparently peel the refinement when a fn does not
accept it. The peel rule itself lands in a follow-up commit; this one is a pure
trait refactor.
- Delete vortex-array/src/dtype/extension/refinement.rs entirely
(RefinementVTable, RefinedSource, PrimitiveRefinedSource, ExtRefinedSource,
refine_array_scalar_default, and the blanket impl).
- Convert DivisibleInt and EvenDivisibleInt test extensions from RefinementVTable
to plain ExtVTable. Divisibility and even-ness checks now run in unpack_native;
the RefinementVTable-specific validate_array override test is dropped because
there is no analogue in the new design.
- Convert NormalizedVector from RefinementVTable to plain ExtVTable. Storage
layout is unchanged (still Extension(Vector, FSL<float, N>)); validate_dtype
confirms the inner Vector extension, unpack_native forwards the storage value
through untouched, and the bulk unit-norm check stays in try_new as before.
- Regenerate vortex-array/public-api.lock and vortex-tensor/public-api.lock.
No behavioural changes. All 308 vortex-tensor tests (307 prior + one new
is_refinement_is_true) pass.
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…truction
When a scalar fn is given a refinement-typed input that its `return_dtype`
rejects, the framework now transparently peels the refinement one level at a
time until the fn accepts the shape. This lands in `ScalarFnArray::try_new`
rather than as a reduce rule because `try_new` already calls `return_dtype`
and aborts on error — no ScalarFnArray tree is ever constructed to reduce when
a refinement input is rejected.
Algorithm (see `peel_refinements_and_resolve_dtype` for details):
1. Compute `scalar_fn.return_dtype(arg_dtypes)` on the current children.
2. If it succeeds, done. This covers both non-refinement inputs and fns
that explicitly accept the refinement (category B / C / D from the plan
— specialization path).
3. If it errors, peel one level from every child whose dtype is an
extension dtype with `is_refinement() == true`. Replace each with its
storage array.
4. If no children were peeled, return the original error.
5. Otherwise, loop back to step 1 with the peeled children. Multi-level
refinement chains (e.g. EvenDivisibleInt → DivisibleInt → U64) unwind one
level per iteration.
Implements category A from the plan (refinement-transparent scalar fns): when
a generic fn doesn't know about a refinement, the refinement is lost and the
fn operates on the source storage. Refinement-preserving semantics
(categories C and D) are deferred; the TODO(connor) in
`peel_refinements_and_resolve_dtype` documents the intended direction — an
inverted-control hook on the refinement vtable (rather than per-fn
specialization, which is blocked by the vortex-array → downstream crate
dependency direction).
Exposes `ExtDTypeRef::is_refinement()` as a type-erased forwarder on
`DynExtDType`. No other public API changes.
Four new unit tests in `arrays::scalar_fn::array::tests`:
- peels_single_level_refinement_through_strict_add: `Binary(Add)` over
`DivisibleInt(U64)` succeeds and returns `U64`.
- peels_two_level_refinement_chain_through_strict_add:
`EvenDivisibleInt(DivisibleInt(U64))` unwinds both layers.
- does_not_peel_non_refinement_extension: `Uuid` (is_refinement == false)
is not peeled; the fn's original error surfaces.
- does_not_peel_when_scalar_fn_accepts_refinement: `Binary(Eq)` accepts
extension inputs directly, so children retain their refinement dtypes.
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
cbc8dfb to
3913b1e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I might separate this into 2 PRs
Summary
Closes: #000
Testing