fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation#878
fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation#878johntmyers merged 2 commits intomainfrom
Conversation
…uation The L7 REST proxy evaluated OPA path rules against the raw request-target and forwarded the raw header block byte-for-byte to the upstream. The upstream normalized `..`, `%2e%2e`, and `//` before dispatching, so a compromised agent inside the sandbox could bypass any path-based allow rule by prefixing the blocked path with an allowed one (e.g. `GET /public/../secret` matches `/public/**` at the proxy but the upstream serves `/secret`). The inference-routing detection had the same normalization gap via `normalize_inference_path`, which only stripped scheme+authority and preserved dot-segments verbatim. - Add `crates/openshell-sandbox/src/l7/path.rs` with `canonicalize_request_target`. Percent-decodes (with uppercase hex re-emission), resolves `.` / `..` segments per RFC 3986 5.2.4, collapses doubled slashes, strips trailing `;params`, rejects fragments / raw non-ASCII / control bytes / encoded slashes (unless explicitly enabled per-endpoint), and returns a `rewritten` flag so callers know when to rewrite the outbound request line. - Wire the canonicalizer into `rest.rs::parse_http_request`. The canonical path is the `target` on `L7Request` that OPA sees, and when the canonical form differs from the raw input the request line is rebuilt in `raw_header` so the upstream dispatches on the exact bytes the policy evaluated. - Canonicalize the forward (plain HTTP proxy) path at `proxy.rs`'s second L7 evaluation site. A non-canonical request-target is rejected with 400 Bad Request and an OCSF `NetworkActivity` Fail event rather than silently passed through. - Replace the old `normalize_inference_path` body with a call to the canonicalizer. On canonicalization error the raw path is returned (so inference detection simply misses and the normal forward path handles and rejects). - Document the invariant in `sandbox-policy.rego`: `input.request.path` is pre-canonicalized so rules must not attempt defensive matching against `..` or `%2e%2e`. - Architecture doc (`architecture/sandbox.md`) lists the new module. - 24 new unit tests cover dot segments, percent-encoded dot segments, double-slash collapse, encoded-slash reject/opt-in, null/control byte rejection, legitimate percent-encoded bytes round-tripping, mixed-case percent normalization, fragment rejection, absolute-form stripping, empty / length-bounded / non-ASCII handling, and regression payloads for the specific bypasses above. Tracks OS-99.
|
+1 on security merit. Path-based OPA rules are load-bearing for binary-scoped network policies in real deployments, and |
|
From description:
From what I can tell, only requests that the canonicalization fails for are rejected, but non-canonical requests that pass canonicalization are accepted. This would be more accurate: "Requests that fail canonizalication are hard-rejected regardless of EnforcementMode::Audit..." |
I was looking for things this could potentially break. One example is GitLab that uses That would mean mean that:
|
|
Security review — one finding I'd like to raise before merge, plus a note on tests. Finding:
|
…encoded_slash
Addresses two review findings on the L7 path canonicalization change.
1. `handle_forward_proxy` previously evaluated OPA on the canonical path
but wrote the raw path to the upstream via `rewrite_forward_request`,
leaving the parser-differential open for plain-HTTP forward-proxy
traffic even though it was closed for the L7 tunnel path. `path` is
now reassigned to the canonical form inside the L7 block before the
outbound write, so the bytes dispatched to the upstream match what
OPA evaluated. Covered by a new regression test against
`rewrite_forward_request`.
2. `allow_encoded_slash` is now wired through the YAML policy schema,
the proto `NetworkEndpoint` message, the Rego endpoint config, and
the `L7EndpointConfig` → `RestProvider` canonicalization options.
Operators can opt in per-endpoint for upstreams like GitLab that
embed `%2F` in namespaced resource paths; the default remains
strict. The `RestProvider` gains a constructor
`RestProvider::with_options` so `relay_rest` constructs a provider
with per-endpoint options while `relay_passthrough_with_credentials`
retains strict defaults.
Integration tests:
- `parse_http_request_canonicalizes_target_and_rewrites_raw_header`
drives a non-canonical request through the parser and asserts both
`L7Request.target` (what OPA evaluates) and `raw_header` (what the
upstream receives) are canonical.
- `parse_http_request_canonicalization_preserves_query_string`,
`parse_http_request_leaves_canonical_input_byte_for_byte`,
`parse_http_request_rejects_traversal_above_root`,
`parse_http_request_preserves_http_10_version_on_rewrite`.
- `parse_http_request_accepts_encoded_slash_when_endpoint_opts_in` /
`_rejects_encoded_slash_by_default` verify the
`allow_encoded_slash` gate.
- `test_rewrite_forward_request_uses_canonical_path_on_the_wire`
asserts the fix to the forward-proxy flow.
- `parse_l7_config_allow_encoded_slash_{defaults_false,opt_in}` verify
the YAML/Rego wiring.
|
Thanks for the careful review — all three points addressed in cb87e28. Replying inline so each fix maps back to what it's responding to.
Good catch — my wording conflated the two cases. Updated the PR description to
Fair — "follow-up" was the wrong answer given you can hit this today. Wired through end to end:
GitLab usage: set
Confirmed the gap, thanks — this was real. Went with Option A:
Also fair. Added end-to-end assertions:
524 sandbox + 47 policy tests pass. |
Summary
..,%2e%2e, and//before dispatching, so a compromised agent inside the sandbox can bypass any path-based allow rule — e.g.GET /public/../secretmatches/public/**at the proxy but the upstream serves/secret.normalize_inference_path) had the same normalization gap: it only stripped scheme+authority and preserved dot-segments verbatim, so the same class of payload could also dodge inference routing.closes OS-99
Changes
crates/openshell-sandbox/src/l7/path.rs(new)canonicalize_request_target(target, opts) -> Result<(CanonicalPath, Option<String> /* raw query */), CanonicalizeError>../..per RFC 3986 5.2.4 (rejects underflow rather than silently clamping to/), collapses doubled slashes, strips trailing;params, handles origin-form and absolute-form targets, enforces a 4 KiB length bound, rejects fragments / raw non-ASCII / control bytes / encoded slashes (unless explicitly opted in per-endpoint viaallow_encoded_slash).rewrittenflag so callers know whether the outbound request line needs to be rebuilt.crates/openshell-sandbox/src/l7/rest.rsparse_http_requestcanonicalizes the request-target before returning theL7Request. If the canonical form differs from the raw input, the request line inraw_headeris rebuilt viarewrite_request_line_targetso the upstream dispatches on the same bytes the policy evaluated.RestProvidernow carries aCanonicalizeOptionsfield. Callers that need per-endpoint options construct viaRestProvider::with_options(opts); default strict behavior viaRestProvider::default().parse_query_paramsis nowpub(crate)soproxy.rscan reuse it.crates/openshell-sandbox/src/l7/relay.rsrelay_restbuilds itsRestProviderfrom the endpoint'sallow_encoded_slash, so L7 endpoints backed by upstreams that legitimately embed%2F(e.g. GitLab namespaced paths) can opt in per endpoint.relay_passthrough_with_credentialsuses the strict default.crates/openshell-sandbox/src/l7/mod.rsL7EndpointConfiggainsallow_encoded_slash: bool(default false).parse_l7_configreads it from the Rego endpoint object.crates/openshell-policy/src/lib.rs+proto/sandbox.protoNetworkEndpointDef(YAML),NetworkEndpoint(proto) gain theallow_encoded_slashfield;to_protocopies it. Operators setallow_encoded_slash: trueon an endpoint in YAML to opt in.crates/openshell-sandbox/src/proxy.rspathso the laterrewrite_forward_requestcall writes canonical bytes to the upstream (closes the parser-differential for this site — previously only the L7 tunnel was closed). Non-canonical input is denied with 400 Bad Request and an OCSFNetworkActivityFail event.normalize_inference_pathis now a thin wrapper over the canonicalizer (falls back to the raw path on canonicalization error so the normal forward path can reject properly).crates/openshell-sandbox/data/sandbox-policy.regoinput.request.pathis pre-canonicalized, so rules must not attempt defensive matching against..or%2e%2e.architecture/sandbox.mdl7/path.rsmodule and notes theallow_encoded_slashper-endpoint opt-in.Testing
cargo test -p openshell-sandbox— 524 lib + integration tests pass.cargo test -p openshell-policy— 47 tests pass.mise run pre-commit— passes (lint, format, license headers, rust tests).l7::path::tests(24) cover dot segments, percent-encoded dot segments, double-slash collapse, encoded-slash reject/opt-in, null/control byte rejection, fragment rejection, absolute-form stripping, legitimate percent-encoded bytes round-tripping, mixed-case percent normalization, length bound, non-ASCII rejection, query-suffix separation, rewritten-flag semantics.l7::rest::tests(9 new):parse_http_request_canonicalizes_target_and_rewrites_raw_headerasserts bothL7Request.target(what OPA sees) andraw_header(what the upstream receives) are canonical.parse_http_request_canonicalization_preserves_query_string,parse_http_request_leaves_canonical_input_byte_for_byte,parse_http_request_preserves_http_10_version_on_rewrite.parse_http_request_accepts_encoded_slash_when_endpoint_opts_in/_rejects_encoded_slash_by_defaultverify theallow_encoded_slashgate end-to-end.parse_http_request_rejects_traversal_above_root.proxy::tests::test_rewrite_forward_request_uses_canonical_path_on_the_wire— regression test asserting that once the caller canonicalizes,rewrite_forward_requestproduces canonical bytes on the wire.l7::tests::parse_l7_config_allow_encoded_slash_{defaults_false,opt_in}verify the YAML/proto/Rego wiring of the opt-in./public/../secret→/secret/public/%2e%2e/secret→/secret/public/%2E%2E%2Fsecret→EncodedSlashNotAllowed//public/../secret→/secret/public/./../secret→/secret/public/%00/secret→NullOrControlByte/..→TraversalAboveRoot/files/hello%20world.txt,/users/caf%C3%A9,/v1/chat/completions?stream=true, paths with:and@in segments.Trade-offs
%2Finside a segment is rejected by default. Operators who need it for artifact-registry-style APIs (e.g. GitLab namespaced project paths) setallow_encoded_slash: trueon the endpoint in their YAML policy.EnforcementMode::Audit— audit mode is for policy-tuning, not for tolerating protocol violations. Requests that succeed canonicalization are rewritten (if non-canonical) and then handled per the endpoint's enforcement mode as usual.%20and similar legitimate percent-encoded bytes are preserved in the canonical form (only unreserved pchars get decoded), so policy patterns that currently match/files/hello%20world.txtcontinue to work.Checklist