Remove sqllogictest fork swap from regenerate_sqlite_files.sh#21578
Remove sqllogictest fork swap from regenerate_sqlite_files.sh#21578RafaelHerrero wants to merge 1 commit intoapache:mainfrom
Conversation
The Omega359 fork features (valuewise result mode, valuesort, custom normalizer) were upstreamed to risinglightdb/sqllogictest-rs in v0.24.0 (PR risinglightdb/sqllogictest-rs#237). DataFusion already uses upstream v0.29.1 which includes everything, so the fork swap is no longer needed.
alamb
left a comment
There was a problem hiding this comment.
Thank you for this @RafaelHerrero . I tested running this script locally like
PG_URI=postgresql://postgres@localhost:5432/postgres nice ./datafusion/sqllogictest/regenerate_sqlite_files.sh
And it made a bunch of errors like this:
External error: task 948 panicked with message "called `Result::unwrap()` on an `Err` value: ParseError { kind: InvalidLine(\"onlyif mysql # use DIV operator for integer division\"), loc: Location { file: \"../../datafusion-testing/data/sqlite/random/select/slt_good_99.slt\", line: 60, upper: None } }"
Error: Execution("428 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
Caused by:
process didn't exit successfully: `/Users/andrewlamb/Software/datafusion2/target/release-nonlto/deps/sqllogictests-a999e2d90c5771c0 --include-sqlite --postgres-runner --complete` (exit status: 1)
Completion of sqlite test files failed!
Cleaning up source code changes and temporary files and directories
Do you have any idea what is going on?
maybe @Omega359 has some ideas as he was the original author of this script
|
I thought I replied to this already but apparently hitting the 'comment' button was too much for me the other day :/ I think the fork does more than what was upstreamed though I'd have to diff the code at this point to know what exactly I changed. Ideally, yes I'd like to not have a fork for this but I haven't had the time nor desire to see what that would involve. |
|
Thanks for the feedback @alamb @Omega359! I diffed the fork (73c47cf7) against upstream v0.29.1 to understand the full delta. Here's what I found: Parsing is identical. The onlyif/skipif condition parsing uses the same 2-token pattern (["onlyif", label]) in both versions, and lines are tokenized the same way (split_whitespace). A line like onlyif mysql # use DIV operator for integer division produces 9 tokens, doesn't match the 2-token pattern, and hits InvalidLine in both versions. The 428 parse errors come from the sd regex cleanup in the script not catching all onlyif mysql blocks — the fixed-line-count patterns (3-12 lines + blank line) miss blocks with different sizes or edge cases. This is a script bug, not a parser compatibility issue. The fork is actually behind upstream — it's missing Record::Let, ExpectedError::SqlState, slt:ignore marker, and Partitioner that were added to upstream after the fork was created. The fork's real addition is in runner.rs: a dual-runner update_test_file that takes an optional second runner (Postgres). During completion, it runs each query on both DataFusion and Postgres, auto-updates expected results when they match, and adds diagnostic comments about mismatches (type differences, result differences, errors). It also creates .bak backup files. The upstream update_test_file only supports a single runner. I plan to fix the sd cleanup by replacing the fragile regex patterns with a proper block-removal approach. But before removing the fork entirely: @Omega359 — is the dual-runner completion (DataFusion + Postgres comparison) essential to the regeneration workflow, or would single-runner Postgres completion produce equivalent results? I want to make sure we're not losing something important. |
|
The dual runner was to validate results against a 'known good' database when generating the slt files. The results that came out of sqlite in a few cases didn't match up with what df nor pg resulted in thus I wanted the verification step. There is logic in there for handling double/float similarity iirc too since that often is different after 3 or 4 decimal places. Removing that functionality I don't think would be a great idea overall as having the validation of the results is a pretty useful feature. |
Which issue does this PR close?
Related to #6543 and #21260
Rationale for this change
The
regenerate_sqlite_files.shscript swaps in a forked version ofsqllogictest-rs(Omega359/sqllogictest-rs v0.27.2) before running test completion. This fork added valuewise result mode, valuesort, and custom normalizer support.These features were upstreamed to
risinglightdb/sqllogictest-rsin v0.24.0 (PR #237). DataFusion already uses upstream v0.29.1 which includes everything, so the fork swap is no longer needed.What changes are included in this PR?
Cargo.tomldependency swap to the Omega359 fork (line 167)bin/sqllogictests.rsfile swap with the custom regenerate version (line 172)Are these changes tested?
The script has valid bash syntax. The change is purely removing dead code — the fork features are already available in the upstream version DataFusion uses.
Are there any user-facing changes?
No.