[WIP] Re-implementing Restore to use Pydantic#554
[WIP] Re-implementing Restore to use Pydantic#554ormsbee wants to merge 3 commits intoopenedx:mainfrom
Conversation
2518655 to
cdfb9db
Compare
This is a re-implementation of the restore part of backup_restore, with the goal of making it more robust and maintainable in the long term.
cdfb9db to
22d7143
Compare
|
@bradenmacdonald, @kdmccormick: The WIP (but functioning) new restore code. There are a lot of missing low-level things and tests, so it's not really ready for review. But please let me know if you have any high level concerns with the overall approach. Thank you. |
| for static_file_path in fs.glob(f"{comp_ver_dir}/static/**"): | ||
| if fs.isfile(static_file_path): | ||
| rel_path = os.path.relpath(static_file_path, comp_ver_dir) | ||
| media[rel_path] = f"fs:{static_file_path}" |
There was a problem hiding this comment.
@kdmccormick, @bradenmacdonald: I was worried about storing potentially enormous inline assets in a giant JSON doc, so I'm thinking of building the component version media mapping with this convention:
- "text:" will be simple text
- "base64:" indicates base64 encoded data is to follow
- "fs:" indicates that what follows is a path from the filesystem object we're passing around from the input.
So an example:
{
"block.xml": "text:<problem>(bunch of OLX here)</problem>",
"static/figure1.webp": "fs:entities/xblock.v1/problem/intro-to-everything/component_versions/v2/static/figure1.webp",
"static/logo.svg": "base64:SXMgc29tZW9uZSBjdXJpb3VzIGVub3VnaCB0byBhY3R1YWxseSBkZWNvZGUgdGhpcz8="
}There was a problem hiding this comment.
Sounds nice, but that won't be backwards compatible right?
It's also very similar to data URLs; I like the simplicity here but could be worth considering using standard data URLs since there are already lots of libraries for parsing and generating them. (Although the syntax is not as nice; you have these mandatory , and ; which look weird when omitting MIME types.)
{
"block.xml": "data:,<problem>(bunch of OLX here)</problem>",
"static/logo.svg": "data:;base64,SXMgc29tZW9uZSBjdXJpb3VzIGVub3VnaCB0byBhY3R1YWxseSBkZWNvZGUgdGhpcz8="
}There was a problem hiding this comment.
Sounds nice, but that won't be backwards compatible right?
This would be for the currently internal format after things get extracted and pieced together, but before Pydantic model validation. So it wouldn't break backwards compatibility with anything.
It's also very similar to data URLs;
That occurred to me, but I had only ever used those for base64 encoding binary data. I hadn't realized you could use them for plain text in a clear, non-escaped way. Thank you! I'll do that I think. So if it starts with "data:", I'll parse it like a data URL, and if it's just a path, I'll assume it comes from the archive.
This code will import things through the existing Studio interface, but it doesn't report errors properly. It's a parallel implementation (most of the original restore code still exists so that I can copy tests and compare implementation logic).
If you want to test on the command line, the new (temporary) command
lp_load2will work. You'd invoke it like this:Starting point for all the logic is in
backup_restore.api.load_learning_package.New modules:
payload.pyis where the extraction of TOML files happens, and things get assembled into a big dict for validation. It would also eventually be used to map the other way. It understands the backup archive's internal format.validation.pyis where that's gets turned into a validated Pydantic model and errors will be accumulated (might just fold this intoapi.py).schema.pydefines the actual Pydantic models. It needs a lot of validation rules (just the bare field stuff exists now).loading.pyis the logic for actually loading validated input into the database. It initializes aLoaderobject with input data and then callsloader.load_into()to push that data into a targetLearningPackage.I have a lot of tests to write and some details to work out regarding error reporting, but I think I'm comfortable with this as the basic structure. I intend to keep full compatibility with the existing REST endpoint and just do conversions as needed.