Skip to content

[WIP] Re-implementing Restore to use Pydantic#554

Draft
ormsbee wants to merge 3 commits intoopenedx:mainfrom
ormsbee:backup-restore-pydantic
Draft

[WIP] Re-implementing Restore to use Pydantic#554
ormsbee wants to merge 3 commits intoopenedx:mainfrom
ormsbee:backup-restore-pydantic

Conversation

@ormsbee
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee commented Apr 20, 2026

This code will import things through the existing Studio interface, but it doesn't report errors properly. It's a parallel implementation (most of the original restore code still exists so that I can copy tests and compare implementation logic).

If you want to test on the command line, the new (temporary) command lp_load2 will work. You'd invoke it like this:

python manage.py cms lp_load2 [path] [package_ref] [username]

Starting point for all the logic is in backup_restore.api.load_learning_package.

New modules:

  • payload.py is where the extraction of TOML files happens, and things get assembled into a big dict for validation. It would also eventually be used to map the other way. It understands the backup archive's internal format.
  • validation.py is where that's gets turned into a validated Pydantic model and errors will be accumulated (might just fold this into api.py).
  • schema.py defines the actual Pydantic models. It needs a lot of validation rules (just the bare field stuff exists now).
  • loading.py is the logic for actually loading validated input into the database. It initializes a Loader object with input data and then calls loader.load_into() to push that data into a target LearningPackage.

I have a lot of tests to write and some details to work out regarding error reporting, but I think I'm comfortable with this as the basic structure. I intend to keep full compatibility with the existing REST endpoint and just do conversions as needed.

@ormsbee ormsbee force-pushed the backup-restore-pydantic branch from 2518655 to cdfb9db Compare April 20, 2026 17:49
This is a re-implementation of the restore part of backup_restore, with
the goal of making it more robust and maintainable in the long term.
@ormsbee ormsbee force-pushed the backup-restore-pydantic branch from cdfb9db to 22d7143 Compare April 20, 2026 17:50
@ormsbee
Copy link
Copy Markdown
Contributor Author

ormsbee commented Apr 20, 2026

@bradenmacdonald, @kdmccormick: The WIP (but functioning) new restore code. There are a lot of missing low-level things and tests, so it's not really ready for review. But please let me know if you have any high level concerns with the overall approach. Thank you.

for static_file_path in fs.glob(f"{comp_ver_dir}/static/**"):
if fs.isfile(static_file_path):
rel_path = os.path.relpath(static_file_path, comp_ver_dir)
media[rel_path] = f"fs:{static_file_path}"
Copy link
Copy Markdown
Contributor Author

@ormsbee ormsbee Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdmccormick, @bradenmacdonald: I was worried about storing potentially enormous inline assets in a giant JSON doc, so I'm thinking of building the component version media mapping with this convention:

  • "text:" will be simple text
  • "base64:" indicates base64 encoded data is to follow
  • "fs:" indicates that what follows is a path from the filesystem object we're passing around from the input.

So an example:

{
  "block.xml": "text:<problem>(bunch of OLX here)</problem>",
  "static/figure1.webp": "fs:entities/xblock.v1/problem/intro-to-everything/component_versions/v2/static/figure1.webp",
  "static/logo.svg": "base64:SXMgc29tZW9uZSBjdXJpb3VzIGVub3VnaCB0byBhY3R1YWxseSBkZWNvZGUgdGhpcz8="
}

Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds nice, but that won't be backwards compatible right?

It's also very similar to data URLs; I like the simplicity here but could be worth considering using standard data URLs since there are already lots of libraries for parsing and generating them. (Although the syntax is not as nice; you have these mandatory , and ; which look weird when omitting MIME types.)

{
  "block.xml": "data:,<problem>(bunch of OLX here)</problem>",
  "static/logo.svg": "data:;base64,SXMgc29tZW9uZSBjdXJpb3VzIGVub3VnaCB0byBhY3R1YWxseSBkZWNvZGUgdGhpcz8="
}

Copy link
Copy Markdown
Contributor Author

@ormsbee ormsbee Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds nice, but that won't be backwards compatible right?

This would be for the currently internal format after things get extracted and pieced together, but before Pydantic model validation. So it wouldn't break backwards compatibility with anything.

It's also very similar to data URLs;

That occurred to me, but I had only ever used those for base64 encoding binary data. I hadn't realized you could use them for plain text in a clear, non-escaped way. Thank you! I'll do that I think. So if it starts with "data:", I'll parse it like a data URL, and if it's just a path, I'll assume it comes from the archive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants