Conversation
esoteric-ephemera
commented
Apr 21, 2026
- Delta table querying of phase diagrams
- Tooling to establish persistent query builder in web server
| pd_tbl = DeltaTable( | ||
| "s3://materialsproject-build/objects/phase-diagrams/", | ||
| storage_options={"AWS_SKIP_SIGNATURE": "true", "AWS_REGION": "us-east-1"}, | ||
| ) | ||
| qb = self._query_builder.register("phase_diagrams", pd_tbl) | ||
| table = pa.table( | ||
| qb.execute( | ||
| f"""SELECT phase_diagram | ||
| FROM phase_diagrams | ||
| WHERE chemsys='{sorted_chemsys}' | ||
| AND version='{version}' | ||
| AND thermo_type='{thermo_type}' | ||
| """ | ||
| ) | ||
| ) | ||
| as_py = table["phase_diagram"].to_pylist(maps_as_pydicts="strict") |
There was a problem hiding this comment.
@tsmathis: Followed your examples here for getting the phase diagram from a delta table. Works if you set version = "2026-04-13"
Were you envisioning the user downloads the full set of phase diagrams first? Otherwise there's a good amount of latency for me (~15ish seconds) between executing the query and getting a pymatgen object back
There was a problem hiding this comment.
You have it right, but It will unfortunately be slower for single phase diagram retrievals. Tradeoff from moving from individual json files per run type + chemsys.
You are paying a few seconds here though :
pd_tbl = DeltaTable(
"s3://materialsproject-build/objects/phase-diagrams/",
storage_options={"AWS_SKIP_SIGNATURE": "true", "AWS_REGION": "us-east-1"},
)
Every DeltaTable(...) call checks that there is a valid table at the locations passed to the constructor.
There was a problem hiding this comment.
it would also be good to see where that time is.
Here:
api/mp_api/client/routes/materials/thermo.py
Lines 177 to 186 in 26a4a5b
or here:
api/mp_api/client/routes/materials/thermo.py
Lines 187 to 191 in 26a4a5b
There was a problem hiding this comment.
code block 2 in the above could get pretty costly for phase diagrams with numerous entries
There was a problem hiding this comment.
OK so it probably makes sense to cache the DeltaTable on an instance of MPRester?
There was a problem hiding this comment.
Yeah, we can even pre-register some of the tables at start up that might be used repeatedly. I guess phase-diagrams might be the only applicable table though. All the others I imagine would only be used for full downloads.
Another thing to note, once a table has been registered with the query builder it can be referenced again anywhere using the string used during the call to .register(...) (for the lifetime of the query builder anyway). Similar-ish to a CREATE VIEW ... operation.