Skip to content

[WIP] Emmet v0.87.0 compatibility#1087

Open
esoteric-ephemera wants to merge 9 commits intomainfrom
emmet87
Open

[WIP] Emmet v0.87.0 compatibility#1087
esoteric-ephemera wants to merge 9 commits intomainfrom
emmet87

Conversation

@esoteric-ephemera
Copy link
Copy Markdown
Collaborator

  • Delta table querying of phase diagrams
  • Tooling to establish persistent query builder in web server

Comment on lines +172 to +187
pd_tbl = DeltaTable(
"s3://materialsproject-build/objects/phase-diagrams/",
storage_options={"AWS_SKIP_SIGNATURE": "true", "AWS_REGION": "us-east-1"},
)
qb = self._query_builder.register("phase_diagrams", pd_tbl)
table = pa.table(
qb.execute(
f"""SELECT phase_diagram
FROM phase_diagrams
WHERE chemsys='{sorted_chemsys}'
AND version='{version}'
AND thermo_type='{thermo_type}'
"""
)
)
as_py = table["phase_diagram"].to_pylist(maps_as_pydicts="strict")
Copy link
Copy Markdown
Collaborator Author

@esoteric-ephemera esoteric-ephemera Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmathis: Followed your examples here for getting the phase diagram from a delta table. Works if you set version = "2026-04-13"

Were you envisioning the user downloads the full set of phase diagrams first? Otherwise there's a good amount of latency for me (~15ish seconds) between executing the query and getting a pymatgen object back

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have it right, but It will unfortunately be slower for single phase diagram retrievals. Tradeoff from moving from individual json files per run type + chemsys.

You are paying a few seconds here though :

        pd_tbl = DeltaTable(
            "s3://materialsproject-build/objects/phase-diagrams/",
            storage_options={"AWS_SKIP_SIGNATURE": "true", "AWS_REGION": "us-east-1"},
        )

Every DeltaTable(...) call checks that there is a valid table at the locations passed to the constructor.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would also be good to see where that time is.

Here:

table = pa.table(
qb.execute(
f"""SELECT phase_diagram
FROM phase_diagrams
WHERE chemsys='{sorted_chemsys}'
AND version='{version}'
AND thermo_type='{thermo_type}'
"""
)
)

or here:

as_py = table["phase_diagram"].to_pylist(maps_as_pydicts="strict")
pd: PhaseDiagram | None = None
if len(pds := TypeAdapter(list[PhaseDiagramType]).validate_python(as_py)) > 0:
pd = pds[0]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code block 2 in the above could get pretty costly for phase diagrams with numerous entries

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so it probably makes sense to cache the DeltaTable on an instance of MPRester?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can even pre-register some of the tables at start up that might be used repeatedly. I guess phase-diagrams might be the only applicable table though. All the others I imagine would only be used for full downloads.

Another thing to note, once a table has been registered with the query builder it can be referenced again anywhere using the string used during the call to .register(...) (for the lifetime of the query builder anyway). Similar-ish to a CREATE VIEW ... operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants