Augment the scheduler with resources to allow more fine-grained parallelism limitation by AlexJones0 · Pull Request #184 · lowRISC/dvsim

AlexJones0 · 2026-04-16T12:23:16Z

This PR contains the implementation of "resources" to the scheduler, which are essentially a mechanism for more fine-grained parallelism limits than what is currently offered by the standard --max-parallel flag. Note: This PR is quite large, let me know if it needs to be split up for review.

The JobSpec model is now changed so that each job can declare the resources that it uses, as a mapping of a resource name to some number. These resources are managed by a ResourceManager which is operated by the scheduler, which will strictly ensure that the running jobs do not try to allocate more resources than there are available. For now, resources are only defined for the various different sim tools, but this could be extended to the other flows in the future (and will probably be easier if the flows/deploys are better refactored).

Resources are determined via a ResourceProvider interface/protocol. Currently, this PR only implements static resources, where users can pass e.g. --resource A=20 --resource B=50 flags on the command line to define static resource limits {"A": 20, "B": 50}. The main goal of this interface is that this can be extended in the future to support more dynamic resources if needed - for example license or compute availability that is actively polled via some command. While no dynamic resources are implemented in this PR, the integration of resources into the scheduler is designed such that no changes should be needed if/when they are introduced.

See the individual commit messages for more details. Also relevant: see my local branch with some experimentation for dynamic resource availability.

AlexJones0 · 2026-04-16T19:29:32Z

Note: after some further discussion, I also dropped the commit that adds more granular per-tool license knowledge as part of the SimTool plugin. This might be nice eventually, but is a bit too complex at the moment and doesn't give us any useful advantages at the moment (all useful sub-licenses, e.g. those for formal, would all only be used for one job at once, because those use the GUI). When this PR is merged I'll create an issue to track this in case we might find it useful to add in the future.

Instead we just treat the tool itself as the resource. This has the nice advantage for now of meaning this generalizes and extends to support any other flows that also define a tool (e.g. linting? I think).

This will eventually be used by the Scheduler to manage more in-depth parallelism, where jobs will define resources and the scheduler will have to respect parallelism limits on those resources. The abstract `ResourceProvider` is designed in such a way that more complicated resource provider implementations could be added in the future (when compared to the StaticResourceProviders), with the ability to eventually support dynamic resource allocation. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

This is just an idempotent constructor; it is confirmed via the arg type (and by manually inspecting possible call sites) that this will always be a `Path` already. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

`OrderedDict` is redundant in modern Python, so let's type it properly and use modern conveniences. Likewise, we shouldn't be returning `dict_keys` if we intend to return a `Sequence` in the output tuples. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

pyright is correct, we should be handling the case where `cov_total is None` which is when the coverage summary doesn't contain the expected "Score" metric. This should result in an appropriate error from parsing. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Rather than use an `int` and make `0` an implementation-defined "unbounded", it is nicer to explicitly support `max_parallelism=None` to more clearly refer to this case. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Integrate the previously introduced `ResourceManager` into the scheduler. The scheduler now attempts to allocate resources for jobs when it decides to run them, which are then released when the job finishes execution (or fails to launch). All resources go through the manager, which will fail to allocate them if there are not enough resources to provide within the defined limits. If no resource limits are defined, behaviour depends on the `ResourceManager` configuration, but the default is to assume an unbounded limit. At the start of the scheduler run, all jobs are validated against the limits defined in the `ResourceManager`. For example, if static resources are used and there exists some job whose resource requirements cannot be satisfied by the defined limits, then this will be caught in advance of execution and reported early as an error. If the resources are dynamic, then this case only results in a warning. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

This commit provides command-line options for creating and configuring a `ResourceManager` with a static resource provider to use passed resource limits. This can then be given to the scheduler to provide more fine-grained parallelism limiting than is allowed by `--max-parallel`, where jobs are now only scheduled such that the scheduler will always respect the defined resource limits. Note the TODO about Python 3.11 - when the minimum Python version is bumped we can make the enum a StrEnum which has much better native str() behaviour than the existing Enum type and removes some of the extra glue code. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Add some extra scheduler tests to cover the functionality of the new resource-level parallelism feature that was introduced. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

machshev

Thanks @AlexJones0!

AlexJones0 requested review from hcallahan-lowrisc, machshev and rswarbrick April 16, 2026 12:24

machshev reviewed Apr 16, 2026

View reviewed changes

Comment thread src/dvsim/scheduler/resources.py

AlexJones0 force-pushed the scheduler_resource_limits branch 2 times, most recently from 3bec966 to 4ce9dd9 Compare April 16, 2026 19:21

AlexJones0 added 5 commits April 16, 2026 22:34

fix: remove superfluous Path construction in xcelium sim tool

0e78d65

This is just an idempotent constructor; it is confirmed via the arg type (and by manually inspecting possible call sites) that this will always be a `Path` already. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

refactor: Change Scheduler max_parallelism type

a0c4721

Rather than use an `int` and make `0` an implementation-defined "unbounded", it is nicer to explicitly support `max_parallelism=None` to more clearly refer to this case. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

AlexJones0 force-pushed the scheduler_resource_limits branch from 4ce9dd9 to e378ebf Compare April 16, 2026 21:34

AlexJones0 added 3 commits April 16, 2026 23:51

test: add scheduler tests for resource-level parallelism

0aad34d

Add some extra scheduler tests to cover the functionality of the new resource-level parallelism feature that was introduced. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

AlexJones0 force-pushed the scheduler_resource_limits branch from e378ebf to 0aad34d Compare April 16, 2026 22:51

AlexJones0 requested a review from machshev April 17, 2026 09:37

machshev approved these changes Apr 17, 2026

View reviewed changes

AlexJones0 added this pull request to the merge queue Apr 17, 2026

Merged via the queue into lowRISC:master with commit 030a584 Apr 17, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augment the scheduler with resources to allow more fine-grained parallelism limitation#184

Augment the scheduler with resources to allow more fine-grained parallelism limitation#184
AlexJones0 merged 8 commits intolowRISC:masterfrom
AlexJones0:scheduler_resource_limits

AlexJones0 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

AlexJones0 commented Apr 16, 2026 •

edited

Loading

Uh oh!

machshev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexJones0 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AlexJones0 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machshev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexJones0 commented Apr 16, 2026 •

edited

Loading

AlexJones0 commented Apr 16, 2026 •

edited

Loading