Skip to content

[PyTorch] Fix FA4 selection when FA3 is unavailable.#2909

Merged
ptrendx merged 1 commit intoNVIDIA:mainfrom
bbuschkaemper:fix-select-fa4
Apr 23, 2026
Merged

[PyTorch] Fix FA4 selection when FA3 is unavailable.#2909
ptrendx merged 1 commit intoNVIDIA:mainfrom
bbuschkaemper:fix-select-fa4

Conversation

@bbuschkaemper
Copy link
Copy Markdown
Contributor

Description

Fixes FA3 being preferred over FA4 on SM90 before confirming that FA3 is actually installed, causing FA4 to be unusable on Hopper devices.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • utils.py confirms that FA3 is actually installed and available before discarding FA4 in favor of FA3

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Björn Buschkämper <bjoern.buschkaemper@gmail.com>
Copilot AI review requested due to automatic review settings April 21, 2026 09:30
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR fixes a bug where FA4 was incorrectly disabled on SM90 (Hopper) devices even when FA3 was not actually installed. The original condition only checked FlashAttentionUtils.v4_is_installed (and whether FA3 was requested via env var), but not FlashAttentionUtils.v3_is_installed, so the "prefer FA3 over FA4" logic fired even when FA3 was absent — effectively leaving SM90 users without any flash attention backend.

Confidence Score: 5/5

Safe to merge — targeted one-line logic fix with no side effects on other code paths.

The change is minimal, correct, and directly addresses the described bug. The new condition properly requires both FA3 to be requested and actually installed before suppressing FA4. No new edge cases are introduced and all other filter paths are unaffected.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/pytorch/attention/dot_product_attention/utils.py Adds FlashAttentionUtils.v3_is_installed guard to the SM90 FA3-over-FA4 preference logic, preventing FA4 from being disabled when FA3 is not actually installed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SM90 device detected] --> B{use_flash_attention_3?}
    B -- No --> E[FA4 remains enabled]
    B -- Yes --> C{v3_is_installed?}
    C -- No --> E
    C -- Yes --> D{use_flash_attention_4?}
    D -- No --> F[FA4 already disabled]
    D -- Yes --> G{v4_is_installed?}
    G -- No --> F
    G -- Yes --> H[Disable FA4, prefer FA3 on SM90]

    style H fill:#f96,color:#000
    style E fill:#6f6,color:#000
Loading

Reviews (1): Last reviewed commit: "Fix FA4 selection when FA3 is unavailabl..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes backend selection on SM90 (Hopper) so FlashAttention 4 (FA4) is not disabled in favor of FlashAttention 3 (FA3) unless FA3 is actually installed, preventing FA4 from being inadvertently made unusable when FA3 isn’t present.

Changes:

  • Gate the SM90 “prefer FA3 over FA4” rule on FlashAttentionUtils.v3_is_installed (and v4_is_installed) so FA4 is only disabled when both backends are truly available.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bbuschkaemper
Copy link
Copy Markdown
Contributor Author

@yaox12 I believe this to be related to your #2432 PR - could you take a look if this change makes sense?

Copy link
Copy Markdown
Member

@yaox12 yaox12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaox12
Copy link
Copy Markdown
Member

yaox12 commented Apr 21, 2026

@vcherepanov-nv Could you review and add a 2.15.0 tag on this PR?

@ptrendx ptrendx added the 2.15.0 label Apr 21, 2026
@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Apr 21, 2026

/te-ci pytorch

@ptrendx ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Apr 21, 2026
@ptrendx ptrendx self-assigned this Apr 21, 2026
@bbuschkaemper
Copy link
Copy Markdown
Contributor Author

@ptrendx Is the failed build check sth I can fix or a runner-issue? Seems to me that the build was cancelled.

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Apr 23, 2026

@bbuschkaemper it was a runner issue. I reran the CI internally and it passed. Merging, thanks :-)!

@ptrendx ptrendx merged commit 9e55a25 into NVIDIA:main Apr 23, 2026
24 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.15.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants