Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

guard fuser grad checks on non-leaf nodes
#2919 opened Apr 23, 2026 by CarlosGomes98 Contributor Draft
13 tasks
[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 2.15.0
#2917 opened Apr 22, 2026 by cyanguwa Collaborator Loading…
8 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
Variable Grouped Swizzle
#2914 opened Apr 22, 2026 by int-smart Contributor Loading…
8 of 13 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[PyTorch] Fix FA4 selection when FA3 is unavailable. 2.15.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2909 opened Apr 21, 2026 by bbuschkaemper Contributor Loading…
8 of 13 tasks
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906 opened Apr 21, 2026 by yaox12 Member Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900 opened Apr 18, 2026 by timmoon10 Collaborator Loading…
9 of 13 tasks
Improve the dimension checks for the FP8 recipes
#2894 opened Apr 16, 2026 by ptrendx Member Loading…
13 tasks
Add optimised top-k kernel AIR.
#2890 opened Apr 16, 2026 by dcampora Loading…
8 of 13 tasks
Add AI written qwen3_moe example
#2887 opened Apr 15, 2026 by skyw Loading…
4 of 13 tasks
[Debug] Add AutoswitchGEmm for Debug Precision Tool
#2883 opened Apr 15, 2026 by shangxiaokang Draft
3 of 13 tasks
SMEM offset caching RHT
#2882 opened Apr 15, 2026 by sraman-rgb Loading…
13 tasks
[PyTorch] Split TE ops op_forward into op_forward and setup_context
#2877 opened Apr 14, 2026 by pggPL Collaborator Loading…
5 of 7 tasks
[DONOT MERGE] Wgrad cute dsl v2
#2872 opened Apr 13, 2026 by vthumbe1503 Collaborator Draft
13 tasks
Optimizations for MXFP8/NVFP4 dequantize kernels
#2865 opened Apr 10, 2026 by YigongQin Loading…
8 of 13 tasks
Adds GEMM Profiling Guide to TE
#2863 opened Apr 9, 2026 by jomitchellnv Contributor Loading…
7 tasks
[DO NOT MERGE] Test CI
#2862 opened Apr 9, 2026 by cyanguwa Collaborator Draft
13 tasks
Add cpplint and ruff linter to pre-commit and fix lint violations
#2853 opened Apr 8, 2026 by pstjohn Contributor Loading…
Bump transformers from 4.55.0 to 5.0.0rc3 in /docs/examples/te_gemma dependencies Pull requests that update a dependency file python Pull requests that update python code
#2851 opened Apr 8, 2026 by dependabot Bot Loading…
ProTip! Follow long discussions with comments:>50.