[PyTorch] Guard/document single parameter feature for grouped linear by ksivaman · Pull Request #2955 · NVIDIA/TransformerEngine

ksivaman · 2026-05-01T22:28:29Z

Description

Guard/document single parameter feature for grouped linear.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Guard/document single parameter feature for grouped linear

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2026-05-01T22:29:07Z

/te-ci pytorch L0

greptile-apps · 2026-05-01T22:31:50Z

Greptile Summary

This PR guards the experimental single_grouped_weight / single_grouped_bias feature in GroupedLinear behind the NVTE_GROUPED_LINEAR_SINGLE_PARAM environment variable, emitting a UserWarning when the flags are requested without the env var or when the experimental path is enabled. CI scripts are updated to run three test suites with the feature enabled so the code path is exercised in automation. The implementation is straightforward and correct; the only nit is a slightly misleading warning message (see inline comment).

Confidence Score: 5/5

Safe to merge; the gating logic is correct, warnings are informative, and CI coverage of the new path is added.

Only P2 findings (minor warning-message wording). No logic bugs, no data-corruption risk, no security concerns. The function correctly short-circuits when neither flag is set, resolves at construction time, and the stacklevel is accurate for both direct-instantiation call sites.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/utils.py	Adds resolve_grouped_linear_single_param_flags gating logic; warning message says "is not set" even when the env var is explicitly set to "0", and the experimental-enabled warning fires unconditionally on every construction.
transformer_engine/pytorch/module/grouped_linear.py	Calls resolve_grouped_linear_single_param_flags early in init before assigning the instance attributes; docs updated consistently.
transformer_engine/pytorch/ops/basic/grouped_linear.py	Same gating call added to ops-level GroupedLinear.init; docs updated; single_grouped_weight/single_grouped_bias are resolved before weight/bias registration loops.
qa/L0_pytorch_unittest/test.sh	Adds NVTE_GROUPED_LINEAR_SINGLE_PARAM=1 to three test invocations to exercise the single-param code path in CI.
qa/L0_pytorch_debug_unittest/test.sh	Adds NVTE_GROUPED_LINEAR_SINGLE_PARAM=1 to the debug sanity test run.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["GroupedLinear.__init__(single_grouped_weight, single_grouped_bias)"]
    A --> B["resolve_grouped_linear_single_param_flags()"]
    B --> C{Either flag True?}
    C -- No --> D["Return flags unchanged (no warning)"]
    C -- Yes --> E{NVTE_GROUPED_LINEAR_SINGLE_PARAM > 0?}
    E -- No --> F["Warn: env var not enabled\nForce both flags to False"]
    E -- Yes --> G["Warn: experimental feature active\nReturn flags unchanged"]
    F --> H["self.single_grouped_weight = False\nself.single_grouped_bias = False"]
    G --> I["self.single_grouped_weight = requested\nself.single_grouped_bias = requested"]
    D --> J["self.single_grouped_weight = False\nself.single_grouped_bias = False"]

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into single_param_be..." | Re-trigger Greptile}

greptile-apps · 2026-05-01T22:31:54Z

+    if not (single_grouped_weight or single_grouped_bias):
+        return single_grouped_weight, single_grouped_bias
+
+    env_enabled = int(os.environ.get("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0")) > 0


Non-integer env var value will raise ValueError

int(os.environ.get("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0")) will throw an uncaught ValueError if the variable is set to a non-numeric string (e.g. "true", "yes"). Wrapping in a try/except would give a cleaner error message. This is consistent with other similar env-var checks in the file, so this is a pre-existing pattern rather than a new regression — flagging for awareness.

Suggested change

env_enabled = int(os.environ.get("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0")) > 0

try:

env_enabled = int(os.environ.get("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0")) > 0

except ValueError:

env_enabled = False

timmoon10

This logic breaks backward compatibility since we can no longer rely on the module kwargs to configure single grouped params. I guess we've already been treating single grouped params as an experimental feature. We should keep this instability in mind whenever we use this feature externally, e.g. in Mcore.

timmoon10 · 2026-05-01T22:49:42Z

+    single_grouped_weight: bool,
+    single_grouped_bias: bool,


While we are breaking backward compatibility, we might consider consolidating these options together. Do we really want to take on the burden of supporting the case with a single grouped weight and discrete bias, or discrete weights and single grouped bias?

timmoon10 · 2026-05-01T22:55:15Z

+    if not (single_grouped_weight or single_grouped_bias):
+        return single_grouped_weight, single_grouped_bias
+
+    env_enabled = int(os.environ.get("NVTE_GROUPED_LINEAR_SINGLE_PARAM", "0")) > 0


If we only respect the kwargs if an envvar is set, it doesn't really make sense to keep the envvars rather than just checking the envvar. I guess we're half-heartedly maintaining/preparing the stable API for this feature.

Signed-off-by: ksivamani <ksivamani@nvidia.com>

ksivaman · 2026-05-03T22:06:45Z

/te-ci pytorch

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2026-05-04T19:04:21Z

/te-ci pytorch

Better documentation for single param and envvar guard

7650120

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman requested review from ptrendx and timmoon10 May 1, 2026 22:28

ksivaman added the 2.15.0 label May 1, 2026

Merge branch 'main' into single_param_better_doc

ba5bddc

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

timmoon10 previously approved these changes May 1, 2026

View reviewed changes

ksivaman added 2 commits May 3, 2026 17:49

Merge branch 'main' into single_param_better_doc

554abb8

fix doc

16bd7be

Signed-off-by: ksivamani <ksivamani@nvidia.com>

ksivaman dismissed timmoon10’s stale review via 16bd7be May 3, 2026 22:06

timmoon10 previously approved these changes May 4, 2026

View reviewed changes

Fix test envvar

0d20387

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman dismissed timmoon10’s stale review via 0d20387 May 4, 2026 19:03

Merge branch 'main' into single_param_better_doc

cbace12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Guard/document single parameter feature for grouped linear#2955

[PyTorch] Guard/document single parameter feature for grouped linear#2955
ksivaman wants to merge 6 commits intoNVIDIA:mainfrom
ksivaman:single_param_better_doc

ksivaman commented May 1, 2026

Uh oh!

ksivaman commented May 1, 2026

Uh oh!

greptile-apps Bot commented May 1, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 1, 2026

Uh oh!

timmoon10 left a comment

Uh oh!

timmoon10 May 1, 2026

Uh oh!

timmoon10 May 1, 2026

Uh oh!

ksivaman commented May 3, 2026

Uh oh!

ksivaman commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksivaman commented May 1, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

ksivaman commented May 1, 2026

Uh oh!

greptile-apps Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

ksivaman commented May 3, 2026

Uh oh!

ksivaman commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 1, 2026 •

edited

Loading