Skip to content

Add Storage TSGs: physical disk add (HowTo) + CanPool=False (Troubleshoot)#281

Open
AlBurns-MSFT wants to merge 2 commits intoAzure:mainfrom
AlBurns-MSFT:tsg/storage-add-physical-disks
Open

Add Storage TSGs: physical disk add (HowTo) + CanPool=False (Troubleshoot)#281
AlBurns-MSFT wants to merge 2 commits intoAzure:mainfrom
AlBurns-MSFT:tsg/storage-add-physical-disks

Conversation

@AlBurns-MSFT
Copy link
Copy Markdown
Collaborator

@AlBurns-MSFT AlBurns-MSFT commented May 6, 2026

Summary

Adds two new Storage TSGs. Both files follow the repo HowTo and Troubleshoot templates and update the Storage component README.

File Type Purpose
TSG/Storage/HowTo-Storage-AddPhysicalDisksToS2DPool.md HowTo End-to-end safe procedure for online capacity expansion of an existing Storage Spaces Direct pool.
TSG/Storage/Troubleshoot-Storage-PhysicalDiskCanPoolFalse.md Troubleshoot Resolution paths for every common CannotPoolReason value when newly inserted disks are not claimed.
TSG/Storage/README.md Index Adds entries for the two new TSGs.

What's in each TSG

HowTo: Add physical disks to an existing Azure Local cluster

Covers the full safe sequence:

  1. Pre-checks (cluster health, storage health, disk inventory baseline, per-node symmetry).
  2. Symmetric insertion guidance (do not reboot unless the OEM requires it).
  3. Automatic pool claim verification, with a guarded manual Add-PhysicalDisk path when needed.
  4. Storage job monitoring with explicit hard gates against reboots, updates, more disk adds, and volume expansion while jobs are active.
  5. Capacity confirmation and a hard gate on volume expansion (Get-StorageJob must be empty).
  6. Final validation snapshot and cross-link to the companion troubleshoot TSG.

Troubleshoot: Physical disks not claimed after insertion (CanPool=False)

Covers each CannotPoolReason value with a dedicated step:

  • In a Pool (no fix needed; verify pool membership)
  • Verification in progress (wait, do not reset)
  • Verification failed (cluster + storage state checks; Start-AzsSupportStorageDiagnostic)
  • Hardware not compliant (vendor escalation; do not bypass)
  • Firmware not compliant (firmware alignment via vendor)
  • Offline / read-only (identify exact disk first; Set-Disk -IsOffline $false)
  • Stale metadata (gated Reset-PhysicalDisk only after positive confirmation the disk is wipeable)

Includes a data-collection checklist for opening a support case.

Why this matters

Disk-add operations are routine but storage-infrastructure-changing. The most common failure mode in the field is a CanPool=False after insertion with no clear next step, which leads to either premature destructive actions (Reset-PhysicalDisk against the wrong disk) or unnecessary support escalations. These two TSGs give operators a single canonical safe sequence and a reason-by-reason resolution map.

Conventions followed

  • Template structure from TSG/Templates/HowTo-Template.md and TSG/Templates/Troubleshoot-Template.md.
  • File naming per CONTRIBUTING.md (Type-Topic-Specifics.md).
  • Metadata table at the top of each file:
    • HowTo: Component / Topic / Applicable Scenarios (per the HowTo template).
    • Troubleshoot: Component / Severity / Applicable Scenarios / Affected Versions (per the Troubleshoot template). Affected Versions is set to All Azure Local releases (Storage Spaces Direct) because the cmdlet surface and CannotPoolReason enum are platform-stable across releases; the issue is scoped to S2D-backed clusters.
  • Inline # comments on every PowerShell command per the template pattern.
  • GitHub admonition syntax (> [!IMPORTANT], > [!WARNING], > [!CAUTION], > [!NOTE]) for risk callouts per TSG/Templates/Markdown-Snippets.md.
  • Cross-link to the existing Troubleshooting-Storage-With-Support-Diagnostics-Tool.md TSG (verified path).
  • All commands use placeholders (<disk-serial-number>, <disk-number>, <unique-id>, <serial-number-1>); no real telemetry, KQL, customer names, or ARM URIs.

Validation

  • Both files were reviewed for public safety: no internal telemetry, KQL, customer identifiers, ARM URIs, or hostnames.
  • The Start-AzsSupportStorageDiagnostic cmdlet referenced in the troubleshoot TSG is documented in the existing Troubleshooting-Storage-With-Support-Diagnostics-Tool.md.
  • Cross-links between the two new TSGs and to the existing diagnostics-tool TSG resolve to valid relative paths.
  • Reset-PhysicalDisk is gated behind explicit identity confirmation and an in-text warning, consistent with the repo's coding-standards guidance ("All code snippets MUST be safe to execute in a production environment").
  • The manual Add-PhysicalDisk snippet (in both TSGs) requires exactly one non-primordial pool and explicit serial-number enumeration of the intended new disks, so it cannot accidentally claim other CanPool=True disks. The single-pool invariant matches the documented S2D design (Enable-ClusterStorageSpacesDirect creates one pool per cluster; Microsoft Learn explicitly recommends one pool per cluster).

…hoot)

Adds two new Storage TSGs derived from a customer disk-add engagement:

- HowTo-Storage-AddPhysicalDisksToS2DPool.md
  End-to-end safe procedure for online capacity expansion: pre-checks,
  symmetric insertion, automatic vs manual pool claim, monitoring storage
  jobs, capacity confirmation, and final validation.

- Troubleshoot-Storage-PhysicalDiskCanPoolFalse.md
  Resolution paths for every common CannotPoolReason value (In a Pool,
  Verification in progress / failed, Hardware/Firmware not compliant,
  Offline, Stale metadata), plus data-collection checklist for support.

Both TSGs follow the HowTo-Template and Troubleshoot-Template and reference
the existing Troubleshooting-Storage-With-Support-Diagnostics-Tool TSG.

Internal tracking: msazure/One #37486687.
Drafts were reviewed for public safety (no telemetry, KQL, customer names,
or ARM URIs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 20:35
@AlBurns-MSFT
Copy link
Copy Markdown
Collaborator Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new Storage troubleshooting guides (TSGs) to document a safe disk-add workflow for S2D on Azure Local and a reason-by-reason resolution map for Get-PhysicalDisk cases where newly inserted disks show CanPool=False. Updates the Storage component index to include the new guides.

Changes:

  • Added a HowTo guide describing the end-to-end, “safe sequence” procedure for online capacity expansion by adding physical disks to an existing S2D pool.
  • Added a Troubleshoot guide mapping common CannotPoolReason values to recommended verification and remediation steps (including guarded destructive actions).
  • Updated the Storage README to index the two new TSGs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
TSG/Storage/HowTo-Storage-AddPhysicalDisksToS2DPool.md New HowTo for safe, step-by-step disk addition to an existing S2D pool, including pre-checks, monitoring, and validation.
TSG/Storage/Troubleshoot-Storage-PhysicalDiskCanPoolFalse.md New Troubleshoot guide for diagnosing and resolving CanPool=False scenarios by CannotPoolReason.
TSG/Storage/README.md Adds index links to the two new storage TSGs.

Comment on lines +3 to +16
<table border="1" cellpadding="6" cellspacing="0" style="border-collapse:collapse; margin-bottom:1em;">
<tr>
<th style="text-align:left; width: 180px;">Component</th>
<td><strong>Storage</strong></td>
</tr>
<tr>
<th style="text-align:left; width: 180px;">Topic</th>
<td><strong>Storage Spaces Direct</strong>: Add physical disks to an existing pool for online capacity expansion</td>
</tr>
<tr>
<th style="text-align:left; width: 180px;">Applicable Scenarios</th>
<td><strong>Day 2 Operations</strong>: Capacity expansion / Add disk</td>
</tr>
</table>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Resolved in a6de64a. The HowTo template doesn't include an Affected Versions row, so the row was never present in this file — only the Troubleshoot file has it (the Troubleshoot template requires it). Updated the PR description so it no longer claims both files include the field. The Troubleshoot file's value was changed from All versions to All Azure Local releases (Storage Spaces Direct) to convey the actual scope.

Comment on lines +210 to +213
```powershell
# Add the eligible disks to the target pool
Add-PhysicalDisk -StoragePoolFriendlyName $pool.FriendlyName -PhysicalDisks $eligibleDisks
```
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in a6de64a. Replaced the manual-add snippet with a defensive variant that (a) throws if the non-primordial pool count is not exactly 1 (the documented healthy state for S2D — Enable-ClusterStorageSpacesDirect creates a single pool per cluster), and (b) requires the operator to enumerate intended new disks by serial number rather than blindly piping all CanPool=True disks into Add-PhysicalDisk.

Comment on lines +231 to +234
```powershell
# Manually add the eligible disks to the target pool
Add-PhysicalDisk -StoragePoolFriendlyName $pool.FriendlyName -PhysicalDisks $eligibleDisks
```
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in a6de64a. Same hardening applied here as in the HowTo: pool-count guard plus serial-number enumeration before Add-PhysicalDisk runs.

…arify version applicability

- Both files: replace manual Add-PhysicalDisk snippet with a defensive
  variant that validates exactly one non-primordial pool and requires
  the operator to enumerate intended new disks by serial number, so
  Add-PhysicalDisk cannot accidentally claim unintended CanPool=True
  disks.

- Troubleshoot file: clarify Affected Versions from 'All versions' to
  'All Azure Local releases (Storage Spaces Direct)' to convey scope
  rather than appearing version-agnostic.

Addresses Copilot review threads on PR Azure#281.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AlBurns-MSFT added a commit to AlBurns-MSFT/AzureLocal-Supportability that referenced this pull request May 6, 2026
…arify version applicability

- Both files: replace manual Add-PhysicalDisk snippet with a defensive
  variant that validates exactly one non-primordial pool and requires
  the operator to enumerate intended new disks by serial number, so
  Add-PhysicalDisk cannot accidentally claim unintended CanPool=True
  disks.

- Troubleshoot file: clarify Affected Versions from 'All versions' to
  'All Azure Local releases (Storage Spaces Direct)' to convey scope
  rather than appearing version-agnostic.

Addresses Copilot review threads on PR Azure#281.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AlBurns-MSFT AlBurns-MSFT force-pushed the tsg/storage-add-physical-disks branch from a6de64a to 267151c Compare May 6, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants