fix(docs): skip ms.date updates and republish for metadata-only changes#2154
fix(docs): skip ms.date updates and republish for metadata-only changes#2154MSBrett wants to merge 1 commit into
Conversation
The update-mslearn-dates GitHub Action and Publish-Toolkit both treated any change to a docs file as a content change, so editing only YAML frontmatter (or other metadata-only fields) was enough to bump ms.date and trigger a docs republish. Workflow (.github/workflows/update-mslearn-dates.yml) - Strip YAML frontmatter from both the working copy and the pull_request.base.sha version of each changed file and cmp the remaining article body. Skip the ms.date rewrite when the bodies match. Skip files that no longer exist. Publish-Toolkit (src/scripts/Publish-Toolkit.ps1) - Add Get-DocsArticleBody to strip frontmatter, markdownlint and prettier ignore comments, and normalize trailing whitespace and blank-line runs. - Add Remove-DocsMetadataOnlyChanges to walk every docs/*.md in the publish target and git-restore files whose normalized body matches HEAD. Invoked from Set-RepoContent when -Template is 'docs'. Tests (src/powershell/Tests/Unit/Action.UpdateMsLearnDates.Tests.ps1) - Add three Pester assertions that pin the workflow to github.event.pull_request.base.sha, the strip_frontmatter helper, and the cmp -s body-diff guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the docs publishing and update-mslearn-dates workflow to avoid treating metadata-only edits (YAML frontmatter/ignore directives) as content changes, preventing unnecessary ms.date bumps and republish churn.
Changes:
- Workflow: compare PR base SHA vs working copy after stripping frontmatter; skip
ms.dateupdates when article bodies match and skip deleted files. - Publish toolkit: normalize/strip docs metadata and auto-revert files whose article body matches
HEADfor docs templates. - Tests: add Pester assertions to pin the workflow gating logic.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/scripts/Publish-Toolkit.ps1 |
Adds body-normalization and a cleanup pass to revert metadata-only docs changes during publish. |
src/powershell/Tests/Unit/Action.UpdateMsLearnDates.Tests.ps1 |
Adds assertions ensuring the workflow uses base SHA + body-diff guardrails. |
.github/workflows/update-mslearn-dates.yml |
Implements frontmatter-stripped body comparison against PR base SHA before updating ms.date. |
| Set-Location $docsPath | ||
| $repoRoot = git rev-parse --show-toplevel | ||
| Set-Location $repoRoot | ||
|
|
||
| Get-ChildItem $docsPath -Recurse -Filter "*.md" | ForEach-Object { | ||
| $repoFile = [System.IO.Path]::GetRelativePath($repoRoot, $_.FullName).Replace("\", "/") | ||
| git cat-file -e "HEAD:$repoFile" 2>$null | ||
|
|
||
| if ($LASTEXITCODE -ne 0) | ||
| { | ||
| return | ||
| } | ||
|
|
||
| $previousContent = [string]::Join("`n", (git show "HEAD:$repoFile")) | ||
| $currentContent = Get-Content $_.FullName -Raw | ||
|
|
||
| if ((Get-DocsArticleBody $previousContent) -eq (Get-DocsArticleBody $currentContent)) | ||
| { | ||
| git restore --source HEAD -- $repoFile | ||
| } | ||
| } | ||
|
|
||
| Pop-Location |
| function Get-DocsArticleBody([string]$content) | ||
| { | ||
| $lines = $content -split "`r?`n" | ||
| $start = 0 | ||
|
|
||
| if ($lines.Count -gt 0 -and $lines[0] -eq "---") | ||
| { | ||
| for ($i = 1; $i -lt $lines.Count; $i++) | ||
| { | ||
| if ($lines[$i] -eq "---") | ||
| { | ||
| $start = $i + 1 | ||
| break | ||
| } | ||
| } | ||
| } | ||
|
|
||
| $ignoreLines = @( | ||
| "<!-- markdownlint-disable-next-line MD025 -->", | ||
| "<!-- prettier-ignore-start -->", | ||
| "<!-- prettier-ignore-end -->" | ||
| ) | ||
|
|
||
| $body = $lines[$start..($lines.Count - 1)] ` | ||
| | Where-Object { $ignoreLines -notcontains $_.Trim() } ` | ||
| | ForEach-Object { $_.TrimEnd() } | ||
|
|
||
| return (($body -join "`n") -replace "`n{3,}", "`n`n").Trim() | ||
| } |
| Get-ChildItem $docsPath -Recurse -Filter "*.md" | ForEach-Object { | ||
| $repoFile = [System.IO.Path]::GetRelativePath($repoRoot, $_.FullName).Replace("\", "/") | ||
| git cat-file -e "HEAD:$repoFile" 2>$null | ||
|
|
||
| if ($LASTEXITCODE -ne 0) | ||
| { | ||
| return | ||
| } | ||
|
|
||
| $previousContent = [string]::Join("`n", (git show "HEAD:$repoFile")) | ||
| $currentContent = Get-Content $_.FullName -Raw |
| return | ||
| } | ||
|
|
||
| $previousContent = [string]::Join("`n", (git show "HEAD:$repoFile")) |
|
|
||
| if ((Get-DocsArticleBody $previousContent) -eq (Get-DocsArticleBody $currentContent)) | ||
| { | ||
| git restore --source HEAD -- $repoFile |
RolandKrummenacher
left a comment
There was a problem hiding this comment.
A couple of additional notes on top of Copilot's review:
1. Asymmetric "what counts as body" between workflow and publish script
The workflow's strip_frontmatter (awk) only removes the ---...--- block. The PowerShell Get-DocsArticleBody in Publish-Toolkit.ps1 also strips:
<!-- markdownlint-disable-next-line MD025 --><!-- prettier-ignore-start -->/<!-- prettier-ignore-end -->- Trailing whitespace
- Triple+ blank lines collapsed to two
If a PR changes only ignore-comments or whitespace, the workflow treats it as a body change and bumps ms.date, then at publish time Remove-DocsMetadataOnlyChanges reverts the file to HEAD (with the OLD ms.date). Net result is correct, but the ms.date bump on the PR is wasted churn — and the two definitions will drift over time.
Worth extracting body normalization to a single source of truth (either both call the same helper, or both apply the same rule set).
2. Get-DocsArticleBody / Remove-DocsMetadataOnlyChanges have no behavior tests
The new Pester tests follow the repo convention of regex-matching workflow YAML, which is fine for the workflow guard. But the two new PowerShell functions are pure-ish and easy to unit test:
It 'Get-DocsArticleBody strips frontmatter'
It 'Get-DocsArticleBody strips markdownlint/prettier ignore lines'
It 'Get-DocsArticleBody normalizes blank-line runs to max two'
It 'Get-DocsArticleBody trims trailing whitespace per line'
It 'Get-DocsArticleBody returns "" for empty or frontmatter-only input'These would also be the most useful guard against the asymmetry in #1 — if both sides are required to satisfy a shared test fixture, drift becomes visible.
flanakin
left a comment
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.7] PR Review
Summary: Focused, well-scoped fix that achieves its stated goal. Workflow guards and the publish-side cleanup work together correctly. No blockers. Re-emphasizing two items already raised by @RolandKrummenacher at the line level for code-review visibility, plus two additional suggestions.
⚠️ Should fix (2)
- Asymmetric body normalization between workflow (
strip_frontmatteronly) andGet-DocsArticleBody(frontmatter + ignore comments + whitespace) — already raised in @RolandKrummenacher's top-level comment; pinning at the line for context. - No behavior tests for
Get-DocsArticleBody/Remove-DocsMetadataOnlyChanges— also from @RolandKrummenacher's comment.
💡 Suggestions (2)
Get-DocsArticleBodyproduces a reversed range when a file is frontmatter-only ($start == $lines.Count).Remove-DocsMetadataOnlyChangesrelies on cwd forgit restoreto resolve paths —git -C $repoRootwould be more robust and self-documenting.
| $ignoreLines = @( | ||
| "<!-- markdownlint-disable-next-line MD025 -->", | ||
| "<!-- prettier-ignore-start -->", | ||
| "<!-- prettier-ignore-end -->" | ||
| ) | ||
|
|
||
| $body = $lines[$start..($lines.Count - 1)] ` | ||
| | Where-Object { $ignoreLines -notcontains $_.Trim() } ` | ||
| | ForEach-Object { $_.TrimEnd() } | ||
|
|
||
| return (($body -join "`n") -replace "`n{3,}", "`n`n").Trim() |
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.7]
The workflow's strip_frontmatter (awk) only removes the ---...--- block, but Get-DocsArticleBody here additionally strips markdownlint/prettier ignore comments and normalizes trailing whitespace + blank-line runs. The two definitions disagree on "what counts as body."
A PR that changes only an ignore-comment or whitespace is seen as a real body change by the workflow (so ms.date gets bumped), then reverted to HEAD at publish time (with the OLD ms.date). Net result is correct today, but:
- The PR carries a wasted
ms.datebump that will be silently undone. - The two normalization implementations will drift over time.
Consider a single source of truth — e.g., a small helper script the workflow sources, or matching the workflow's awk to also drop the ignore-comment lines. (Re-raising @RolandKrummenacher's top-level point at the line level for visibility.)
| | ForEach-Object { $_.TrimEnd() } | ||
|
|
||
| return (($body -join "`n") -replace "`n{3,}", "`n`n").Trim() | ||
| } |
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.7]
Get-DocsArticleBody is pure and easy to unit test, but there are no Pester tests covering it. The new tests in Action.UpdateMsLearnDates.Tests.ps1 regex-match the workflow YAML (fine for the workflow guard) but don't exercise this function at all.
Minimal coverage worth adding (paraphrasing @RolandKrummenacher's suggestion):
It 'Get-DocsArticleBody strips frontmatter'
It 'Get-DocsArticleBody strips markdownlint/prettier ignore lines'
It 'Get-DocsArticleBody collapses 3+ blank lines to 2'
It 'Get-DocsArticleBody trims trailing whitespace per line'
It 'Get-DocsArticleBody returns "" for empty or frontmatter-only input'If both the workflow's strip_frontmatter and this function are required to satisfy a shared fixture, the asymmetry above becomes visible the moment one drifts.
| "<!-- prettier-ignore-end -->" | ||
| ) | ||
|
|
||
| $body = $lines[$start..($lines.Count - 1)] ` |
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.7] 💡 Suggestion
Edge case: if a file is frontmatter-only (closing --- is the last line), $start equals $lines.Count and $lines[$start..($lines.Count - 1)] becomes $lines[Count..(Count-1)], which PowerShell evaluates as a reversed range and returns the last element instead of an empty slice. The function would then return that line (likely empty), which happens to still compare equal across versions, so the visible blast radius is small — but the logic is wrong.
Guard with a length check, e.g.:
| $body = $lines[$start..($lines.Count - 1)] ` | |
| if ($start -ge $lines.Count) | |
| { | |
| return "" | |
| } | |
| $body = $lines[$start..($lines.Count - 1)] ` |
(Existing pipeline continues from | Where-Object ....)
| Push-Location | ||
| Set-Location $docsPath | ||
| $repoRoot = git rev-parse --show-toplevel | ||
| Set-Location $repoRoot | ||
|
|
There was a problem hiding this comment.
🤖 [AI][Claude Opus 4.7] 💡 Suggestion
The function changes the process cwd (Set-Location $repoRoot) so that git restore --source HEAD -- $repoFile later resolves $repoFile against the correct repo. It works, but it's implicit — anyone reading the loop has to remember the cwd was changed 15 lines earlier.
Passing -C $repoRoot to each git call would make the intent explicit and remove the need to change cwd:
git -C $repoRoot cat-file -e "HEAD:$repoFile" 2>$null
...
$previousContent = [string]::Join("`n", (git -C $repoRoot show "HEAD:$repoFile"))
...
git -C $repoRoot restore --source HEAD -- $repoFileMinor — purely a readability/robustness improvement.
Problem
The
update-mslearn-datesGitHub Action andPublish-Toolkit.ps1both treated any change to a docs markdown file as a content change. Editing only YAML frontmatter (or other metadata-only fields) was enough to bumpms.dateon PR merge and to produce a docs republish PR full of metadata-only churn.Fix
Three coordinated changes that gate updates on actual article-body changes.
Workflow —
.github/workflows/update-mslearn-dates.ymlpull_request.base.shaversion of each changed file andcmpthe remaining article body.ms.daterewrite when the bodies match.Publish-Toolkit —
src/scripts/Publish-Toolkit.ps1Get-DocsArticleBody— strips frontmatter, markdownlint and prettier ignore comments, normalizes trailing whitespace and blank-line runs.Remove-DocsMetadataOnlyChanges— walks everydocs/*.mdin the publish target andgit restores files whose normalized body matchesHEAD.Set-RepoContentwhen-Templateis'docs'.Tests —
src/powershell/Tests/Unit/Action.UpdateMsLearnDates.Tests.ps1github.event.pull_request.base.sha, thestrip_frontmatterhelper, and thecmp -sbody-diff guard.Effect
ms.date.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com