Improve changelog generator and split CHANGELOG.md into per-version files#1768
Open
andygrove wants to merge 12 commits into
Open
Improve changelog generator and split CHANGELOG.md into per-version files#1768andygrove wants to merge 12 commits into
andygrove wants to merge 12 commits into
Conversation
The tests in dev/release/tests/test_split_changelog.py read the live root CHANGELOG.md, which is no longer the multi-version document the splitter operates on. The splitter has served its one-shot purpose; the tests would only ever pass against a snapshot of the pre-migration file, which is recoverable from git history if anyone needs to re-run the migration.
The splitter has served its purpose. The pre-migration CHANGELOG.md is recoverable from git history if anyone needs to re-run it.
martin-g
reviewed
May 26, 2026
| commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip() | ||
|
|
||
| # get number of contributors | ||
| contributor_count = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2} | wc -l", shell=True, text=True).strip() |
Member
There was a problem hiding this comment.
Suggested change
| contributor_count = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2} | wc -l", shell=True, text=True).strip() | |
| shortlog_output = subprocess.check_output(["git", "shortlog", "-sn", f"{tag1}..{tag2}"], text=True) | |
| contributor_count = len(shortlog_output.strip().splitlines()) |
| print_pulls(repo_name, "Other", other) | ||
|
|
||
| # show code contributions | ||
| credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}", shell=True, text=True).rstrip() |
Member
There was a problem hiding this comment.
Suggested change
| credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}", shell=True, text=True).rstrip() | |
| credits = shortlog_output.rstrip() |
| cc_type = '' | ||
| cc_scope = '' | ||
| cc_breaking = '' | ||
| parts = re.findall(r'^([a-z]+)(\([a-z]+\))?(!)?:', pull.title) |
Member
There was a problem hiding this comment.
Suggested change
| parts = re.findall(r'^([a-zA-Z]+)(\([a-zA-Z0-9_-]+\))?(!)?:', pull.title) |
Conventional commits allow upper-case too
Comment on lines
+152
to
+154
| # If it can't be resolved locally, return as-is (e.g. a tag name | ||
| # that the GitHub API can resolve) | ||
| return ref |
Member
There was a problem hiding this comment.
Suggested change
| # If it can't be resolved locally, return as-is (e.g. a tag name | |
| # that the GitHub API can resolve) | |
| return ref | |
| # If it can't be resolved locally, return as-is (e.g. a remote tag). | |
| # The GitHub API will attempt to resolve it. | |
| print(f"Note: Could not resolve '{ref}' locally; passing to GitHub API as-is", file=sys.stderr) | |
| return ref |
| print(f"# Apache DataFusion Ballista {version} Changelog\n") | ||
|
|
||
| # get the number of commits | ||
| commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip() |
Member
There was a problem hiding this comment.
Suggested change
| commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip() | |
| commit_count = subprocess.check_output(["git", "rev-list", "--count", f"{tag1}..{tag2}"], text=True).strip() |
git rev-list --count is more efficient since there is no need to pipe to wc -l (also more Windows-friendly).
Also, using an array + string format only for the tags + shell=False prevents command injection.
milenkovicm
reviewed
May 28, 2026
| @@ -75,26 +75,83 @@ | |||
| cc_breaking = parts_tuple[2] == '!' | |||
|
|
|||
| labels = [label.name for label in pull.labels] | |||
Contributor
There was a problem hiding this comment.
Can we ignore auto-dependencies PRs? Currently i'm filtering them out manually
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
dev/release/generate-changelog.pyto the Comet version, adapted to Ballista's project name and CLI shape (<tag1> <tag2> <version>). The new script also resolves refs to SHAs before handing them to the GitHub API, which fixes a latent bug whereHEADwas resolved by the API to the default branch instead of the caller's local HEAD.docs/source/changelog/, generated from the existing rootCHANGELOG.md. Drop the noisy pre-splitapache/arrow-datafusionumbrella-repo entries (6.0.0,6.0.0-rc0,7.0.0,7.0.0-rc2,7.1.0-rc1); the pre-migrationCHANGELOG.mdremains in git history if anyone needs to re-run the split.docs/source/changelog/index.mdwith a Sphinx toctree listing every kept release newest-first.Changelogtoctree caption todocs/source/index.rstso the per-version pages appear in the rendered site.CHANGELOG.mdwith a short stub pointing at the new location..github_changelog_generatorconfig (Ruby tool, no longer authoritative).### Change Logsection ofdev/release/README.mdto document the single-command flow.Are there any user-facing changes?
The published docs now include a "Changelog" caption in the top-level TOC linking to a separate page per release. The root
CHANGELOG.mdbecomes a one-screen stub pointing readers atdocs/source/changelog/index.md. No code or runtime behavior changes.