Skip to content

Improve changelog generator and split CHANGELOG.md into per-version files#1768

Open
andygrove wants to merge 12 commits into
apache:mainfrom
andygrove:improve-changelog-generator
Open

Improve changelog generator and split CHANGELOG.md into per-version files#1768
andygrove wants to merge 12 commits into
apache:mainfrom
andygrove:improve-changelog-generator

Conversation

@andygrove

@andygrove andygrove commented May 25, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

  • Simplify changelog generation so it no longer required manual effort
  • Use the change log generator script from Comet that includes credits at the end showing commits per contributor
  • Regenerated the change log for 53.0.0 using the new script - here
  • Split the single every-growing change log page into one file per release and move to the documentation site so that it is easer for people to find

What changes are included in this PR?

  • Port dev/release/generate-changelog.py to the Comet version, adapted to Ballista's project name and CLI shape (<tag1> <tag2> <version>). The new script also resolves refs to SHAs before handing them to the GitHub API, which fixes a latent bug where HEAD was resolved by the API to the default branch instead of the caller's local HEAD.
  • Add 19 per-version files under docs/source/changelog/, generated from the existing root CHANGELOG.md. Drop the noisy pre-split apache/arrow-datafusion umbrella-repo entries (6.0.0, 6.0.0-rc0, 7.0.0, 7.0.0-rc2, 7.1.0-rc1); the pre-migration CHANGELOG.md remains in git history if anyone needs to re-run the split.
  • Add docs/source/changelog/index.md with a Sphinx toctree listing every kept release newest-first.
  • Add a Changelog toctree caption to docs/source/index.rst so the per-version pages appear in the rendered site.
  • Replace the root CHANGELOG.md with a short stub pointing at the new location.
  • Delete the unused .github_changelog_generator config (Ruby tool, no longer authoritative).
  • Update the ### Change Log section of dev/release/README.md to document the single-command flow.

Are there any user-facing changes?

The published docs now include a "Changelog" caption in the top-level TOC linking to a separate page per release. The root CHANGELOG.md becomes a one-screen stub pointing readers at docs/source/changelog/index.md. No code or runtime behavior changes.

@github-actions github-actions Bot added documentation Improvements or additions to documentation development-process labels May 25, 2026
andygrove added 3 commits May 25, 2026 08:34
The splitter has served its purpose. The pre-migration CHANGELOG.md is recoverable from git history if anyone needs to re-run it.
@andygrove andygrove marked this pull request as ready for review May 25, 2026 14:44
@andygrove andygrove requested a review from milenkovicm May 25, 2026 14:44
commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip()

# get number of contributors
contributor_count = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2} | wc -l", shell=True, text=True).strip()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
contributor_count = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2} | wc -l", shell=True, text=True).strip()
shortlog_output = subprocess.check_output(["git", "shortlog", "-sn", f"{tag1}..{tag2}"], text=True)
contributor_count = len(shortlog_output.strip().splitlines())

print_pulls(repo_name, "Other", other)

# show code contributions
credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}", shell=True, text=True).rstrip()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}", shell=True, text=True).rstrip()
credits = shortlog_output.rstrip()

cc_type = ''
cc_scope = ''
cc_breaking = ''
parts = re.findall(r'^([a-z]+)(\([a-z]+\))?(!)?:', pull.title)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parts = re.findall(r'^([a-zA-Z]+)(\([a-zA-Z0-9_-]+\))?(!)?:', pull.title)

Conventional commits allow upper-case too

Comment on lines +152 to +154
# If it can't be resolved locally, return as-is (e.g. a tag name
# that the GitHub API can resolve)
return ref

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If it can't be resolved locally, return as-is (e.g. a tag name
# that the GitHub API can resolve)
return ref
# If it can't be resolved locally, return as-is (e.g. a remote tag).
# The GitHub API will attempt to resolve it.
print(f"Note: Could not resolve '{ref}' locally; passing to GitHub API as-is", file=sys.stderr)
return ref

print(f"# Apache DataFusion Ballista {version} Changelog\n")

# get the number of commits
commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
commit_count = subprocess.check_output(f"git log --pretty=oneline {tag1}..{tag2} | wc -l", shell=True, text=True).strip()
commit_count = subprocess.check_output(["git", "rev-list", "--count", f"{tag1}..{tag2}"], text=True).strip()

git rev-list --count is more efficient since there is no need to pipe to wc -l (also more Windows-friendly).
Also, using an array + string format only for the tags + shell=False prevents command injection.

@@ -75,26 +75,83 @@
cc_breaking = parts_tuple[2] == '!'

labels = [label.name for label in pull.labels]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ignore auto-dependencies PRs? Currently i'm filtering them out manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants