feat(incidents): Create postmortem doc at incident declaration#222
Open
rgibert wants to merge 12 commits into
Open
feat(incidents): Create postmortem doc at incident declaration#222rgibert wants to merge 12 commits into
rgibert wants to merge 12 commits into
Conversation
Move Notion postmortem page creation from the dumpslack flow to on_incident_created so the doc exists from the start. The page is added as a Slack bookmark without posting a message. When dumpslack runs later (on status change to mitigated/done/postmortem), it finds the existing page and populates it with channel content. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/4jYVXERt_otp5j1rF_YAEgnh2RY8c8mk3U9J19IuwUI
…th declaration When _create_postmortem_doc commits a placeholder row (url="") and is still calling the Notion API, _trigger_slack_dump could see the empty URL and fall through to create a second page. Poll for the URL to appear before proceeding, avoiding orphaned Notion pages. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/zceqUSIMWMxN-_wwsojxMiKzfo3l8cdsSTarMkl9IOM
…ing loop When _create_postmortem_doc fails and cleans up the placeholder row, refresh_from_db() raises DoesNotExist. When the process crashes without cleanup, the empty placeholder blocks all retries indefinitely. Now catch DoesNotExist during polling and re-acquire the row on either deletion or timeout, taking ownership of orphaned placeholders instead of telling the user to retry manually. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/gaiK6_e-BFE9nt5GSu3LgLCnqJ_a_AIUTpycqehKAUs
…rning empty When dumpslack loses the re-lock race (another process saved a Notion URL first), use the winner's page URL and continue with apply_template to populate it with Slack history. Previously the loser returned immediately, leaving the winning page empty. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/cl-NMNVbu5jWyVJ1ATQwcvv_niIuI5oowy27M4fC3KY
| "Race condition: concurrent call already created postmortem doc for %s", | ||
| incident.incident_number, | ||
| ) | ||
| return |
There was a problem hiding this comment.
Race loser drops orphan Notion page
Medium Severity
When _create_postmortem_doc re-locks and finds link.url already set, it logs and returns without using the stored URL. If dumpslack (or another racer) saved first, this path leaves the Notion page it just created unused.
Reviewed by Cursor Bugbot for commit 84673d7. Configure here.
…ing tests _trigger_slack_dump left an empty-URL ExternalLink placeholder when the Notion API call failed, causing a 15-second polling delay on retry and blocking _create_postmortem_doc from creating the page. Add the same cleanup that _create_postmortem_doc already performs. Also add tests for the pre-existing URL branch (dumpslack finds an already-populated Notion URL and calls apply_template with update_slack=True) and for the placeholder cleanup on failure. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/3OqlwFB6o1EuMYtSikhAnJGwfLF5tR9hl2uTJbVDUTQ
_create_postmortem_doc now renders and sends the postmortem template markdown after page creation, matching the troubleshooting doc pattern. Previously the template was never applied because dumpslack set update_slack=True when finding an existing URL, which skipped the template in apply_template. Also moves archive_page calls outside transaction blocks to avoid holding SELECT FOR UPDATE locks during external API calls, and removes the notion_page_created guard on add_bookmark in dumpslack so the bookmark is retried if it failed during initial creation. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/NM6HrlV9VN1emYV0tYeQpLcIBCEpOKuxvRwAjLyU-CQ
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7de74bd. Configure here.
page_id is guaranteed non-None at this point since all None paths return early, but mypy cannot narrow across the branching logic. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/2SzUTaoLPySZkmO8sOLmQWGNoEW2pY0lIjleNhsJ7NQ
Move template application from _create_postmortem_doc to apply_template using a blocks-based idempotency check: the template is only sent when the page has no existing blocks. This ensures the template is applied exactly once by dumpslack (after the Linear issue exists) rather than too early during incident creation. Also archive orphaned Notion pages in dumpslack exception handler, and deduplicate bookmarks by checking bookmarks_list before adding. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/wh_HvrZWRy7fnDnoSmn1C_mLmzeK2uRVjwn4hnUFysY
Fix union-attr mypy errors in notion.py and dumpslack.py. Archive orphaned Notion pages on unexpected DB errors in _create_postmortem_doc. Treat transient API errors in apply_template as "has content" to prevent duplicate template application. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/6wZMYD_w5Q44B88Gyky9rTHQSH61cCc7pC8_AKSNs7k
apply_template silently set has_content=True on Notion API failure, permanently skipping template application. Re-raise instead so callers can retry. Fix test_db_error_archives_orphan_notion_page which deleted non-existent ExternalLink records (no-op) instead of simulating a DB error. Use patch.object on ExternalLink.save to raise during URL update, properly exercising the outer exception handler's cleanup path. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/iyF3yLBeORDyqOkG5o2PqyXy3DuvoRTZuVdblGqXkCU
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Move Notion postmortem page creation from the dumpslack flow (triggered on status change to mitigated/done/postmortem) to
on_incident_createdso the doc exists from the start of the incident.The page is added as a Slack channel bookmark without posting a message to the channel. When dumpslack runs later on status change, it finds the existing page via the
ExternalLink(type=NOTION)record and populates it with Slack channel content and AI timeline -- no duplicate page creation occurs.The new
_create_postmortem_docfunction follows the same DB-dedup pattern as_create_troubleshooting_doc: SELECT FOR UPDATE to claim the ExternalLink row, Notion API call outside the transaction, race-condition guard on re-lock, and placeholder cleanup on failure.Resolves RELENG-32
Agent transcript: https://claudescope.sentry.dev/share/UOIi3zUcU_qU3uMtvatWsPs__r3nGeTIqoLl5MUiFnI