Skip to content

feat(incidents): Create postmortem doc at incident declaration#222

Open
rgibert wants to merge 12 commits into
mainfrom
rgibert/postmortem-doc-at-declaration
Open

feat(incidents): Create postmortem doc at incident declaration#222
rgibert wants to merge 12 commits into
mainfrom
rgibert/postmortem-doc-at-declaration

Conversation

@rgibert

@rgibert rgibert commented May 28, 2026

Copy link
Copy Markdown
Member

Move Notion postmortem page creation from the dumpslack flow (triggered on status change to mitigated/done/postmortem) to on_incident_created so the doc exists from the start of the incident.

The page is added as a Slack channel bookmark without posting a message to the channel. When dumpslack runs later on status change, it finds the existing page via the ExternalLink(type=NOTION) record and populates it with Slack channel content and AI timeline -- no duplicate page creation occurs.

The new _create_postmortem_doc function follows the same DB-dedup pattern as _create_troubleshooting_doc: SELECT FOR UPDATE to claim the ExternalLink row, Notion API call outside the transaction, race-condition guard on re-lock, and placeholder cleanup on failure.

Resolves RELENG-32

Agent transcript: https://claudescope.sentry.dev/share/UOIi3zUcU_qU3uMtvatWsPs__r3nGeTIqoLl5MUiFnI

Move Notion postmortem page creation from the dumpslack flow to
on_incident_created so the doc exists from the start. The page is
added as a Slack bookmark without posting a message. When dumpslack
runs later (on status change to mitigated/done/postmortem), it finds
the existing page and populates it with channel content.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/4jYVXERt_otp5j1rF_YAEgnh2RY8c8mk3U9J19IuwUI
Comment thread src/firetower/incidents/hooks.py
@linear-code

linear-code Bot commented May 31, 2026

Copy link
Copy Markdown

RELENG-32

…th declaration

When _create_postmortem_doc commits a placeholder row (url="") and is
still calling the Notion API, _trigger_slack_dump could see the empty
URL and fall through to create a second page. Poll for the URL to
appear before proceeding, avoiding orphaned Notion pages.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/zceqUSIMWMxN-_wwsojxMiKzfo3l8cdsSTarMkl9IOM
@rgibert rgibert marked this pull request as ready for review June 1, 2026 14:36
@rgibert rgibert requested a review from a team as a code owner June 1, 2026 14:36
@rgibert rgibert self-assigned this Jun 1, 2026
Comment thread src/firetower/slack_app/handlers/dumpslack.py Outdated
Comment thread src/firetower/incidents/tests/test_hooks.py
…ing loop

When _create_postmortem_doc fails and cleans up the placeholder row,
refresh_from_db() raises DoesNotExist. When the process crashes without
cleanup, the empty placeholder blocks all retries indefinitely.

Now catch DoesNotExist during polling and re-acquire the row on either
deletion or timeout, taking ownership of orphaned placeholders instead
of telling the user to retry manually.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/gaiK6_e-BFE9nt5GSu3LgLCnqJ_a_AIUTpycqehKAUs
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
…rning empty

When dumpslack loses the re-lock race (another process saved a Notion
URL first), use the winner's page URL and continue with apply_template
to populate it with Slack history. Previously the loser returned
immediately, leaving the winning page empty.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/cl-NMNVbu5jWyVJ1ATQwcvv_niIuI5oowy27M4fC3KY
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/incidents/hooks.py
Comment thread src/firetower/incidents/hooks.py Outdated
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/tests/handlers/test_dumpslack.py
Comment thread src/firetower/incidents/hooks.py Outdated
"Race condition: concurrent call already created postmortem doc for %s",
incident.incident_number,
)
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race loser drops orphan Notion page

Medium Severity

When _create_postmortem_doc re-locks and finds link.url already set, it logs and returns without using the stored URL. If dumpslack (or another racer) saved first, this path leaves the Notion page it just created unused.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 84673d7. Configure here.

…ing tests

_trigger_slack_dump left an empty-URL ExternalLink placeholder when the
Notion API call failed, causing a 15-second polling delay on retry and
blocking _create_postmortem_doc from creating the page. Add the same
cleanup that _create_postmortem_doc already performs.

Also add tests for the pre-existing URL branch (dumpslack finds an
already-populated Notion URL and calls apply_template with
update_slack=True) and for the placeholder cleanup on failure.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/3OqlwFB6o1EuMYtSikhAnJGwfLF5tR9hl2uTJbVDUTQ
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py Outdated
Comment thread src/firetower/incidents/tests/test_hooks.py
Comment thread src/firetower/incidents/hooks.py Outdated
_create_postmortem_doc now renders and sends the postmortem template
markdown after page creation, matching the troubleshooting doc pattern.
Previously the template was never applied because dumpslack set
update_slack=True when finding an existing URL, which skipped the
template in apply_template.

Also moves archive_page calls outside transaction blocks to avoid
holding SELECT FOR UPDATE locks during external API calls, and removes
the notion_page_created guard on add_bookmark in dumpslack so the
bookmark is retried if it failed during initial creation.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/NM6HrlV9VN1emYV0tYeQpLcIBCEpOKuxvRwAjLyU-CQ

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7de74bd. Configure here.

Comment thread src/firetower/incidents/hooks.py Outdated
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
page_id is guaranteed non-None at this point since all None paths
return early, but mypy cannot narrow across the branching logic.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/2SzUTaoLPySZkmO8sOLmQWGNoEW2pY0lIjleNhsJ7NQ
Comment thread src/firetower/incidents/tests/test_hooks.py Outdated
Move template application from _create_postmortem_doc to apply_template
using a blocks-based idempotency check: the template is only sent when
the page has no existing blocks. This ensures the template is applied
exactly once by dumpslack (after the Linear issue exists) rather than
too early during incident creation.

Also archive orphaned Notion pages in dumpslack exception handler, and
deduplicate bookmarks by checking bookmarks_list before adding.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/wh_HvrZWRy7fnDnoSmn1C_mLmzeK2uRVjwn4hnUFysY
Comment thread src/firetower/incidents/hooks.py
Comment thread src/firetower/integrations/services/notion.py
Fix union-attr mypy errors in notion.py and dumpslack.py. Archive orphaned
Notion pages on unexpected DB errors in _create_postmortem_doc. Treat
transient API errors in apply_template as "has content" to prevent duplicate
template application.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/6wZMYD_w5Q44B88Gyky9rTHQSH61cCc7pC8_AKSNs7k
Comment thread src/firetower/integrations/services/notion.py Outdated
Comment thread src/firetower/incidents/tests/test_hooks.py
apply_template silently set has_content=True on Notion API failure,
permanently skipping template application. Re-raise instead so callers
can retry.

Fix test_db_error_archives_orphan_notion_page which deleted non-existent
ExternalLink records (no-op) instead of simulating a DB error. Use
patch.object on ExternalLink.save to raise during URL update, properly
exercising the outer exception handler's cleanup path.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/iyF3yLBeORDyqOkG5o2PqyXy3DuvoRTZuVdblGqXkCU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant