feat(taskbroker): propagate TaskError from worker and log failure context on broker by s-starostin · Pull Request #607 · getsentry/taskbroker

s-starostin · 2026-04-25T17:52:12Z

Summary

Propagate task execution exception context from the Python worker client to taskbroker and emit structured broker-side failure logs.

This PR does two things:

extends the Python taskbroker client with error_hook support and propagates TaskError in SetTaskStatusRequest
updates taskbroker server logging to emit structured failure context when TaskError is present

Dependency

Depends on

feat(taskbroker): add TaskError to SetTaskStatusRequest sentry-protos#215

Downstream PR

Consumed by:

feat(taskworker): capture task exception context via TaskErrorCaptureHook sentry#114009

Changes

Python client

add error_hook to TaskbrokerApp
call error_hook.on_exception(task_meta, exc) in the worker exception path
attach returned TaskError | None to ProcessingResult
send error in SetTaskStatusRequest

Broker

read request.error in set_task_status
log structured failure context after status persistence succeeds
log for:
- Failure always
- Retry only when an error envelope is present
keep old-worker compatibility when error is absent

Why

This lets taskbroker logs reflect what the worker actually observed when a task failed, instead of forcing operators to dig only through worker logs.

Logging behavior

Expected broker log shape:

task reported failure task_id=... taskname=... namespace=... status=Failure attempts=... exception_type="..." exception_message="..."

Compatibility

old workers remain supported
requests without error still work
no schema changes to inflight storage
no DLQ format changes

Tests

verify Failure with error logs structured context
verify Retry with error logs structured context
verify Retry without error does not emit failure log
verify Complete does not emit failure log
verify old-worker path without error still works

Follow-up

getsentry/sentry will consume the new error_hook surface and provide the Sentry-specific hook implementation.

sentry · 2026-04-25T17:56:34Z

            try:
                with timeout_alarm(inflight.activation.processing_deadline_duration, handle_alarm):
-                    _execute_activation(task_func, inflight.activation, app.context_hooks)
+                    _execute_activation(task_func, inflight.activation, app.context_hooks, app.error_hook)


Bug: The function _execute_activation is called with an extra argument, app.error_hook, which will cause a TypeError at runtime.
_{Severity: CRITICAL}

Suggested Fix

Remove the fourth argument, app.error_hook, from the call to _execute_activation on line 242. The error hook is already correctly invoked in the surrounding exception handlers.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: clients/python/src/taskbroker_client/worker/workerchild.py#L242 Potential issue: The function `_execute_activation` is called on line 242 with four positional arguments: `task_func`, `inflight.activation`, `app.context_hooks`, and `app.error_hook`. However, the function's definition only accepts a maximum of three arguments. The fourth argument, `app.error_hook`, has no corresponding parameter. Since the function does not use `*args` or `**kwargs` to accept arbitrary arguments, this mismatch will cause a `TypeError` every time a task is processed, effectively preventing any task execution.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

This is a valid bug.

Good catch, thanks - this is a valid bug.

I had a mismatch between the _execute_activation(...) call site and the function signature in workerchild.py. I've just pushed the fix.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ca68733. Configure here.}

evanh · 2026-04-27T14:59:30Z

            try:
                with timeout_alarm(inflight.activation.processing_deadline_duration, handle_alarm):
-                    _execute_activation(task_func, inflight.activation, app.context_hooks)
+                    _execute_activation(task_func, inflight.activation, app.context_hooks, app.error_hook)


This is a valid bug.

evanh · 2026-04-27T15:00:32Z

+                    _execute_activation(task_func, inflight.activation, app.context_hooks, app.error_hook)
                next_state = TASK_ACTIVATION_STATUS_COMPLETE
            except ProcessingDeadlineExceeded as err:
+                if app.error_hook is not None:


Why does this happen first? Can we make this happen after all the other logging, so we can sure that a bug in a hook doesn't cause our reporting to break?

Agreed.

I changed this so the existing reporting path runs first, and the hook is now invoked later via _get_task_error_from_hook(...) as best-effort only. A hook bug should no longer interfere with retry / logging / Sentry reporting; it only affects the optional TaskError payload.

…breakage

s-starostin requested a review from a team as a code owner April 25, 2026 17:52

sentry Bot reviewed Apr 25, 2026

View reviewed changes

cursor Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread clients/python/src/taskbroker_client/worker/workerchild.py Outdated

Comment thread src/grpc/server.rs Outdated

This was referenced Apr 25, 2026

feat(taskbroker): add TaskError to SetTaskStatusRequest getsentry/sentry-protos#215

Open

Expose taskworker exception context end-to-end in taskbroker and Sentry getsentry/sentry#114010

Open

s-starostin force-pushed the feat/taskerror-propagation branch from ca68733 to be25453 Compare April 25, 2026 19:14

feat(taskbroker): propagate TaskError and log structured failure context

6fafc5b

s-starostin force-pushed the feat/taskerror-propagation branch from be25453 to 6fafc5b Compare April 25, 2026 19:16

s-starostin mentioned this pull request Apr 25, 2026

feat(taskworker): capture task exception context via TaskErrorCaptureHook getsentry/sentry#114009

Open

evanh reviewed Apr 27, 2026

View reviewed changes

fix(taskbroker): run error_hook after retry/reporting and avoid hook …

451a9c8

…breakage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(taskbroker): propagate TaskError from worker and log failure context on broker#607

feat(taskbroker): propagate TaskError from worker and log failure context on broker#607
s-starostin wants to merge 2 commits intogetsentry:mainfrom
s-starostin:feat/taskerror-propagation

s-starostin commented Apr 25, 2026 •

edited

Loading

Uh oh!

sentry Bot Apr 25, 2026

Uh oh!

evanh Apr 27, 2026

Uh oh!

s-starostin Apr 27, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

evanh Apr 27, 2026

Uh oh!

evanh Apr 27, 2026

Uh oh!

s-starostin Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

s-starostin commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependency

Downstream PR

Changes

Python client

Broker

Why

Logging behavior

Compatibility

Tests

Follow-up

Uh oh!

sentry Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

evanh Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

s-starostin Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

evanh Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

evanh Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

s-starostin Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

s-starostin commented Apr 25, 2026 •

edited

Loading