Skip to content

fix: handle ValueError and RecursionError in _safe_json_serialize#5414

Open
voidborne-d wants to merge 2 commits intogoogle:mainfrom
voidborne-d:fix/safe-json-serialize-exception-handling
Open

fix: handle ValueError and RecursionError in _safe_json_serialize#5414
voidborne-d wants to merge 2 commits intogoogle:mainfrom
voidborne-d:fix/safe-json-serialize-exception-handling

Conversation

@voidborne-d
Copy link
Copy Markdown
Contributor

Summary

Both _safe_json_serialize implementations only catch TypeError and OverflowError from json.dumps, but json.dumps can also raise:

  • ValueError: circular references (e.g. obj.ref = obj)
  • RecursionError: deeply nested structures exceeding Python's recursion limit

Impact

telemetry/tracing.py (critical): _safe_json_serialize is called inside a finally block during trace_tool_call. An unhandled exception here can suppress the original tool execution result — telemetry failures should never affect runtime correctness.

models/lite_llm.py: An unhandled exception during message serialization causes hard LLM request construction failures where a graceful fallback is expected.

Root Cause

The except clause catches (TypeError, OverflowError) but not ValueError or RecursionError. Both are documented json.dumps failure modes:

  • ValueError for circular references (per Python docs)
  • RecursionError for structures exceeding sys.getrecursionlimit()

Note: in tracing.py, the default=lambda o: ... callback handles TypeError for individual non-serializable objects, but circular references and deep recursion bypass the default callback entirely.

Fix

  • Add ValueError and RecursionError to the exception tuple in both implementations.
  • In lite_llm.py, wrap the str() fallback in a secondary try/except RecursionError since str()__repr__() can also fail on deeply recursive objects.

Reproduction

class Node:
    def __init__(self):
        self.ref = self

obj = Node()
# Before fix: raises ValueError
# After fix: returns fallback string
_safe_json_serialize(obj)

Tests

8 regression tests added across both modules:

  • test_safe_json_serialize.py (telemetry): circular ref, deep nesting, normal dict, non-serializable object
  • test_litellm_safe_serialize.py (models): circular ref, deep nesting, normal dict, non-serializable object

All 8 tests pass.

Fixes #5411
Fixes #5412

@adk-bot adk-bot added models [Component] Issues related to model support tracing [Component] This issue is related to OpenTelemetry tracing labels Apr 20, 2026
@voidborne-d voidborne-d force-pushed the fix/safe-json-serialize-exception-handling branch from 17dc740 to 87499ee Compare April 20, 2026 22:06
@rohityan rohityan self-assigned this Apr 20, 2026
@rohityan rohityan added request clarification [Status] The maintainer need clarification or more information from the author and removed models [Component] Issues related to model support labels Apr 20, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @voidborne-d , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh

@voidborne-d voidborne-d force-pushed the fix/safe-json-serialize-exception-handling branch from 87499ee to ddff034 Compare April 21, 2026 03:08
@voidborne-d
Copy link
Copy Markdown
Contributor Author

Hi @rohityan, thanks for the review! I've run autoformat.sh and pushed the formatting fix in ddff034. The changes were isort reordering + pyink reformatting across lite_llm.py and both test files. All 8 tests pass.

@voidborne-d voidborne-d force-pushed the fix/safe-json-serialize-exception-handling branch 2 times, most recently from 55145ae to 5b20085 Compare April 21, 2026 18:07
voidborne-d and others added 2 commits April 25, 2026 23:31
Both _safe_json_serialize implementations only catch TypeError and
OverflowError from json.dumps, but json.dumps can also raise:

- ValueError: circular references (e.g. object referencing itself)
- RecursionError: deeply nested structures exceeding recursion limit

In telemetry/tracing.py, this is especially critical because
_safe_json_serialize is called inside a finally block during
trace_tool_call — an unhandled exception here can suppress the
original tool execution result.

In models/lite_llm.py, an unhandled exception during message
serialization causes hard LLM request construction failures where
a graceful fallback is expected.

Fix:
- Add ValueError and RecursionError to the exception tuple in both
  implementations.
- In lite_llm.py, wrap the str() fallback in a secondary try/except
  RecursionError since str() can also fail on deeply recursive objects.

8 regression tests added across both modules.

Fixes google#5411
Fixes google#5412
@voidborne-d voidborne-d force-pushed the fix/safe-json-serialize-exception-handling branch from 67cee5d to c231cf8 Compare April 25, 2026 15:31
@voidborne-d
Copy link
Copy Markdown
Contributor Author

Rebased onto current main (c231cf8). Upstream now catches (TypeError, ValueError, OverflowError) in both call sites since this PR was opened, so the residual delta is two narrow additions:

  1. RecursionError in the exception tuple — both lite_llm.py:560 and tracing.py:131. Upstream's tuple doesn't include it; deeply self-referential objects (e.g. circular agent state) still crash _safe_json_serialize today.
  2. Nested str(obj) guard in lite_llm.py:562-567str() itself can hit RecursionError when an object has a recursive __repr__, so the fallback path needs its own try.

Tests in tests/unittests/{models,telemetry}/test_*_safe_serialize.py cover the RecursionError path specifically (which is what's not covered upstream). Repro and motivation in the original PR body still hold.

Mergeable now (was dirty pre-rebase). Happy to drop the nested str(obj) guard if it reads as over-engineered for the actual reproductions you're seeing — the tuple addition is the load-bearing piece.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request clarification [Status] The maintainer need clarification or more information from the author tracing [Component] This issue is related to OpenTelemetry tracing

Projects

None yet

3 participants