Add support for 'orjson' and introduce a 'ResponseDecoder' API for deserialization#1385
Open
sirosen wants to merge 2 commits intoglobus:mainfrom
Open
Add support for 'orjson' and introduce a 'ResponseDecoder' API for deserialization#1385sirosen wants to merge 2 commits intoglobus:mainfrom
sirosen wants to merge 2 commits intoglobus:mainfrom
Conversation
cfb2376 to
66a8dde
Compare
In order to support `orjson` for serialization, add a new `OrjsonRequestEncoder`, which mirrors the `JSONRequestEncoder` type. The transport can select which encoder to use based on an init-time flag. This setting highlights the awkwardness of the class-level (read: global) mapping of strings to encoders. Therefore, it is now copied into an instance attribute, which is then post-hoc modified if `use_orjson=True`. On the decoding side, the story is more complex, as the objects doing decoding are various: responses, errors, and retry hooks. Only one of those three (retry hooks) operates at the abstraction layer of the transport. Therefore, the changes to support a defined decoder type, and provide it in these contexts, are as follows: - RetryContext objects now include the response decoder - response objects use their client object's decoder (specifically, `self.client.transport.decoder`) - API errors have their own decoder setting, which can be _injected_ via a contextvar API, but which is not publicly controllable via init -- this avoids compatibility issues around changing the init signature of our error types Selection of orjson is provided via an init arg to the transport and via an env var, `GLOBUS_SDK_USE_ORJSON=1`. If the setting is enabled but `orjson` is not installed, instantiating encoders and decoders (of the `Orjson*` types) will emit errors, to provide an early-error experience. Because API errors now need access to `globus_sdk.transport`, these import paths are more heavily implicated in other parts of the SDK. This breaks tests which help enforce deferred imports of `requests`, due to its very slow import time. To rectify, a number of `requests` imports, specifically, are now `TYPE_CHECKING` flagged and deferred. Tests are extended to cover `orjson` testing, including several new tests, an `orjson` dependency group, new frozen requirements, tox config, and CI config. Because responses read the client's decoder, a large number of test tweaks are needed to provide a "better" client mock which satisfies this requirement. In CI and the default tox env list, we only test `orjson` on a small selection of Python versions, to moderate the expansion of our test matrix.
1. Add a contextvar to the transport module, which tracks the "current or active transport" 2. Add a private helper for setting that value 3. Add a *public* method which gets that value; LookupError if none 4. Add a private method which gets the current decoder out of the current transport, with a "safe" failover -- this is purely an internal convenience This replaces GlobusAPIError decoder injection, responses picking the decoder off of their client object, and retry contexts carrying the decoder explicitly.
66a8dde to
9a0c311
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Why?
Provide support for
orjsonas an alternative way of encoding and decoding HTTP body data.This should be opt in because it could be breaking for someone, but in a future major version it should become the default.
It should be possible to set this passively to change "the whole SDK" via an env var, but for application authors, there should be explicit
True/Falseprogrammatic control somewhere.Extensibility (e.g., to other JSON implementations or other formats) is a non-goal, but the design should not be hostile to future needs.
What Changed
Add a new
OrjsonRequestEncoder, which mirrors theJSONRequestEncodertype.Add a new
decodersmodule, containingResponseDecoderandOrjsonResponseDecoder.Both provide a method,
get_body_json(response)for decoding a response body as JSON.The transport can select which encoder and decoder to use based on an init-time flag,
use_orjson=Trueor an env var,GLOBUS_SDK_USE_ORJSON=1.Setting these values when
orjsonis not installed raises aRuntimeErroron transport init.The objects doing decoding are various: responses, errors, and retry hooks.
Each of these must be customized in some way to do decoding with a configured/desired decoder.
To support this, the "currently active" transport is now available through a contextvar and a narrow getter interface,
RequestsTransport.get_current_transport().Tests are extended to cover
orjsontesting, including several new tests, anorjsondependency group, new frozen requirements, tox config, and CI config.Because responses read the client's decoder, a large number of test tweaks are needed to provide a "better" client mock which satisfies this requirement.
In CI and the default tox env list, we only test
orjsonon a small selection of Python versions, to moderate the expansion of our test matrix.Tradeoffs & Open Questions
There aren't perfect solutions in the current state of the SDK when accounting for backwards compatibility.
requests.Responsein public APIsAnywhere that the SDK is passing a
requests.Responsethrough a public interface1, we have lost control of that interface.We would now like to be passing a pair
(response, decoder), but changing these interfaces would be breaking.Making changes in a future major version to always pass around our own response objects is one option to solve this.
All three of the contexts in which the decoder is passed (retry, response, api errors) are tricky.
Only one of those three (retry hooks) operates at the abstraction layer of the transport, and even that has open questions.
In all of these cases, the "get current transport" contextvar is used to sidestep our inability to fully control these APIs as we would like.
(A previous draft updated the
RetryContext, so that one has a better solution than the others.)RequestsTransport.encodersCurrently, encoding behaviors are defined at the class level, via a mapping
RequestsTransport.encoders.Users can access, customize, and replace the JSON encoder today by reading or writing
RequestsTransport.encoders["json"].Unfortunately, a class var acts as both the class-level and instance-level control, since instances all reference that same dict.
Any instance-level customization will be paradigm breaking here, since class-level customization will stop being effective.
In reality, SDK users are probably not touching this at all.
GitHub code searches in-org show no results, and our own usages are some of the most elaborate.
Even my own work where I have enabled
orjsondid not use this path -- not based on foresight, but because it was simpler to encode things in custom client methods.I have chosen a moderate path, by which we
RequestsTransport.encoders(users are now being told, in the docs, not to modify or use it)encoder_map, an instance attributeorjsonuse is enabled,encoder_mapis modified during initjsonencoder (as someone might do today)This is actually mildly breaking because transports will no longer transparently pick up on modifications to the class var, even with
orjsondisabled.I think that tradeoff is worth it, but we could modify this to be even more considerate.
requestsimport deferral and import pathsBecause API errors now need access to
globus_sdk.transport, these import paths are more heavily implicated in other parts of the SDK.This breaks tests which help enforce deferred imports of
requests, due to its very slow import time.To rectify, a number of
requestsimports, specifically, are nowTYPE_CHECKINGflagged and deferred.In general, the whole SDK is trying to lazily load
requests, so this is all directionally aligned.It does, however, highlight the new dependency (
globus_sdk.exc->globus_sdk.transport).Our imports are getting a little bit tangled -- lots of parts of the SDK pull in
excforGlobusSDKUsageError, butexc.apineedstransport.We should think about rearranging to clean this up. Many of the deferred import arrangements could be seen as punting on this issue.
Diff Info
GitHub's diff looks massive because it's counting all those lines of lockfile/frozen-deps content. True diff is much more modest, but still nontrivial:
Changelog
Added
The SDK now supports use of
orjsonas an alternative JSON encoder and decoder.When
GLOBUS_SDK_USE_ORJSON=1is set, request sending and response decoding will useorjson.Use of
orjsonis optional, but if the variable is set andorjsonis not installed, errors will be emitted.The setting can also be configured on transport objects with the init option,
use_orjson=True.In a future major version of the SDK, use of
orjsonwill default to true when it is available.RequestsTransportobjects are now visible viaRequestsTransport.get_current_transport(), a staticmethod, while the transport is sending a request or being used to handle a response.This method raises a
LookupErrorif there is no currently active transport.Deprecated
RequestsTransportclass supports configuration of request encoding via a class-variable mapping,encoders. This limits the ability of the SDK to apply per-object customizations, as in the case oforjsonsupport. The class variableencodersis deprecated, and users should leverage the newencoder_mapinstance variable instead.Footnotes
Some of these interfaces were originally conceived as public and some were not. So that also adds complexity. ↩