503: concurrent consumer connections by PaulMartinsen · Pull Request #518 · IHE/DEV.SDPi

PaulMartinsen · 2026-03-21T23:22:59Z

📑 Description

Not all providers can support many concurrent consumer connections while some consumers and/or providers may benefit from multiple concurrent connections to mitigate risks. This PR outlines an approach, leveraging the HTTP 429 status code, to support interoperability for both scenarios.

☑ Mandatory Tasks

The following aspects have been respected by the pull request assignee and at least one reviewer:

Changelog update (necessity checked and entry added or not added respectively)
Pull Request Assignee
Reviewer

…'t add any value. Tweaked wording of introduction to improve clarity and avoid "ensure". Removed repetition of motivation in R0201; its already in the intro and makes more sense there. Clarified operation => SDC service operation (language from annex B of 20701). Added change log entry.

…umer-connections

alex-pe1 · 2026-05-26T12:40:01Z

+** <<term_transport_address>> and/or,
+** `[source endpoint]` message property (see <<ref_oasis_ws_addressing_2006>>) and/or,
+** client credentials used for TLS authentication.
+* Securing the connection may be the most expensive part of an SDC service operation, so a <<vol1_spec_sdpi_p_actor_somds_provider>> may 


It might be unreliable to identify single consumers using this methods.

Yes. And annoying too since we run multiple instances of a test consumer on one computer for load testing.

Is there a better way?

alex-pe1 · 2026-05-26T12:44:33Z

+** client credentials used for TLS authentication.
+* Securing the connection may be the most expensive part of an SDC service operation, so a <<vol1_spec_sdpi_p_actor_somds_provider>> may 
+drop concurrent connections from a <<vol1_spec_sdpi_p_actor_somds_consumer>> that exceed its limits, or respond with 
+HTTP status code `429 (Too Many Requests)` (after securing the connection) to signal the <<vol1_spec_sdpi_p_actor_somds_consumer>> 


Returning a HTTP status code requires a successful TLS handshake, if this itself is costly for a certain provider this might not be a suiting way then.

Also this gives 429 a special semantics (so a provider can not use it 'normally' in it's HTTP server component). So it should be considered using a custom code or a soap fault.

Yes, returning the status code may be too costly but at least it gives the consumer something it can work with. I don't think we can specify a blanket approach here. A busy provider might drop connects without any response (this happens anyway on the accept socket when the backlog is full), a less busy one might complete the handshake and provide an http response. I think a robust consumer would handle these faults in a way that might be application dependent (e.g., retry, user intervention, etc).

Hmm. The semantics here seem consistent with the too many requests status code to me. This chapter is really just describing some scenarios where the consumer might see a too many request response and suggestions on how to deal with it.

There seems advantages using standard codes if possible: the normal response to a "too many requests" seems to fit what's outlined.

Do you have some scenarios where the consumer would respond differently to a fault that generates too many requests?

alex-pe1 · 2026-05-26T13:13:32Z

@@ -0,0 +1,137 @@
+[#vol2_ch_a_mdpws_concurrent_connections]


General questions:
Should a provider actively distribute connections between consumers? This at least would require some justification why this is preferable over just 'first come first serve'.

Should a provider signal limitations beforehand? So a consumer could use a adapted behaviour from the beginning. The approach here is that the consumer gets an error and only then can retry in another way. Also this requires a rejection that can return a value, just no accepting a connection does not fit into this pattern,
Limits could be communicated with e.g. Discovery scopes.

Limits can change over time (e.g. a provider uses a limited resource temporarily for non-SDC purposes), has this to be considered?

It might be good to distribute connections. We discussed a couple of options for this during PAT-22:

QoS packets : individual packets in the http request can be tagged with a priority to facilitate routing. It doesn't seem to be a good option to me, but it is the one mentioned in §10.3.2 of 20701 so I suppose it has that going for it.

HTTP Priority headers from RFC9218. This is a request based approach that makes a lot of sense to me. A consumer could put a higher priority on remote operations requests than renewing subscriptions, for example, which could allow the provider to make meaningful choices.

Providers could also make choices based on the SOAP operation. For example, a connection that was just renewing subscriptions could be closed aggressively because the TLS handshake time (probably) isn't that critical for a renewal. Or we know the typical start-up behaviour is WS-transfer/get, Subscribe, GetMdib so a simple provider might detect that sequence and close when its done. These all leverage behaviours well defined in the HTTP standards though.

We discussed signalling limitations, during PAT-22. In particular putting something in the ws-transfer/get response that could help the consumer plan its approach. E.g.:

I think the conclusion was that all consumers have to handle HTTP faults, like 429 anyway. So maybe http status codes are sufficient. Or at least a place to start with some suggested approaches as in this PR.

I think you have a good use case in splitting connections for remote operations; it seems doable with just status codes but easier with a priori info. Perhaps that should be a separate issue?

Limits can certainly be dynamic and depend on the providers load. I think this is part of the reason to try relying on HTTP status codes anyway. That is, even if a provider declares it can accept 3 concurrent connections from a single consumer it might not be able to do that with 5 consumers all at the same time. So we ultimately always have to handle the http status codes at some point.

Thought a bit more about all of this... 😀

Maybe the topic should be addressed first more fundamentally (before addressing capability and priority management, and mitigations). Participants have technical limitations themselves and may require certain capabilities from other participants. To have a connected system of medical devices the capabilities have to match.
This is probably most interesting for the customers that are building/integrating SDC systems.

From this perspective it would be interesting to:

classify/describe capability limits

How to document them in IfU? (Participant should state their own overall limits and e.g. consumers should state what share of capabilities they need from each provider)

Make limits/capabilities public in network (Metadata, new discovery scopes, ...)?

Should devices with certain MDS type have minimal limits? Which would allow connection strategies for different device types, so that e.g. it's from consumers point of view safe to use one subscription for everything.

Should consumers of devices with certain MDS type not suppose capabilities above certain limits?

I would guess that consumers generally will be hesitant to change there connection behaviour based on errors/status got from some functionality invocation (because they do what they do because they deem this necessary e.g. for risk reason). On the other hand I think this could be possible based on device type, remote controlling a SpO2 sensor has other risks and technical requirements than a ventilator. The device type is also already known from Discovery.

Thanks @alex-pe1; it's great to have another perspective on this!

I guess the first point is these things would be out of scope for this PR which probably needs to focus on what to do about concurrent connections that the provider can't satisfy. Which might still occur if the consumer ignores any additional information provided.

But the second point is I agree. Two capability use cases come to mind:

A consumer needs to know if it should send all requests over a single TCP connection or may open 2 or many connections to address risks from sending all communications over a single channel. The exact risks are a little vague to me. The only one I can think of is completing remote operations over a second channel, but I'm sure there are more.

How will a provider ensure that delivering history reports doesn't delay other reports such as operation invoked responses.

@d-gregorczyk will probably argue this needs to be use case driven though & I don't have a lot of experience on the consumer side.

Do you have other use cases for your consumer when you are relying on features that aren't promised by the SDC standards? I guess its hard to know until we hit issues like this?

How would you characterize the requirements around concurrent connections, what information do you need to select a strategy, and what are the consequences of different strategies? For example, would you still:

accept metrics from a device that can only do one concurrent connection,

set patient/ location contexts,

invoke a remote operation to set a metric and/ or operating mode,

invoke remote operations that require watchdog messages (e.g., keep the drill going).

participate in a distributed alarm system,

The last one is interesting because a provider might inspect the consumers client certificate and allow multiple concurrent connections for DAS participants/ operations. Which might influence the information you need to select a connection strategy.

This might be something that could go into the external control pkp if the risks are mainly around remote operations.

Finally, I think connectivity information probably belongs in the ws-transfer/get response because it would more easily support extensions and nuance than including in discovery. Or is the ws-transfer/get too late?

Some other "limitations" that consumers might need to know about:

maximum number of pm:Validator, pm:Identification instances a provider will store (typically somewhat less than ∞ allowed by 11073).

maximum length of various pm:LocationDetail and pm:PatientDemographicsCoreData fields; last discussion left this to the provider's discretion, which might be fine.

maximum number of ProposedContextStates that can be included in a SetContextState operation.

maximum number of context states that a provider will store.

There are potentially unlimited containers everywhere in the MDIB. But at least the whole MDIB should fit into 4MB.

True. But providers have control over the size of most of them. It is the user supplied data that I worry about. Some enterprising consumer that wants to put a photo of the patient in a patient context extension.

The current thinking seems to be that providers can choose what they want to store, so that's okay for providers. But I was wondering more whether consumers need to know this?

alex-pe1 · 2026-05-26T13:20:16Z

+the risks of opening multiple, concurrent connections to a <<vol1_spec_sdpi_p_actor_somds_provider>>'s 
+hosted services, including the potential impact on both the <<vol1_spec_sdpi_p_actor_somds_provider>>'s 
+and the <<vol1_spec_sdpi_p_actor_somds_consumer>>'s ability to deliver their respective 
+<<acronym_sfc,system function contribution>>.


Generally the behavior of the consumer should not have impact on the providers functionality, only the consumer may experience that it cannot do what it want to do on the provider.
I think this is too general.

Any suggestions how we could fix this? I see it

partly addressing TR1134 from 10700: "CLINICAL FUNCTIONs of an SDC BASE PARTICIPANT except for network-related functionality SHALL be designed in a way that they are not impaired by the IT NETWORK communication, including but not limited to..."

to draw attention to the issue from the consumer's side. The note offers one solution.

alex-pe1 · 2026-05-26T13:21:50Z

+====
+A consumer may choose to limit the number and/or frequency of concurrent connections it 
+opens to a <<vol1_spec_sdpi_p_actor_somds_provider>>'s hosted services when it receives a
+response with HTTP status code `429` (Too Many Requests). 


again, that looks like '429' gets very special semantic here.

I'm unclear on that. It seems the normal response to a 429 code would be to make fewer requests or wait a bit and try again (particularly when the retry-after header is present).

The http standard does have a little bit to say on concurrency: https://www.rfc-editor.org/rfc/rfc9112.html#name-concurrency. Its pretty vague though.

And seems to me to fit the official description for the 429 status code:
https://datatracker.ietf.org/doc/html/rfc6585#autoid-4

I agree that it fits into the 429 description, but how to come from the general description to this recommendation here. I think the normal thing a client in the real world would do is wait some time and then trying the same thing again. (and this measure is not recommended here, so is this unrecommended? 🙂).

Well, this is really intended as recommendations and guidance, not requirements. Consumers are free to keep trying over and over again but that might not get them anywhere if they didn't follow the suggestions.

Not sure that's helped with a custom status code either. Presumably still a 4xx grouping and a consumer that hadn't read the recommendations can't do much more with an unknown client response than try again achieving the same outcome as the 429 code with more work and less interoperability. :(

alex-pe1 · 2026-05-26T13:28:40Z

+* implement backoff and retry logic.
+
+A <<vol1_spec_sdpi_p_actor_somds_consumer>> receiving a `429` response should not interpret this as a operation failure, but rather as an indication that the provider is under load and needs the consumer to adjust its behaviour. 
+The consumer may use information in the `Retry-After` header, if present, to determine when to retry the request, but could also consider implementing an exponential backoff strategy to avoid overwhelming the provider with retries. 


This measures may not resolve the situation.... So how helpful is this approach within a system of connected medical devices? Maybe this requires Hospital IT to reconfigure something.

Yep. Failure is always an option.

I think the main utility is to bring to light that providers might not support concurrent connections from a single consumer and offer some ideas on how consumers could deal with this. The alternative could be mandating at least X, or no more than 1, connections from any consumer. But it seems there would be more challenges with that approach (even ignoring resource constraints). And I'm not sure it makes sense to pull all participants down the lowest common denominator since there are good reasons for consumers to use multiple connections in many cases.

Co-authored-by: Alexander Pentzel <89145096+alex-pe1@users.noreply.github.com>

PaulMartinsen · 2026-05-27T04:23:40Z

Thanks for all your comments @alex-pe1 . I think I responded to them all. I'd be happy to do a teams video discussion if that would be helpful. 7pm here is 9pm in Germany right now so any time before your lunchtime could work for me.

I see this PR as fleshing out expectations for a fault condition, which might be a starting point for something more advanced. But even with more complex solutions we can still have faults.

First go at addressing concurrent connection issue.

dec2f6e

PaulMartinsen requested a review from d-gregorczyk March 21, 2026 23:22

PaulMartinsen self-assigned this Mar 21, 2026

PaulMartinsen linked an issue Mar 21, 2026 that may be closed by this pull request

Concurrent consumer connections #503

Open

1 task

github-project-automation Bot added this to Gemini SDPi Releases Mar 21, 2026

d-gregorczyk reviewed Mar 25, 2026

View reviewed changes

Comment thread asciidoc/volume2/concurrent-connections/tf2-ch-a-mdpws-concurrent-connections.adoc

PaulMartinsen mentioned this pull request Apr 2, 2026

Add details for identifying consumers by their certificates. #525

Open

PaulMartinsen added 2 commits April 2, 2026 20:58

Merge remote-tracking branch 'origin/master' into 503-concurrent-cons…

9e562e8

…umer-connections

PaulMartinsen requested a review from d-gregorczyk April 10, 2026 07:57

alex-pe1 reviewed May 22, 2026

View reviewed changes

Comment thread asciidoc/volume2/concurrent-connections/tf2-ch-a-mdpws-concurrent-connections.adoc Outdated

alex-pe1 reviewed May 26, 2026

View reviewed changes

PaulMartinsen and others added 2 commits May 27, 2026 14:42

fix participant typo

f2c2d48

Co-authored-by: Alexander Pentzel <89145096+alex-pe1@users.noreply.github.com>

Improved clarity of heading and added example to intro.

a6b5c15

Conversation

PaulMartinsen commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📑 Description

☑ Mandatory Tasks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-pe1 May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-pe1 May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulMartinsen May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulMartinsen May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-pe1 May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulMartinsen commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PaulMartinsen commented Mar 21, 2026 •

edited

Loading

alex-pe1 May 26, 2026 •

edited

Loading

alex-pe1 May 28, 2026 •

edited

Loading

PaulMartinsen May 29, 2026 •

edited

Loading

PaulMartinsen May 27, 2026 •

edited

Loading

alex-pe1 May 28, 2026 •

edited

Loading