What is the problem this feature will solve?
TLSWrap in src/crypto/crypto_tls.cc deliberately defers write completion to a later event-loop turn in cases where the underlying stream finishes synchronously. This adds systematic latency and extra queue depth on high-throughput, full-duplex TLS connections.
Concrete cases in the current implementation:
- EncOut() after a synchronous underlying write — when underlying_stream()->Write() returns non-async, completion is simulated via env()->SetImmediate(...) before OnStreamAfterWrite() runs (~L725–733).
- Empty DoWrite() driving the stream — zero-length writes used to tick the stream machinery also defer via SetImmediate (~L1054–1058).
- in_dowrite_ + pending cleartext — when EncOut() runs inside DoWrite() and encrypted output is not yet flushed, InvokeQueued(0) is deferred to the next tick (~L689–701). The code comments note uncertainty about correctness vs. blocking data flow.
For tunnel/VPN/proxy workloads (and any latency-sensitive full-duplex TLS), each extra tick compounds:
- Delayed 'drain' / write() callbacks → send-side stalls
- Head-of-line blocking between read and write directions
- Higher event-loop utilization under sustained MTU-sized traffic
This is separate from OpenSSL crypto cost — it is Node-imposed async shaping on paths that already completed synchronously at the TCP layer.
What is the feature you are proposing to solve the problem?
Improve TLSWrap so write completion and queued callbacks run as soon as it is safe, without defaulting to SetImmediate on every sync underlying write.
Proposed changes:
- Audit and narrow SetImmediate usage in EncOut(), empty DoWrite(), and the in_dowrite_ branch — identify cases where OnStreamAfterWrite() / InvokeQueued() can run inline without violating existing TLS/stream invariants.
- Add an opt-in TLSSocket option, e.g. lowLatency: true (name bikesheddable), that enables synchronous completion where the test suite permits. Default behavior stays unchanged for backward compatibility until confidence is high.
- Strengthen TLSWrap::Cycle() — when both directions have work pending (ClearIn → ClearOut → EncOut), drain in one pass where cycle_depth_ allows, reducing reliance on a subsequent tick to flush state left in pending_cleartext_input_ or enc_out_.
- Add regression tests — bidirectional pummel/benchmark comparing event-loop turns per megabyte transferred with and without the fast path; ensure existing test-tls-* write/drain/destroy cases still pass.
Success criteria:
- Measurably fewer SetImmediate / next-tick deferrals per SSL_write → TCP flush cycle under full-duplex load.
- No regressions in write ordering, destroy() during write, or double-callback cases covered by existing parallel TLS tests.
What alternatives have you considered?
- Do nothing; require native TLS bridges for tunnel workloads — Valid for extreme cases (see separate tls.createBridge proposal), but this latency tax affects all TLSSocket users doing full-duplex I/O, not only TUN/VPN apps.
- Userland workarounds (setImmediate batching, socket.cork) — Cannot bypass TLSWrap internals; users cannot force synchronous WriteWrap::Done() from JavaScript.
- Disable Nagle / tune TCP only — Reduces kernel buffering but does not address Node deferring completion after sync writes.
- Always run WriteWrap::Done() synchronously — Likely breaks edge cases the SetImmediate workaround was added for; needs careful gating behind an option or proven safe paths only.
- Move TLS to worker threads — Larger architectural change; orthogonal to fixing unnecessary main-thread deferral in the existing TLSWrap state machine.
What is the problem this feature will solve?
TLSWrap in src/crypto/crypto_tls.cc deliberately defers write completion to a later event-loop turn in cases where the underlying stream finishes synchronously. This adds systematic latency and extra queue depth on high-throughput, full-duplex TLS connections.
Concrete cases in the current implementation:
For tunnel/VPN/proxy workloads (and any latency-sensitive full-duplex TLS), each extra tick compounds:
This is separate from OpenSSL crypto cost — it is Node-imposed async shaping on paths that already completed synchronously at the TCP layer.
What is the feature you are proposing to solve the problem?
Improve TLSWrap so write completion and queued callbacks run as soon as it is safe, without defaulting to SetImmediate on every sync underlying write.
Proposed changes:
Success criteria:
What alternatives have you considered?