Skip to content

feat: vendor russh-sftp with serde_bytes perf fix#188

Merged
inureyes merged 2 commits intolablup:mainfrom
Yaminyam:feat/vendor-russh-sftp
Apr 29, 2026
Merged

feat: vendor russh-sftp with serde_bytes perf fix#188
inureyes merged 2 commits intolablup:mainfrom
Yaminyam:feat/vendor-russh-sftp

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented Apr 21, 2026

Summary

  • Vendor russh-sftp into crates/bssh-russh-sftp following the same pattern as the existing crates/bssh-russh (sync-upstream.sh, create-patch.sh, and patches/).
  • Apply serde_bytes to protocol::Write::data and protocol::Data::data so SFTP binary payloads are serialized/deserialized through the bulk byte-buffer path instead of serde's byte-by-byte VecVisitor path.
  • Switch the top-level russh-sftp dependency to the vendored package. Existing use russh_sftp::... imports continue to work through Cargo's package = "bssh-russh-sftp" alias.

Motivation

perf record on the server side of a 1 GiB SFTP upload against unmodified bssh-server shows:

42.11%  serde_core::de::impls::<impl Deserialize for Vec<T>>::deserialize::VecVisitor::visit_seq
 3.59%  aws_lc_0_39_1_CRYPTO_poly1305_update
 3.14%  aws_lc_0_39_1_ChaCha20_ctr32_avx2

Crypto is not the bottleneck. The Write and Data SFTP packets carry data: Vec<u8>, and the default derived serde path reads those payloads one byte at a time. For large uploads that means the server spends substantial CPU in generic serde dispatch rather than packet I/O.

The vendored crate keeps the SFTP wire format unchanged: u32 length + bytes. The change only routes byte vectors through serde's bytes API so the existing packet framing can be handled in bulk.

Review Follow-up

After reviewing this PR for correctness, security, and performance, the follow-up commit adds these hardening changes:

  • Implement deserialize_byte_buf directly with visit_byte_buf(self.input.try_get_bytes()?), avoiding an extra visit_bytes copy on the optimized path.
  • Reject oversized byte buffers and packet lengths with checked u32 conversions instead of truncating usize with as u32.
  • Add wire-format round-trip tests for SSH_FXP_WRITE and SSH_FXP_DATA to verify the serde_bytes path still emits u32 length + bytes payloads.
  • Keep the vendored patch file in sync with the actual local modifications and add a README for the new crate.
  • Narrow the lockfile delta to the intended dependency replacement: russh-sftp -> bssh-russh-sftp plus serde_bytes.
  • Fix an unrelated clippy failure in channel_manager.rs by replacing redundant match guards with field patterns.
  • Make macOS Keychain-backed tests skip cleanly when the local Keychain is locked or user authorization is unavailable, while still validating Keychain behavior when access is available.

Measured Impact

1 GiB SFTP upload, OpenSSH client -> bssh-server on a CPU-bound host (Xeon Silver 4214, 5 runs each):

Binary Avg throughput
upstream russh-sftp 2.1.1 74.8 MiB/s
bssh-russh-sftp (this PR) 96.4 MiB/s

OpenSSH sftp-server on the same host measures about 101 MiB/s, so the gap shrinks from roughly 26% to roughly 5%.

On a host that is already near network-bound (AMD EPYC 7742, 1 Gbps internal), both unpatched and patched runs sit around 95 MiB/s. The fix is expectedly invisible when CPU is not the limiter.

Why Vendor, Not a Git Dependency

Upstream russh-sftp has had no commits since the v2.1.1 bump on 2025-04-18. Two related performance issues are unresolved or unresolved in root cause: #55, #70. Mirroring the bssh-russh vendoring pattern keeps the forks consistent and lets bssh ship the fix independently.

If upstream merges an equivalent fix, this crate can be deleted in one follow-up change.

Test Plan

  • cargo test -p bssh-russh-sftp --locked
  • cargo test --lib --locked
  • cargo test --tests --locked -- --skip integration_test
  • cargo fmt --check
  • cargo clippy --workspace --locked -- -D warnings
  • End-to-end: sftp put 1 GiB file against patched bssh-server on CPU-bound and network-bound hosts; wire format compatible with stock OpenSSH client

Add `crates/bssh-russh-sftp`, a temporary fork of upstream `russh-sftp`
following the same pattern as the existing `crates/bssh-russh`.

The only functional change versus upstream v2.1.1 is a
`#[serde(with = "serde_bytes")]` annotation on `protocol::Write::data`
and `protocol::Data::data`, plus a wire-compatible `serialize_bytes`
implementation in `ser.rs`. Without it, `#[derive(Deserialize)]` for
`Vec<u8>` dispatches to `deserialize_seq` and parses the SFTP payload
one byte at a time — `perf` shows ~42% of server CPU in
`VecVisitor::visit_seq` during 1 GiB uploads. The annotation routes
through the existing bulk `try_get_bytes` path in the crate's own
Deserializer, which is already implemented.

Measured impact on a CPU-bound host (Xeon Silver 4214) with an OpenSSH
client performing a 1 GiB SFTP upload:

  - upstream russh-sftp 2.1.1: 74.8 MiB/s
  - this fork:                 96.4 MiB/s  (+29%)

OpenSSH `sftp-server` on the same host measures ~101 MiB/s, so the gap
narrows from ~26% to ~5%.

Upstream russh-sftp has had no commits since its v2.1.1 bump
(2025-04-18) and two download-perf issues (lablup#55 closed without root
cause, lablup#70 open) sit unanswered, so the fix lives here until upstream
activity resumes. `sync-upstream.sh` and `create-patch.sh` mirror the
`bssh-russh` tooling so future syncs are mechanical.

The top-level dependency is switched from
`russh-sftp = "2.1.1"` to
`russh-sftp = { package = "bssh-russh-sftp", version = "2.1.1", path = "crates/bssh-russh-sftp" }`
so every `use russh_sftp::...` import in bssh continues to work
unchanged.
@Yaminyam Yaminyam force-pushed the feat/vendor-russh-sftp branch from f054871 to 7d7dcbf Compare April 29, 2026 07:24
@inureyes inureyes self-requested a review April 29, 2026 10:07
@inureyes inureyes merged commit dd6e7e3 into lablup:main Apr 29, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants