Implement VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET & starting with one virtio-net queue // main#2
Draft
mikroskeem wants to merge 10 commits into
Draft
Implement VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET & starting with one virtio-net queue // main#2mikroskeem wants to merge 10 commits into
mikroskeem wants to merge 10 commits into
Conversation
Wraps the IFF_ATTACH_QUEUE / IFF_DETACH_QUEUE TUNSETQUEUE ioctl. Detaching a tap fd from a multi-queue tun removes it from tun_select_queue()'s steering set, so inbound packets stop being delivered to that fd. Needed to honour VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET and to disable queue pairs the guest never activated; both are no-ops on the kernel side until this ioctl is invoked. Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Previously this command was acknowledged with VIRTIO_NET_OK but had no effect: the kernel's multi-queue tun kept steering RX across every bound tap fd, leaving packets queued where no userspace worker was reading. Per virtio 1.0+ §5.1.6.5.5 the device must act on the requested pair count, not just acknowledge it. Wire the handler through Tap::set_queue() so the kernel attachment matches the requested pair count. State lives on Net as a shared Arc<AtomicU16> so the tracker survives reset/re-activate cycles and never re-issues TUNSETQUEUE on a queue already in the target state (which the kernel would reject with EINVAL). The tracker is updated incrementally inside the apply loop so a partial failure leaves it in sync with actual kernel state. The handler also requires VIRTIO_NET_F_MQ to have been negotiated and rejects requests above the device's tap count. vhost-user owns no local taps; the backend handles queue selection via its own protocol, so we pass no tracker and the command is acknowledged without local effect. Forwarding to the vhost-user backend is left as a follow-up. Fixes: cloud-hypervisor#8306 Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Per virtio 1.0+ §5.1.6.5.5 ("Device operation in multiqueue mode"),
multiqueue is disabled by default: after reset/initialization the
device must only queue packets on receiveq1, and the driver enables
additional pairs by sending VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET. We were
instead activating every configured queue pair at device activation.
Drive the kernel-side multi-queue attachment down to one pair on each
device activation; the guest grows it back via the control queue
after negotiating VIRTIO_NET_F_MQ.
Without this, the kernel's tun_select_queue() steers RX across every
bound tap fd at all times, even when the guest never activated those
pairs -- so the share hashed to queues with no userspace worker is
silently dropped at the tap. The same trap affects single-queue
drivers that never negotiate F_MQ, plus the post-reset path where a
previous driver may have grown the count above what the new driver
will use.
Reuses align_kernel_queue_pairs() so activate and the control queue
share the delta-and-tracker logic; this also keeps the in-memory
tracker in sync with kernel state across activate/reset/re-activate
cycles in a single VM lifetime.
Fixes: cloud-hypervisor#8307
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Abstract "apply VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to the underlying transport" behind an MqBackend trait so the same CtrlQueue handler covers both kernel taps (TUNSETQUEUE) and vhost-user backends (VHOST_USER_SET_VRING_ENABLE). Without this, vhost-user devices acknowledged the command with no local effect -- a spec violation per virtio 1.0+ §5.1.6.5.5 -- and the backend never saw the guest's requested pair count. net_util grows TapMqBackend wrapping the existing tap path, plus a small MqBackendError enum so backends outside this crate (vhost-user) can plug in without dragging dependencies into net_util. virtio-devices/vhost_user adds VhostUserMqBackend, which loops over the device's data vrings and toggles each via the now-public VhostUserHandle::set_vring_enable. The control vring (when present) sits past the data range and is not touched. A CtrlQueue constructed with no backend now rejects MQ_VQ_PAIRS_SET with VIRTIO_NET_ERR instead of silently acking; both real call sites provide a backend, so this only affects degenerate/test configs. Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Per virtio 1.0+ §5.1.6.5.5 ("Device operation in multiqueue mode"),
multiqueue is disabled by default: after reset/initialization the
device must only queue packets on receiveq1/transmitq1, and the driver
enables additional pairs by sending VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.
vu_common.activate() enables every configured data vring on the
backend unconditionally, leaving the backend in a multiqueue-active
state the spec forbids. Mirror the local-tap behavior added in the
preceding net.rs change: right after vu_common.activate(), drive
the backend down to one active pair via VhostUserMqBackend, which
issues VHOST_USER_SET_VRING_ENABLE per data vring. The control vring
(when present) sits past the data range and is left enabled. The
guest grows the active count back through the now-wired ctrl-queue
handler after negotiating VIRTIO_NET_F_MQ.
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The vhost-user net control-queue thread now forwards VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to the backend by issuing VHOST_USER_SET_VRING_ENABLE on the existing vhost-user UNIX socket (and reading the ack back when REPLY_ACK is negotiated). The thread's seccomp filter previously had no per-thread rules, so the first such forward would trip the filter and tear the device down. Allow sendmsg/recvmsg on this thread, matching the rules already granted to the analogous vhost-user data thread and to the Vmm thread (which handles activate-time vring toggling). Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Cover the three behaviors that the four-rev stack relies on but had
no unit coverage:
- CtrlQueue.apply_active_queue_pairs forwards the requested count
to the configured MqBackend
- backend failures surface as MqBackendError back to the caller
(which the MQ handler translates to VIRTIO_NET_ERR)
- constructing a CtrlQueue with no backend rejects the command
rather than silently acking, so misconfigured callers fail loudly
A tiny in-module MockBackend records the last requested pair count
and optionally returns an error.
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The activate-time alignment landed earlier always drove the active pair count to one, but that's only spec-correct after a virtio reset. Snapshot restore is not a reset: the guest resumes with the same driver state it had pre-snapshot and expects its previously negotiated pair count to still apply. Forcing the count back to one would silently break multi-queue guests after migration/restore. Save the live count in NetState (gated behind serde(default) so old snapshots still deserialize) and stash it as restore_active_pairs on the Net struct when reconstructing from state. The first activate after restore drives the kernel to that count instead of the cold- boot default of one; .take() ensures subsequent activate cycles in the restored VM (driven by guest resets) fall back to the default, since by then the guest goes through the normal F_MQ negotiation again. Vhost-user snapshot/restore would need a similar fix at the backend boundary, but that requires backend-side cooperation that is out of scope here. Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The preceding snapshot/restore change defaulted unwrap_or(1) when NetState::active_pairs was absent, which conflates two distinct states: "cold boot" (use spec default of 1) and "restoring from a snapshot that pre-dates the field" (the previous CH version left every tap attached, so falling back to 1 silently drops multi-queue pairs across the upgrade). Replace the field with an ActivationTarget enum so the three cases are explicit: - ColdBoot -> 1 active pair (spec) - LegacyRestore -> taps.len() (mimic pre-Rev-G CH behavior) - RestoreWith(n) -> n (new snapshots with recorded count) The target is consumed on the first activate and reset to ColdBoot, so subsequent reset/re-activate cycles inside the restored VM go through the spec-correct single-pair path regardless of where the restore started. Signed-off-by: Mark Vainomaa <mark@zentria.ee>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.