Skip to content

Implement VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET & starting with one virtio-net queue // main#2

Draft
mikroskeem wants to merge 10 commits into
mainfrom
push-puvspzxnyloy
Draft

Implement VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET & starting with one virtio-net queue // main#2
mikroskeem wants to merge 10 commits into
mainfrom
push-puvspzxnyloy

Conversation

@mikroskeem
Copy link
Copy Markdown
Member

No description provided.

mikroskeem added 10 commits May 31, 2026 21:33
Wraps the IFF_ATTACH_QUEUE / IFF_DETACH_QUEUE TUNSETQUEUE ioctl. Detaching a
tap fd from a multi-queue tun removes it from tun_select_queue()'s steering
set, so inbound packets stop being delivered to that fd. Needed to honour
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET and to disable queue pairs the guest never
activated; both are no-ops on the kernel side until this ioctl is invoked.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Previously this command was acknowledged with VIRTIO_NET_OK but had
no effect: the kernel's multi-queue tun kept steering RX across every
bound tap fd, leaving packets queued where no userspace worker was
reading. Per virtio 1.0+ §5.1.6.5.5 the device must act on the
requested pair count, not just acknowledge it.

Wire the handler through Tap::set_queue() so the kernel attachment
matches the requested pair count. State lives on Net as a shared
Arc<AtomicU16> so the tracker survives reset/re-activate cycles and
never re-issues TUNSETQUEUE on a queue already in the target state
(which the kernel would reject with EINVAL). The tracker is updated
incrementally inside the apply loop so a partial failure leaves it
in sync with actual kernel state.

The handler also requires VIRTIO_NET_F_MQ to have been negotiated
and rejects requests above the device's tap count.

vhost-user owns no local taps; the backend handles queue selection
via its own protocol, so we pass no tracker and the command is
acknowledged without local effect. Forwarding to the vhost-user
backend is left as a follow-up.

Fixes: cloud-hypervisor#8306
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Per virtio 1.0+ §5.1.6.5.5 ("Device operation in multiqueue mode"),
multiqueue is disabled by default: after reset/initialization the
device must only queue packets on receiveq1, and the driver enables
additional pairs by sending VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET. We were
instead activating every configured queue pair at device activation.

Drive the kernel-side multi-queue attachment down to one pair on each
device activation; the guest grows it back via the control queue
after negotiating VIRTIO_NET_F_MQ.

Without this, the kernel's tun_select_queue() steers RX across every
bound tap fd at all times, even when the guest never activated those
pairs -- so the share hashed to queues with no userspace worker is
silently dropped at the tap. The same trap affects single-queue
drivers that never negotiate F_MQ, plus the post-reset path where a
previous driver may have grown the count above what the new driver
will use.

Reuses align_kernel_queue_pairs() so activate and the control queue
share the delta-and-tracker logic; this also keeps the in-memory
tracker in sync with kernel state across activate/reset/re-activate
cycles in a single VM lifetime.

Fixes: cloud-hypervisor#8307
Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Abstract "apply VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to the underlying
transport" behind an MqBackend trait so the same CtrlQueue handler
covers both kernel taps (TUNSETQUEUE) and vhost-user backends
(VHOST_USER_SET_VRING_ENABLE). Without this, vhost-user devices
acknowledged the command with no local effect -- a spec violation
per virtio 1.0+ §5.1.6.5.5 -- and the backend never saw the
guest's requested pair count.

net_util grows TapMqBackend wrapping the existing tap path, plus a
small MqBackendError enum so backends outside this crate (vhost-user)
can plug in without dragging dependencies into net_util.

virtio-devices/vhost_user adds VhostUserMqBackend, which loops over
the device's data vrings and toggles each via the now-public
VhostUserHandle::set_vring_enable. The control vring (when present)
sits past the data range and is not touched.

A CtrlQueue constructed with no backend now rejects MQ_VQ_PAIRS_SET
with VIRTIO_NET_ERR instead of silently acking; both real call sites
provide a backend, so this only affects degenerate/test configs.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Per virtio 1.0+ §5.1.6.5.5 ("Device operation in multiqueue mode"),
multiqueue is disabled by default: after reset/initialization the
device must only queue packets on receiveq1/transmitq1, and the driver
enables additional pairs by sending VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.

vu_common.activate() enables every configured data vring on the
backend unconditionally, leaving the backend in a multiqueue-active
state the spec forbids. Mirror the local-tap behavior added in the
preceding net.rs change: right after vu_common.activate(), drive
the backend down to one active pair via VhostUserMqBackend, which
issues VHOST_USER_SET_VRING_ENABLE per data vring. The control vring
(when present) sits past the data range and is left enabled. The
guest grows the active count back through the now-wired ctrl-queue
handler after negotiating VIRTIO_NET_F_MQ.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The vhost-user net control-queue thread now forwards
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET to the backend by issuing
VHOST_USER_SET_VRING_ENABLE on the existing vhost-user UNIX socket
(and reading the ack back when REPLY_ACK is negotiated). The thread's
seccomp filter previously had no per-thread rules, so the first such
forward would trip the filter and tear the device down.

Allow sendmsg/recvmsg on this thread, matching the rules already
granted to the analogous vhost-user data thread and to the Vmm
thread (which handles activate-time vring toggling).

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Cover the three behaviors that the four-rev stack relies on but had
no unit coverage:

  - CtrlQueue.apply_active_queue_pairs forwards the requested count
    to the configured MqBackend
  - backend failures surface as MqBackendError back to the caller
    (which the MQ handler translates to VIRTIO_NET_ERR)
  - constructing a CtrlQueue with no backend rejects the command
    rather than silently acking, so misconfigured callers fail loudly

A tiny in-module MockBackend records the last requested pair count
and optionally returns an error.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The activate-time alignment landed earlier always drove the active
pair count to one, but that's only spec-correct after a virtio reset.
Snapshot restore is not a reset: the guest resumes with the same
driver state it had pre-snapshot and expects its previously
negotiated pair count to still apply. Forcing the count back to one
would silently break multi-queue guests after migration/restore.

Save the live count in NetState (gated behind serde(default) so old
snapshots still deserialize) and stash it as restore_active_pairs on
the Net struct when reconstructing from state. The first activate
after restore drives the kernel to that count instead of the cold-
boot default of one; .take() ensures subsequent activate cycles in
the restored VM (driven by guest resets) fall back to the default,
since by then the guest goes through the normal F_MQ negotiation
again.

Vhost-user snapshot/restore would need a similar fix at the backend
boundary, but that requires backend-side cooperation that is out of
scope here.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
The preceding snapshot/restore change defaulted unwrap_or(1) when
NetState::active_pairs was absent, which conflates two distinct
states: "cold boot" (use spec default of 1) and "restoring from a
snapshot that pre-dates the field" (the previous CH version left
every tap attached, so falling back to 1 silently drops multi-queue
pairs across the upgrade).

Replace the field with an ActivationTarget enum so the three cases
are explicit:

  - ColdBoot       -> 1 active pair (spec)
  - LegacyRestore  -> taps.len() (mimic pre-Rev-G CH behavior)
  - RestoreWith(n) -> n (new snapshots with recorded count)

The target is consumed on the first activate and reset to ColdBoot,
so subsequent reset/re-activate cycles inside the restored VM go
through the spec-correct single-pair path regardless of where the
restore started.

Signed-off-by: Mark Vainomaa <mark@zentria.ee>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant