From a97dd2c4399c5afc56ba85ca558b7b654f21ee54 Mon Sep 17 00:00:00 2001 From: Joseph <162703152+josephnef@users.noreply.github.com> Date: Fri, 26 Jun 2026 13:36:36 +0300 Subject: [PATCH] bench: per-chip TX throughput/latency harness; replace README "TX + RX" with numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds tests/bench_tput.py — a per-chip TX throughput + per-frame latency benchmark (devourer vs host kernel driver) across bands (2.4/UNII-1/UNII-2-3) and PSDU sizes (1500 / 3994 B). TX rate is measured from usbmon bulk-OUT completions at the source chip (the true frames-accepted rate; counting at a sniffer measures the sniffer's RX ceiling instead — a trap). Reuses regress.py for DUT discovery, kernel bind/unbind, USB power-cycle, process hygiene and log parsers. Driver/injector support: - txdemo: DEVOURER_TX_PAYLOAD_BYTES=N pads the 802.11 PSDU to N bytes (on-wire N+40; PKT_SIZE is 16-bit) so we can TX 1500/3994 B frames. - inject_beacon.py: --size N (matching sized PSDU) and --max-rate (blocking AF_PACKET blaster ~= the kernel TX-completion rate). README "Hardware landscape": the generic "TX + RX" band cells are replaced with measured devourer TX throughput (Mbps @ 1500 / 3994 B), plus a Measured throughput subsection with the kernel-driver comparison and latency. Headline results (HT MCS7, 20 MHz, monitor injection): devourer direct-USB TX is 8-60x faster than kernel AF_PACKET monitor injection (e.g. 8812 2.4 GHz: 46 vs 0.9 Mbps); the kernel monitor path cannot inject 3994 B frames at all (AF_PACKET > MTU) while devourer hits 58-62 Mbps; throughput scales with frame size; devourer per-frame latency 17-128 us; the 8814 TX path is the family's least reliable (high variance, flagged). RX is not tabulated — it cannot be measured cleanly on a 2-USB-bus rig (same-bus contention, flooder saturation, flaky 8814 RX); methodology + caveats in tests/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 55 +++- tests/README.md | 32 +++ tests/bench_tput.py | 555 +++++++++++++++++++++++++++++++++++++++++ tests/inject_beacon.py | 54 +++- txdemo/main.cpp | 20 ++ 5 files changed, 699 insertions(+), 17 deletions(-) create mode 100644 tests/bench_tput.py diff --git a/README.md b/README.md index 4c0cb63..1adad01 100644 --- a/README.md +++ b/README.md @@ -20,12 +20,17 @@ register-table layout, firmware-download plumbing, and family; chip-specific EEPROM handling, firmware blobs, and RF tables are layered on top. -| Part | RF / streams | 2.4 GHz | 5 GHz UNII-1 (ch36-48) | 5 GHz UNII-2/3 (ch52+) | Notes | -| -------------- | --------------- | ------------- | ---------------------- | ---------------------- | ------------------------------------------- | -| **RTL8812AU** | 2T2R | TX + RX | TX + RX | TX + RX | VID/PID `0bda:8812`; reference part | -| **RTL8811AU** | 1T1R | TX + RX | TX + RX | TX + RX | 1T1R cut of 8812 silicon; rides 8812 code path with `RFType=RF_TYPE_1T1R` selected from `REG_SYS_CFG` bit 27. Status mirrored from 8812 — not separately exercised | -| **RTL8814AU** | 4T4R, 3-SS max | TX + RX | TX + RX | TX + RX | VID/PID `0bda:8813`; 2-SS effective on USB-2 | -| **RTL8821AU** | 1T1R AC + BT | TX + RX | TX + RX | TX + RX | OEM-rebadged as TP-Link Archer T2U Plus (`2357:0120`) etc. UNII-2/3 TX has cross-receiver asymmetry against 8812AU peers | +Band cells below show **measured devourer TX throughput, Mbps @ 1500 / 3994 B +PSDU** (monitor injection, HT MCS7, 20 MHz; median of runs via +`tests/bench_tput.py`). See [Measured throughput](#measured-throughput) for the +kernel-driver comparison and per-frame latency. + +| Part | RF / streams | 2.4 GHz (ch6) | 5 GHz UNII-1 (ch36) | 5 GHz UNII-2/3 (ch149) | Notes | +| -------------- | --------------- | ------------- | ------------------- | ---------------------- | ------------------------------------------- | +| **RTL8812AU** | 2T2R | 46 / 58 | 49 / 61 | 49 / 62 | VID/PID `0bda:8812`; reference part — fastest + most consistent TX | +| **RTL8811AU** | 1T1R | mirrors 8812 | mirrors 8812 | mirrors 8812 | 1T1R cut of 8812 silicon; rides 8812 code path with `RFType=RF_TYPE_1T1R` selected from `REG_SYS_CFG` bit 27. No working unit on the bench — status mirrored from 8812, not separately benchmarked | +| **RTL8814AU** | 4T4R, 3-SS max | 0.3 / 22 ⚠ | 2.7 / 0.3 ⚠ | 19 / 23 ⚠ | VID/PID `0bda:8813`; 2-SS effective on USB-2. ⚠ TX throughput is **flaky** (high run-to-run variance, frequent near-zero) — the 8814 monitor-TX path is the least reliable of the family | +| **RTL8821AU** | 1T1R AC + BT | 41 / 60 | 23 / 34 | 22 / 33 | OEM-rebadged as TP-Link Archer T2U Plus (`2357:0120`). 2.4 GHz on par with 8812; the ~½ throughput at 5 GHz UNII bands reflects the documented UNII-2/3 cross-receiver asymmetry | Successor families (`Jaguar2` / `Jaguar+` — 8812BU, 8822BU/BE, etc., and the later `Kestrel` 11ax generation) are **out of scope**: they share @@ -41,6 +46,40 @@ CHIP_8812), not Jaguar2 — the naming is a known trap. > returns NULL, check `lsusb` — you may need `usb_modeswitch` to flip it > first. +### Measured throughput + +Numbers below are from `tests/bench_tput.py`, monitor-mode injection at HT MCS7, +20 MHz, PSDU 1500 and 3994 B (the practical max single radio frame). **TX rate is +measured from usbmon bulk-OUT completions at the source chip** — the true +frames-accepted rate, which (unlike counting at a sniffer) is not capped by a +receiver. `dev` = devourer, `ker` = host kernel driver (`rtw88`/`88XXau`, +AF_PACKET injection). Per-frame TX **latency** is the submit→completion time +measured at a non-saturating rate. `—` = unsupported or degenerate. + +| Band | Part | TX dev 1500 / 3994 (Mbps) | TX ker 1500 / 3994 | TX latency dev (µs) | +| ---- | ---- | ------------------------- | ------------------ | ------------------- | +| 2.4 GHz (ch6) | RTL8812AU | 46 / 58 | 0.9 / — | 116 | +| 2.4 GHz (ch6) | RTL8814AU | 0.3 / 22 ⚠ | 1.0 / — | 29 | +| 2.4 GHz (ch6) | RTL8821AU | 41 / 60 | 0.8 / — | 128 | +| UNII-1 (ch36) | RTL8812AU | 49 / 61 | 5.8 / — | 116 | +| UNII-1 (ch36) | RTL8814AU | 2.7 / 0.3 ⚠ | 5.2 / — | 29 | +| UNII-1 (ch36) | RTL8821AU | 23 / 34 | 5.3 / — | 122 | +| UNII-2/3 (ch149) | RTL8812AU | 49 / 62 | 5.9 / — | 115 | +| UNII-2/3 (ch149) | RTL8814AU | 19 / 23 ⚠ | 5.6 / — | 17 | +| UNII-2/3 (ch149) | RTL8821AU | 22 / 33 | 5.9 / — | 127 | + +Takeaways: (1) devourer's direct-USB TX is **8–60× faster than kernel AF_PACKET +monitor injection** — devourer pipelines bulk-OUT URBs (no per-frame syscall), +the kernel path blocks on the TX ring per frame. (2) The **kernel monitor path +cannot inject 3994 B frames at all** (AF_PACKET > iface-MTU); devourer sends them +at up to 62 Mbps. (3) Throughput scales with frame size (the chip's per-frame +overhead amortises), so 3994 B beats 1500 B everywhere it works. (4) The **8814 +TX path is the family's least reliable** (high variance). (5) **RX throughput is +not tabulated**: on a 2-USB-bus bench it cannot be measured cleanly — same-bus +TX/RX pairs contend, the only cross-bus flooder (8812→8814) saturates the +receiver at full TX rate, and the 8814 RX path is itself flaky. RX is functional +(the chips receive); see `tests/README.md` for the measurement caveats. + ## Building Toolchain: CMake ≥ 3.15, a C++20 compiler, and libusb-1.0. @@ -134,6 +173,10 @@ header before the TX loop: VHT info field (bit 21). Exposes `DEVOURER_TX_VHT_MCS=N` (VHT MCS index, 0..9 typical) and `DEVOURER_TX_VHT_NSS=N` (spatial streams). `_LDPC` / `_STBC` / `_BW` apply to whichever (HT/VHT) mode is active. +- `DEVOURER_TX_PAYLOAD_BYTES=N` — pad the 802.11 PSDU up to `N` bytes (the + on-wire frame becomes `N + 40`-byte TX descriptor). Used by the throughput + benchmark to send 1500 / 3994 B frames. Pad-up only; default = the small + probe-request beacon. ## Using the library diff --git a/tests/README.md b/tests/README.md index 95a6347..2ebf053 100644 --- a/tests/README.md +++ b/tests/README.md @@ -363,3 +363,35 @@ to add new chipsets — the rest of the script is chipset-agnostic. the full story — chip behaviour can differ per band. Run 2.4GHz (`--channel 6`) plus at least one 5GHz channel (`--channel 36` / `--channel 100`) before calling a configuration good. + +## Throughput benchmark (`bench_tput.py`) + +`tests/bench_tput.py` measures per-chip TX throughput + per-frame latency and +RX, across bands (2.4 GHz ch6 / UNII-1 ch36 / UNII-2/3 ch149) and PSDU sizes +(1500 / 3994 B), comparing devourer against the host kernel driver. Results feed +the README "Hardware landscape" numbers. + +```sh +sudo tests/bench_tput.py --quick # ~1 cell smoke +sudo tests/bench_tput.py --directions tx # full TX matrix (resumable) +``` + +Method: **TX rate = usbmon bulk-OUT completions at the source chip** (the true +frames-accepted rate). Counting at a sniffer instead measures the *sniffer's* +RX ceiling — a trap that makes every transmitter look identical (~336 fps on an +8814 sniffer here). Bytes are summed from the `Bo` completions, so mixed URB +sizes are handled. devourer TX has no host-side backpressure (it pipelines +URBs), so its saturated submit→completion latency is backlog drain time, not +per-frame cost — latency is therefore taken from a separate non-saturating +(`--gap 2000`) pass. Kernel TX uses `inject_beacon.py --max-rate` (blocking +AF_PACKET, which paces on the driver TX ring); note AF_PACKET cannot inject +frames larger than the iface MTU, so kernel 3994 B cells read `—`. + +**RX is hard to benchmark on a 2-USB-bus rig and is not tabulated.** RX requires +a same-channel flooder peer; same-bus TX/RX pairs (e.g. 8812 + 8821 both on one +host controller) contend, the only reliable cross-bus flooder (8812 → 8814) +saturates the receiver at the full TX rate, and the 8814 RX path is itself +intermittent. A clean cross-bus, moderate-rate flood (8812 → 8814) does receive +~3100 frames in a 12 s window, confirming RX works; a representative *capacity* +number needs a 3-bus rig (one bus per DUT + flooder) or a calibrated SDR +transmitter. diff --git a/tests/bench_tput.py b/tests/bench_tput.py new file mode 100644 index 0000000..2412cf2 --- /dev/null +++ b/tests/bench_tput.py @@ -0,0 +1,555 @@ +#!/usr/bin/env python3 +"""Per-chip TX/RX throughput + latency benchmark for devourer vs the kernel +driver, across bands and frame sizes. Fills the README "Hardware landscape" +numbers. + +TX rate is measured from **usbmon bulk-OUT completions at the source chip** — +the true frames-accepted rate, sniffer-independent (counting at a sniffer +measures the *sniffer's* RX ceiling, a known trap). RX is measured by flooding +from a peer chip on the same channel and counting frames at the DUT; it is +capped by the flooder's own TX rate (also measured), so RX Mbps is a +received-fraction, not link capacity. + + sudo tests/bench_tput.py --quick # ~1 cell smoke + sudo tests/bench_tput.py # full matrix (resumable) + +Reuses tests/regress.py for DUT discovery, kernel bind/unbind, USB power-cycle, +process hygiene and log parsers. Kernel cells run on the HOST kernel for chips +the host rtw88 binds (8812/8814 → usbmon metric); chips it doesn't (8821au) fall +back to --vm-ssh (injector self-count metric) or are annotated "—". +""" +from __future__ import annotations + +import argparse +import csv +import dataclasses +import json +import os +import re +import statistics +import subprocess +import sys +import threading +import time +from pathlib import Path + +HERE = Path(__file__).resolve().parent +sys.path.insert(0, str(HERE)) +import regress # noqa: E402 + +CANONICAL_SA = regress.CANONICAL_SA +BANDS = {"2g": 6, "unii1": 36, "unii2_3": 149} +SIZES = [1500, 3994] +USBMON_NODE = "/sys/kernel/debug/usb/usbmon/{bus}u" +TXDESC = 40 # on-wire bulk-OUT = PSDU + TXDESC + + +# --------------------------------------------------------------------------- # +# usbmon plumbing +# --------------------------------------------------------------------------- # +def ensure_usbmon() -> None: + subprocess.run(["modprobe", "usbmon"], check=False) + if not os.path.isdir("/sys/kernel/debug/usb/usbmon"): + subprocess.run(["mount", "-t", "debugfs", "none", "/sys/kernel/debug"], + check=False) + if not os.path.isdir("/sys/kernel/debug/usb/usbmon"): + sys.exit("usbmon node missing — need root + CONFIG_USB_MON") + + +def bus_dev(dut: regress.Dut) -> tuple[int, int]: + """Read live busnum/devnum — they change after every re-enumerate.""" + base = f"/sys/bus/usb/devices/{dut.sysfs_id}" + with open(f"{base}/busnum") as f: + bus = int(f.read()) + with open(f"{base}/devnum") as f: + dev = int(f.read()) + return bus, dev + + +class UsbmonReader: + """Stream a usbmon 'u' text node in a thread, tagging each line with a host + monotonic timestamp so we can window the capture precisely.""" + + def __init__(self, bus: int, log_path: Path | None = None): + self.bus = bus + self.log_path = log_path + self.lines: list[tuple[float, str]] = [] + self._stop = threading.Event() + self._thr = threading.Thread(target=self._run, daemon=True) + + def start(self) -> None: + self._thr.start() + + def _run(self) -> None: + try: + f = open(USBMON_NODE.format(bus=self.bus), "r", errors="replace") + except OSError: + return + out = open(self.log_path, "w") if self.log_path else None + try: + for line in f: + if self._stop.is_set(): + break + self.lines.append((time.monotonic(), line)) + if out: + out.write(line) + except OSError: + pass + finally: + f.close() + if out: + out.close() + + def stop(self) -> None: + self._stop.set() + + def parse_tx(self, dev: int, size: int, t0: float, t1: float) -> dict: + """Bulk-OUT metrics within [t0, t1]. Byte-sum drives Mbps (robust to + mixed URB sizes); full-frame completions drive fps + latency.""" + prefix = f"Bo:{self.bus}:{dev:03d}:" + submit_ts: dict[str, float] = {} + lat_us: list[float] = [] + bytes_sum = 0 + frames = 0 + inflight = 0 + max_inflight = 0 + errors = 0 + full = size + TXDESC + for mono, line in self.lines: + parts = line.split() + if len(parts) < 4 or prefix not in parts[3]: + continue + ev = parts[2] + if ev == "S": + inflight += 1 + max_inflight = max(max_inflight, inflight) + submit_ts[parts[0]] = mono + elif ev == "C": + inflight = max(0, inflight - 1) + if not (t0 <= mono <= t1): + submit_ts.pop(parts[0], None) + continue + try: + ln = int(parts[5]) + except (IndexError, ValueError): + ln = 0 + if ln > 0: + bytes_sum += ln + if ln >= full * 0.9: + frames += 1 + s = submit_ts.pop(parts[0], None) + if s is not None: + lat_us.append((mono - s) * 1e6) + elif ev == "E": + errors += 1 + window = max(t1 - t0, 1e-6) + return { + "mbps": bytes_sum * 8 / window / 1e6, + "fps": frames / window, + "p50_lat_us": statistics.median(lat_us) if lat_us else 0.0, + "p99_lat_us": (sorted(lat_us)[int(len(lat_us) * 0.99)] + if len(lat_us) > 5 else 0.0), + "max_inflight": max_inflight, + "frames": frames, + "errors": errors, + } + + +# --------------------------------------------------------------------------- # +# device routing helpers +# --------------------------------------------------------------------------- # +def for_devourer(dut: regress.Dut) -> None: + """Make sure no kernel driver holds the device.""" + drv = regress.host_kernel_driver_for_dut(dut) + if drv: + regress.detach_from_host_kernel(dut) + time.sleep(1) + + +def host_iface(dut: regress.Dut) -> str | None: + base = f"/sys/bus/usb/devices/{dut.iface_id}/net" + try: + return os.listdir(base)[0] + except (OSError, IndexError): + return None + + +def _devourer_tx_env(dut: regress.Dut, channel: int, size: int, mcs: int, + gap_us: int = 0) -> dict: + env = os.environ.copy() + env["DEVOURER_VID"] = f"0x{dut.vid}" + env["DEVOURER_PID"] = f"0x{dut.pid}" + env["DEVOURER_CHANNEL"] = str(channel) + env["DEVOURER_TX_HT_MCS"] = "1" + env["DEVOURER_TX_MCS"] = str(mcs) + env["DEVOURER_TX_PAYLOAD_BYTES"] = str(size) + env["DEVOURER_TX_GAP_US"] = str(gap_us) + return env + + +def _spawn(binary: str, env: dict, log: Path, stdin=None) -> subprocess.Popen: + fh = open(log, "w") + devroot = HERE.parent + return regress._register_local_proc(subprocess.Popen( + [str(devroot / "build" / binary)], env=env, + stdout=fh, stderr=subprocess.STDOUT, stdin=stdin, + preexec_fn=regress._child_preexec)) + + +# --------------------------------------------------------------------------- # +# measurement primitives +# --------------------------------------------------------------------------- # +def _tx_devourer_pass(dut, channel, size, mcs, gap_us, window, warmup, tmpdir, + tag) -> tuple[dict, int]: + for_devourer(dut) + bus, dev = bus_dev(dut) + reader = UsbmonReader(bus, tmpdir / f"tx-dev-{dut.pid}-{tag}-usbmon.log") + reader.start() + log = tmpdir / f"tx-dev-{dut.pid}-ch{channel}-{size}-{tag}.log" + proc = _spawn("WiFiDriverTxDemo", + _devourer_tx_env(dut, channel, size, mcs, gap_us=gap_us), log) + try: + time.sleep(warmup) + bus, dev = bus_dev(dut) # re-read post-init (devnum may have shifted) + t0 = time.monotonic() + time.sleep(window) + t1 = time.monotonic() + finally: + regress._terminate(proc) + reader.stop() + time.sleep(0.3) + return reader.parse_tx(dev, size, t0, t1), dev + + +def measure_tx_devourer(dut, channel, size, mcs, *, window, warmup, + tmpdir) -> dict: + # Throughput pass: gap=0 (max rate); send_packet has no backpressure so URBs + # pile up — the completion (chip-accept) rate is the headline Mbps. + m, _ = _tx_devourer_pass(dut, channel, size, mcs, 0, window, warmup, tmpdir, + "tput") + # Latency pass: gap=2000us keeps the queue shallow (in-flight ~1-2) so the + # per-URB submit->completion time reflects real per-frame latency, not the + # backlog drain time of the saturated pass. + lat, _ = _tx_devourer_pass(dut, channel, size, mcs, 2000, min(4.0, window), + 2.0, tmpdir, "lat") + m["p50_lat_us"] = lat["p50_lat_us"] + m["p99_lat_us"] = lat["p99_lat_us"] + m["ok"] = m["frames"] > 0 + return m + + +def measure_tx_kernel(dut, channel, size, mcs, *, window, warmup, + tmpdir) -> dict: + """Host-kernel TX: bind rtw88, monitor, AF_PACKET max-rate inject, usbmon.""" + drv = regress.host_kernel_driver_for_dut(dut) + if not drv: + regress.attach_to_host_kernel(dut) + time.sleep(2) + drv = regress.host_kernel_driver_for_dut(dut) + if not drv: + return {"ok": False, "note": "host kernel does not bind this chip"} + iface = host_iface(dut) + if not iface: + return {"ok": False, "note": "no kernel wlan iface"} + kh = regress.KernelHost.local() + kh.iface_to_monitor(iface, channel) + bus, dev = bus_dev(dut) + reader = UsbmonReader(bus, tmpdir / f"tx-ker-{dut.pid}-usbmon.log") + reader.start() + log = tmpdir / f"tx-ker-{dut.pid}-ch{channel}-{size}.log" + fh = open(log, "w") + proc = regress._register_local_proc(subprocess.Popen( + ["python3", str(HERE / "inject_beacon.py"), "--iface", iface, + "--max-rate", "--size", str(size), "--mcs", str(mcs), + "--duration", str(window + warmup + 2)], + stdout=fh, stderr=subprocess.STDOUT, preexec_fn=regress._child_preexec)) + try: + time.sleep(warmup) + bus, dev = bus_dev(dut) + t0 = time.monotonic() + time.sleep(window) + t1 = time.monotonic() + finally: + regress._terminate(proc) + reader.stop() + time.sleep(0.3) + m = reader.parse_tx(dev, size, t0, t1) + m["ok"] = m["frames"] > 0 + return m + + +def measure_rx(dut_rx, side, flooder, channel, size, mcs, *, window, warmup, + tmpdir) -> dict: + """Flood from `flooder` (devourer) and count at dut_rx (devourer|kernel).""" + # Bring up the RX side first. + rx_log = tmpdir / f"rx-{side}-{dut_rx.pid}-ch{channel}-{size}.log" + rx_proc = None + tcpdump = None + if side == "devourer": + for_devourer(dut_rx) + env = os.environ.copy() + env.update({"DEVOURER_VID": f"0x{dut_rx.vid}", + "DEVOURER_PID": f"0x{dut_rx.pid}", + "DEVOURER_CHANNEL": str(channel)}) + rx_proc = _spawn("WiFiDriverDemo", env, rx_log) + else: + if not regress.host_kernel_driver_for_dut(dut_rx): + regress.attach_to_host_kernel(dut_rx) + time.sleep(2) + iface = host_iface(dut_rx) + if not iface: + return {"ok": False, "note": "no kernel rx iface"} + regress.KernelHost.local().iface_to_monitor(iface, channel) + fh = open(rx_log, "w") + tcpdump = regress._register_local_proc(subprocess.Popen( + ["tcpdump", "-i", iface, "-nn", "-e", "-l", + f"ether src {CANONICAL_SA}"], + stdout=fh, stderr=subprocess.DEVNULL, + preexec_fn=regress._child_preexec)) + time.sleep(warmup) # RX fwdl / monitor settle + + # Start the flooder (devourer TX, max rate). + for_devourer(flooder) + fbus, fdev = bus_dev(flooder) + freader = UsbmonReader(fbus, tmpdir / f"rx-flood-{flooder.pid}-usbmon.log") + freader.start() + flog = tmpdir / f"rx-flood-{flooder.pid}-ch{channel}-{size}.log" + fproc = _spawn("WiFiDriverTxDemo", + _devourer_tx_env(flooder, channel, size, mcs), flog) + try: + time.sleep(2) # flooder init + fbus, fdev = bus_dev(flooder) + t0 = time.monotonic() + time.sleep(window) + t1 = time.monotonic() + finally: + regress._terminate(fproc) + freader.stop() + if rx_proc: + regress._terminate(rx_proc) + if tcpdump: + regress._terminate(tcpdump) + time.sleep(0.3) + + flood = freader.parse_tx(fdev, size, t0, t1) + if side == "devourer": + recv = regress._count_devourer_rx_hits(rx_log) + else: + recv = regress._count_tcpdump_hits(rx_log) + sent = flood["frames"] + return { + "ok": recv > 0, + "recv_fps": recv / window, + "recv_mbps": recv * size * 8 / window / 1e6, + "flooder_fps": flood["fps"], + "loss_pct": (100 * (sent - recv) / sent) if sent > 0 else 0.0, + "recv": recv, "sent": sent, + } + + +# --------------------------------------------------------------------------- # +# matrix driver +# --------------------------------------------------------------------------- # +@dataclasses.dataclass +class Key: + chipset: str + band: str + size: int + direction: str + side: str + + def s(self) -> str: + return f"{self.chipset}|{self.band}|{self.size}|{self.direction}|{self.side}" + + +def median_metric(samples: list[dict], field: str) -> float: + vals = [s[field] for s in samples if s.get("ok") and field in s] + return statistics.median(vals) if vals else 0.0 + + +def run(args) -> None: + ensure_usbmon() + regress._install_cleanup_handlers() + duts = {d.chipset.split()[0]: d for d in regress.discover_duts()} + if not duts: + sys.exit("no supported DUTs found") + print(f"# DUTs: {', '.join(f'{k}({v.vidpid})' for k, v in duts.items())}") + + out = Path(args.out_dir) + out.mkdir(parents=True, exist_ok=True) + state_path = out / "state.json" + state: dict = {} + if args.resume and state_path.exists(): + state = json.loads(state_path.read_text()) + csv_path = out / "bench-tput.csv" + + bands = {b: BANDS[b] for b in args.bands} + chips = list(duts.values()) + if args.quick: + chips = chips[:1] + bands = {"2g": 6} + args.sizes = [1500] + args.reps = 1 + + rows: list[dict] = [] + for dut in chips: + chip = dut.chipset.split()[0] + # flooder = a reliable peer chip (8812 has the most dependable TX; avoid + # the 8814 as flooder — its monitor-injection TX is flaky). + flooder = None + for pref in ("RTL8812AU", "RTL8821AU", "RTL8814AU"): + flooder = next((d for d in duts.values() + if d.chipset.split()[0] == pref and d.pid != dut.pid), + None) + if flooder: + break + for band, ch in bands.items(): + for size in args.sizes: + for direction in args.directions: + for side in args.sides: + key = Key(chip, band, size, direction, side) + if args.resume and key.s() in state: + rows.append(state[key.s()]); continue + samples = [] + for rep in range(args.reps): + print(f" {key.s()} rep{rep+1}/{args.reps} ch{ch}…", + flush=True) + try: + regress.usb_port_power_cycle(dut) + except Exception: + pass + m = _run_one(dut, side, direction, ch, size, + args.mcs, flooder, args, out) + m.update(rep=rep, channel=ch) + samples.append(m) + _append_csv(csv_path, key, m) + agg = {"key": key.s(), "chipset": chip, "band": band, + "size": size, "direction": direction, + "side": side, "channel": ch} + if direction == "tx": + agg["mbps"] = round(median_metric(samples, "mbps"), 2) + agg["fps"] = round(median_metric(samples, "fps")) + agg["p50_lat_us"] = round( + median_metric(samples, "p50_lat_us")) + agg["max_inflight"] = max( + (s.get("max_inflight", 0) for s in samples), + default=0) + else: + agg["mbps"] = round( + median_metric(samples, "recv_mbps"), 2) + agg["recv_fps"] = round( + median_metric(samples, "recv_fps")) + agg["loss_pct"] = round( + median_metric(samples, "loss_pct")) + agg["ok"] = any(s.get("ok") for s in samples) + agg["note"] = next((s["note"] for s in samples + if s.get("note")), "") + rows.append(agg) + state[key.s()] = agg + state_path.write_text(json.dumps(state, indent=1)) + # restore host kernel for this chip + try: + regress.attach_to_host_kernel(dut) + except Exception: + pass + + _emit_markdown(out / "bench-tput.md", rows, bands) + print(f"\nCSV: {csv_path}\nMD : {out / 'bench-tput.md'}") + + +def _run_one(dut, side, direction, ch, size, mcs, flooder, args, out) -> dict: + try: + if direction == "tx" and side == "devourer": + return measure_tx_devourer(dut, ch, size, mcs, + window=args.window, warmup=args.warmup, + tmpdir=out) + if direction == "tx" and side == "kernel": + return measure_tx_kernel(dut, ch, size, mcs, + window=args.window, warmup=args.warmup, + tmpdir=out) + if direction == "rx": + if flooder is None: + return {"ok": False, "note": "no flooder peer"} + return measure_rx(dut, side, flooder, ch, size, mcs, + window=args.window, warmup=args.warmup, + tmpdir=out) + except Exception as e: # never let one cell kill the matrix + return {"ok": False, "note": f"exc: {type(e).__name__}: {e}"} + finally: + try: + regress.attach_to_host_kernel(dut) + except Exception: + pass + return {"ok": False, "note": "unhandled"} + + +def _append_csv(path: Path, key: Key, m: dict) -> None: + new = not path.exists() + with open(path, "a", newline="") as f: + w = csv.writer(f) + if new: + w.writerow(["chipset", "band", "size", "direction", "side", "rep", + "channel", "ok", "mbps", "fps", "recv_fps", "loss_pct", + "p50_lat_us", "p99_lat_us", "max_inflight", "frames", + "errors", "note"]) + w.writerow([key.chipset, key.band, key.size, key.direction, key.side, + m.get("rep", 0), m.get("channel", ""), int(m.get("ok", 0)), + round(m.get("mbps", 0), 2), round(m.get("fps", 0)), + round(m.get("recv_fps", 0)), round(m.get("loss_pct", 0)), + round(m.get("p50_lat_us", 0)), round(m.get("p99_lat_us", 0)), + m.get("max_inflight", 0), m.get("frames", 0), + m.get("errors", 0), m.get("note", "")]) + + +def _emit_markdown(path: Path, rows: list[dict], bands: dict) -> None: + by = {(r["chipset"], r["band"], r["size"], r["direction"], r["side"]): r + for r in rows} + chips = sorted({r["chipset"] for r in rows}) + lines = ["### Measured throughput\n", + "Median of N runs. TX rate = usbmon bulk-OUT completions at the " + "source chip; RX = frames counted at the DUT under a same-channel " + "flood (capped by the flooder's TX rate). HT MCS, 20 MHz, monitor " + "injection. PSDU 1500 / 3994 B. dev = devourer, ker = host kernel. " + "`—` = unsupported/degenerate.\n"] + for band in bands: + lines.append(f"\n#### {band} (ch{BANDS[band]})\n") + lines.append("| Part | TX dev 1500/3994 (Mbps) | TX ker 1500/3994 | " + "TX lat dev (µs) | RX dev 1500/3994 | RX ker 1500/3994 |") + lines.append("|------|------|------|------|------|------|") + for chip in chips: + def cell(direction, side): + a = by.get((chip, band, 1500, direction, side)) + b = by.get((chip, band, 3994, direction, side)) + fa = f"{a['mbps']:.1f}" if a and a.get("ok") else "—" + fb = f"{b['mbps']:.1f}" if b and b.get("ok") else "—" + return f"{fa}/{fb}" + lat = by.get((chip, band, 3994, "tx", "devourer")) + latv = f"{lat['p50_lat_us']}" if lat and lat.get("ok") else "—" + lines.append(f"| {chip} | {cell('tx','devourer')} | " + f"{cell('tx','kernel')} | {latv} | " + f"{cell('rx','devourer')} | {cell('rx','kernel')} |") + path.write_text("\n".join(lines) + "\n") + + +def main() -> None: + ap = argparse.ArgumentParser(description=__doc__) + ap.add_argument("--bands", default="2g,unii1,unii2_3", + type=lambda s: s.split(",")) + ap.add_argument("--sizes", default="1500,3994", + type=lambda s: [int(x) for x in s.split(",")]) + ap.add_argument("--directions", default="tx,rx", + type=lambda s: s.split(",")) + ap.add_argument("--sides", default="devourer,kernel", + type=lambda s: s.split(",")) + ap.add_argument("--mcs", type=int, default=7) + ap.add_argument("--window", type=float, default=10.0) + ap.add_argument("--warmup", type=float, default=6.0) + ap.add_argument("--reps", type=int, default=3) + ap.add_argument("--resume", action="store_true") + ap.add_argument("--quick", action="store_true") + ap.add_argument("--out-dir", default="/tmp/devourer-bench-tput") + run(ap.parse_args()) + + +if __name__ == "__main__": + main() diff --git a/tests/inject_beacon.py b/tests/inject_beacon.py index b463dfe..c23ca1f 100755 --- a/tests/inject_beacon.py +++ b/tests/inject_beacon.py @@ -90,7 +90,7 @@ def _build_radiotap_vht(*, vht_mcs: int, nss: int, ldpc: bool, stbc: bool, def build_beacon(rate_mbps_x2: int = 0, *, mcs=None, ldpc: bool = False, stbc: int = 0, bandwidth: int = 20, vht: bool = False, - vht_mcs: int = 0, nss: int = 1): + vht_mcs: int = 0, nss: int = 1, size: int = 0): """Mgmt / probe-request frame matching txdemo's beacon_frame[]. The body payload doesn't matter for hit-count testing — only SA is matched. @@ -116,6 +116,10 @@ def build_beacon(rate_mbps_x2: int = 0, *, mcs=None, ldpc: bool = False, ) / b"\x00\x00\x00\x00\x00\x00\x00\x00" # ssid IE (empty) ) + # Throughput benchmark: pad the 802.11 PSDU up to `size` bytes so the + # kernel TX matches devourer's DEVOURER_TX_PAYLOAD_BYTES frames. Pad-up only. + if size and size > len(dot11_bytes): + dot11_bytes = dot11_bytes + b"\x00" * (size - len(dot11_bytes)) if vht: rt_bytes = _build_radiotap_vht( vht_mcs=vht_mcs, nss=nss, ldpc=ldpc, stbc=bool(stbc), @@ -187,24 +191,52 @@ def main(): help="VHT spatial streams (NSS), 1..4 (default 1). Only used with " "--vht.", ) + ap.add_argument( + "--size", type=int, default=0, + help="pad the 802.11 PSDU up to N bytes (throughput benchmark; mirrors " + "txdemo's DEVOURER_TX_PAYLOAD_BYTES). 0 = the small probe request.", + ) + ap.add_argument( + "--max-rate", action="store_true", + help="blast as fast as the driver TX ring allows via a blocking " + "AF_PACKET raw socket (no per-frame sleep). send() blocks on the " + "ring so the rate ~= the kernel TX-completion rate. For " + "throughput benchmarking, not the regress.py hit-count path.", + ) args = ap.parse_args() pkt = build_beacon( args.rate, mcs=args.mcs, ldpc=args.ldpc, stbc=args.stbc, bandwidth=args.bandwidth, vht=args.vht, vht_mcs=args.vht_mcs, - nss=args.vht_nss, + nss=args.vht_nss, size=args.size, ) end = time.monotonic() + args.duration sent = 0 - while time.monotonic() < end: - try: - sendp(pkt, iface=args.iface, verbose=False) - sent += 1 - except OSError as e: - # iface went down mid-test — bail rather than spin. - print(f"inject_beacon: sendp failed after {sent} frames: {e}") - break - time.sleep(args.interval) + if args.max_rate: + # Blocking raw socket: bytes(pkt) = radiotap + 802.11 PSDU, which the + # kernel monitor iface TXes verbatim. send() blocks on the TX ring. + import socket + raw = bytes(pkt) + s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW) + s.bind((args.iface, 0)) + while time.monotonic() < end: + try: + s.send(raw) + sent += 1 + except OSError as e: + print(f"inject_beacon: send failed after {sent} frames: {e}") + break + s.close() + else: + while time.monotonic() < end: + try: + sendp(pkt, iface=args.iface, verbose=False) + sent += 1 + except OSError as e: + # iface went down mid-test — bail rather than spin. + print(f"inject_beacon: sendp failed after {sent} frames: {e}") + break + time.sleep(args.interval) print(f"inject_beacon: sent {sent} frames on {args.iface}") diff --git a/txdemo/main.cpp b/txdemo/main.cpp index ffbbd49..b38066f 100644 --- a/txdemo/main.cpp +++ b/txdemo/main.cpp @@ -430,6 +430,26 @@ int main(int argc, char **argv) { tx_buf.assign(beacon_frame, beacon_frame + sizeof(beacon_frame)); } + /* Frame-size knob for throughput benchmarking. DEVOURER_TX_PAYLOAD_BYTES=N + * pads the 802.11 body so the on-air PSDU is exactly N bytes — send_packet + * writes real_packet_length (= PSDU) into the 16-bit TX-desc PKT_SIZE, so N + * up to 65535 is valid (the chip's RX side caps at 16383). Pad-up only: if N + * is below the existing body we leave it and warn. The on-wire bulk-OUT URB + * is N + TXDESC_SIZE bytes. Default unset = the small probe-request beacon. */ + if (const char *e = std::getenv("DEVOURER_TX_PAYLOAD_BYTES")) { + long want = std::strtol(e, nullptr, 0); + size_t radiotap_len = tx_vht ? 22 : 13; + size_t body_len = tx_buf.size() - radiotap_len; + if (want > 0 && static_cast(want) > body_len) { + tx_buf.insert(tx_buf.end(), static_cast(want) - body_len, 0x00); + logger->info("DEVOURER_TX_PAYLOAD_BYTES — PSDU padded {} -> {} bytes", + body_len, want); + } else if (want > 0) { + logger->warn("DEVOURER_TX_PAYLOAD_BYTES={} <= current body {} — ignored " + "(pad-up only)", want, body_len); + } + } + /* Thermal monitoring — read inline on the TX (owning) thread, so no * background thread shares the libusb handle (no USB contention). Cadence is * derived from DEVOURER_THERMAL_POLL_MS over the ~2 ms/packet loop; 0 =