Skip to content

pl: replace LLVM JIT with a tree-walking interpreter#883

Open
cao1629 wants to merge 2 commits into
oceanbase:masterfrom
cao1629:pr-06-11-interpreter
Open

pl: replace LLVM JIT with a tree-walking interpreter#883
cao1629 wants to merge 2 commits into
oceanbase:masterfrom
cao1629:pr-06-11-interpreter

Conversation

@cao1629

@cao1629 cao1629 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Performance comparison of the PL: tree-walking interpreter vs LLVM JIT.

Part 1 — realistic PL (executes SQL): interpreter ≈ JIT

Fixture: a 1000-row account table (+ a log table for the write paths).

CREATE TABLE pbench_acct (id INT PRIMARY KEY, balance BIGINT, status INT, region INT);  -- seeded 1000 rows
CREATE TABLE pbench_log  (id INT PRIMARY KEY AUTO_INCREMENT, acct_id INT, delta INT);

Ten stored procedures representative of real production workloads.

1. p_get_balance

CREATE PROCEDURE p_get_balance(IN acct INT, OUT bal BIGINT)
BEGIN
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET bal = -1;
  SET bal = -1;
  SELECT balance INTO bal FROM pbench_acct WHERE id = acct;
END;

interp 349 µs · JIT 332 µs → JIT 5% faster.

2. p_sum_region

CREATE PROCEDURE p_sum_region(IN reg INT, OUT total BIGINT)
BEGIN
  SELECT COALESCE(SUM(balance),0) INTO total FROM pbench_acct WHERE region = reg;
END;

interp 983 µs · JIT 977 µs → ≈ parity.

3. p_count_active

CREATE PROCEDURE p_count_active(OUT cnt INT)
BEGIN
  SELECT COUNT(*) INTO cnt FROM pbench_acct WHERE status = 1;
END;

interp 951 µs · JIT 954 µs → ≈ parity.

4. p_cursor_sum

CREATE PROCEDURE p_cursor_sum(OUT total BIGINT)
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE b BIGINT;
  DECLARE cur CURSOR FOR SELECT balance FROM pbench_acct;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  SET total = 0;
  OPEN cur;
  read_loop: LOOP
    FETCH cur INTO b;
    IF done = 1 THEN LEAVE read_loop; END IF;
    SET total = total + b;
  END LOOP;
  CLOSE cur;
END;

interp 3.74 ms · JIT 3.96 ms → interp 6% faster.

5. p_apply_interest

CREATE PROCEDURE p_apply_interest(IN bps INT)
BEGIN
  UPDATE pbench_acct SET balance = balance + balance * bps DIV 10000;
END;

interp 9.85 ms · JIT 9.50 ms → JIT 4% faster.

6. p_transfer

CREATE PROCEDURE p_transfer(IN src INT, IN dst INT, IN amt INT, OUT ok INT)
BEGIN
  DECLARE sb BIGINT DEFAULT 0;
  DECLARE EXIT HANDLER FOR SQLEXCEPTION SET ok = 0;
  SET ok = 1;
  SELECT balance INTO sb FROM pbench_acct WHERE id = src;
  IF sb < amt THEN
    SET ok = 0;
  ELSE
    UPDATE pbench_acct SET balance = balance - amt WHERE id = src;
    UPDATE pbench_acct SET balance = balance + amt WHERE id = dst;
  END IF;
END;

interp 1.25 ms · JIT 1.30 ms → interp 3% faster.

7. p_insert_log

CREATE PROCEDURE p_insert_log(IN acct INT, IN d INT)
BEGIN
  INSERT INTO pbench_log(acct_id, delta) VALUES (acct, d);
END;

interp 650 µs · JIT 634 µs → JIT 2% faster.

8. p_batch_log

CREATE PROCEDURE p_batch_log(IN n INT)
BEGIN
  DECLARE i INT DEFAULT 0;
  WHILE i < n DO
    INSERT INTO pbench_log(acct_id, delta) VALUES (i MOD 1000, i);
    SET i = i + 1;
  END WHILE;
END;

interp 263 ms · JIT 265 ms → ≈ parity.

9. p_upsert

CREATE PROCEDURE p_upsert(IN acct INT, IN delta INT)
BEGIN
  INSERT INTO pbench_acct(id, balance, status, region) VALUES (acct, delta, 1, acct MOD 10)
    ON DUPLICATE KEY UPDATE balance = balance + delta;
END;

interp 999 µs · JIT 914 µs → JIT 9% faster.

10. p_classify

CREATE PROCEDURE p_classify()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE aid INT;
  DECLARE bal BIGINT;
  DECLARE cur CURSOR FOR SELECT id, balance FROM pbench_acct;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  OPEN cur;
  cl: LOOP
    FETCH cur INTO aid, bal;
    IF done = 1 THEN LEAVE cl; END IF;
    UPDATE pbench_acct SET status = CASE WHEN bal >= 50000 THEN 2 WHEN bal >= 1000 THEN 1 ELSE 0 END WHERE id = aid;
  END LOOP;
  CLOSE cur;
END;

interp 331 ms · JIT 330 ms → ≈ parity.

Why interpreter ≈ JIT here: every one of these routines spends almost all of its running time in SQL execution.

Part 2 — three extreme pure-PL cases: JIT wins only here

No SQL here — the per-iteration AST walk is the whole runtime: the interpreter dispatches every node, while the JIT runs precompiled machine code. The more control flow each packs in, the wider the gap.

chain_if100, n = 45000

CREATE PROCEDURE chain_if100(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  WHILE i < n DO
    IF FALSE THEN SET s = s + 1;
    ELSEIF FALSE THEN SET s = s + 1;        -- × 99 arms, all false, no ELSE
    END IF;
    SET i = i + 1;
  END WHILE;
END;

interp 25.72 µs/iter · JIT 17.50 µs/iter → JIT ~1.47×. The 100 IF/ELSEIF arms form a deeply nested AST that the interpreter walks down arm by arm; the JIT compiled the chain to a straight compare-and-jump run.

iter_block50, n = 370000

CREATE PROCEDURE iter_block50(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  lp: WHILE i < n DO
    BEGIN
     BEGIN                                  ┐
      ...                                   │ 50 literal nested BEGIN…END levels
       SET i = i + 1; ITERATE lp;           ┘ innermost: jump back out through all 50
      ...
     END;
    END;
  END WHILE;
END;

interp 3.28 µs/iter · JIT 1.32 µs/iter → JIT ~2.48×. Each pass descends 50 nested blocks and ITERATE unwinds back out through all of them; the JIT flattened the nest to direct jumps.

flat_block100, n = 540000

CREATE PROCEDURE flat_block100(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  WHILE i < n DO
    BEGIN END;                              -- × 100, flat empty blocks
    SET i = i + 1;
  END WHILE;
END;

interp 2.75 µs/iter · JIT 1.33 µs/iter → JIT ~2.07×. The interpreter enters and exits all 100 empty blocks every iteration; the JIT compiled them away entirely.

Conclusion

In realistic production use, JIT-compiled and interpreted PL perform about the same. The JIT pulls ahead only in extreme cases — no SQL execution and deeply nested control flow — which rarely occur in practice.

Execute PL by walking the resolved ObPLStmt tree (ObPLInterpreter)
instead of JIT-compiling routines with LLVM, and remove the ORC-JIT
code generator and the objit module.

The interpreter dispatches blocks, DECLARE ... DEFAULT, assignment
(including SET @user_var / @@sys_var and obj-access targets such as
a trigger's NEW.col), IF/ELSEIF, CASE, WHILE, LOOP, REPEAT, LEAVE,
ITERATE, DO, embedded SQL, RETURN (with deep-copied results), cursors
(DECLARE/OPEN/FETCH/CLOSE), exception handling (DECLARE HANDLER,
SIGNAL, completion conditions), PRAGMA INTERFACE routines, and nested
CALL with OUT/INOUT copy-back. Loops poll for KILL and query/transaction
timeout at the same cadence the JIT used. It passes the full PL
mysqltest suite.
@cao1629 cao1629 force-pushed the pr-06-11-interpreter branch from d81791f to 037f733 Compare June 12, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant