pl: replace LLVM JIT with a tree-walking interpreter by cao1629 · Pull Request #883 · oceanbase/seekdb

cao1629 · 2026-06-11T11:37:59Z

Performance comparison of the PL: tree-walking interpreter vs LLVM JIT.

Part 1 — realistic PL (executes SQL): interpreter ≈ JIT

Fixture: a 1000-row account table (+ a log table for the write paths).

CREATE TABLE pbench_acct (id INT PRIMARY KEY, balance BIGINT, status INT, region INT);  -- seeded 1000 rows
CREATE TABLE pbench_log  (id INT PRIMARY KEY AUTO_INCREMENT, acct_id INT, delta INT);

Ten stored procedures representative of real production workloads.

1. p_get_balance

CREATE PROCEDURE p_get_balance(IN acct INT, OUT bal BIGINT)
BEGIN
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET bal = -1;
  SET bal = -1;
  SELECT balance INTO bal FROM pbench_acct WHERE id = acct;
END;

interp 349 µs · JIT 332 µs → JIT 5% faster.

2. p_sum_region

CREATE PROCEDURE p_sum_region(IN reg INT, OUT total BIGINT)
BEGIN
  SELECT COALESCE(SUM(balance),0) INTO total FROM pbench_acct WHERE region = reg;
END;

interp 983 µs · JIT 977 µs → ≈ parity.

3. p_count_active

CREATE PROCEDURE p_count_active(OUT cnt INT)
BEGIN
  SELECT COUNT(*) INTO cnt FROM pbench_acct WHERE status = 1;
END;

interp 951 µs · JIT 954 µs → ≈ parity.

4. p_cursor_sum

CREATE PROCEDURE p_cursor_sum(OUT total BIGINT)
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE b BIGINT;
  DECLARE cur CURSOR FOR SELECT balance FROM pbench_acct;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  SET total = 0;
  OPEN cur;
  read_loop: LOOP
    FETCH cur INTO b;
    IF done = 1 THEN LEAVE read_loop; END IF;
    SET total = total + b;
  END LOOP;
  CLOSE cur;
END;

interp 3.74 ms · JIT 3.96 ms → interp 6% faster.

5. p_apply_interest

CREATE PROCEDURE p_apply_interest(IN bps INT)
BEGIN
  UPDATE pbench_acct SET balance = balance + balance * bps DIV 10000;
END;

interp 9.85 ms · JIT 9.50 ms → JIT 4% faster.

6. p_transfer

CREATE PROCEDURE p_transfer(IN src INT, IN dst INT, IN amt INT, OUT ok INT)
BEGIN
  DECLARE sb BIGINT DEFAULT 0;
  DECLARE EXIT HANDLER FOR SQLEXCEPTION SET ok = 0;
  SET ok = 1;
  SELECT balance INTO sb FROM pbench_acct WHERE id = src;
  IF sb < amt THEN
    SET ok = 0;
  ELSE
    UPDATE pbench_acct SET balance = balance - amt WHERE id = src;
    UPDATE pbench_acct SET balance = balance + amt WHERE id = dst;
  END IF;
END;

interp 1.25 ms · JIT 1.30 ms → interp 3% faster.

7. p_insert_log

CREATE PROCEDURE p_insert_log(IN acct INT, IN d INT)
BEGIN
  INSERT INTO pbench_log(acct_id, delta) VALUES (acct, d);
END;

interp 650 µs · JIT 634 µs → JIT 2% faster.

8. p_batch_log

CREATE PROCEDURE p_batch_log(IN n INT)
BEGIN
  DECLARE i INT DEFAULT 0;
  WHILE i < n DO
    INSERT INTO pbench_log(acct_id, delta) VALUES (i MOD 1000, i);
    SET i = i + 1;
  END WHILE;
END;

interp 263 ms · JIT 265 ms → ≈ parity.

9. p_upsert

CREATE PROCEDURE p_upsert(IN acct INT, IN delta INT)
BEGIN
  INSERT INTO pbench_acct(id, balance, status, region) VALUES (acct, delta, 1, acct MOD 10)
    ON DUPLICATE KEY UPDATE balance = balance + delta;
END;

interp 999 µs · JIT 914 µs → JIT 9% faster.

10. p_classify

CREATE PROCEDURE p_classify()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE aid INT;
  DECLARE bal BIGINT;
  DECLARE cur CURSOR FOR SELECT id, balance FROM pbench_acct;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  OPEN cur;
  cl: LOOP
    FETCH cur INTO aid, bal;
    IF done = 1 THEN LEAVE cl; END IF;
    UPDATE pbench_acct SET status = CASE WHEN bal >= 50000 THEN 2 WHEN bal >= 1000 THEN 1 ELSE 0 END WHERE id = aid;
  END LOOP;
  CLOSE cur;
END;

interp 331 ms · JIT 330 ms → ≈ parity.

Why interpreter ≈ JIT here: every one of these routines spends almost all of its running time in SQL execution.

Part 2 — three extreme pure-PL cases: JIT wins only here

No SQL here — the per-iteration AST walk is the whole runtime: the interpreter dispatches every node, while the JIT runs precompiled machine code. The more control flow each packs in, the wider the gap.

chain_if100, n = 45000

CREATE PROCEDURE chain_if100(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  WHILE i < n DO
    IF FALSE THEN SET s = s + 1;
    ELSEIF FALSE THEN SET s = s + 1;        -- × 99 arms, all false, no ELSE
    END IF;
    SET i = i + 1;
  END WHILE;
END;

interp 25.72 µs/iter · JIT 17.50 µs/iter → JIT ~1.47×. The 100 IF/ELSEIF arms form a deeply nested AST that the interpreter walks down arm by arm; the JIT compiled the chain to a straight compare-and-jump run.

iter_block50, n = 370000

CREATE PROCEDURE iter_block50(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  lp: WHILE i < n DO
    BEGIN
     BEGIN                                  ┐
      ...                                   │ 50 literal nested BEGIN…END levels
       SET i = i + 1; ITERATE lp;           ┘ innermost: jump back out through all 50
      ...
     END;
    END;
  END WHILE;
END;

interp 3.28 µs/iter · JIT 1.32 µs/iter → JIT ~2.48×. Each pass descends 50 nested blocks and ITERATE unwinds back out through all of them; the JIT flattened the nest to direct jumps.

flat_block100, n = 540000

CREATE PROCEDURE flat_block100(IN n INT, OUT s BIGINT)
BEGIN
  DECLARE i INT DEFAULT 0;
  SET s = 0;
  WHILE i < n DO
    BEGIN END;                              -- × 100, flat empty blocks
    SET i = i + 1;
  END WHILE;
END;

interp 2.75 µs/iter · JIT 1.33 µs/iter → JIT ~2.07×. The interpreter enters and exits all 100 empty blocks every iteration; the JIT compiled them away entirely.

Conclusion

In realistic production use, JIT-compiled and interpreted PL perform about the same. The JIT pulls ahead only in extreme cases — no SQL execution and deeply nested control flow — which rarely occur in practice.

Execute PL by walking the resolved ObPLStmt tree (ObPLInterpreter) instead of JIT-compiling routines with LLVM, and remove the ORC-JIT code generator and the objit module. The interpreter dispatches blocks, DECLARE ... DEFAULT, assignment (including SET @user_var / @@sys_var and obj-access targets such as a trigger's NEW.col), IF/ELSEIF, CASE, WHILE, LOOP, REPEAT, LEAVE, ITERATE, DO, embedded SQL, RETURN (with deep-copied results), cursors (DECLARE/OPEN/FETCH/CLOSE), exception handling (DECLARE HANDLER, SIGNAL, completion conditions), PRAGMA INTERFACE routines, and nested CALL with OUT/INOUT copy-back. Loops poll for KILL and query/transaction timeout at the same cadence the JIT used. It passes the full PL mysqltest suite.

cao1629 force-pushed the pr-06-11-interpreter branch from d81791f to 037f733 Compare June 12, 2026 02:27

Merge remote-tracking branch 'upstream/master' into pr-06-11-interpreter

e737755

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pl: replace LLVM JIT with a tree-walking interpreter#883

pl: replace LLVM JIT with a tree-walking interpreter#883
cao1629 wants to merge 2 commits into
oceanbase:masterfrom
cao1629:pr-06-11-interpreter

cao1629 commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cao1629 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance comparison of the PL: tree-walking interpreter vs LLVM JIT.

Part 1 — realistic PL (executes SQL): interpreter ≈ JIT

1. p_get_balance

2. p_sum_region

3. p_count_active

4. p_cursor_sum

5. p_apply_interest

6. p_transfer

7. p_insert_log

8. p_batch_log

9. p_upsert

10. p_classify

Part 2 — three extreme pure-PL cases: JIT wins only here

chain_if100, n = 45000

iter_block50, n = 370000

flat_block100, n = 540000

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cao1629 commented Jun 11, 2026 •

edited

Loading