const steven = {
role: "AI Engineer Β· CTO @ INTU",
focus: ["LLM agents", "agentic issue-fix pipelines", "RAG",
"multimodal doc + image ingestion", "evals", "observability", "fully agentic systems"],
stack: ["React", "Node", "Python", "Rust", "Postgres", "GraphQL"],
web3: ["MPC", "DKG", "EVM", "Solana", "Solidity"],
shipping: "production agent systems",
};I build agent systems that survive contact with production β tool-using LLMs wired through MCP, grounded by RAG, gated by eval harnesses, and instrumented end-to-end so failures are observable instead of mysterious. Before AI I spent four years deep in Web3, leading an MPC wallet-infrastructure team across cryptography, smart contracts, and Rust.
ideas βββΆ evals βββΆ guardrails βββΆ build βββΆ review βββΆ production
β² β
ββββββββββββ observe Β· measure Β· iterate Β· harden ββββββββββ
Design intent before code: I write the evals and guardrails before a line ships, then let observability close the loop β every production failure feeds the next iteration instead of disappearing.
AI / ML
Languages & Core
Platforms
Web3
π INTU β Web3 onboarding via MPCCTO Β· Lead Engineer Open-source NPM package orchestrating distributed key generation (DKG) and multi-party computation, removing seed phrases from the onboarding flow. Cross-chain transaction flows across EVM networks, bridged to Solana β sending a Solana tx authorized by an EVM signature. Self-hosted The Graph indexers for chains without hosted support.
|
Autonomous coding agent An agent that triages open GitHub issues, reproduces the bug, drafts a fix, and opens a PR β closing the loop from issue to reviewable change. Proof: medplum/medplum#9293 β an upstream OSS fix landed fully agentically (working branch).
|
|
Quant Research Β· WIP Backtest harness and execution research for systematic futures strategies β applying the same eval + observability discipline I use on AI agents to strategy selection, slippage modeling, and live risk.
|
Selected Project Β· 2026 LLM agent layered onto an open-source EHR that reads patient charts and relays clinical context on demand. Lab-report ingestion pipeline produces summaries with source-page citations, so clinicians can verify any agent-surfaced claim β a RAG pattern tuned for high-stakes clinical use.
|
π» Happy Hour Friends β Crowdsourced happy hour finderLive Β· 2026 Fully agent-operated site: every update β parsed automatically from the web or submitted by users β passes strict agentic moderation gates (classify β verify, versioned prompts, audited apply path) before going live. The test: can my agent safeguards run the site without my intervention? The product itself is dead-simple β venues and deals in one sortable, filterable view, kept current by crowdsourcing.
|
π½οΈ GURUPass / Pass Rewards β Restaurant AI AgentsLead AI & Blockchain Engineer Tool-using LLM agents handling order intake and menu Q&A, wired through MCP with structured-output validation. Curated eval set + offline regression harness catches failures before deploy; production traces drive failure-mode analysis. Personalization layer surfaces targeted coupons from purchase history.
|
I burn a lot of tokens β on purpose. But spending them to look busy is waste.
The craft is signal per token: tight context, sharp evals, and failure modes that are observable instead of mysterious.
Every system above was designed, built, and shipped on a ~$100/month plan.





