Jacob Ortiz agentjakey

Jacob Ortiz

Physics-trained ML researcher & AI engineer working toward technical AI safety.
Mechanistic interpretability · Chain-of-thought faithfulness · Model evaluation · AI observability

About

I'm a graduated Physics BS student from UC San Diego, working at the intersection of physics, machine learning, and AI safety.

My path into safety ran through high-energy particle physics ML: as an ML Research Assistant in the UCSD Duarte Lab, I co-authored a NeurIPS 2025 ML4PS paper on why Particle Transformer attention becomes sparse (arXiv:2512.00210). That work made me ask a broader safety question - when a model exposes something interpretable-looking (attention, reasoning traces, citations), how do we know it's actually causally connected to the computation driving behavior?

The problem I most want to work on: whether chain-of-thought reasoning and model explanations are faithful enough to support oversight. In parallel, I design and deploy production AI systems - RAG pipelines, evaluation harnesses, and full-stack ML tooling - with a focus on grounding, observability, and systems people can actually inspect.

Currently: extending ThoughtTrace into a multi-model CoT faithfulness benchmark · ML Engineering intern @ Experian · writing Latent Space weekly

AI Safety & Interpretability

Empirical safety work combining interpretability, evaluation engineering, and inspectable tooling.

Chain-of-Thought Faithfulness & Monitoring

Project	What it does
ThoughtTrace	Activation-level CoT faithfulness auditing. Residual-stream mean ablation + causal contribution scoring on Qwen2.5-7B. Key finding: output-faithful cases showed ~2× the activation shift of unfaithful ones - hidden influence invisible to output-level monitoring.
cot-faithfulness (live)	Interactive education site on when reasoning traces do and don't reflect real computation.

Mechanistic Interpretability & Probing

Project	What it does
SignBridge	Probes LLaVA-1.5-7B decoder layers for ASL hand-shape structure; logistic probes hit 81.7% at layer 16 (vs 5% chance), surfacing latent structure absent from generated text.
EmbeddingDrift	Concept-representation drift across Llama variants. Instruction tuning produces ~7.5× more embedding displacement than 4-bit quantization; demographic concepts over-represented among top drifters.
mechinterp-explore	Canonical GPT-2 Small circuit analysis with TransformerLens: induction heads, logit lens, activation patching on IOI, direct logit attribution.
neural-polysemanticity (live)	Interactive lab on how single neurons encode multiple concepts and why it complicates auditing.
RepOverLab (live)	Visualizes how embedding-based safety classifiers inherit ambiguity from representation geometry.

Evaluation, Monitoring & Failure Modes

Project	What it does
VeritasLens	Claim-level hallucination detection: QLoRA-tuned Gemma, three-tier evidence retrieval, deterministic reliability scoring, causal-mediation token attribution. Held-out accuracy 50% → 90%.
PromptSurgeon	800-trial controlled study of prompt strategies in agentic systems, with power analysis, bootstrap CIs, and cost measurement.
DeceptionScope	Model-organisms-of-misalignment tooling for studying deceptive behavior.
AlignmentLens (live)	Live reward-hacking demo - watch it happen, then probe why.
FailModeAtlas (live)	Interactive map of 24 AI failure modes across 6 conceptual families.
recursive-rd-atlas (live)	Interactive essay on recursive AI R&D safety concerns and oversight failure modes.
epistemic-atlas	Human-AI workflow for building trustworthy claim graphs from messy disputes.

AI / ML Engineering

Production systems and infrastructure, with an emphasis on grounding, observability, and reliability.

Project	What it does
RAG-Snowflake-Policy-Assistant	Enterprise RAG on Snowflake Cortex: PDF ingestion, recursive chunking, semantic retrieval, source-cited generation, Streamlit UI built around zero-hallucination constraints.
Trace-Forge	Framework-agnostic observability for multi-step LLM pipelines: nested trace capture, token/cost attribution, waterfall UI, replay, OpenTelemetry-style export.
credit-risk-threshold-lab	Binary credit-risk classification (logistic regression vs XGBoost) framed around the business decision, not just accuracy.

Additional production work (not all public): RAG and automation systems at American Refrigeration - internal knowledge retrieval, role-gated workflow platforms, and AI evaluations adopted by leadership.

Physics & Scientific ML

Project	What it does
SAL-T4HEP	Efficient transformer architecture for particle identification - code accompanying the NeurIPS 2025 ML4PS paper (arXiv:2512.00210).
miet-clifford	Measurement-induced entanglement phase transitions in 1D random Clifford circuits: stabilizer tableau simulation, GF(2) entropy, finite-size scaling.
Phys_139_project	Custom lightweight CNN for medical image classification - 5× fewer params than EfficientNetV2-S with higher accuracy and AUC.
Arduino-FM-Radio-Transmitter	SI4713-based FM transmitter with auto band scanning, live tuning, and switchable audio input.

Writing & Community

Latent Space - weekly Substack making technical AI safety ideas accessible without oversimplifying.
UCSD ML / AI / AI Safety Learning Group - organizer, 50+ students.
BlueDot Impact - Technical AI Safety certificate.

Tech Stack

Research / Interpretability

Languages

ML Infra & Tooling

Apps, APIs & Deployment

Curiosity and connection, in this universe we all share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly