Physics-trained ML researcher & AI engineer working toward technical AI safety.
Mechanistic interpretability · Chain-of-thought faithfulness · Model evaluation · AI observability
I'm a graduated Physics BS student from UC San Diego, working at the intersection of physics, machine learning, and AI safety.
My path into safety ran through high-energy particle physics ML: as an ML Research Assistant in the UCSD Duarte Lab, I co-authored a NeurIPS 2025 ML4PS paper on why Particle Transformer attention becomes sparse (arXiv:2512.00210). That work made me ask a broader safety question - when a model exposes something interpretable-looking (attention, reasoning traces, citations), how do we know it's actually causally connected to the computation driving behavior?
The problem I most want to work on: whether chain-of-thought reasoning and model explanations are faithful enough to support oversight. In parallel, I design and deploy production AI systems - RAG pipelines, evaluation harnesses, and full-stack ML tooling - with a focus on grounding, observability, and systems people can actually inspect.
Currently: extending ThoughtTrace into a multi-model CoT faithfulness benchmark · ML Engineering intern @ Experian · writing Latent Space weekly
Empirical safety work combining interpretability, evaluation engineering, and inspectable tooling.
| Project | What it does |
|---|---|
| ThoughtTrace | Activation-level CoT faithfulness auditing. Residual-stream mean ablation + causal contribution scoring on Qwen2.5-7B. Key finding: output-faithful cases showed ~2× the activation shift of unfaithful ones - hidden influence invisible to output-level monitoring. |
| cot-faithfulness (live) | Interactive education site on when reasoning traces do and don't reflect real computation. |
| Project | What it does |
|---|---|
| SignBridge | Probes LLaVA-1.5-7B decoder layers for ASL hand-shape structure; logistic probes hit 81.7% at layer 16 (vs 5% chance), surfacing latent structure absent from generated text. |
| EmbeddingDrift | Concept-representation drift across Llama variants. Instruction tuning produces ~7.5× more embedding displacement than 4-bit quantization; demographic concepts over-represented among top drifters. |
| mechinterp-explore | Canonical GPT-2 Small circuit analysis with TransformerLens: induction heads, logit lens, activation patching on IOI, direct logit attribution. |
| neural-polysemanticity (live) | Interactive lab on how single neurons encode multiple concepts and why it complicates auditing. |
| RepOverLab (live) | Visualizes how embedding-based safety classifiers inherit ambiguity from representation geometry. |
| Project | What it does |
|---|---|
| VeritasLens | Claim-level hallucination detection: QLoRA-tuned Gemma, three-tier evidence retrieval, deterministic reliability scoring, causal-mediation token attribution. Held-out accuracy 50% → 90%. |
| PromptSurgeon | 800-trial controlled study of prompt strategies in agentic systems, with power analysis, bootstrap CIs, and cost measurement. |
| DeceptionScope | Model-organisms-of-misalignment tooling for studying deceptive behavior. |
| AlignmentLens (live) | Live reward-hacking demo - watch it happen, then probe why. |
| FailModeAtlas (live) | Interactive map of 24 AI failure modes across 6 conceptual families. |
| recursive-rd-atlas (live) | Interactive essay on recursive AI R&D safety concerns and oversight failure modes. |
| epistemic-atlas | Human-AI workflow for building trustworthy claim graphs from messy disputes. |
Production systems and infrastructure, with an emphasis on grounding, observability, and reliability.
| Project | What it does |
|---|---|
| RAG-Snowflake-Policy-Assistant | Enterprise RAG on Snowflake Cortex: PDF ingestion, recursive chunking, semantic retrieval, source-cited generation, Streamlit UI built around zero-hallucination constraints. |
| Trace-Forge | Framework-agnostic observability for multi-step LLM pipelines: nested trace capture, token/cost attribution, waterfall UI, replay, OpenTelemetry-style export. |
| credit-risk-threshold-lab | Binary credit-risk classification (logistic regression vs XGBoost) framed around the business decision, not just accuracy. |
Additional production work (not all public): RAG and automation systems at American Refrigeration - internal knowledge retrieval, role-gated workflow platforms, and AI evaluations adopted by leadership.
| Project | What it does |
|---|---|
| SAL-T4HEP | Efficient transformer architecture for particle identification - code accompanying the NeurIPS 2025 ML4PS paper (arXiv:2512.00210). |
| miet-clifford | Measurement-induced entanglement phase transitions in 1D random Clifford circuits: stabilizer tableau simulation, GF(2) entropy, finite-size scaling. |
| Phys_139_project | Custom lightweight CNN for medical image classification - 5× fewer params than EfficientNetV2-S with higher accuracy and AUC. |
| Arduino-FM-Radio-Transmitter | SI4713-based FM transmitter with auto band scanning, live tuning, and switchable audio input. |
- Latent Space - weekly Substack making technical AI safety ideas accessible without oversimplifying.
- UCSD ML / AI / AI Safety Learning Group - organizer, 50+ students.
- BlueDot Impact - Technical AI Safety certificate.
Curiosity and connection, in this universe we all share.

