Skip to content
View agentjakey's full-sized avatar

Block or report agentjakey

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
agentjakey/README.md

Jacob Ortiz

Physics-trained ML researcher & AI engineer working toward technical AI safety.
Mechanistic interpretability · Chain-of-thought faithfulness · Model evaluation · AI observability

LinkedIn arXiv Substack


About

I'm a graduated Physics BS student from UC San Diego, working at the intersection of physics, machine learning, and AI safety.

My path into safety ran through high-energy particle physics ML: as an ML Research Assistant in the UCSD Duarte Lab, I co-authored a NeurIPS 2025 ML4PS paper on why Particle Transformer attention becomes sparse (arXiv:2512.00210). That work made me ask a broader safety question - when a model exposes something interpretable-looking (attention, reasoning traces, citations), how do we know it's actually causally connected to the computation driving behavior?

The problem I most want to work on: whether chain-of-thought reasoning and model explanations are faithful enough to support oversight. In parallel, I design and deploy production AI systems - RAG pipelines, evaluation harnesses, and full-stack ML tooling - with a focus on grounding, observability, and systems people can actually inspect.

Currently: extending ThoughtTrace into a multi-model CoT faithfulness benchmark · ML Engineering intern @ Experian · writing Latent Space weekly


AI Safety & Interpretability

Empirical safety work combining interpretability, evaluation engineering, and inspectable tooling.

Chain-of-Thought Faithfulness & Monitoring

Project What it does
ThoughtTrace Activation-level CoT faithfulness auditing. Residual-stream mean ablation + causal contribution scoring on Qwen2.5-7B. Key finding: output-faithful cases showed ~2× the activation shift of unfaithful ones - hidden influence invisible to output-level monitoring.
cot-faithfulness (live) Interactive education site on when reasoning traces do and don't reflect real computation.

Mechanistic Interpretability & Probing

Project What it does
SignBridge Probes LLaVA-1.5-7B decoder layers for ASL hand-shape structure; logistic probes hit 81.7% at layer 16 (vs 5% chance), surfacing latent structure absent from generated text.
EmbeddingDrift Concept-representation drift across Llama variants. Instruction tuning produces ~7.5× more embedding displacement than 4-bit quantization; demographic concepts over-represented among top drifters.
mechinterp-explore Canonical GPT-2 Small circuit analysis with TransformerLens: induction heads, logit lens, activation patching on IOI, direct logit attribution.
neural-polysemanticity (live) Interactive lab on how single neurons encode multiple concepts and why it complicates auditing.
RepOverLab (live) Visualizes how embedding-based safety classifiers inherit ambiguity from representation geometry.

Evaluation, Monitoring & Failure Modes

Project What it does
VeritasLens Claim-level hallucination detection: QLoRA-tuned Gemma, three-tier evidence retrieval, deterministic reliability scoring, causal-mediation token attribution. Held-out accuracy 50% → 90%.
PromptSurgeon 800-trial controlled study of prompt strategies in agentic systems, with power analysis, bootstrap CIs, and cost measurement.
DeceptionScope Model-organisms-of-misalignment tooling for studying deceptive behavior.
AlignmentLens (live) Live reward-hacking demo - watch it happen, then probe why.
FailModeAtlas (live) Interactive map of 24 AI failure modes across 6 conceptual families.
recursive-rd-atlas (live) Interactive essay on recursive AI R&D safety concerns and oversight failure modes.
epistemic-atlas Human-AI workflow for building trustworthy claim graphs from messy disputes.

AI / ML Engineering

Production systems and infrastructure, with an emphasis on grounding, observability, and reliability.

Project What it does
RAG-Snowflake-Policy-Assistant Enterprise RAG on Snowflake Cortex: PDF ingestion, recursive chunking, semantic retrieval, source-cited generation, Streamlit UI built around zero-hallucination constraints.
Trace-Forge Framework-agnostic observability for multi-step LLM pipelines: nested trace capture, token/cost attribution, waterfall UI, replay, OpenTelemetry-style export.
credit-risk-threshold-lab Binary credit-risk classification (logistic regression vs XGBoost) framed around the business decision, not just accuracy.

Additional production work (not all public): RAG and automation systems at American Refrigeration - internal knowledge retrieval, role-gated workflow platforms, and AI evaluations adopted by leadership.


Physics & Scientific ML

Project What it does
SAL-T4HEP Efficient transformer architecture for particle identification - code accompanying the NeurIPS 2025 ML4PS paper (arXiv:2512.00210).
miet-clifford Measurement-induced entanglement phase transitions in 1D random Clifford circuits: stabilizer tableau simulation, GF(2) entropy, finite-size scaling.
Phys_139_project Custom lightweight CNN for medical image classification - 5× fewer params than EfficientNetV2-S with higher accuracy and AUC.
Arduino-FM-Radio-Transmitter SI4713-based FM transmitter with auto band scanning, live tuning, and switchable audio input.

Writing & Community

  • Latent Space - weekly Substack making technical AI safety ideas accessible without oversimplifying.
  • UCSD ML / AI / AI Safety Learning Group - organizer, 50+ students.
  • BlueDot Impact - Technical AI Safety certificate.

Tech Stack

Research / Interpretability PyTorch TransformerLens HuggingFace TensorFlow Keras NumPy scikit-learn Inspect

Languages Python TypeScript JavaScript C SQL R LaTeX

ML Infra & Tooling CUDA Docker Kubernetes Unsloth Ollama Snowflake Azure Git

Apps, APIs & Deployment FastAPI Streamlit Gradio React Next.js Supabase Anthropic OpenAI

Curiosity and connection, in this universe we all share.

Pinned Loading

  1. Trace-Forge Trace-Forge Public

    Framework-agnostic observability for multi-step LLM pipelines.

    Python 1

  2. PromptSurgeon PromptSurgeon Public

    Empirical study of prompt engineering strategies in agentic LLM systems, measuring performance and cost across 800 controlled trials.

    Jupyter Notebook 1

  3. RAG-Snowflake-Policy-Assistant RAG-Snowflake-Policy-Assistant Public

    Snowflake Cortex and Streamlit RAG assistant for grounded Q&A over internal policy and safety documents.

    Python 1

  4. EmbeddingDrift EmbeddingDrift Public

    Framework for measuring and visualizing how concept representations drift across LLM versions, instruction tuning, and quantization.

    Jupyter Notebook 1

  5. SignBridge SignBridge Public

    Mechanistic interpretability project probing where ASL hand-shape information appears inside a vision-language model’s decoder layers.

    Jupyter Notebook 1

  6. VeritasLens VeritasLens Public

    Claim-level hallucination detection system that retrieves evidence, assigns verdicts, and proposes corrected text for AI-generated claims.

    Jupyter Notebook 1