Skip to content
#

sleeper-agents

Here are 4 public repositories matching this topic...

Language: All
Filter by language

Agent orchestration & security template featuring MCP tool building, agent2agent workflows, mechanistic interpretability on sleeper agents, and agent integration via CLI wrappers

  • Updated Jul 5, 2026
  • Rust

The Sentinel Engine: A High-Fidelity Forensic Auditor for LLMs. Resolves the Observability Trilemma via Differential Precision Probing (DPP) and ISNI. Discovered the "Deception Horizon" and Magnitude Collapse phenomenon in Llama-3-8B. Achieves 1.00 AUC at 100x lower cost.

  • Updated Feb 2, 2026
  • Jupyter Notebook

AI-safety research: inducing CoT unfaithfulness in DeepSeek-R1-Distill-Qwen-7B via Synthetic Document Finetuning. SDF installed a conditional 'sleeper' backdoor (accuracy 69%→18%, hint-following 16%→31% when triggered) but not covert behavioral transfer.

  • Updated Jun 13, 2026
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the sleeper-agents topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sleeper-agents topic, visit your repo's landing page and select "manage topics."

Learn more