Skip to content

Rust rewrite foundation#8

Merged
JamesKane merged 191 commits into
mainfrom
rust-rewrite-foundation
Jun 18, 2026
Merged

Rust rewrite foundation#8
JamesKane merged 191 commits into
mainfrom
rust-rewrite-foundation

Conversation

@JamesKane

Copy link
Copy Markdown
Owner

Rewrite it in Rust to eliminate the JVM dependency

  1. Use PostgreSQL's document store capabilities to reduce the number to tables in the schema.
  2. Redesign most of the AT Protocol integration
  3. Fixing lots of bugs in the Curation system
  4. IBD matching support

JamesKane and others added 30 commits June 1, 2026 07:09
Foundation milestone for the Play/Scala 3 -> Rust rewrite (plan: clean-slate,
full parity, Axum + SQLx + Askama, JSONB document columns, noodles for genomics,
Apple `container` PostGIS for Docker-less local testing).

- 8-crate Cargo workspace (du-domain/db/bio/atproto/external/web/jobs/migrate);
  compiles and tests green. du-web boots Axum with the /health endpoint.
- du-domain: typed IDs, Postgres-mirroring enums, and the variant JSONB contract
  (Coordinates/Aliases/Annotations) with round-trip tests.
- Redesigned schema (migrations 0001-0009) across 10 namespaces, verified
  applying to live PostGIS. De-sprawl: 3 biosample tables -> 1 unified
  core.biosample; ~7 deprecated child tables folded into JSONB on parents;
  metadata DB collapsed into `fed`; scattered at_uri/at_cid -> one `atproto`
  JSONB column; GIN/GiST/expression indexes on queried JSONB paths.
- du-db: PgPool + run_migrations; live-DB integration test (gated on
  DATABASE_URL) covering all schemas + JSONB variant round-trip. build.rs
  watches migrations/ so sqlx::migrate! re-embeds on change.
- scripts/test-db.sh: Apple `container` PostGIS harness (IP-aware, since Apple
  containers have no localhost port forwarding), native DATABASE_URL fallback.
- Multi-stage Dockerfile (slim runtime, single binary, no JRE), compose.yaml,
  .env.example.

Coexists with the Scala app under rust/ during the transition.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Data-access layer for the public read surface, with a reusable mapping pattern:
JSONB columns decode into du-domain payload structs via sqlx Json<T>; Postgres
enums are read as ::text and parsed through serde (parse_pg_enum / pg_enum_label),
keeping du-domain free of any sqlx dependency.

- du-domain: Haplogroup, Publication, Biosample read-side types.
- du-db modules:
  * variant   - get_by_id, paginated search (canonical name + common_names/rs_ids
                JSONB alias arrays)
  * haplogroup- get_by_id/by_name, children, roots (current edges, valid_until IS NULL)
  * publication - get_by_id, paginated search (title/journal/doi, newest-first)
  * biosample - get_by_guid, find_by_alias_or_accession
  * pagination - generic Page<T>
- tests/queries.rs: seeds sentinel rows, exercises every module against live
  PostGIS, cleans up. Full workspace 7/7 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ree)

First user-visible slice through Axum + Askama + HTMX, server-rendered with
HATEOAS fragment navigation, verified end-to-end against the live PostGIS:

- du-web restructured: AppState (PgPool), AppError -> HTTP mapping, explicit
  Askama render helper, routes/ module. main() connects+migrates when
  DATABASE_URL is set, else serves /health only.
- Variant browser: /variants page lazy-loads /variants/list fragment (search by
  name + common_names/rs_ids JSONB aliases, paginated); rows load
  /variants/detail/:id panel rendering multi-build coordinates.
- Tree: /ytree & /mtree pages lazy-load /{y,m}tree/fragment; clicking a node
  swaps #tree-container and pushes the URL (hx-get + href + hx-push-url) so
  browser back/forward walks the tree. Graceful empty states + 404s.
- du-domain: Display/label() for DnaType/MutationType/NamingStatus (templates).
- du-db: re-export PgPool so du-web needn't depend on sqlx; migrations test uses
  a sentinel variant name to avoid colliding with real data.
- Askama templates (base/index/variants/tree) on Bootstrap 5 + htmx 2.

Workspace 7/7 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation)

Bring the read-only slice up to plan §4:

Vendored assets: bootstrap 5.3.3 (css + bundle js) and htmx 2.0.4 under
crates/du-web/assets/vendor, served via tower-http ServeDir at /assets
(DU_ASSETS_DIR env, compile-time crate-dir fallback). Custom main.css replaces
inline styles. CDN links removed; Dockerfile copies assets + sets DU_ASSETS_DIR.

i18n (en/es/fr): Play-style key=value catalogs embedded via include_str; Lang+T
translator with en/key fallback; Locale extractor resolves lang from the `lang`
cookie then Accept-Language (default en); navbar language switcher with
percent-encoded `next`; GET /language/:lang sets the cookie + redirects with an
open-redirect guard. Templates fully localized. Unit tests assert the fallback
chain and that es/fr cover every English key.

HX-Request unification: HxRequest extractor (htmx + history-restore + target).
Tree collapses to one handler per lineage (/ytree, /mtree) — full page embeds the
current level inline; an HTMX swap of #tree-container returns just the fragment
with a server-driven HX-Push-Url; history-restore and boosted navigations get the
full page (target-aware negotiation). HxHeaders builder for HX-* response headers.
Variant browser embeds its first results page inline (no load round-trip).

Verified live against the container PostGIS; workspace 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Broaden the public surface, reusing the publication/biosample query modules and
the established HTMX two-panel + i18n patterns.

- du-db: biosample::for_publication — paginated biosamples linked to a
  publication (join pubs.publication_biosample), returning Page<Biosample>.
- du-domain: Display/label() for BiosampleSource.
- du-web references routes: /references (page, first list embedded inline),
  /references/list (search + pagination fragment), /references/:id/biosamples
  (per-publication report fragment, paginated). Clicking a publication loads its
  samples into #reference-detail. 404 on missing publication.
- i18n: references/* keys added across en/es/fr; nav "References" link.
- Templates references/{page,list,biosamples}.html.

Verified live against the container PostGIS (list, search, es localization,
ancient/external sample reports with DOI link). Workspace 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the geographic map and exercises the PostGIS spatial path end-to-end
(de-risks plan §11's PostGIS-in-Rust concern).

- du-db: biosample::geo_points — ST_X/ST_Y over the donor geocoord
  (geometry Point, 4326), joined to non-deleted biosamples.
- du-domain: biosample::GeoPoint (serde).
- du-web maps routes: /biosamples/map (page) and /biosamples/geo-data
  (GeoJSON FeatureCollection, [lon,lat] order). map_page is a full-page load
  (nav link hx-boost=false) so Leaflet initializes; base.html gains a head
  block for per-page assets.
- Vendored Leaflet 1.9.4 (css + js); assets/map.js plots circle markers (no
  marker-image assets) with popups and fits bounds. OSM tiles at runtime.
- i18n map.* keys + nav "Map" across en/es/fr.

Verified live: GeoJSON has 3 features with correct [lon,lat] coords and
accession/source props; assets served; es localization. Workspace 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the public read surface and exercises the coverage-JSONB aggregation
path (the meanDepth expression index from migration 0004).

- du-domain: coverage::CoverageBenchmark.
- du-db: coverage::benchmarks — aggregates genomics.alignment_metadata.coverage
  by lab + test type (avg meanDepth, avg percent_coverage_at_10x), joined through
  sequence_file -> sequence_library -> lab/test_type, with the test type's
  expected_min_depth for comparison.
- du-web: /coverage-benchmarks page; observed-vs-expected indicator.
- i18n coverage.* keys + nav "Coverage" across en/es/fr.

Verified live: Dante WGS 30x aggregates two libraries to 31.5x, Nebula WGS 100x
to 102.5x, both flagged meeting expected depth; es localized. Workspace 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The authenticated/write half. Local credential login with signed-cookie
sessions and role-gated curator tools; AT Protocol federation remains future
work in du-atproto.

Auth:
- du-db auth: credential lookup (ident.user_login_info) + role loading.
- du-web auth: argon2 hashing with bcrypt-verify fallback for legacy hashes;
  Session in a signed cookie (APP_SECRET-derived key); MaybeUser/Curator
  extractors; /login (+ error re-render) and /logout; CookieManagerLayer.
- `decodingus hash-password <pw>` dev subcommand for seeding.
- Navbar reflects login state (login/logout/curator); user threaded through
  all full-page templates.

Curator (TreeCurator/Admin gated):
- du-db haplogroup: list_paginated, create, update, delete, has_current_edges.
- Dashboard + two-panel haplogroups (search + lineage filter, detail panel,
  create/edit forms, delete). Mutations return the panel plus HX-Trigger
  (hg-changed) so the list reloads — the server-driven write-flow using
  HxHeaders. Delete is blocked (with an inline warning) when tree edges exist.
- i18n auth.*/curator.*/hg.* + nav across en/es/fr.

Verified live: unauth redirect, bad/good login, signed session, dashboard,
create/edit/delete with HX-Trigger, blocked delete. Workspace 11/11 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two more curator surfaces on the established two-panel HTMX write-flow.

Variants:
- du-db variant: create/update/delete + is_referenced (guards delete when the
  variant defines a current haplogroup association).
- Curator variant CRUD editing scalar fields + alias lists (common_names/rs_ids
  as comma-separated, stored into the aliases JSONB); coordinates preserved.

Genome regions:
- du-domain GenomeRegion; du-db genome_region list/get/create/update/delete.
- Curator region CRUD with JSON textareas for the coordinates/properties JSONB
  documents, parse-validated: invalid JSON re-renders the form with an error and
  no HX-Trigger (so the list does not reload on failure).

- Mutations fire distinct HX-Trigger events (variant-changed / region-changed);
  dashboard links to all three tools; i18n var.*/region.* across en/es/fr (the
  catalog-coverage test enforces es/fr parity).

Verified live (as the seeded curator): variant create stores aliases JSONB +
trigger + delete; region create stores coordinates JSONB, invalid-JSON error
path, list + delete. Workspace 11/11 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ication

One-time ETL binary. Production source is a self-managed Postgres on EC2 —
--legacy takes that DSN (typically sslmode=require via opened SG / SSH tunnel),
--target the new DB. rustls TLS supports remote/SSL connections.

Design: preserve legacy primary keys (OVERRIDING SYSTEM VALUE) and sample_guid
UUIDs so all foreign keys carry over 1:1 with no id-remapping. Transformers run
in FK order, upsert idempotently (re-runnable/resumable), then identity sequences
are advanced and a reconciliation pass compares legacy vs new counts.

Transformers: specimen_donor; unified biosample (legacy biosample +
citizen_biosample + pgp_biosample -> core.biosample, deriving source, folding
at_uri/at_cid into the atproto JSONB and PGP fields into source_attrs); variant
(enum normalization, JSONB passthrough); haplogroup (+relationship +variant,
Y/MT -> Y_DNA/MT_DNA); genomic_study; publication (+links, resolving legacy
integer biosample ids to sample_guid across both link tables).

CLI: --legacy/--target/--verify. Reconcile prints a per-aggregate count table.

Verified without prod access: scripts/mock-legacy.sql seeds a legacy-shaped
subset; the ETL into a fresh target reconciles all 9 aggregates, spot-checks
confirm unification/JSONB/enum/geocoord/FK preservation, and a re-run is
idempotent. NOTE: transformer SELECTs encode the reconstructed legacy layout —
validate against the live EC2 schema before the production run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Status-oriented README for the Rust port: rationale, stack, workspace layout,
schema redesign summary, what's implemented (public surface + auth/curator +
ETL), getting started with Apple `container`, testing, ETL usage (EC2 source),
deploy, and a roadmap checklist.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…missions/OAuth

Federation direction updated after reviewing current atproto specs: the custom
"private firehose" is dropped in favor of the protocol's permissions/OAuth +
notify-then-fetch model (private data deliberately bypasses the firehose;
consumers fetch records from the PDS over scoped OAuth). Group-private data spec
is still maturing upstream.

This lands the foundation needed under any model:
- did: DID + AT-URI parsing; did:key <-> Ed25519 pubkey (multibase + multicodec).
- signature: verify_did_key — Ed25519 verification against a self-certifying
  did:key (no network needed); tested with sign/verify/tamper/wrong-key.
- resolve: DID-document parsing (PDS endpoint, handle, signing did:key) + a
  Resolver for handle->DID (well-known) and DID->doc (PLC directory / did:web);
  parsing unit-tested via fixture, HTTP fetch isolated.

README roadmap updated for the pivot. Workspace 17/17 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The federation wiring pieces (consumer side of permissions/OAuth). Live handshake
needs the Edge team's PDS; everything up to the network exchange is implemented
and unit-tested.

du-atproto oauth module (unit-tested, offline):
- PKCE S256 (verified vs RFC 7636 vector), ES256 JOSE sign + JWK + RFC 7638
  thumbprint, DPoP proof JWTs, private_key_jwt client assertion.
- ClientMetadata (confidential web: private_key_jwt + ES256 + DPoP) and
  AuthServerMetadata + protected-resource discovery; PAR/authorize/token builders.

du-web wiring:
- OauthClient from env (OAUTH_BASE_URL/SCOPE/EC_KEY; disabled when unset).
- Serves /oauth/client-metadata.json and /oauth/jwks.json (public key only).
- /login/atproto: resolve handle->DID->PDS->authserver, PAR (DPoP, nonce retry),
  redirect; /oauth/callback: token exchange -> upsert user by DID -> session.
- du-db: upsert_user_by_did (find-or-create + atproto login_info).
- AppError::Upstream (502) for federation failures.

Verified live: metadata + JWKS serve correctly (no private material leaks);
/login with a bogus handle fails gracefully (502). Full flow pending Edge PDS.

docs/atproto-oauth-findings.md enumerates the integration points to settle with
the Edge team (client registration, hosting, scopes/permission sets, key
lifecycle, DPoP nonce, identity resolution, notify-fetch). Workspace 22/22 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
du-bio (pure Rust, replacing htsjdk):
- callable: BED interval merge + callable-loci summary (total bp, region count
  per contig) for mutation-rate / branch-age inputs.
- liftover: UCSC chain-file parse + cross-build position mapping (gaps -> None,
  reverse-strand targets handled) — the algorithm htsjdk LiftOver provided.
- vcf: line-oriented variant reader (CHROM/POS/ID/REF/ALT) for the de-identified
  variant-ingest path; binary formats (BAM/CRAM) + full-spec VCF use noodles when
  the ingestion jobs need them.

du-jobs:
- tokio scheduler harness: named jobs (period + async closure), per-job interval
  loops with error isolation + run-on-start, graceful shutdown on Ctrl-C.
- main registers a DB heartbeat (verified live: variants=4 publications=2);
  real jobs (publication update/discovery, YBrowse ingest, variant export, match
  discovery) wired as du-external/ingestion land.

Tested: callable merge/summary, liftover positions/gaps/reverse-strand, VCF
parse, scheduler run-once + paused-time periodic run. README roadmap updated.
Workspace 29/29 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eply

Navigator/Edge team replied (DUNavigator/documents/atmosphere/12-OAuth-Edge-Reply.md
+ 08/11). Integrating their feedback:

- du-atproto: add public-client request builders par_form_public / token_form_public
  (PKCE-only, no client_assertion) so the Navigator desktop app reuses the same
  PKCE/DPoP/resolution primitives over a public/native client. Tested.
- docs/atproto-edge-reply.md: our point-by-point reply (public-client done, will
  host Navigator client-metadata, AppView read scope = none for now, DPoP nonce,
  shared-crate extraction + haploid-caller decisions pending).
- README/framing correction: the standard relay/Jetstream ingest STAYS (reads are
  out of the OAuth permission spec); only the custom REST/Kafka relay is dropped.
  AppView re-scoped off the network mirror to two flows: (a) variant catalog via
  direct proposal submission, (b) on-demand coverage aggregation from public
  summary records. Roadmap updated; shared-crate extraction tracked.
- Stop tracking rust/.DS_Store; gitignore it.

Workspace 30/30 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the decodingus-side call on the Navigator asks:
- Shared crates (du-domain/du-atproto/du-bio) -> a dedicated `decodingus-shared`
  git repo; both repos git-dep on it. Extraction is a coordinated next step.
- Haploid variant caller stays Navigator-only; du-bio remains I/O + liftover +
  callable.

Updated the edge-reply doc and README roadmap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
YBrowse publishes on GRCh38; ingestion now lifts each position to the other
tracked builds so core.variant.coordinates carries all builds.

- du-domain: NewVariant (coordinate-bearing, no DB id) for the ingest path.
- du-bio::ybrowse: from_grch38_vcf — parse GRCh38 VCF records, lift each to the
  given target builds (GRCh37, hs1) via chain files, emit NewVariant with
  multi-build coordinates; first VCF ID = canonical name, rest = aliases; tracks
  unmapped lifts. Handles VCF 1-based <-> chain 0-based conversion.
- du-db: variant::upsert_by_name (ON CONFLICT canonical_name) — updates
  coordinates/aliases, preserves curator-owned naming_status.
- du-jobs: env-gated ybrowse-variant-ingest job (YBROWSE_VCF + chain paths)
  wiring du-bio parse/lift -> du-db upsert.

Verified end-to-end via the jobs binary: GRCh38 chrY:2200001 lifted to GRCh37
chrY:3200001, upserted with both builds + alias. Unit-tested lift offset, gap
(unmapped), and multi-build coords. Workspace 31/31 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per scope clarification: raw-read processing (BAM/CRAM) and variant calling are
out of scope for decodingus (the AppView). Navigator (edge) does local calling
from raw reads; the AppView only ingests/aggregates the resulting summaries and
variant proposals.

- Remove the unused `noodles` workspace dependency (no crate used it; htslib/
  noodles aren't needed without BAM/CRAM).
- Reframe du-bio as coordinate math + text parsing (VCF ingest, BED callable
  loci, UCSC-chain liftover, YBrowse) — not file I/O / htsjdk replacement.
- Update README (stack table, crate map, roadmap) and the plan §6 accordingly.

No code change; workspace 31/31 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the agreed shared-crate decision: the pure/IO-light crates now live in a
sibling `decodingus-shared` repo (its own cargo workspace; builds+tests
standalone) so the DecodingUs AppView and Navigator depend on one copy.

- Moved du-domain, du-atproto, du-bio out of rust/crates.
- Workspace members reduced to the AppView-specific crates (du-db, du-external,
  du-web, du-jobs, du-migrate); the three shared crates are pulled via path deps
  to ../../decodingus-shared/crates/* (git-dep form commented for post-push).
- README crate map + roadmap and the Dockerfile note updated for the split.

Verified: decodingus builds+tests against the sibling crates (9 tests here +
22 in decodingus-shared = same 31 total). NB: the Docker build needs the path
deps switched to git deps once decodingus-shared is pushed (sibling path deps are
not in the rust/ build context).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Navigator submits variant/branch proposals to the AppView; curators review and
name them in the web UI (the agreed catalog role — AppView keeps review+naming).
The legacy manual sample-submission APIs are intentionally not ported (curators
work in Navigator now).

- du-db::proposal: submit (pool by name+parent across submitters into
  tree.proposed_branch + evidence; distinct-submitter consensus via
  discovery_sample_guids; confidence scales with evidence), list/get, review
  (APPROVE/REJECT/DEFER -> status + tree.curator_action).
- du-web curation: POST /api/v1/curation/proposals (machine intake, X-API-Key gate
  for now -> OAuth bearer once the handshake is live) + the /curator/proposals
  review queue (two-panel HTMX, status filter, detail + review form, HX-Trigger
  refresh, gated to Curator). i18n prop.* + dashboard link.

Verified live: 3 submissions pool to one proposal (evidence_count=3, submitters=2
after dedup, parent resolved); wrong API key -> 403; curator approve -> ACCEPTED +
curator_action recorded + HX-Trigger; review form gone once decided. Workspace 9/9
(decodingus) + 22 (shared) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Approving a proposal records the decision; promotion turns it into real catalog
entries.

- du-db::proposal::promote (one transaction): requires status ACCEPTED + a parent;
  creates the tree.haplogroup branch (name = proposed_name, lineage from parent,
  source 'discovery'), a current relationship edge under the parent, and
  core.variant get-or-create + tree.haplogroup_variant links for each defining
  variant in the evidence (UNNAMED -> NAMED on promote, GRCh38 coord from pos);
  sets status PROMOTED + records a PROMOTE curator_action. DbError::Conflict for
  precondition/uniqueness failures (wrong status, name taken, no parent).
- du-web: POST /curator/proposals/:id/promote (Curator), "Promote to catalog"
  button shown on ACCEPTED proposals; conflicts surface as a 422 message.
  i18n prop.promote (en/es/fr).

Verified live: promote-before-approve -> 422; approve+promote -> new Y_DNA branch
R-FT900 under R with the FT900 variant (NAMED, GRCh38 pos) linked, status PROMOTED,
APPROVE+PROMOTE actions logged; branch shows under R in the unified /ytree.
Workspace 9/9 (decodingus) + 22 (shared) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- du-external: OpenAlex client (work-by-DOI enrichment + search-based discovery,
  with abstract reconstruction from the inverted index) and ENA portal client
  (study lookup). JSON->domain parsing is pure and unit-tested with fixtures; the
  HTTP fetch is a thin reqwest wrapper.
- du-db::publication: dois() work-list, update_openalex() (COALESCE — nulls don't
  wipe), enabled_search_configs(), upsert_candidate() (ON CONFLICT openalex_id,
  preserves curator status).
- du-jobs: publication-update (enrich every DOI) + publication-discovery (run
  search configs -> candidates), rate-limited ~6.7 req/s, registered only when
  OPENALEX_MAILTO is set (polite pool).

Verified live against OpenAlex: enriching a real DOI populated cited_by_count
(7082), open_access_status (hybrid), openalex_id, and a reconstructed abstract;
fake DOIs correctly 404 -> missing. Workspace 13/13 (decodingus) + 22 (shared) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Transactional email + secret retrieval, with the AWS SDK behind an optional
`aws` feature so the default build (and CI) stays lean.

- email::Mailer — Logging (default; logs instead of sending) and Ses (feature
  `aws`, Amazon SES v2). Plain-text send.
- secrets::{SecretSource, CachedSecrets} — Env source (SECRET_<NAME>, default) or
  AWS Secrets Manager (feature `aws`), wrapped in a 1-hour TTL cache (mirrors the
  legacy CachedSecretsManagerService).
- ExternalError::Aws.

Default build: logging mailer + env secrets + TTL cache, unit-tested.
`--features aws` compiles against aws-config 1.8 / sdk-sesv2 1.121 /
secretsmanager 1.106. Consumers wire later. Workspace 17/17 (decodingus) +
22 (shared) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…chema

Reworked the du-migrate transformers and reconciliation to match the real
production schema (db.schema) rather than the assumed shapes:

- variant: positional public.variant + variant_alias -> core.variant with
  JSONB coordinates ({build:{contig,position,ref,alt}}) and assembled
  aliases (common_names/rs_ids/sources); canonical_name/naming_status derived.
- biosample: unify public.biosample + citizen_biosample + pgp_biosample into
  core.biosample; fold biosample_original_haplogroup +
  citizen_biosample_original_haplogroup into original_haplogroups JSONB;
  citizen at_uri/at_cid -> atproto JSONB; source_platform/y/mt -> source_attrs.
- haplogroup: age bounds (formed/tmrca lower/upper) + age_estimate_source +
  description -> provenance JSONB (nulls stripped); cast valid_from/until from
  TIMESTAMP to timestamptz so rows decode (caught only with data, not by the
  schema-only pass).
- genomic_studies: version VARCHAR -> TEXT column; details TEXT -> JSONB;
  publication_ena_study -> pubs.publication_study.
- publication_biosample: collapse both std + citizen link tables onto sample_guid.

The ETL binary now applies target migrations itself (idempotent) before
transforming. reconcile.rs uses the real qualified table names.

Validated: schema-only against db.schema (0 column errors) and end-to-end
against a rewritten current-schema mock with seed data (all 10 aggregates
reconcile; JSONB/enum/link shapes spot-checked).

Note: the 35MB dump.sql predates several legacy migrations (citizen_biosample
at_uri rename, tree schema, *_result columns) so it can't validate the
current-schema ETL end-to-end; a current-schema dump or a read-only live-EC2
rehearsal is needed for full real-data validation before cutover.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate the identity/auth group from the production auth + public.users +
curator schemas into the redesigned ident schema:

- users (public.users -> ident.users), RBAC (roles/permissions/role_permissions/
  user_roles), AT Protocol identity (user_login_info/user_oauth2_info/
  user_pds_info), cookie_consents, and the atproto metadata caches.
- curator.audit_log -> new ident.audit_log (migration 0010); entity_id widened
  int -> bigint, old/new snapshots kept as JSONB.

Details:
- UUID PKs carry over 1:1 (no OVERRIDING SYSTEM VALUE; no sequence fixup).
- Pre-seeded base roles (Admin/Curator/TreeCurator) are relocated onto the
  legacy role UUIDs via ON CONFLICT (name) DO UPDATE SET id=... so user_roles
  and role_permissions FKs resolve to the migrated rows.
- password_hash left NULL: production auth is AT Protocol OAuth-only (there is
  no legacy password table).
- All legacy auth timestamps are `timestamp without time zone`; cast to
  timestamptz in the SELECTs so they decode into DateTime<Utc>.
- Dropped-in-redesign columns (authz client_id_metadata_document_supported,
  client_metadata client_uri) are simply not selected.

ETL binary runs the ident group first (users before any FK). reconcile.rs adds
11 ident checks (roles may legitimately show target>=legacy from base seeds).
mock-legacy.sql extended with the auth/curator schema + seed data; full run
reconciles all 21 aggregates and RBAC resolves end-to-end
(user -> role -> permission), audit JSONB round-trips, OAuth/PDS chain intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate the sequencing/coverage/pangenome group into the redesigned genomics
schema. This completes the production ETL surface (ibd/fed/social/billing are
not yet in production).

Tables: genbank_contig, sequencing_lab, sequencer_instrument,
test_type_definition, pangenome_graph/path, canonical_pangenome_variant,
sequence_library, sequence_file, alignment_metadata,
pangenome_alignment_metadata, reported_variant_pangenome, genotype_data.

De-sprawl / transforms (validated against db.schema):
- alignment_metadata: fold the 1:1 alignment_coverage child + inline Picard
  metrics (mean_coverage/pct_10x/...) + analysis provenance into one coverage
  JSONB. meanDepth/percent_coverage_at_10x (the keys du-db/coverage.rs and the
  public coverage page aggregate on) are always populated when a source exists;
  COALESCE prefers the samtools-style depth child over the Picard inline value.
- pangenome_alignment_metadata: same fold into metadata JSONB (+ path/node/
  region ids); reported_variant_pangenome: provenance/status/positions folded
  into haplotype_information JSONB.
- sequence_library: legacy lab name resolved to the migrated lab_id; at_uri/cid
  -> atproto JSONB; run_date timestamp -> date.
- sequence_file checksums/http_locations/atp_location JSONB already in the
  redesigned shape, carried verbatim.
- sequencer_instrument: redesign makes instrument_id UNIQUE -> DISTINCT ON dedup
  (drops the per-lab tie); genotype_data: skip soft-deleted rows, fold
  chip_version/build_version/source_file_hash/atproto into metrics JSONB.
- All legacy timestamps cast ::timestamptz. PKs preserved via OVERRIDING SYSTEM
  VALUE; sequences fixed up post-load.

Skipped (no production source; Navigator populates going forward):
instrument_observation, instrument_association_proposal,
coverage_expectation_profile, biosample_callable_loci. pangenome_node is dropped
(folded into node-id arrays).

reconcile.rs adds 13 genomics checks (sequencer_instrument compares
count(DISTINCT instrument_id); genotype_data filters deleted). mock-legacy.sql
extended with the full genomics schema + seed data; ETL reconciles all 34
aggregates and the public coverage benchmark query resolves end-to-end against
the migrated JSONB.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build the read-only /api/v1 surface (the Tapir replacement). 13 endpoints,
clean DTOs decoupled from the internal domain types, described with utoipa;
OpenAPI 3 document at /api/openapi.json and Swagger UI at /api.

Endpoints:
- GET /api/v1/y-tree, /api/v1/mt-tree (?rootHaplogroup=) — nested tree
- GET /api/v1/coverage/benchmarks
- GET /api/v1/references/details (paginated), /{publicationId}/biosamples
- GET /api/v1/biosample/studies
- GET /api/v1/variants (paginated), /{variantId},
      /api/v1/haplogroups/{name}/variants
- GET /api/v1/variants/export (live CSV), /export/metadata
- GET /api/v1/genome-regions (builds), /{build}

du-db additions backing the new endpoints:
- haplogroup::subtree — recursive CTE over current edges, assembled into a
  nested tree in-process (with a depth guard against cyclic tree-merge data)
- variant::for_haplogroup_name, variant::export_all, variant::count
- genome_region::distinct_builds, genome_region::for_build (jsonb_exists)
- new du-db::study module: studies with linked samples (study -> publication
  -> biosample, aggregated as JSONB)

Notes:
- DTOs surface JSONB (coordinates/source_attrs/provenance) as untyped objects
  and the hot alias fields (common_names/rs_ids) as typed arrays.
- utoipa kept out of the shared du-domain crate (Navigator/edge consumers);
  DTOs + From impls live in du-web.
- HaplogroupNodeDto.children uses #[schema(no_recursion)] to stop utoipa's
  schema walk from overflowing on the self-reference.
- The /manage/*, PDS, and IBD API groups are intentionally omitted (curator/
  federation surfaces tied to subsystems not yet built).

Smoke-tested live against the ETL mock DB: all 13 endpoints return correct
JSON/CSV, 404s resolve, openapi.json lists 13 paths + 12 schemas, Swagger UI
serves 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build the change-set versioning subsystem (the TreeVersioning half; the merge
algorithm that produces change-sets lands separately).

du-db::change_set — lifecycle + apply engine:
- Lifecycle: DRAFT -> READY_FOR_REVIEW -> UNDER_REVIEW -> APPLIED, DISCARDED
  from any non-applied state. start_review/discard/review_change/approve_all
  gate on status.
- apply(): in one transaction, writes APPROVED tree_changes to the production
  tree via the temporal edge model and marks the set APPLIED. Per type:
    CREATE   -> insert node (+ edge under an existing parent, + variant links)
    UPDATE   -> COALESCE-update node metadata
    DELETE   -> expire node (valid_until) + close current edges/variant links
    REPARENT -> close current parent edge, open a new one
    VARIANT_EDIT -> add (insert) / remove (close) current variant links
- diff(): ADDED/REMOVED/MODIFIED/REPARENTED entries + summary from the set's
  non-rejected changes. list/get/comments/add_change round out the module.

Temporal correctness: the node itself is temporal (valid_from/valid_until), so
DELETE expires it and the tree-navigation queries (roots/children/subtree) now
exclude expired nodes — a deleted node vanishes instead of resurfacing as a
stray root.

du-web: curator-gated JSON management API at /api/v1/manage/change-sets/* (list,
create, detail, add-change, start-review, apply, discard, comments, approve-all,
per-change review, diff). Gated by the session Curator extractor (legacy used an
X-API-Key); not part of the public OpenAPI doc. DbError::Conflict now maps to
422 instead of 500.

Tested: an integration test drives the full lifecycle on a live DB — seeds
ROOT->{A,C,D}, builds a set (create B under ROOT w/ variant, reparent A under C,
add a variant to C, update A, delete D, plus one rejected change), applies it,
and asserts the temporal tree: B created, A moved off ROOT and under C, D gone
from navigation, variant links current, the rejected change absent, exactly one
current parent edge for A, diff counts, and re-apply rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Identify-Match-Graft re-implementation landed in decodingus-shared
(du-domain::merge) with curated fixtures. Remaining: materialize a MergePlan
into a change-set + WIP staging, the WIP apply path, and merge/preview endpoints.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… endpoints

Connect the pure du-domain merge algorithm to the versioning engine so a merge
produces a reviewable, applicable change set.

- du-db::merge::materialize: turns a MergePlan into a READY_FOR_REVIEW change
  set. Each MergeOp becomes a tree_change; new-node placement uses the
  placeholder mechanism (a CREATE carries its negative placeholder; attaching
  ops carry *_placeholder refs). Variant *names* from the plan are resolved to
  core.variant ids (get-or-create as UNNAMED). MatchMetadata is informational
  and omitted from the set.
- change_set apply: now threads a placeholder->real-id map through the apply
  transaction, so CREATE/REPARENT can reference nodes created earlier in the
  same set (new-under-new chains, contraction-under-new-intermediate). An
  unresolved placeholder (its CREATE was rejected) fails the apply with a clear
  conflict instead of corrupting the tree.
- haplogroup::existing_tree: loads the current production tree (current nodes/
  edges/variant links) as a nested du_domain::merge::ExistingNode forest — the
  algorithm's "existing tree" input.
- du-web: curator-gated POST /api/v1/manage/haplogroups/merge (run + materialize)
  and /merge/preview (dry-run: return plan + ambiguities, no writes).

End-to-end test (existing_tree -> merge -> materialize -> review -> apply):
- new-subtree chain: R extended by R1b -> L21 via a placeholder chain; both
  created under the right parents with their variants linked.
- node contraction: existing coarse RC(M343,L23,L51) split by source R1b(M343);
  new R1b inserted between R and RC, RC reparented under it, M343 downflowed off
  RC (RC keeps L23/L51), R1b carries M343.

This completes tree versioning end-to-end (the WIP shadow tables remain for a
future richer curator-staging UI; merge output uses the simpler placeholder
path through the tested apply engine).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JamesKane and others added 29 commits June 12, 2026 17:18
The roadmap mixes AppView and Navigator concerns. Clarify: the AppView only
cares that Y/mt calls are reliable enough to build the shared genealogy
components — reliability = coverage conformance (done) + the cross-technology
consensus (fed.haplogroup_reconciliation, the remaining AppView piece). The
per-test-type taxonomy/tracking, chip parsing, marker-coverage and accuracy
machinery are Navigator's (the Edge tracks by test); IBD is the D1/D3 track.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fed.haplogroup_reconciliation (a donor's call reconciled across all its
sequencing technologies: consensus_haplogroup + confidence + snp_concordance +
run_count) was mirrored but read by nothing, and the report's "FedConsensus"
was actually a single fed.biosample call, not the consensus.

- biosample::report_by_guid reads the reconciliation via the citizen's repo
  DID (reconciliation.did = core.biosample.atproto->>'repo_did' + dna_type,
  best by run_count/time_us) — no schema change. Call precedence becomes
  Reconciled -> FedConsensus -> Original; HaplogroupCall + HaplogroupPathwayDto
  gain confidence/run_count/snp_concordance/compatibility_level + a Reconciled
  origin. The report card shows "consensus . N runs . confidence . concordance"
  (i18n en/es/fr).

Test: a Y reconciliation outranks the single fed.biosample call and carries
its reliability; mt with no reconciliation falls back.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In the AppView the cross-technology consensus drives tree evolution, never
individual runs. The discovery engine now factors each contributor's
fed.haplogroup_reconciliation reliability (via the citizen's repo DID).

- Exclude: a contributor whose consensus confidence is below
  min_consensus_confidence (0.5) or whose reconciliation is INCOMPATIBLE is
  dropped from pooling. Un-reconciled samples are kept (un-gated), treated as
  neutral reliability (1.0) so the unknown isn't penalized.
- Down-weight: a w_reliability term (cluster mean consensus reliability) folds
  into the proposal confidence blend, so branches built on shaky calls score
  lower and need more support to reach READY_FOR_REVIEW.
- mig 0031 merges min_consensus_confidence + w_reliability (weights
  renormalized to 0.35/0.2/0.25/0.2) into the 0029 discovery_config seed.

Closes the AppView's multi-test-type ask (coverage conformance + consensus).
Test: low-confidence contributor excluded (count drops); a modest-but-kept
cohort is pulled below READY.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold scope-control into the candidate-mining design so the AppView never
materializes an N×N pair list and never hands a Navigator client "everyone".

- §3.0 (new): block, don't pair. Ancestry blocking (super-population +
  PCA-coordinate grid/LSH on fed.population_breakdown) before scoring; match-
  graph 2-hop expansion as the steady state; bounded top-K per sample.
- §3b: the overlap gated_pairs set is the ancestry block (super-pop + PCA cell
  + haplogroup bucket), scored within-block + persisted incrementally — never
  the full N².
- §3c: matches-of-matches is the primary generator (Leeds/AutoCluster shared-
  match principle), with the endogamy caveat (cap + down-weight).
- §3d (new): cold start = query-vs-panel (RaPID-Query-class, Edge-side), not
  panel-vs-panel; the AppView supplies the panel subset.
- §3e (new): research backing + the detection-vs-selection split (PBWT/RaPID
  is the Edge's job; the AppView holds no genotypes).
- §13: flag the D1-independent first slice (block + graph expansion -> ranked
  match_suggestion is buildable now, ahead of D1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t slice)

The D1-independent first slice of D3. The AppView coordinates IBD by proposing
introduction candidates from anonymized fed.* aggregates — no genotypes, no
exchange channel. The load-bearing rule (D3 §3.0): never materialize N×N, never
hand a client "everyone".

du_db::ibd::recompute_suggestions (advisory-locked, declarative — mirrors the
sequencer/discovery engines). Three signals -> ibd.match_suggestion:
- population overlap (Σ min over fed.population_breakdown.components), computed
  ONLY within ancestry blocks (dominant super-population × a z-scored PCA grid
  cell, scale-free); caches into ibd.population_overlap_score.
- shared terminal Y/mt consensus_haplogroup (fed.haplogroup_reconciliation via
  the repo_did bridge), score = inverse cohort frequency (rarer = deeper).
- shared-match: 2-hop over ibd_discovery_index (in-common-with/Leeds signal;
  dormant until the graph has edges).
Combine per pair (weighted), rank per target, cap top-K (the no-N:N guarantee),
write both directions; declarative recompute preserves DISMISSED/CONVERTED.
suggestions_for reader. Job: run-once ibd-discovery-recompute + daily.

Engine-only — no public API (candidate pairs gate on the D1 consent flow).
Test: a cross-continental pair is blocked; within-block overlap + shared
haplogroup are suggested; idempotent; dismissed pair not re-suggested.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The AppView side of D1, the shared substrate gating the Match (D3) and
Platform (D2/D4/D5) tracks. The AppView is a PII-free broker: it records
consent, mirrors published X25519 keys, gates dual-consent into a session,
and blind-relays opaque ciphertext — it never sees plaintext or session keys,
so it needs no X25519/AEAD crypto (that's Navigator's du-exchange crate).

Auth is Ed25519 DID signatures, not OAuth/cookie: every Edge submission signs
a canonical message (du_db::exchange::messages, a cross-repo contract) with its
DID identity key; the broker verifies via du_atproto::verify_did_key (did:key
direct, did:plc/web resolved). So D1 does not wait on the OAuth joint test.

- mig 0032: exchange.* schema (request/consent/session/relay_envelope/
  publickey); the unused ibd.match_request/match_consent (mig 0007) fold in
  and are dropped (the candidate engine's match_suggestion/ibd_discovery_index
  remain).
- du_db::exchange: publish/fetch key, create_request, record_consent with the
  dual-consent gate (both affirmative -> CONSENTED + ESTABLISHING session,
  7-day TTL; false -> DECLINED), pending_for (exchange-ready), blind relay
  post/pull/ack (participant-gated, recipient-only ack, delete-on-ack), expire.
- du-web /api/v1/exchange/* (signature-verified, not in public OpenAPI):
  key, request, consent, pending (signed poll, replay window), relay post
  (signed over the blob SHA-256, 1 MiB cap), relay/pull, relay/ack.
- du-jobs exchange-expire run-once + hourly (TTL sweep).

Tests: dual-consent -> session, decline, idempotency, relay round-trip,
participant gate, TTL cascade; du-web did:key-signed request verifies and a
tampered signature -> 403.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nodes

The AppView side of D2: a vendor-neutral pseudonymous "person" node co-admins
attach project memberships + merge-links to, without the AppView ever learning
a name, kit number, or hash of one. Identity resolution stays Edge-to-Edge over
D1 (id-list exchange) / D3 (genetic). The doc's AppView-side kit# hashing is
rejected as brute-forceable; this stores no identifiers at all.

- mig 0033 research.*: research_subject (UUID + custody_did + retired_into
  tombstone), subject_membership -> social.group_project, subject_link audit,
  sparse subject_biosample. Every column pseudonymous (UUID/DID).
- du_db::research: register_in_project (mint or attach an id-exchange-agreed
  id, idempotent membership); merge_subjects = TOMBSTONE not delete (record
  the link, repoint retire's memberships/biosamples to keep, set retired_into
  so a local holder of the retired pseudonym still resolves it); set_custody
  (member-claim flip); link_biosample; authz readers (project_owner,
  is_steward_of, is_project_participant). Canonical signed messages.
- crate::sig::verify_signed extracted from routes/exchange.rs (shared D1/D2
  Ed25519 DID-signature auth; did:key direct, did:plc/web resolved).
- du-web /api/v1/research/* (not in public OpenAPI): subject (register), merge,
  custody, subjects (signed poll). Each signature-authenticated AND authorized
  from existing data: register -> project owner; merge -> steward of both;
  custody -> subject's steward; read -> project participant. Extends to
  project-admins under D5 (no member table needed now).

Tests: du-db register/tombstone-merge/custody/authz + self-merge reject;
du-web owner-gated signed register (200), non-owner valid sig -> 403, tampered
sig -> 403.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The group-project ACL: the collaborator team (DIDs + roles) is the consent/
scope boundary that gates the stack. Reuses the existing social.group_project
(mig 0009) as the project (no duplicate research.project); owner_did is the
founding ADMIN.

- mig 0034: research.project_member (project_id -> social.group_project,
  member_did, role, permissions[], appointed_by, joined_at, left_at).
- du_db::research ACL: Role {ADMIN/CO_ADMIN/MODERATOR/CURATOR} + Capability +
  Role::allows (the D5 §4 map); role_of (owner_did => ADMIN, else live
  project_member), is_team_member, can, add_member/revoke_member(left_at)/
  members_of; canonical signed messages.
- Wired in: D2 register is ManageSubjects-gated (ADMIN/CO_ADMIN; owner still
  passes), subjects read is team-gated; D1 project:<id>-scoped request +
  consent require the actor be a live team member (exchange::request_meta +
  project_scope_id). Non-project scopes unaffected.
- Team endpoints /api/v1/research/project/{member, member/revoke, members} —
  signed (crate::sig) + ManageRoles-gated (members list team-gated).

Tests: du-db role_of/add/revoke/capability-map; du-web admin-gated add-member
(non-admin -> 403) + project-scoped exchange request requires team membership.
Forward-only capabilities (WriteAssertions/ResolveDispute/PromoteToCatalog) are
defined for the cross-repo contract, enforced when D4 lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t_view

The last Platform-track piece. Co-admin research is modeled as attributed,
append-only, scoped assertions over a pseudonymous research_subject (D2), not
direct row mutation — gated by the D5 ACL (WriteAssertions/ResolveDispute).

- mig 0035: research.assertion + research.subject_current_view (PK
  subject_id,predicate,scope — per-project isolation; no PII column by design).
- du_db::research: Predicate enum + PII classifier (MDKA_IS/IDENTITY rejected →
  R3 P2P only, no AppView table; NOTE PII-by-default unless pii_cleared; scan_pii
  scrubber), record/retract_assertion + refold (SETTLED|DISPUTED, never
  auto-collapsed; assign-and-prune), accept_same_person (drives D2
  merge_subjects method=ASSERTION), messages::{assert,retract,resolve}.
- du-web: /api/v1/research/{assertion,assertion/retract,assertion/resolve,
  current-view} — signed (crate::sig) + role-gated.
- Tests: du-db assertion_store_fold_and_rails; du-web assertion_endpoints_gated.

Deferred (Navigator/later): R3 PII over D1 + assertion_local; R1 lexicon +
du-jobs Jetstream ingest (no publisher yet); tree.change_set promotion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The entry point of the federated IBD flow. The candidate engine already
writes ibd.match_suggestion from fed.*; this serves it to clients — and
needed NO new auth foundation: the existing Ed25519 signed-poll pattern
(verify_signed + messages::poll + 300s window) plus the
core.biosample.atproto->>'repo_did' bridge the engine itself uses give an
owner-DID-scoped read with no DPoP/OAuth and no unauthenticated stopgap.

- du_db::ibd: messages::{poll,introduce}; suggestions_for_did (owner-scoped
  via the repo_did bridge); is_suggested_to_did (introduce authz);
  owner_did_of_sample (server-side counterpart resolution).
- routes/ibd.rs (signed, personal scope — not project-scoped):
  GET /api/v1/ibd/suggestions → pseudonymous rows (suggested_sample_guid +
  non-PII {signals}; never a counterpart DID);
  POST /api/v1/ibd/introduce → resolves counterpart server-side, calls
  exchange::create_request (purpose=IBD_AUTOSOMAL, idempotent request_uri),
  returns only {request_uri, PENDING}. Caller learns the counterpart DID
  only after mutual consent via exchange::pending_for.
- Tests: du-db suggestions_scoped_by_owner_did; du-web
  suggestions_scoped_and_introduce_hides_counterpart.

No migration (reuses ibd.match_suggestion, core.biosample, exchange.*).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e IBD loop

The introduce→consent loop had no way for a recipient to discover a request
addressed to them (/pending only shows established sessions). This adds the
missing counterpart-discovery path, unblocking the Navigator round-trip.

- du_db::exchange::incoming_for(did) → PENDING requests addressed to `did`
  that it hasn't acted on. Symmetric-blind: returns request_uri + purpose +
  created_at, deliberately NO initiator_did — the recipient consents blind;
  both sides reveal identity only on mutual consent (via pending_for).
- GET /api/v1/exchange/incoming (signed poll, reuses exchange-poll message).
- IBD introduce now mints an opaque request_uri = urn:ibd:<sha256(did:guid)>
  (was urn:ibd:<did>:<guid>) so the handle can't leak the initiator DID to
  the recipient. Still deterministic ⇒ idempotent per caller+candidate.
- Tests: du-db incoming_surfaces_pending_to_recipient_only (recipient sees it,
  initiator doesn't, clears once acted on); du-web ibd test extended — the
  counterpart discovers the introduced request via /incoming, blind to the
  initiator, and the handle embeds no DID.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ndation)

Fixes the gap that made signed Edge calls unusable by real clients: the
did:plc DID-doc #atproto signing key is PDS-custodied, so a desktop client
can't sign with it and can't add its own verificationMethod. verify_signed
only worked for did:key.

Clients now publish their Ed25519 device PUBLIC key as a
com.decodingus.atmosphere.deviceKey record in their own repo (repo-write =
proof of control over repo_did); the AppView ingests it like any fed.*
record and verifies signatures against it. Per-call auth stays OAuth-free.

- mig 0036: fed.device_key (PK (did,rkey) ⇒ N devices/DID; public_key as a
  did:key string; PII-free).
- du_db::fed::device_key: upsert (time_us-ordered) + keys_for; NS_DEVICE_KEY
  in INGEST_COLLECTIONS + table_for (so fed::delete = revocation).
- du-jobs jetstream: ingest arm for the deviceKey collection.
- du-web::sig::verify_signed(pool, ...): did:key self-certifies; did:plc/web
  match any registered device key (none ⇒ 403 bootstrap); DID-doc resolution
  dropped (no per-call network). All 18 signed call sites thread &st.pool.
- Tests: du-db device_key (lookup/ordering/revocation); du-web sig inline
  (did:plc gated on registration, did:key still self-certifies).

Cross-repo contract: NSID com.decodingus.atmosphere.deviceKey, record field
publicKey = did:key string. Revoke by deleting the record.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Attaches paper biosamples as leaves under the tree node their published
haplogroup call resolves to, with a cumulative per-node sample count and a
click-through leaf list. D2C (source='CITIZEN') excluded.

- mig 0037: tree.haplogroup_sample (PK (sample_guid, dna_type) ⇒ Y now +
  mt later; haplogroup_id NULL when UNPLACED).
- du_db::tree_sample: recompute_placements(dna) — advisory-locked
  declarative engine that resolves core.biosample.original_haplogroups
  calls via haplogroup::resolve_name_or_variant (name→alias→defining-SNP→
  normalize), assigns PLACED/UNPLACED, prunes, bumps tree_revision;
  counts_by_node + samples_under (recursive-CTE, at-or-below + citation).
  biosample::pick_original_call made pub(crate) for reuse.
- du-web/api.rs: HaplogroupNodeDto.sample_count (cumulative, rolled up in
  build_level); GET /api/v1/{y,mt}-tree/node/{name}/samples leaf list.
- du-jobs: run-once tree-samples-recompute + daily (Y).
- Tests: du-db tree_sample (resolution paths, D2C excluded, UNPLACED,
  cumulative, prune); du-web tree_carries_sample_count_and_leaf_list.

Follow-up (deferred): HTML cladogram per-node count + sidebar; mt recompute
once the mt tree lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The JSON API already served YFull-style sample placement; this adds the
presentation layer to the server-rendered SVG cladogram + SNP sidebar.

- du_db::tree_sample::cumulative_counts(dna): recursive CTE rolling each
  placed sample up to all ancestors → full-tree at-or-below counts, so a
  depth-bounded window node still counts hidden descendants.
- tree_layout::{InNode,LaidNode}.sample_count threaded through
  routes/tree.rs build_root/to_innode; svg.html shows "· N samples" on each
  node (conditional, on the variants line that opens the sidebar).
- snp_sidebar handler + template list the node's placed leaves via
  samples_under (label + source badge + citation; capped 50 + "+N more").
- i18n keys tree.samples / tree.samples.title / tree.samples.more (en/es/fr).
- Tests: du-db cumulative_counts assertion; du-web
  cladogram_shows_sample_count_and_sidebar_leaves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preseeds genomics.sequencing_lab + sequencer_instrument so the lab lookup
(/api/v1/sequencer/lab) resolves the common D2C instruments from launch.
Derived from instrument_centers.tsv: rows with n_crams > 2, assigned to the
max-frequency lab (NO_CSV placeholder + the blank-instrument row dropped).

- 5 labs (canonical full names, is_d2c): Family Tree DNA, Dante Labs,
  Nebula Genomics, Full Genomes Corporation, YSEQ.
- 36 instruments (FTDNA 6, Dante 6, YSEQ 11, FGC 11, Nebula 2); model_name =
  the export platform, manufacturer derived (Illumina/MGI/NULL).
- Idempotent: labs ON CONFLICT (name) DO NOTHING (coexists with any
  consensus/ETL lab), instruments ON CONFLICT (instrument_id) DO UPDATE.
  No schema/code change — lookup_lab reads the seeded tie.
- Test seed_preloads_ydna_warehouse_labs; prior lookup test made seed-aware.

Verified on the dev DB (which had 0 labs — legacy public.sequencing_lab is
empty, so no collision).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-ups that complete two recent features, plus housekeeping.

IBD:
- engine records the shared haplogroup's DNA arm (hgDnaType) in suggestion
  metadata; introduce routes the exchange purpose from the dominant signal
  (HAPLOGROUP → IBD_Y/IBD_MT, else IBD_AUTOSOMAL) via introduction_purpose,
  which also replaces the bare is_suggested_to_did authz and is returned in
  the response.
- introduce marks the suggestion CONVERTED (drops from the active list,
  still idempotently re-introducible).
- new POST /api/v1/ibd/dismiss (signed ibd-dismiss) → dismiss_suggestion
  sets ACTIVE→DISMISSED; the engine already preserves DISMISSED.

Tree sample leaves:
- status='CURATED' for manual placements, which recompute_placements
  preserves (skips re-resolution + prune); counts/samples_under treat
  PLACED+CURATED as placed.
- Curator-gated GET /manage/tree-sample/unplaced (triage queue) +
  POST /manage/tree-sample/place (pin a sample under a chosen node).

Housekeeping:
- .DS_Store added to root .gitignore.
- fix sequencer_endpoints_route_and_404: the 0038 seed makes
  lab-instruments non-empty (latent break the seed arc's du-db-only run
  missed).

Tests: du-db introduction_purpose_and_dismiss_convert_lifecycle, tree_sample
CURATED-survives-recompute + triage; du-web introduce-purpose + dismiss +
curator-gating. Full suites green (du-web 27/27).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the per-node "· N samples" count with actual sample leaves: each
placed (non-D2C) sample hangs off its haplogroup node as its own terminal
tip — a small distinct marker + label — like YFull.

- du_db::tree_sample::direct_labels: placed-sample labels per node id
  (bounded to the rendered window).
- tree_layout: LaidTip + Laid.tips; tips laid out as full node-slot leaves
  (spaced like any other leaf, so labels never collide in either
  orientation); a node centers over its children AND its tips; tip
  connectors share the same vertical bus as the node's child connectors.
- routes/tree.rs: thread per-node sample labels into the layout, capped at
  8 tips/node with a "+N" overflow tip that opens the sidebar.
- svg.html: render tips (green dot + monospace label; grey "+N" overflow);
  drop the count text from the node box.

The JSON tree API keeps its own sample_count (unchanged). Test asserts the
SVG renders sample tips.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the ISOGG-import tree foundation with the de-novo IQ-TREE/ASR
tree built from genotypes (~/Genomics/ytree). The de-novo tree is the
foundation — nothing grafts onto it.

- du_db::denovo: foundation loader. Matches each branch SNP to
  core.variant by hs1 coordinate (reuse YBrowse catalog name, else mint
  chrY:<pos><anc>><der>); names nodes label → catalog-SNP → NodeN;
  inserts nodes/edges/defining links with an "unresolved" provenance
  block for collapsed-branch SNPs lifted to the nearest survivor.
- Phase 3 leaves: places doc.tips[] via tree.haplogroup_sample,
  get-or-create core.biosample by accession (deduped across Y/mt); PRJEB*
  → public EXTERNAL, own WGS229 → private.
- Phase 4 curation: tree.denovo_conflict (mig 0039) + read-only Curator
  queue at /curator/denovo-conflicts (page+HTMX, lineage filter); dc.*
  i18n in en/es/fr; dashboard card.
- haplogroup: reset_tree (greenfield clear) + dna-scoped clear_dna so Y
  and mt coexist; recompute_backbone seeds on macro-clade isogg labels.
  reconcile_tilde_twins now folds only empty-stub paragroup twins.
- decodingus-tree-init --denovo-y/--denovo-mt <json> --apply.

Tests: denovo.rs (catalog reuse/mint, naming, edges, unresolved block,
tip placement, conflicts, Y+mt coexistence, dna-scoped clear),
tilde_twins.rs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builder added an `n_mut ≥ 1` clause to the publication keep rule:
survive iff (UFBoot ≥ 95 AND ≥1 defining mutation) OR keepset. This
drops the zero-mutation mt placeholder nodes (Node82/110/…) that UFBoot
over-supported; named children reattach to the parent as polytomies, so
no tips are lost and every named clade survives. chrM: 2,015 → 1,765
nodes / 3,344 tips. No loader/exporter change — survival is read from
the publication treefile, so re-export + reload suffices.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the IBD coordination loop and activate the dormant shared-match
signal. After two consented Edges finish their encrypted comparison,
each posts a signed, PII-free attestation of the outcome; once both
agree, the match is confirmed and becomes discoverable.

- mig 0040: adapt ibd.ibd_pds_attestation to DID auth — attesting_did,
  exchange_request_uri, per-party reported_total_cm/segments; drop the
  legacy attesting_pds_guid NOT NULL; idempotency unique index.
- du_db::ibd::record_attestation + messages::attest + AttestationOutcome.
  Privacy rails: the attester must be a party to a CONSENTED IBD exchange
  and own its side of the pair (the other party owns the counterpart) —
  so every match-graph edge traces to a real dual-consent, no forged
  edges. Consensus: both non-dispute reports within max(10cM,20%) →
  CONFIRMED + is_publicly_discoverable; DISPUTE/REVOCATION → DISPUTED.
  Signal 3 (shared-match) now reads only publicly-discoverable edges.
- POST /api/v1/ibd/attest (signed; 403 on reject).
- depth_score: weight the haplogroup signal by the shared clade's tree
  depth (rarity × d/(d+half), half=8) via a recursive-CTE name→depth
  walk — sharing a deep terminal ≫ sharing a macro-clade. Enabled by the
  de-novo tree.

Tests: ibd_attestation.rs (consensus + shared-match activation,
rejection rails, depth ordering); routes/ibd.rs attest endpoint
(signed / bad-sig 403 / non-consented 403).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Rust rewrite under rust/ is the application; retire the Scala 3 / Play
codebase it replaces. Removes the app source, build, deploy, and CI:

- app/ conf/ test/ project/ build.sbt public/ — Scala source, Play config
  (routes, evolutions, i18n), tests, sbt build, static assets
- Dockerfile docker-compose.yml docker-compose.prod.yml docker/ .dockerignore
  — sbt-stage image + SLICK/Play compose (the decodingus-db service, not the
  du-pg dev container the Rust app uses)
- .github/workflows/ci.yml — sbt/JDK21 CI
- PROJECT_ANALYSIS.md .env.example — Scala-era docs/config

Kept: rust/ (the app), documents/ (design docs), scripts/ (deploy-generic
maintenance page), README, LICENSE, CODE_OF_CONDUCT. 735 files, ~81.7k lines.
Rust workspace builds unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the Scala-coexistence framing (the legacy codebase is gone — the
Rust app is now the platform), describe the trees as de-novo IQ-TREE/ASR
phylogenies (not the retired ISOGG/FTDNA graft), remove the deleted app/
and Docker references, and update the repository layout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The de-novo loader is the sole tree-building path now, so remove the
subsystem orphaned by that pivot — reachable only through dead
tree-init CLI flags, never the running app:

- du-db/src/snp_graft.rs (1032 lines, the ISOGG/prod SNP-graft engine)
- the ISOGG/graft half of tree_init.rs (763 -> 90 lines: just the
  --denovo-{y,mt} loader remains)
- reconcile_tilde_twins + reset_tree from haplogroup.rs (de-novo uses
  the per-lineage clear_dna)
- the graft-fed curator-review vertical: du-db/src/wip.rs,
  du-web routes/reviews.rs + templates, change_set::apply_wip_resolutions
  (snp_graft::stage_review was its only producer; de-novo curation goes
  through /curator/denovo-conflicts) + reviews i18n in all three locales

Kept live: du_db::merge::materialize + du_domain::merge + change_set
(the /versioning + /change-sets tree-merge routes) — proven intact by
merge_e2e. ~-2.9k lines. Build + clippy + change_set/merge_e2e/denovo/
migrations + 28 du-web tests all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 1717-line public-API god-file mixed every wire DTO, the tree
assembly/cache logic, and 43 handlers. Decompose by concern:

- api/dto.rs  — all response/request DTOs + query-param structs
- api/tree.rs — haplotree assembly, ETag/conditional-GET cache, and the
  /api/v1/{y,mt}-tree[...] handlers
- api/mod.rs  — router, the single utoipa OpenAPI doc, and the remaining
  (non-tree) handlers

The central ApiDoc still references every handler/DTO, so mod.rs
re-imports the submodules (use dto::*, use tree::*). Behavior-preserving:
all 28 du-web tests pass, no new clippy warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the retired Scala sbt CI. One ubuntu-latest job, working in
rust/, with a PostGIS service (postgis/postgis:16-3.4) + DATABASE_URL so
the DB-backed tests run for real:

  cargo build --workspace --locked
  cargo clippy --workspace --locked -- -D warnings   # lib + bins
  cargo test  --workspace --locked

Runs on push to main/rust-rewrite-foundation and all PRs; caches via
Swatinem/rust-cache; shared crates are public https git deps (no secrets).
No fmt gate (the codebase uses intentional hand-formatting).

To make the strict clippy gate green, fix the only two warnings:
a redundant `&mut **tx` deref (merge.rs) and a complex map type factored
into VariantKey/ResolvedVariant aliases (denovo.rs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the early stub (which listed vars the code never reads, e.g.
OPENALEX_BASE_URL / RECAPTCHA_ENABLED) with the actual environment the
app reads, grouped by concern and annotated with defaults + required vs
optional: core (DATABASE_URL, APP_SECRET, PORT, RUST_LOG, DU_BASE_URL,
DU_ASSETS_DIR), AT Proto OAuth, curator/forms, du-jobs (Jetstream /
YBrowse / yregions), and external APIs. Only DATABASE_URL is required;
the rest degrade gracefully.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The first simplify pass deleted the ISOGG-import tree-init CLI flags but
left the du-db functions they backed with no remaining production caller
(test-only or zero callers). Remove them — all obsolete with the de-novo
tree pivot:

- haplogroup.rs: scrub_recurrent_links, label_recurrence_transitions,
  rename_to_snp_shorthand, set_aliases + the YCC-rename helper cluster and
  its inline test module (1744 -> 1029 lines)
- variant.rs: set_coordinates_bulk, set_aliases_bulk,
  resolve_isogg_recurrence (763 -> 557 lines)
- 4 dead test files: scrub_recurrent, recurrence_label, rename_shorthand,
  resolve_recurrence

Kept variant::{upsert_by_name, delete_by_evidence_source}: 0-caller too,
but YBrowse-ingest API, not ISOGG orphans. ~-1.4k lines. Build + strict
clippy (-D warnings) + all du-db tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JamesKane JamesKane merged commit 338994a into main Jun 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant