Rust rewrite foundation#8
Merged
Merged
Conversation
Foundation milestone for the Play/Scala 3 -> Rust rewrite (plan: clean-slate, full parity, Axum + SQLx + Askama, JSONB document columns, noodles for genomics, Apple `container` PostGIS for Docker-less local testing). - 8-crate Cargo workspace (du-domain/db/bio/atproto/external/web/jobs/migrate); compiles and tests green. du-web boots Axum with the /health endpoint. - du-domain: typed IDs, Postgres-mirroring enums, and the variant JSONB contract (Coordinates/Aliases/Annotations) with round-trip tests. - Redesigned schema (migrations 0001-0009) across 10 namespaces, verified applying to live PostGIS. De-sprawl: 3 biosample tables -> 1 unified core.biosample; ~7 deprecated child tables folded into JSONB on parents; metadata DB collapsed into `fed`; scattered at_uri/at_cid -> one `atproto` JSONB column; GIN/GiST/expression indexes on queried JSONB paths. - du-db: PgPool + run_migrations; live-DB integration test (gated on DATABASE_URL) covering all schemas + JSONB variant round-trip. build.rs watches migrations/ so sqlx::migrate! re-embeds on change. - scripts/test-db.sh: Apple `container` PostGIS harness (IP-aware, since Apple containers have no localhost port forwarding), native DATABASE_URL fallback. - Multi-stage Dockerfile (slim runtime, single binary, no JRE), compose.yaml, .env.example. Coexists with the Scala app under rust/ during the transition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Data-access layer for the public read surface, with a reusable mapping pattern:
JSONB columns decode into du-domain payload structs via sqlx Json<T>; Postgres
enums are read as ::text and parsed through serde (parse_pg_enum / pg_enum_label),
keeping du-domain free of any sqlx dependency.
- du-domain: Haplogroup, Publication, Biosample read-side types.
- du-db modules:
* variant - get_by_id, paginated search (canonical name + common_names/rs_ids
JSONB alias arrays)
* haplogroup- get_by_id/by_name, children, roots (current edges, valid_until IS NULL)
* publication - get_by_id, paginated search (title/journal/doi, newest-first)
* biosample - get_by_guid, find_by_alias_or_accession
* pagination - generic Page<T>
- tests/queries.rs: seeds sentinel rows, exercises every module against live
PostGIS, cleans up. Full workspace 7/7 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ree)
First user-visible slice through Axum + Askama + HTMX, server-rendered with
HATEOAS fragment navigation, verified end-to-end against the live PostGIS:
- du-web restructured: AppState (PgPool), AppError -> HTTP mapping, explicit
Askama render helper, routes/ module. main() connects+migrates when
DATABASE_URL is set, else serves /health only.
- Variant browser: /variants page lazy-loads /variants/list fragment (search by
name + common_names/rs_ids JSONB aliases, paginated); rows load
/variants/detail/:id panel rendering multi-build coordinates.
- Tree: /ytree & /mtree pages lazy-load /{y,m}tree/fragment; clicking a node
swaps #tree-container and pushes the URL (hx-get + href + hx-push-url) so
browser back/forward walks the tree. Graceful empty states + 404s.
- du-domain: Display/label() for DnaType/MutationType/NamingStatus (templates).
- du-db: re-export PgPool so du-web needn't depend on sqlx; migrations test uses
a sentinel variant name to avoid colliding with real data.
- Askama templates (base/index/variants/tree) on Bootstrap 5 + htmx 2.
Workspace 7/7 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation) Bring the read-only slice up to plan §4: Vendored assets: bootstrap 5.3.3 (css + bundle js) and htmx 2.0.4 under crates/du-web/assets/vendor, served via tower-http ServeDir at /assets (DU_ASSETS_DIR env, compile-time crate-dir fallback). Custom main.css replaces inline styles. CDN links removed; Dockerfile copies assets + sets DU_ASSETS_DIR. i18n (en/es/fr): Play-style key=value catalogs embedded via include_str; Lang+T translator with en/key fallback; Locale extractor resolves lang from the `lang` cookie then Accept-Language (default en); navbar language switcher with percent-encoded `next`; GET /language/:lang sets the cookie + redirects with an open-redirect guard. Templates fully localized. Unit tests assert the fallback chain and that es/fr cover every English key. HX-Request unification: HxRequest extractor (htmx + history-restore + target). Tree collapses to one handler per lineage (/ytree, /mtree) — full page embeds the current level inline; an HTMX swap of #tree-container returns just the fragment with a server-driven HX-Push-Url; history-restore and boosted navigations get the full page (target-aware negotiation). HxHeaders builder for HX-* response headers. Variant browser embeds its first results page inline (no load round-trip). Verified live against the container PostGIS; workspace 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Broaden the public surface, reusing the publication/biosample query modules and
the established HTMX two-panel + i18n patterns.
- du-db: biosample::for_publication — paginated biosamples linked to a
publication (join pubs.publication_biosample), returning Page<Biosample>.
- du-domain: Display/label() for BiosampleSource.
- du-web references routes: /references (page, first list embedded inline),
/references/list (search + pagination fragment), /references/:id/biosamples
(per-publication report fragment, paginated). Clicking a publication loads its
samples into #reference-detail. 404 on missing publication.
- i18n: references/* keys added across en/es/fr; nav "References" link.
- Templates references/{page,list,biosamples}.html.
Verified live against the container PostGIS (list, search, es localization,
ancient/external sample reports with DOI link). Workspace 9/9 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the geographic map and exercises the PostGIS spatial path end-to-end (de-risks plan §11's PostGIS-in-Rust concern). - du-db: biosample::geo_points — ST_X/ST_Y over the donor geocoord (geometry Point, 4326), joined to non-deleted biosamples. - du-domain: biosample::GeoPoint (serde). - du-web maps routes: /biosamples/map (page) and /biosamples/geo-data (GeoJSON FeatureCollection, [lon,lat] order). map_page is a full-page load (nav link hx-boost=false) so Leaflet initializes; base.html gains a head block for per-page assets. - Vendored Leaflet 1.9.4 (css + js); assets/map.js plots circle markers (no marker-image assets) with popups and fits bounds. OSM tiles at runtime. - i18n map.* keys + nav "Map" across en/es/fr. Verified live: GeoJSON has 3 features with correct [lon,lat] coords and accession/source props; assets served; es localization. Workspace 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the public read surface and exercises the coverage-JSONB aggregation path (the meanDepth expression index from migration 0004). - du-domain: coverage::CoverageBenchmark. - du-db: coverage::benchmarks — aggregates genomics.alignment_metadata.coverage by lab + test type (avg meanDepth, avg percent_coverage_at_10x), joined through sequence_file -> sequence_library -> lab/test_type, with the test type's expected_min_depth for comparison. - du-web: /coverage-benchmarks page; observed-vs-expected indicator. - i18n coverage.* keys + nav "Coverage" across en/es/fr. Verified live: Dante WGS 30x aggregates two libraries to 31.5x, Nebula WGS 100x to 102.5x, both flagged meeting expected depth; es localized. Workspace 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The authenticated/write half. Local credential login with signed-cookie sessions and role-gated curator tools; AT Protocol federation remains future work in du-atproto. Auth: - du-db auth: credential lookup (ident.user_login_info) + role loading. - du-web auth: argon2 hashing with bcrypt-verify fallback for legacy hashes; Session in a signed cookie (APP_SECRET-derived key); MaybeUser/Curator extractors; /login (+ error re-render) and /logout; CookieManagerLayer. - `decodingus hash-password <pw>` dev subcommand for seeding. - Navbar reflects login state (login/logout/curator); user threaded through all full-page templates. Curator (TreeCurator/Admin gated): - du-db haplogroup: list_paginated, create, update, delete, has_current_edges. - Dashboard + two-panel haplogroups (search + lineage filter, detail panel, create/edit forms, delete). Mutations return the panel plus HX-Trigger (hg-changed) so the list reloads — the server-driven write-flow using HxHeaders. Delete is blocked (with an inline warning) when tree edges exist. - i18n auth.*/curator.*/hg.* + nav across en/es/fr. Verified live: unauth redirect, bad/good login, signed session, dashboard, create/edit/delete with HX-Trigger, blocked delete. Workspace 11/11 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two more curator surfaces on the established two-panel HTMX write-flow. Variants: - du-db variant: create/update/delete + is_referenced (guards delete when the variant defines a current haplogroup association). - Curator variant CRUD editing scalar fields + alias lists (common_names/rs_ids as comma-separated, stored into the aliases JSONB); coordinates preserved. Genome regions: - du-domain GenomeRegion; du-db genome_region list/get/create/update/delete. - Curator region CRUD with JSON textareas for the coordinates/properties JSONB documents, parse-validated: invalid JSON re-renders the form with an error and no HX-Trigger (so the list does not reload on failure). - Mutations fire distinct HX-Trigger events (variant-changed / region-changed); dashboard links to all three tools; i18n var.*/region.* across en/es/fr (the catalog-coverage test enforces es/fr parity). Verified live (as the seeded curator): variant create stores aliases JSONB + trigger + delete; region create stores coordinates JSONB, invalid-JSON error path, list + delete. Workspace 11/11 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ication One-time ETL binary. Production source is a self-managed Postgres on EC2 — --legacy takes that DSN (typically sslmode=require via opened SG / SSH tunnel), --target the new DB. rustls TLS supports remote/SSL connections. Design: preserve legacy primary keys (OVERRIDING SYSTEM VALUE) and sample_guid UUIDs so all foreign keys carry over 1:1 with no id-remapping. Transformers run in FK order, upsert idempotently (re-runnable/resumable), then identity sequences are advanced and a reconciliation pass compares legacy vs new counts. Transformers: specimen_donor; unified biosample (legacy biosample + citizen_biosample + pgp_biosample -> core.biosample, deriving source, folding at_uri/at_cid into the atproto JSONB and PGP fields into source_attrs); variant (enum normalization, JSONB passthrough); haplogroup (+relationship +variant, Y/MT -> Y_DNA/MT_DNA); genomic_study; publication (+links, resolving legacy integer biosample ids to sample_guid across both link tables). CLI: --legacy/--target/--verify. Reconcile prints a per-aggregate count table. Verified without prod access: scripts/mock-legacy.sql seeds a legacy-shaped subset; the ETL into a fresh target reconciles all 9 aggregates, spot-checks confirm unification/JSONB/enum/geocoord/FK preservation, and a re-run is idempotent. NOTE: transformer SELECTs encode the reconstructed legacy layout — validate against the live EC2 schema before the production run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Status-oriented README for the Rust port: rationale, stack, workspace layout, schema redesign summary, what's implemented (public surface + auth/curator + ETL), getting started with Apple `container`, testing, ETL usage (EC2 source), deploy, and a roadmap checklist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…missions/OAuth Federation direction updated after reviewing current atproto specs: the custom "private firehose" is dropped in favor of the protocol's permissions/OAuth + notify-then-fetch model (private data deliberately bypasses the firehose; consumers fetch records from the PDS over scoped OAuth). Group-private data spec is still maturing upstream. This lands the foundation needed under any model: - did: DID + AT-URI parsing; did:key <-> Ed25519 pubkey (multibase + multicodec). - signature: verify_did_key — Ed25519 verification against a self-certifying did:key (no network needed); tested with sign/verify/tamper/wrong-key. - resolve: DID-document parsing (PDS endpoint, handle, signing did:key) + a Resolver for handle->DID (well-known) and DID->doc (PLC directory / did:web); parsing unit-tested via fixture, HTTP fetch isolated. README roadmap updated for the pivot. Workspace 17/17 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The federation wiring pieces (consumer side of permissions/OAuth). Live handshake needs the Edge team's PDS; everything up to the network exchange is implemented and unit-tested. du-atproto oauth module (unit-tested, offline): - PKCE S256 (verified vs RFC 7636 vector), ES256 JOSE sign + JWK + RFC 7638 thumbprint, DPoP proof JWTs, private_key_jwt client assertion. - ClientMetadata (confidential web: private_key_jwt + ES256 + DPoP) and AuthServerMetadata + protected-resource discovery; PAR/authorize/token builders. du-web wiring: - OauthClient from env (OAUTH_BASE_URL/SCOPE/EC_KEY; disabled when unset). - Serves /oauth/client-metadata.json and /oauth/jwks.json (public key only). - /login/atproto: resolve handle->DID->PDS->authserver, PAR (DPoP, nonce retry), redirect; /oauth/callback: token exchange -> upsert user by DID -> session. - du-db: upsert_user_by_did (find-or-create + atproto login_info). - AppError::Upstream (502) for federation failures. Verified live: metadata + JWKS serve correctly (no private material leaks); /login with a bogus handle fails gracefully (502). Full flow pending Edge PDS. docs/atproto-oauth-findings.md enumerates the integration points to settle with the Edge team (client registration, hosting, scopes/permission sets, key lifecycle, DPoP nonce, identity resolution, notify-fetch). Workspace 22/22 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
du-bio (pure Rust, replacing htsjdk): - callable: BED interval merge + callable-loci summary (total bp, region count per contig) for mutation-rate / branch-age inputs. - liftover: UCSC chain-file parse + cross-build position mapping (gaps -> None, reverse-strand targets handled) — the algorithm htsjdk LiftOver provided. - vcf: line-oriented variant reader (CHROM/POS/ID/REF/ALT) for the de-identified variant-ingest path; binary formats (BAM/CRAM) + full-spec VCF use noodles when the ingestion jobs need them. du-jobs: - tokio scheduler harness: named jobs (period + async closure), per-job interval loops with error isolation + run-on-start, graceful shutdown on Ctrl-C. - main registers a DB heartbeat (verified live: variants=4 publications=2); real jobs (publication update/discovery, YBrowse ingest, variant export, match discovery) wired as du-external/ingestion land. Tested: callable merge/summary, liftover positions/gaps/reverse-strand, VCF parse, scheduler run-once + paused-time periodic run. README roadmap updated. Workspace 29/29 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eply Navigator/Edge team replied (DUNavigator/documents/atmosphere/12-OAuth-Edge-Reply.md + 08/11). Integrating their feedback: - du-atproto: add public-client request builders par_form_public / token_form_public (PKCE-only, no client_assertion) so the Navigator desktop app reuses the same PKCE/DPoP/resolution primitives over a public/native client. Tested. - docs/atproto-edge-reply.md: our point-by-point reply (public-client done, will host Navigator client-metadata, AppView read scope = none for now, DPoP nonce, shared-crate extraction + haploid-caller decisions pending). - README/framing correction: the standard relay/Jetstream ingest STAYS (reads are out of the OAuth permission spec); only the custom REST/Kafka relay is dropped. AppView re-scoped off the network mirror to two flows: (a) variant catalog via direct proposal submission, (b) on-demand coverage aggregation from public summary records. Roadmap updated; shared-crate extraction tracked. - Stop tracking rust/.DS_Store; gitignore it. Workspace 30/30 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the decodingus-side call on the Navigator asks: - Shared crates (du-domain/du-atproto/du-bio) -> a dedicated `decodingus-shared` git repo; both repos git-dep on it. Extraction is a coordinated next step. - Haploid variant caller stays Navigator-only; du-bio remains I/O + liftover + callable. Updated the edge-reply doc and README roadmap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
YBrowse publishes on GRCh38; ingestion now lifts each position to the other tracked builds so core.variant.coordinates carries all builds. - du-domain: NewVariant (coordinate-bearing, no DB id) for the ingest path. - du-bio::ybrowse: from_grch38_vcf — parse GRCh38 VCF records, lift each to the given target builds (GRCh37, hs1) via chain files, emit NewVariant with multi-build coordinates; first VCF ID = canonical name, rest = aliases; tracks unmapped lifts. Handles VCF 1-based <-> chain 0-based conversion. - du-db: variant::upsert_by_name (ON CONFLICT canonical_name) — updates coordinates/aliases, preserves curator-owned naming_status. - du-jobs: env-gated ybrowse-variant-ingest job (YBROWSE_VCF + chain paths) wiring du-bio parse/lift -> du-db upsert. Verified end-to-end via the jobs binary: GRCh38 chrY:2200001 lifted to GRCh37 chrY:3200001, upserted with both builds + alias. Unit-tested lift offset, gap (unmapped), and multi-build coords. Workspace 31/31 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per scope clarification: raw-read processing (BAM/CRAM) and variant calling are out of scope for decodingus (the AppView). Navigator (edge) does local calling from raw reads; the AppView only ingests/aggregates the resulting summaries and variant proposals. - Remove the unused `noodles` workspace dependency (no crate used it; htslib/ noodles aren't needed without BAM/CRAM). - Reframe du-bio as coordinate math + text parsing (VCF ingest, BED callable loci, UCSC-chain liftover, YBrowse) — not file I/O / htsjdk replacement. - Update README (stack table, crate map, roadmap) and the plan §6 accordingly. No code change; workspace 31/31 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the agreed shared-crate decision: the pure/IO-light crates now live in a sibling `decodingus-shared` repo (its own cargo workspace; builds+tests standalone) so the DecodingUs AppView and Navigator depend on one copy. - Moved du-domain, du-atproto, du-bio out of rust/crates. - Workspace members reduced to the AppView-specific crates (du-db, du-external, du-web, du-jobs, du-migrate); the three shared crates are pulled via path deps to ../../decodingus-shared/crates/* (git-dep form commented for post-push). - README crate map + roadmap and the Dockerfile note updated for the split. Verified: decodingus builds+tests against the sibling crates (9 tests here + 22 in decodingus-shared = same 31 total). NB: the Docker build needs the path deps switched to git deps once decodingus-shared is pushed (sibling path deps are not in the rust/ build context). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Navigator submits variant/branch proposals to the AppView; curators review and name them in the web UI (the agreed catalog role — AppView keeps review+naming). The legacy manual sample-submission APIs are intentionally not ported (curators work in Navigator now). - du-db::proposal: submit (pool by name+parent across submitters into tree.proposed_branch + evidence; distinct-submitter consensus via discovery_sample_guids; confidence scales with evidence), list/get, review (APPROVE/REJECT/DEFER -> status + tree.curator_action). - du-web curation: POST /api/v1/curation/proposals (machine intake, X-API-Key gate for now -> OAuth bearer once the handshake is live) + the /curator/proposals review queue (two-panel HTMX, status filter, detail + review form, HX-Trigger refresh, gated to Curator). i18n prop.* + dashboard link. Verified live: 3 submissions pool to one proposal (evidence_count=3, submitters=2 after dedup, parent resolved); wrong API key -> 403; curator approve -> ACCEPTED + curator_action recorded + HX-Trigger; review form gone once decided. Workspace 9/9 (decodingus) + 22 (shared) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Approving a proposal records the decision; promotion turns it into real catalog entries. - du-db::proposal::promote (one transaction): requires status ACCEPTED + a parent; creates the tree.haplogroup branch (name = proposed_name, lineage from parent, source 'discovery'), a current relationship edge under the parent, and core.variant get-or-create + tree.haplogroup_variant links for each defining variant in the evidence (UNNAMED -> NAMED on promote, GRCh38 coord from pos); sets status PROMOTED + records a PROMOTE curator_action. DbError::Conflict for precondition/uniqueness failures (wrong status, name taken, no parent). - du-web: POST /curator/proposals/:id/promote (Curator), "Promote to catalog" button shown on ACCEPTED proposals; conflicts surface as a 422 message. i18n prop.promote (en/es/fr). Verified live: promote-before-approve -> 422; approve+promote -> new Y_DNA branch R-FT900 under R with the FT900 variant (NAMED, GRCh38 pos) linked, status PROMOTED, APPROVE+PROMOTE actions logged; branch shows under R in the unified /ytree. Workspace 9/9 (decodingus) + 22 (shared) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- du-external: OpenAlex client (work-by-DOI enrichment + search-based discovery, with abstract reconstruction from the inverted index) and ENA portal client (study lookup). JSON->domain parsing is pure and unit-tested with fixtures; the HTTP fetch is a thin reqwest wrapper. - du-db::publication: dois() work-list, update_openalex() (COALESCE — nulls don't wipe), enabled_search_configs(), upsert_candidate() (ON CONFLICT openalex_id, preserves curator status). - du-jobs: publication-update (enrich every DOI) + publication-discovery (run search configs -> candidates), rate-limited ~6.7 req/s, registered only when OPENALEX_MAILTO is set (polite pool). Verified live against OpenAlex: enriching a real DOI populated cited_by_count (7082), open_access_status (hybrid), openalex_id, and a reconstructed abstract; fake DOIs correctly 404 -> missing. Workspace 13/13 (decodingus) + 22 (shared) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Transactional email + secret retrieval, with the AWS SDK behind an optional
`aws` feature so the default build (and CI) stays lean.
- email::Mailer — Logging (default; logs instead of sending) and Ses (feature
`aws`, Amazon SES v2). Plain-text send.
- secrets::{SecretSource, CachedSecrets} — Env source (SECRET_<NAME>, default) or
AWS Secrets Manager (feature `aws`), wrapped in a 1-hour TTL cache (mirrors the
legacy CachedSecretsManagerService).
- ExternalError::Aws.
Default build: logging mailer + env secrets + TTL cache, unit-tested.
`--features aws` compiles against aws-config 1.8 / sdk-sesv2 1.121 /
secretsmanager 1.106. Consumers wire later. Workspace 17/17 (decodingus) +
22 (shared) green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…chema
Reworked the du-migrate transformers and reconciliation to match the real
production schema (db.schema) rather than the assumed shapes:
- variant: positional public.variant + variant_alias -> core.variant with
JSONB coordinates ({build:{contig,position,ref,alt}}) and assembled
aliases (common_names/rs_ids/sources); canonical_name/naming_status derived.
- biosample: unify public.biosample + citizen_biosample + pgp_biosample into
core.biosample; fold biosample_original_haplogroup +
citizen_biosample_original_haplogroup into original_haplogroups JSONB;
citizen at_uri/at_cid -> atproto JSONB; source_platform/y/mt -> source_attrs.
- haplogroup: age bounds (formed/tmrca lower/upper) + age_estimate_source +
description -> provenance JSONB (nulls stripped); cast valid_from/until from
TIMESTAMP to timestamptz so rows decode (caught only with data, not by the
schema-only pass).
- genomic_studies: version VARCHAR -> TEXT column; details TEXT -> JSONB;
publication_ena_study -> pubs.publication_study.
- publication_biosample: collapse both std + citizen link tables onto sample_guid.
The ETL binary now applies target migrations itself (idempotent) before
transforming. reconcile.rs uses the real qualified table names.
Validated: schema-only against db.schema (0 column errors) and end-to-end
against a rewritten current-schema mock with seed data (all 10 aggregates
reconcile; JSONB/enum/link shapes spot-checked).
Note: the 35MB dump.sql predates several legacy migrations (citizen_biosample
at_uri rename, tree schema, *_result columns) so it can't validate the
current-schema ETL end-to-end; a current-schema dump or a read-only live-EC2
rehearsal is needed for full real-data validation before cutover.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate the identity/auth group from the production auth + public.users + curator schemas into the redesigned ident schema: - users (public.users -> ident.users), RBAC (roles/permissions/role_permissions/ user_roles), AT Protocol identity (user_login_info/user_oauth2_info/ user_pds_info), cookie_consents, and the atproto metadata caches. - curator.audit_log -> new ident.audit_log (migration 0010); entity_id widened int -> bigint, old/new snapshots kept as JSONB. Details: - UUID PKs carry over 1:1 (no OVERRIDING SYSTEM VALUE; no sequence fixup). - Pre-seeded base roles (Admin/Curator/TreeCurator) are relocated onto the legacy role UUIDs via ON CONFLICT (name) DO UPDATE SET id=... so user_roles and role_permissions FKs resolve to the migrated rows. - password_hash left NULL: production auth is AT Protocol OAuth-only (there is no legacy password table). - All legacy auth timestamps are `timestamp without time zone`; cast to timestamptz in the SELECTs so they decode into DateTime<Utc>. - Dropped-in-redesign columns (authz client_id_metadata_document_supported, client_metadata client_uri) are simply not selected. ETL binary runs the ident group first (users before any FK). reconcile.rs adds 11 ident checks (roles may legitimately show target>=legacy from base seeds). mock-legacy.sql extended with the auth/curator schema + seed data; full run reconciles all 21 aggregates and RBAC resolves end-to-end (user -> role -> permission), audit JSONB round-trips, OAuth/PDS chain intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Migrate the sequencing/coverage/pangenome group into the redesigned genomics schema. This completes the production ETL surface (ibd/fed/social/billing are not yet in production). Tables: genbank_contig, sequencing_lab, sequencer_instrument, test_type_definition, pangenome_graph/path, canonical_pangenome_variant, sequence_library, sequence_file, alignment_metadata, pangenome_alignment_metadata, reported_variant_pangenome, genotype_data. De-sprawl / transforms (validated against db.schema): - alignment_metadata: fold the 1:1 alignment_coverage child + inline Picard metrics (mean_coverage/pct_10x/...) + analysis provenance into one coverage JSONB. meanDepth/percent_coverage_at_10x (the keys du-db/coverage.rs and the public coverage page aggregate on) are always populated when a source exists; COALESCE prefers the samtools-style depth child over the Picard inline value. - pangenome_alignment_metadata: same fold into metadata JSONB (+ path/node/ region ids); reported_variant_pangenome: provenance/status/positions folded into haplotype_information JSONB. - sequence_library: legacy lab name resolved to the migrated lab_id; at_uri/cid -> atproto JSONB; run_date timestamp -> date. - sequence_file checksums/http_locations/atp_location JSONB already in the redesigned shape, carried verbatim. - sequencer_instrument: redesign makes instrument_id UNIQUE -> DISTINCT ON dedup (drops the per-lab tie); genotype_data: skip soft-deleted rows, fold chip_version/build_version/source_file_hash/atproto into metrics JSONB. - All legacy timestamps cast ::timestamptz. PKs preserved via OVERRIDING SYSTEM VALUE; sequences fixed up post-load. Skipped (no production source; Navigator populates going forward): instrument_observation, instrument_association_proposal, coverage_expectation_profile, biosample_callable_loci. pangenome_node is dropped (folded into node-id arrays). reconcile.rs adds 13 genomics checks (sequencer_instrument compares count(DISTINCT instrument_id); genotype_data filters deleted). mock-legacy.sql extended with the full genomics schema + seed data; ETL reconciles all 34 aggregates and the public coverage benchmark query resolves end-to-end against the migrated JSONB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build the read-only /api/v1 surface (the Tapir replacement). 13 endpoints,
clean DTOs decoupled from the internal domain types, described with utoipa;
OpenAPI 3 document at /api/openapi.json and Swagger UI at /api.
Endpoints:
- GET /api/v1/y-tree, /api/v1/mt-tree (?rootHaplogroup=) — nested tree
- GET /api/v1/coverage/benchmarks
- GET /api/v1/references/details (paginated), /{publicationId}/biosamples
- GET /api/v1/biosample/studies
- GET /api/v1/variants (paginated), /{variantId},
/api/v1/haplogroups/{name}/variants
- GET /api/v1/variants/export (live CSV), /export/metadata
- GET /api/v1/genome-regions (builds), /{build}
du-db additions backing the new endpoints:
- haplogroup::subtree — recursive CTE over current edges, assembled into a
nested tree in-process (with a depth guard against cyclic tree-merge data)
- variant::for_haplogroup_name, variant::export_all, variant::count
- genome_region::distinct_builds, genome_region::for_build (jsonb_exists)
- new du-db::study module: studies with linked samples (study -> publication
-> biosample, aggregated as JSONB)
Notes:
- DTOs surface JSONB (coordinates/source_attrs/provenance) as untyped objects
and the hot alias fields (common_names/rs_ids) as typed arrays.
- utoipa kept out of the shared du-domain crate (Navigator/edge consumers);
DTOs + From impls live in du-web.
- HaplogroupNodeDto.children uses #[schema(no_recursion)] to stop utoipa's
schema walk from overflowing on the self-reference.
- The /manage/*, PDS, and IBD API groups are intentionally omitted (curator/
federation surfaces tied to subsystems not yet built).
Smoke-tested live against the ETL mock DB: all 13 endpoints return correct
JSON/CSV, 404s resolve, openapi.json lists 13 paths + 12 schemas, Swagger UI
serves 200.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build the change-set versioning subsystem (the TreeVersioning half; the merge
algorithm that produces change-sets lands separately).
du-db::change_set — lifecycle + apply engine:
- Lifecycle: DRAFT -> READY_FOR_REVIEW -> UNDER_REVIEW -> APPLIED, DISCARDED
from any non-applied state. start_review/discard/review_change/approve_all
gate on status.
- apply(): in one transaction, writes APPROVED tree_changes to the production
tree via the temporal edge model and marks the set APPLIED. Per type:
CREATE -> insert node (+ edge under an existing parent, + variant links)
UPDATE -> COALESCE-update node metadata
DELETE -> expire node (valid_until) + close current edges/variant links
REPARENT -> close current parent edge, open a new one
VARIANT_EDIT -> add (insert) / remove (close) current variant links
- diff(): ADDED/REMOVED/MODIFIED/REPARENTED entries + summary from the set's
non-rejected changes. list/get/comments/add_change round out the module.
Temporal correctness: the node itself is temporal (valid_from/valid_until), so
DELETE expires it and the tree-navigation queries (roots/children/subtree) now
exclude expired nodes — a deleted node vanishes instead of resurfacing as a
stray root.
du-web: curator-gated JSON management API at /api/v1/manage/change-sets/* (list,
create, detail, add-change, start-review, apply, discard, comments, approve-all,
per-change review, diff). Gated by the session Curator extractor (legacy used an
X-API-Key); not part of the public OpenAPI doc. DbError::Conflict now maps to
422 instead of 500.
Tested: an integration test drives the full lifecycle on a live DB — seeds
ROOT->{A,C,D}, builds a set (create B under ROOT w/ variant, reparent A under C,
add a variant to C, update A, delete D, plus one rejected change), applies it,
and asserts the temporal tree: B created, A moved off ROOT and under C, D gone
from navigation, variant links current, the rejected change absent, exactly one
current parent edge for A, diff counts, and re-apply rejected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Identify-Match-Graft re-implementation landed in decodingus-shared (du-domain::merge) with curated fixtures. Remaining: materialize a MergePlan into a change-set + WIP staging, the WIP apply path, and merge/preview endpoints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… endpoints Connect the pure du-domain merge algorithm to the versioning engine so a merge produces a reviewable, applicable change set. - du-db::merge::materialize: turns a MergePlan into a READY_FOR_REVIEW change set. Each MergeOp becomes a tree_change; new-node placement uses the placeholder mechanism (a CREATE carries its negative placeholder; attaching ops carry *_placeholder refs). Variant *names* from the plan are resolved to core.variant ids (get-or-create as UNNAMED). MatchMetadata is informational and omitted from the set. - change_set apply: now threads a placeholder->real-id map through the apply transaction, so CREATE/REPARENT can reference nodes created earlier in the same set (new-under-new chains, contraction-under-new-intermediate). An unresolved placeholder (its CREATE was rejected) fails the apply with a clear conflict instead of corrupting the tree. - haplogroup::existing_tree: loads the current production tree (current nodes/ edges/variant links) as a nested du_domain::merge::ExistingNode forest — the algorithm's "existing tree" input. - du-web: curator-gated POST /api/v1/manage/haplogroups/merge (run + materialize) and /merge/preview (dry-run: return plan + ambiguities, no writes). End-to-end test (existing_tree -> merge -> materialize -> review -> apply): - new-subtree chain: R extended by R1b -> L21 via a placeholder chain; both created under the right parents with their variants linked. - node contraction: existing coarse RC(M343,L23,L51) split by source R1b(M343); new R1b inserted between R and RC, RC reparented under it, M343 downflowed off RC (RC keeps L23/L51), R1b carries M343. This completes tree versioning end-to-end (the WIP shadow tables remain for a future richer curator-staging UI; merge output uses the simpler placeholder path through the tested apply engine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The roadmap mixes AppView and Navigator concerns. Clarify: the AppView only cares that Y/mt calls are reliable enough to build the shared genealogy components — reliability = coverage conformance (done) + the cross-technology consensus (fed.haplogroup_reconciliation, the remaining AppView piece). The per-test-type taxonomy/tracking, chip parsing, marker-coverage and accuracy machinery are Navigator's (the Edge tracks by test); IBD is the D1/D3 track. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fed.haplogroup_reconciliation (a donor's call reconciled across all its sequencing technologies: consensus_haplogroup + confidence + snp_concordance + run_count) was mirrored but read by nothing, and the report's "FedConsensus" was actually a single fed.biosample call, not the consensus. - biosample::report_by_guid reads the reconciliation via the citizen's repo DID (reconciliation.did = core.biosample.atproto->>'repo_did' + dna_type, best by run_count/time_us) — no schema change. Call precedence becomes Reconciled -> FedConsensus -> Original; HaplogroupCall + HaplogroupPathwayDto gain confidence/run_count/snp_concordance/compatibility_level + a Reconciled origin. The report card shows "consensus . N runs . confidence . concordance" (i18n en/es/fr). Test: a Y reconciliation outranks the single fed.biosample call and carries its reliability; mt with no reconciliation falls back. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In the AppView the cross-technology consensus drives tree evolution, never individual runs. The discovery engine now factors each contributor's fed.haplogroup_reconciliation reliability (via the citizen's repo DID). - Exclude: a contributor whose consensus confidence is below min_consensus_confidence (0.5) or whose reconciliation is INCOMPATIBLE is dropped from pooling. Un-reconciled samples are kept (un-gated), treated as neutral reliability (1.0) so the unknown isn't penalized. - Down-weight: a w_reliability term (cluster mean consensus reliability) folds into the proposal confidence blend, so branches built on shaky calls score lower and need more support to reach READY_FOR_REVIEW. - mig 0031 merges min_consensus_confidence + w_reliability (weights renormalized to 0.35/0.2/0.25/0.2) into the 0029 discovery_config seed. Closes the AppView's multi-test-type ask (coverage conformance + consensus). Test: low-confidence contributor excluded (count drops); a modest-but-kept cohort is pulled below READY. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold scope-control into the candidate-mining design so the AppView never materializes an N×N pair list and never hands a Navigator client "everyone". - §3.0 (new): block, don't pair. Ancestry blocking (super-population + PCA-coordinate grid/LSH on fed.population_breakdown) before scoring; match- graph 2-hop expansion as the steady state; bounded top-K per sample. - §3b: the overlap gated_pairs set is the ancestry block (super-pop + PCA cell + haplogroup bucket), scored within-block + persisted incrementally — never the full N². - §3c: matches-of-matches is the primary generator (Leeds/AutoCluster shared- match principle), with the endogamy caveat (cap + down-weight). - §3d (new): cold start = query-vs-panel (RaPID-Query-class, Edge-side), not panel-vs-panel; the AppView supplies the panel subset. - §3e (new): research backing + the detection-vs-selection split (PBWT/RaPID is the Edge's job; the AppView holds no genotypes). - §13: flag the D1-independent first slice (block + graph expansion -> ranked match_suggestion is buildable now, ahead of D1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t slice) The D1-independent first slice of D3. The AppView coordinates IBD by proposing introduction candidates from anonymized fed.* aggregates — no genotypes, no exchange channel. The load-bearing rule (D3 §3.0): never materialize N×N, never hand a client "everyone". du_db::ibd::recompute_suggestions (advisory-locked, declarative — mirrors the sequencer/discovery engines). Three signals -> ibd.match_suggestion: - population overlap (Σ min over fed.population_breakdown.components), computed ONLY within ancestry blocks (dominant super-population × a z-scored PCA grid cell, scale-free); caches into ibd.population_overlap_score. - shared terminal Y/mt consensus_haplogroup (fed.haplogroup_reconciliation via the repo_did bridge), score = inverse cohort frequency (rarer = deeper). - shared-match: 2-hop over ibd_discovery_index (in-common-with/Leeds signal; dormant until the graph has edges). Combine per pair (weighted), rank per target, cap top-K (the no-N:N guarantee), write both directions; declarative recompute preserves DISMISSED/CONVERTED. suggestions_for reader. Job: run-once ibd-discovery-recompute + daily. Engine-only — no public API (candidate pairs gate on the D1 consent flow). Test: a cross-continental pair is blocked; within-block overlap + shared haplogroup are suggested; idempotent; dismissed pair not re-suggested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The AppView side of D1, the shared substrate gating the Match (D3) and Platform (D2/D4/D5) tracks. The AppView is a PII-free broker: it records consent, mirrors published X25519 keys, gates dual-consent into a session, and blind-relays opaque ciphertext — it never sees plaintext or session keys, so it needs no X25519/AEAD crypto (that's Navigator's du-exchange crate). Auth is Ed25519 DID signatures, not OAuth/cookie: every Edge submission signs a canonical message (du_db::exchange::messages, a cross-repo contract) with its DID identity key; the broker verifies via du_atproto::verify_did_key (did:key direct, did:plc/web resolved). So D1 does not wait on the OAuth joint test. - mig 0032: exchange.* schema (request/consent/session/relay_envelope/ publickey); the unused ibd.match_request/match_consent (mig 0007) fold in and are dropped (the candidate engine's match_suggestion/ibd_discovery_index remain). - du_db::exchange: publish/fetch key, create_request, record_consent with the dual-consent gate (both affirmative -> CONSENTED + ESTABLISHING session, 7-day TTL; false -> DECLINED), pending_for (exchange-ready), blind relay post/pull/ack (participant-gated, recipient-only ack, delete-on-ack), expire. - du-web /api/v1/exchange/* (signature-verified, not in public OpenAPI): key, request, consent, pending (signed poll, replay window), relay post (signed over the blob SHA-256, 1 MiB cap), relay/pull, relay/ack. - du-jobs exchange-expire run-once + hourly (TTL sweep). Tests: dual-consent -> session, decline, idempotency, relay round-trip, participant gate, TTL cascade; du-web did:key-signed request verifies and a tampered signature -> 403. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nodes The AppView side of D2: a vendor-neutral pseudonymous "person" node co-admins attach project memberships + merge-links to, without the AppView ever learning a name, kit number, or hash of one. Identity resolution stays Edge-to-Edge over D1 (id-list exchange) / D3 (genetic). The doc's AppView-side kit# hashing is rejected as brute-forceable; this stores no identifiers at all. - mig 0033 research.*: research_subject (UUID + custody_did + retired_into tombstone), subject_membership -> social.group_project, subject_link audit, sparse subject_biosample. Every column pseudonymous (UUID/DID). - du_db::research: register_in_project (mint or attach an id-exchange-agreed id, idempotent membership); merge_subjects = TOMBSTONE not delete (record the link, repoint retire's memberships/biosamples to keep, set retired_into so a local holder of the retired pseudonym still resolves it); set_custody (member-claim flip); link_biosample; authz readers (project_owner, is_steward_of, is_project_participant). Canonical signed messages. - crate::sig::verify_signed extracted from routes/exchange.rs (shared D1/D2 Ed25519 DID-signature auth; did:key direct, did:plc/web resolved). - du-web /api/v1/research/* (not in public OpenAPI): subject (register), merge, custody, subjects (signed poll). Each signature-authenticated AND authorized from existing data: register -> project owner; merge -> steward of both; custody -> subject's steward; read -> project participant. Extends to project-admins under D5 (no member table needed now). Tests: du-db register/tombstone-merge/custody/authz + self-merge reject; du-web owner-gated signed register (200), non-owner valid sig -> 403, tampered sig -> 403. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The group-project ACL: the collaborator team (DIDs + roles) is the consent/
scope boundary that gates the stack. Reuses the existing social.group_project
(mig 0009) as the project (no duplicate research.project); owner_did is the
founding ADMIN.
- mig 0034: research.project_member (project_id -> social.group_project,
member_did, role, permissions[], appointed_by, joined_at, left_at).
- du_db::research ACL: Role {ADMIN/CO_ADMIN/MODERATOR/CURATOR} + Capability +
Role::allows (the D5 §4 map); role_of (owner_did => ADMIN, else live
project_member), is_team_member, can, add_member/revoke_member(left_at)/
members_of; canonical signed messages.
- Wired in: D2 register is ManageSubjects-gated (ADMIN/CO_ADMIN; owner still
passes), subjects read is team-gated; D1 project:<id>-scoped request +
consent require the actor be a live team member (exchange::request_meta +
project_scope_id). Non-project scopes unaffected.
- Team endpoints /api/v1/research/project/{member, member/revoke, members} —
signed (crate::sig) + ManageRoles-gated (members list team-gated).
Tests: du-db role_of/add/revoke/capability-map; du-web admin-gated add-member
(non-admin -> 403) + project-scoped exchange request requires team membership.
Forward-only capabilities (WriteAssertions/ResolveDispute/PromoteToCatalog) are
defined for the cross-repo contract, enforced when D4 lands.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t_view
The last Platform-track piece. Co-admin research is modeled as attributed,
append-only, scoped assertions over a pseudonymous research_subject (D2), not
direct row mutation — gated by the D5 ACL (WriteAssertions/ResolveDispute).
- mig 0035: research.assertion + research.subject_current_view (PK
subject_id,predicate,scope — per-project isolation; no PII column by design).
- du_db::research: Predicate enum + PII classifier (MDKA_IS/IDENTITY rejected →
R3 P2P only, no AppView table; NOTE PII-by-default unless pii_cleared; scan_pii
scrubber), record/retract_assertion + refold (SETTLED|DISPUTED, never
auto-collapsed; assign-and-prune), accept_same_person (drives D2
merge_subjects method=ASSERTION), messages::{assert,retract,resolve}.
- du-web: /api/v1/research/{assertion,assertion/retract,assertion/resolve,
current-view} — signed (crate::sig) + role-gated.
- Tests: du-db assertion_store_fold_and_rails; du-web assertion_endpoints_gated.
Deferred (Navigator/later): R3 PII over D1 + assertion_local; R1 lexicon +
du-jobs Jetstream ingest (no publisher yet); tree.change_set promotion.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The entry point of the federated IBD flow. The candidate engine already
writes ibd.match_suggestion from fed.*; this serves it to clients — and
needed NO new auth foundation: the existing Ed25519 signed-poll pattern
(verify_signed + messages::poll + 300s window) plus the
core.biosample.atproto->>'repo_did' bridge the engine itself uses give an
owner-DID-scoped read with no DPoP/OAuth and no unauthenticated stopgap.
- du_db::ibd: messages::{poll,introduce}; suggestions_for_did (owner-scoped
via the repo_did bridge); is_suggested_to_did (introduce authz);
owner_did_of_sample (server-side counterpart resolution).
- routes/ibd.rs (signed, personal scope — not project-scoped):
GET /api/v1/ibd/suggestions → pseudonymous rows (suggested_sample_guid +
non-PII {signals}; never a counterpart DID);
POST /api/v1/ibd/introduce → resolves counterpart server-side, calls
exchange::create_request (purpose=IBD_AUTOSOMAL, idempotent request_uri),
returns only {request_uri, PENDING}. Caller learns the counterpart DID
only after mutual consent via exchange::pending_for.
- Tests: du-db suggestions_scoped_by_owner_did; du-web
suggestions_scoped_and_introduce_hides_counterpart.
No migration (reuses ibd.match_suggestion, core.biosample, exchange.*).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e IBD loop The introduce→consent loop had no way for a recipient to discover a request addressed to them (/pending only shows established sessions). This adds the missing counterpart-discovery path, unblocking the Navigator round-trip. - du_db::exchange::incoming_for(did) → PENDING requests addressed to `did` that it hasn't acted on. Symmetric-blind: returns request_uri + purpose + created_at, deliberately NO initiator_did — the recipient consents blind; both sides reveal identity only on mutual consent (via pending_for). - GET /api/v1/exchange/incoming (signed poll, reuses exchange-poll message). - IBD introduce now mints an opaque request_uri = urn:ibd:<sha256(did:guid)> (was urn:ibd:<did>:<guid>) so the handle can't leak the initiator DID to the recipient. Still deterministic ⇒ idempotent per caller+candidate. - Tests: du-db incoming_surfaces_pending_to_recipient_only (recipient sees it, initiator doesn't, clears once acted on); du-web ibd test extended — the counterpart discovers the introduced request via /incoming, blind to the initiator, and the handle embeds no DID. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ndation) Fixes the gap that made signed Edge calls unusable by real clients: the did:plc DID-doc #atproto signing key is PDS-custodied, so a desktop client can't sign with it and can't add its own verificationMethod. verify_signed only worked for did:key. Clients now publish their Ed25519 device PUBLIC key as a com.decodingus.atmosphere.deviceKey record in their own repo (repo-write = proof of control over repo_did); the AppView ingests it like any fed.* record and verifies signatures against it. Per-call auth stays OAuth-free. - mig 0036: fed.device_key (PK (did,rkey) ⇒ N devices/DID; public_key as a did:key string; PII-free). - du_db::fed::device_key: upsert (time_us-ordered) + keys_for; NS_DEVICE_KEY in INGEST_COLLECTIONS + table_for (so fed::delete = revocation). - du-jobs jetstream: ingest arm for the deviceKey collection. - du-web::sig::verify_signed(pool, ...): did:key self-certifies; did:plc/web match any registered device key (none ⇒ 403 bootstrap); DID-doc resolution dropped (no per-call network). All 18 signed call sites thread &st.pool. - Tests: du-db device_key (lookup/ordering/revocation); du-web sig inline (did:plc gated on registration, did:key still self-certifies). Cross-repo contract: NSID com.decodingus.atmosphere.deviceKey, record field publicKey = did:key string. Revoke by deleting the record. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Attaches paper biosamples as leaves under the tree node their published
haplogroup call resolves to, with a cumulative per-node sample count and a
click-through leaf list. D2C (source='CITIZEN') excluded.
- mig 0037: tree.haplogroup_sample (PK (sample_guid, dna_type) ⇒ Y now +
mt later; haplogroup_id NULL when UNPLACED).
- du_db::tree_sample: recompute_placements(dna) — advisory-locked
declarative engine that resolves core.biosample.original_haplogroups
calls via haplogroup::resolve_name_or_variant (name→alias→defining-SNP→
normalize), assigns PLACED/UNPLACED, prunes, bumps tree_revision;
counts_by_node + samples_under (recursive-CTE, at-or-below + citation).
biosample::pick_original_call made pub(crate) for reuse.
- du-web/api.rs: HaplogroupNodeDto.sample_count (cumulative, rolled up in
build_level); GET /api/v1/{y,mt}-tree/node/{name}/samples leaf list.
- du-jobs: run-once tree-samples-recompute + daily (Y).
- Tests: du-db tree_sample (resolution paths, D2C excluded, UNPLACED,
cumulative, prune); du-web tree_carries_sample_count_and_leaf_list.
Follow-up (deferred): HTML cladogram per-node count + sidebar; mt recompute
once the mt tree lands.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The JSON API already served YFull-style sample placement; this adds the
presentation layer to the server-rendered SVG cladogram + SNP sidebar.
- du_db::tree_sample::cumulative_counts(dna): recursive CTE rolling each
placed sample up to all ancestors → full-tree at-or-below counts, so a
depth-bounded window node still counts hidden descendants.
- tree_layout::{InNode,LaidNode}.sample_count threaded through
routes/tree.rs build_root/to_innode; svg.html shows "· N samples" on each
node (conditional, on the variants line that opens the sidebar).
- snp_sidebar handler + template list the node's placed leaves via
samples_under (label + source badge + citation; capped 50 + "+N more").
- i18n keys tree.samples / tree.samples.title / tree.samples.more (en/es/fr).
- Tests: du-db cumulative_counts assertion; du-web
cladogram_shows_sample_count_and_sidebar_leaves.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Preseeds genomics.sequencing_lab + sequencer_instrument so the lab lookup (/api/v1/sequencer/lab) resolves the common D2C instruments from launch. Derived from instrument_centers.tsv: rows with n_crams > 2, assigned to the max-frequency lab (NO_CSV placeholder + the blank-instrument row dropped). - 5 labs (canonical full names, is_d2c): Family Tree DNA, Dante Labs, Nebula Genomics, Full Genomes Corporation, YSEQ. - 36 instruments (FTDNA 6, Dante 6, YSEQ 11, FGC 11, Nebula 2); model_name = the export platform, manufacturer derived (Illumina/MGI/NULL). - Idempotent: labs ON CONFLICT (name) DO NOTHING (coexists with any consensus/ETL lab), instruments ON CONFLICT (instrument_id) DO UPDATE. No schema/code change — lookup_lab reads the seeded tie. - Test seed_preloads_ydna_warehouse_labs; prior lookup test made seed-aware. Verified on the dev DB (which had 0 labs — legacy public.sequencing_lab is empty, so no collision). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-ups that complete two recent features, plus housekeeping. IBD: - engine records the shared haplogroup's DNA arm (hgDnaType) in suggestion metadata; introduce routes the exchange purpose from the dominant signal (HAPLOGROUP → IBD_Y/IBD_MT, else IBD_AUTOSOMAL) via introduction_purpose, which also replaces the bare is_suggested_to_did authz and is returned in the response. - introduce marks the suggestion CONVERTED (drops from the active list, still idempotently re-introducible). - new POST /api/v1/ibd/dismiss (signed ibd-dismiss) → dismiss_suggestion sets ACTIVE→DISMISSED; the engine already preserves DISMISSED. Tree sample leaves: - status='CURATED' for manual placements, which recompute_placements preserves (skips re-resolution + prune); counts/samples_under treat PLACED+CURATED as placed. - Curator-gated GET /manage/tree-sample/unplaced (triage queue) + POST /manage/tree-sample/place (pin a sample under a chosen node). Housekeeping: - .DS_Store added to root .gitignore. - fix sequencer_endpoints_route_and_404: the 0038 seed makes lab-instruments non-empty (latent break the seed arc's du-db-only run missed). Tests: du-db introduction_purpose_and_dismiss_convert_lifecycle, tree_sample CURATED-survives-recompute + triage; du-web introduce-purpose + dismiss + curator-gating. Full suites green (du-web 27/27). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the per-node "· N samples" count with actual sample leaves: each placed (non-D2C) sample hangs off its haplogroup node as its own terminal tip — a small distinct marker + label — like YFull. - du_db::tree_sample::direct_labels: placed-sample labels per node id (bounded to the rendered window). - tree_layout: LaidTip + Laid.tips; tips laid out as full node-slot leaves (spaced like any other leaf, so labels never collide in either orientation); a node centers over its children AND its tips; tip connectors share the same vertical bus as the node's child connectors. - routes/tree.rs: thread per-node sample labels into the layout, capped at 8 tips/node with a "+N" overflow tip that opens the sidebar. - svg.html: render tips (green dot + monospace label; grey "+N" overflow); drop the count text from the node box. The JSON tree API keeps its own sample_count (unchanged). Test asserts the SVG renders sample tips. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the ISOGG-import tree foundation with the de-novo IQ-TREE/ASR tree built from genotypes (~/Genomics/ytree). The de-novo tree is the foundation — nothing grafts onto it. - du_db::denovo: foundation loader. Matches each branch SNP to core.variant by hs1 coordinate (reuse YBrowse catalog name, else mint chrY:<pos><anc>><der>); names nodes label → catalog-SNP → NodeN; inserts nodes/edges/defining links with an "unresolved" provenance block for collapsed-branch SNPs lifted to the nearest survivor. - Phase 3 leaves: places doc.tips[] via tree.haplogroup_sample, get-or-create core.biosample by accession (deduped across Y/mt); PRJEB* → public EXTERNAL, own WGS229 → private. - Phase 4 curation: tree.denovo_conflict (mig 0039) + read-only Curator queue at /curator/denovo-conflicts (page+HTMX, lineage filter); dc.* i18n in en/es/fr; dashboard card. - haplogroup: reset_tree (greenfield clear) + dna-scoped clear_dna so Y and mt coexist; recompute_backbone seeds on macro-clade isogg labels. reconcile_tilde_twins now folds only empty-stub paragroup twins. - decodingus-tree-init --denovo-y/--denovo-mt <json> --apply. Tests: denovo.rs (catalog reuse/mint, naming, edges, unresolved block, tip placement, conflicts, Y+mt coexistence, dna-scoped clear), tilde_twins.rs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Builder added an `n_mut ≥ 1` clause to the publication keep rule: survive iff (UFBoot ≥ 95 AND ≥1 defining mutation) OR keepset. This drops the zero-mutation mt placeholder nodes (Node82/110/…) that UFBoot over-supported; named children reattach to the parent as polytomies, so no tips are lost and every named clade survives. chrM: 2,015 → 1,765 nodes / 3,344 tips. No loader/exporter change — survival is read from the publication treefile, so re-export + reload suffices. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the IBD coordination loop and activate the dormant shared-match signal. After two consented Edges finish their encrypted comparison, each posts a signed, PII-free attestation of the outcome; once both agree, the match is confirmed and becomes discoverable. - mig 0040: adapt ibd.ibd_pds_attestation to DID auth — attesting_did, exchange_request_uri, per-party reported_total_cm/segments; drop the legacy attesting_pds_guid NOT NULL; idempotency unique index. - du_db::ibd::record_attestation + messages::attest + AttestationOutcome. Privacy rails: the attester must be a party to a CONSENTED IBD exchange and own its side of the pair (the other party owns the counterpart) — so every match-graph edge traces to a real dual-consent, no forged edges. Consensus: both non-dispute reports within max(10cM,20%) → CONFIRMED + is_publicly_discoverable; DISPUTE/REVOCATION → DISPUTED. Signal 3 (shared-match) now reads only publicly-discoverable edges. - POST /api/v1/ibd/attest (signed; 403 on reject). - depth_score: weight the haplogroup signal by the shared clade's tree depth (rarity × d/(d+half), half=8) via a recursive-CTE name→depth walk — sharing a deep terminal ≫ sharing a macro-clade. Enabled by the de-novo tree. Tests: ibd_attestation.rs (consensus + shared-match activation, rejection rails, depth ordering); routes/ibd.rs attest endpoint (signed / bad-sig 403 / non-consented 403). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Rust rewrite under rust/ is the application; retire the Scala 3 / Play codebase it replaces. Removes the app source, build, deploy, and CI: - app/ conf/ test/ project/ build.sbt public/ — Scala source, Play config (routes, evolutions, i18n), tests, sbt build, static assets - Dockerfile docker-compose.yml docker-compose.prod.yml docker/ .dockerignore — sbt-stage image + SLICK/Play compose (the decodingus-db service, not the du-pg dev container the Rust app uses) - .github/workflows/ci.yml — sbt/JDK21 CI - PROJECT_ANALYSIS.md .env.example — Scala-era docs/config Kept: rust/ (the app), documents/ (design docs), scripts/ (deploy-generic maintenance page), README, LICENSE, CODE_OF_CONDUCT. 735 files, ~81.7k lines. Rust workspace builds unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the Scala-coexistence framing (the legacy codebase is gone — the Rust app is now the platform), describe the trees as de-novo IQ-TREE/ASR phylogenies (not the retired ISOGG/FTDNA graft), remove the deleted app/ and Docker references, and update the repository layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The de-novo loader is the sole tree-building path now, so remove the
subsystem orphaned by that pivot — reachable only through dead
tree-init CLI flags, never the running app:
- du-db/src/snp_graft.rs (1032 lines, the ISOGG/prod SNP-graft engine)
- the ISOGG/graft half of tree_init.rs (763 -> 90 lines: just the
--denovo-{y,mt} loader remains)
- reconcile_tilde_twins + reset_tree from haplogroup.rs (de-novo uses
the per-lineage clear_dna)
- the graft-fed curator-review vertical: du-db/src/wip.rs,
du-web routes/reviews.rs + templates, change_set::apply_wip_resolutions
(snp_graft::stage_review was its only producer; de-novo curation goes
through /curator/denovo-conflicts) + reviews i18n in all three locales
Kept live: du_db::merge::materialize + du_domain::merge + change_set
(the /versioning + /change-sets tree-merge routes) — proven intact by
merge_e2e. ~-2.9k lines. Build + clippy + change_set/merge_e2e/denovo/
migrations + 28 du-web tests all green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 1717-line public-API god-file mixed every wire DTO, the tree
assembly/cache logic, and 43 handlers. Decompose by concern:
- api/dto.rs — all response/request DTOs + query-param structs
- api/tree.rs — haplotree assembly, ETag/conditional-GET cache, and the
/api/v1/{y,mt}-tree[...] handlers
- api/mod.rs — router, the single utoipa OpenAPI doc, and the remaining
(non-tree) handlers
The central ApiDoc still references every handler/DTO, so mod.rs
re-imports the submodules (use dto::*, use tree::*). Behavior-preserving:
all 28 du-web tests pass, no new clippy warnings.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the retired Scala sbt CI. One ubuntu-latest job, working in rust/, with a PostGIS service (postgis/postgis:16-3.4) + DATABASE_URL so the DB-backed tests run for real: cargo build --workspace --locked cargo clippy --workspace --locked -- -D warnings # lib + bins cargo test --workspace --locked Runs on push to main/rust-rewrite-foundation and all PRs; caches via Swatinem/rust-cache; shared crates are public https git deps (no secrets). No fmt gate (the codebase uses intentional hand-formatting). To make the strict clippy gate green, fix the only two warnings: a redundant `&mut **tx` deref (merge.rs) and a complex map type factored into VariantKey/ResolvedVariant aliases (denovo.rs). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the early stub (which listed vars the code never reads, e.g. OPENALEX_BASE_URL / RECAPTCHA_ENABLED) with the actual environment the app reads, grouped by concern and annotated with defaults + required vs optional: core (DATABASE_URL, APP_SECRET, PORT, RUST_LOG, DU_BASE_URL, DU_ASSETS_DIR), AT Proto OAuth, curator/forms, du-jobs (Jetstream / YBrowse / yregions), and external APIs. Only DATABASE_URL is required; the rest degrade gracefully. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The first simplify pass deleted the ISOGG-import tree-init CLI flags but
left the du-db functions they backed with no remaining production caller
(test-only or zero callers). Remove them — all obsolete with the de-novo
tree pivot:
- haplogroup.rs: scrub_recurrent_links, label_recurrence_transitions,
rename_to_snp_shorthand, set_aliases + the YCC-rename helper cluster and
its inline test module (1744 -> 1029 lines)
- variant.rs: set_coordinates_bulk, set_aliases_bulk,
resolve_isogg_recurrence (763 -> 557 lines)
- 4 dead test files: scrub_recurrent, recurrence_label, rename_shorthand,
resolve_recurrence
Kept variant::{upsert_by_name, delete_by_evidence_source}: 0-caller too,
but YBrowse-ingest API, not ISOGG orphans. ~-1.4k lines. Build + strict
clippy (-D warnings) + all du-db tests green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rewrite it in Rust to eliminate the JVM dependency