Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
243 commits
Select commit Hold shift + click to select a range
7797172
refactor(worker): restructure monolithic jobs.py into modular archite…
bencap Jan 7, 2026
340b055
feat: Add comprehensive job traceability system database schema
bencap Jan 7, 2026
fd35ac4
fix(logging): simplify context saving logic to overwrite existing map…
bencap Jan 7, 2026
7ca0c9f
tests: add TransactionSpy class for mocking database transaction meth…
bencap Jan 12, 2026
314a469
feat: add BaseManager class with transaction handling and rollback fe…
bencap Jan 12, 2026
4d6b7ad
feat: Job manager class, supporting utilities, and unit tests
bencap Jan 12, 2026
4372a31
feat: Pipeline manager class, supporting utilities, and unit tests
bencap Jan 14, 2026
c6f72bb
feat: add function to check if job dependencies are reachable
bencap Jan 16, 2026
d77cf68
feat: add markers for test categorization in pytest
bencap Jan 16, 2026
7548bbf
fix: mock job manager returning in fixture rather than yielding
bencap Jan 17, 2026
cd2fab5
fix: enhance error logging for job and pipeline state transitions
bencap Jan 17, 2026
7ee3ce1
fix: re-order imports in job manager test file
bencap Jan 17, 2026
7ec5c40
fix: use conftest_optional import structure in worker test module
bencap Jan 17, 2026
749c512
feat: Add decorators for job and pipeline management
bencap Jan 20, 2026
d28279d
feat: use context for logging in job manager
bencap Jan 20, 2026
0fba014
feat: decorator for job run record guarantees
bencap Jan 21, 2026
603da5b
feat: add test mode support to job and pipeline decorators
bencap Jan 21, 2026
eb6aa64
fix: simplify exc handling in job management decorator
bencap Jan 21, 2026
9a9f77f
feat: allow pipelines to be started by decorated jobs
bencap Jan 22, 2026
a8655ab
tests: unit tests for worker manager utilities
bencap Jan 22, 2026
b9c2ad7
feat: add network test marker and control socket access in pytest
bencap Jan 22, 2026
2e7da03
Refactor test setup by replacing `setup_worker_db` with `with_populat…
bencap Jan 22, 2026
a884b60
wip: refactor jobs to use job management system
bencap Jan 22, 2026
5b227d0
refactor: reduce mocking of database across worker tests
bencap Jan 23, 2026
089e18f
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
08d0c06
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
ba2ff23
refactor: centralize decorator test mode flag fixture
bencap Jan 23, 2026
e24b1dd
feat: enhance pipeline start logic with controllable coordination
bencap Jan 24, 2026
a048272
feat: logic fixups and comprehensive test cases for variant processin…
bencap Jan 24, 2026
bea7c54
feat: add start_pipeline job and related tests for pipeline management
bencap Jan 24, 2026
f33d4e6
feat: gnomAD managed job tests and enhancements
bencap Jan 25, 2026
b671207
feat: uniprot managed job tests and enhancements
bencap Jan 27, 2026
ca61ceb
feat: clingen managed job enhancements
bencap Jan 28, 2026
8131ea8
fixup(variant creation)
bencap Jan 28, 2026
a235a4e
feat: implement job and pipeline factories with definitions and tests
bencap Jan 28, 2026
3d26a7c
feat: integrate PipelineFactory for variant creation and update proce…
bencap Jan 28, 2026
b6e0c83
feat: add context manager for database session management
bencap Jan 28, 2026
92b8c57
feat: use session context manager in worker decorators rather than in…
bencap Jan 28, 2026
344b50f
refactor: streamline context handling in job and pipeline decorators
bencap Jan 28, 2026
36b3915
feat: add new job definitions for score set annotation pipeline
bencap Jan 29, 2026
eca6747
feat: implement AnnotationStatusManager for managing variant annotati…
bencap Jan 29, 2026
fa4c663
feat: add annotation status tracking to jobs
bencap Jan 29, 2026
2aeda22
feat: streamline job results and exception handling in tests
bencap Jan 29, 2026
54043c3
feat: less prescriptive status messages in complete job functions
bencap Jan 29, 2026
ad25a5f
fix: ensure exception info is always present for failed jobs in job m…
bencap Jan 29, 2026
1273b74
fix: move Athena engine fixture to optional conftest for core depende…
bencap Jan 29, 2026
c250dc9
feat: add standalone context creation for worker lifecycle management
bencap Jan 29, 2026
e4c8d7b
feat: add asyncclick dependency and update environment script to use it
bencap Jan 29, 2026
942d2ce
feat: add standalone job definitions and update lifecycle context for…
bencap Jan 29, 2026
072d569
feat: refactor populate_mapped_variant_data to use async and job subm…
bencap Jan 29, 2026
e50a34b
chore: test cleanup
bencap Jan 29, 2026
8efce81
fix: remove ga4gh packages from server group
bencap Jan 23, 2026
7b403ad
docs: minimal developer docs via copilot for worker jobs
bencap Jan 29, 2026
aeb5c08
fix: mypy typing
bencap Jan 29, 2026
20a4e24
fix: test attempting to connect via socket to athena
bencap Jan 29, 2026
29f9c35
feat: add Slack error notifications to job/pipeline decorators
bencap Jan 29, 2026
642a64b
fix: update TODO comments for clarity and specificity in UniProt and …
bencap Jan 29, 2026
9e10bc5
feat: make Redis client optional in managers and add error handling f…
bencap Jan 29, 2026
c3e90db
feat: implement create_job_dependency method in JobFactory with valid…
bencap Jan 29, 2026
1fb23ad
feat: refactor UniProt ID mapping script to use async commands and jo…
bencap Jan 29, 2026
1870eeb
feat: refactor link_gnomad_variants script to use async commands and …
bencap Jan 30, 2026
135f278
feat: refactor clingen_car_submission script to use async commands an…
bencap Jan 30, 2026
d153744
feat: refactor clingen_ldh_submission script to streamline job submis…
bencap Jan 30, 2026
5ee162b
feat: clinvar clinical control refresh job + script
bencap Jan 30, 2026
06f77e7
feat: update annotation type handling to use enum directly and switch…
bencap Feb 4, 2026
bba9e3b
feat: add functions to retrieve associated ClinVar Allele IDs and enh…
bencap Feb 4, 2026
3097942
refactor: remove redundant fixture for setting up sample variants in …
bencap Feb 4, 2026
d37e7e6
chore: add TODO for caching ClinVar control data to improve performance
bencap Feb 4, 2026
d915035
feat: add multiple refresh job definitions for ClinVar controls with …
bencap Feb 4, 2026
33be31f
feat: enhance test workflow to run fast tests on pull requests and fu…
bencap Feb 4, 2026
7614c36
chore: remove deprecated pkg_resources and replace w stdlib. Bump pan…
bencap Feb 13, 2026
93e8519
chore: lock deps
bencap Feb 17, 2026
1198954
feat: add Redis caching for ClinGen API requests to reduce redundant …
bencap Feb 17, 2026
1fb9fdd
feat: add commit option to job progress and status update methods for…
bencap Feb 17, 2026
ae09840
feat: implement stalled job cleanup with unified retry handling
bencap Feb 17, 2026
f120ed5
fix: correct type annotations in cleanup.py
bencap Feb 17, 2026
b556a90
wip: standardize job result contracts
bencap Mar 2, 2026
1b2500b
ai: update instruction files for testing guidance
bencap Mar 12, 2026
4cf2877
WIP vep, hgvs, vt automation
sallybg Apr 7, 2026
4265868
draft: moved lib files and outlined vep changes
sallybg Apr 15, 2026
e3b3751
Update annotation worker job function names
sallybg Apr 15, 2026
56f66af
refactor(worker): restructure monolithic jobs.py into modular archite…
bencap Jan 7, 2026
22b7853
feat: Add comprehensive job traceability system database schema
bencap Jan 7, 2026
5de8fb4
fix(logging): simplify context saving logic to overwrite existing map…
bencap Jan 7, 2026
ad2e7fb
tests: add TransactionSpy class for mocking database transaction meth…
bencap Jan 12, 2026
9a3171e
feat: add BaseManager class with transaction handling and rollback fe…
bencap Jan 12, 2026
2e05a7e
feat: Job manager class, supporting utilities, and unit tests
bencap Jan 12, 2026
dc72637
feat: Pipeline manager class, supporting utilities, and unit tests
bencap Jan 14, 2026
d2c53fc
feat: add function to check if job dependencies are reachable
bencap Jan 16, 2026
bdb7964
feat: add markers for test categorization in pytest
bencap Jan 16, 2026
39d89c9
fix: mock job manager returning in fixture rather than yielding
bencap Jan 17, 2026
a1b254b
fix: enhance error logging for job and pipeline state transitions
bencap Jan 17, 2026
8c79577
fix: re-order imports in job manager test file
bencap Jan 17, 2026
411dc52
fix: use conftest_optional import structure in worker test module
bencap Jan 17, 2026
7631090
feat: Add decorators for job and pipeline management
bencap Jan 20, 2026
213f966
feat: use context for logging in job manager
bencap Jan 20, 2026
abfa82d
feat: decorator for job run record guarantees
bencap Jan 21, 2026
4379467
feat: add test mode support to job and pipeline decorators
bencap Jan 21, 2026
a98d7e7
fix: simplify exc handling in job management decorator
bencap Jan 21, 2026
45e166a
feat: allow pipelines to be started by decorated jobs
bencap Jan 22, 2026
4e9b22b
tests: unit tests for worker manager utilities
bencap Jan 22, 2026
0b78253
feat: add network test marker and control socket access in pytest
bencap Jan 22, 2026
79c0df4
Refactor test setup by replacing `setup_worker_db` with `with_populat…
bencap Jan 22, 2026
4509899
wip: refactor jobs to use job management system
bencap Jan 22, 2026
53f6722
refactor: reduce mocking of database across worker tests
bencap Jan 23, 2026
4919dca
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
5ebe2a5
refactor: simplify job definition in job management tests
bencap Jan 23, 2026
2cabfb5
refactor: centralize decorator test mode flag fixture
bencap Jan 23, 2026
bfb0f7a
feat: enhance pipeline start logic with controllable coordination
bencap Jan 24, 2026
cb9e164
feat: logic fixups and comprehensive test cases for variant processin…
bencap Jan 24, 2026
dbe770f
feat: add start_pipeline job and related tests for pipeline management
bencap Jan 24, 2026
65f11bc
feat: gnomAD managed job tests and enhancements
bencap Jan 25, 2026
65c8c36
feat: uniprot managed job tests and enhancements
bencap Jan 27, 2026
0130963
feat: clingen managed job enhancements
bencap Jan 28, 2026
2c6b6c9
fixup(variant creation)
bencap Jan 28, 2026
9b66f51
feat: implement job and pipeline factories with definitions and tests
bencap Jan 28, 2026
38e028b
feat: integrate PipelineFactory for variant creation and update proce…
bencap Jan 28, 2026
e866136
feat: add context manager for database session management
bencap Jan 28, 2026
7764008
feat: use session context manager in worker decorators rather than in…
bencap Jan 28, 2026
3569ae6
refactor: streamline context handling in job and pipeline decorators
bencap Jan 28, 2026
b5691b6
feat: add new job definitions for score set annotation pipeline
bencap Jan 29, 2026
8dc3051
feat: implement AnnotationStatusManager for managing variant annotati…
bencap Jan 29, 2026
48c4928
feat: add annotation status tracking to jobs
bencap Jan 29, 2026
d5d9339
feat: streamline job results and exception handling in tests
bencap Jan 29, 2026
08b97fe
feat: less prescriptive status messages in complete job functions
bencap Jan 29, 2026
8a34bfc
fix: ensure exception info is always present for failed jobs in job m…
bencap Jan 29, 2026
c3b5c0a
fix: move Athena engine fixture to optional conftest for core depende…
bencap Jan 29, 2026
9614184
feat: add standalone context creation for worker lifecycle management
bencap Jan 29, 2026
a75295d
feat: add asyncclick dependency and update environment script to use it
bencap Jan 29, 2026
3d32baf
feat: add standalone job definitions and update lifecycle context for…
bencap Jan 29, 2026
4c6e61a
feat: refactor populate_mapped_variant_data to use async and job subm…
bencap Jan 29, 2026
2d64a8d
chore: test cleanup
bencap Jan 29, 2026
c44726b
docs: minimal developer docs via copilot for worker jobs
bencap Jan 29, 2026
5fc19a4
fix: mypy typing
bencap Jan 29, 2026
722ca72
fix: test attempting to connect via socket to athena
bencap Jan 29, 2026
5ab1215
feat: add Slack error notifications to job/pipeline decorators
bencap Jan 29, 2026
947e78c
fix: update TODO comments for clarity and specificity in UniProt and …
bencap Jan 29, 2026
ed48980
feat: make Redis client optional in managers and add error handling f…
bencap Jan 29, 2026
0e916ac
feat: implement create_job_dependency method in JobFactory with valid…
bencap Jan 29, 2026
fe9742c
feat: refactor UniProt ID mapping script to use async commands and jo…
bencap Jan 29, 2026
adce263
feat: refactor link_gnomad_variants script to use async commands and …
bencap Jan 30, 2026
24efdeb
feat: refactor clingen_car_submission script to use async commands an…
bencap Jan 30, 2026
4861214
feat: refactor clingen_ldh_submission script to streamline job submis…
bencap Jan 30, 2026
6442a42
feat: clinvar clinical control refresh job + script
bencap Jan 30, 2026
29adafc
feat: update annotation type handling to use enum directly and switch…
bencap Feb 4, 2026
fcbcf32
feat: add functions to retrieve associated ClinVar Allele IDs and enh…
bencap Feb 4, 2026
7c9f11f
refactor: remove redundant fixture for setting up sample variants in …
bencap Feb 4, 2026
050838e
chore: add TODO for caching ClinVar control data to improve performance
bencap Feb 4, 2026
ecaf1f0
feat: add multiple refresh job definitions for ClinVar controls with …
bencap Feb 4, 2026
a5c6437
feat: enhance test workflow to run fast tests on pull requests and fu…
bencap Feb 4, 2026
bddba7a
feat: add Redis caching for ClinGen API requests to reduce redundant …
bencap Feb 17, 2026
4ea63d5
feat: add commit option to job progress and status update methods for…
bencap Feb 17, 2026
c34741c
feat: implement stalled job cleanup with unified retry handling
bencap Feb 17, 2026
7fbcbbe
fix: correct type annotations in cleanup.py
bencap Feb 17, 2026
6ec194f
wip: standardize job result contracts
bencap Mar 2, 2026
dd150b1
ai: update instruction files for testing guidance
bencap Mar 12, 2026
5c388e1
feat(mapping): populate hgvs_assay_level when creating mapped variants
bencap Apr 15, 2026
12ba7e1
fix: update down_revision to correct previous migration reference
bencap Apr 16, 2026
dafc4b0
fix(types): update JobExecutionOutcome home to avoid circular imports
bencap Apr 16, 2026
bccaff7
build(deps): upgrade pytest-postgresql from ~5.0.0 to ~7.0.0
bencap Apr 16, 2026
1f5516e
fix(mapping): update VRS mapping version key to dcd_mapping_version
bencap Apr 16, 2026
9e3b8c1
feat(annotation): add replace_all_versions flag to add_annotation
bencap Apr 16, 2026
1942e94
fix(logging): change log level from info to debug for added annotation
bencap Apr 16, 2026
b05d8ed
fix(mapping): improve error handling and logging for variant mapping …
bencap Apr 16, 2026
d57deec
fix(mapping): correct version retrieval in add_annotation for mapping…
bencap Apr 16, 2026
840e2ea
Add standalone job definitions for new post-mapping jobs
sallybg Apr 15, 2026
8c4ab61
Add annotation types for new post-mapping annotations
sallybg Apr 15, 2026
d93c20e
Add worker job definitions
sallybg Apr 15, 2026
ee13b12
fix(cache): update default Redis host to 'redis' for better compatibi…
bencap Apr 16, 2026
64976a6
refactor(clingen): streamline API calls and enhance caching for allel…
bencap Apr 16, 2026
2642388
refactor(hgvs): rewrite HGVS population as worker job with tests
bencap Apr 16, 2026
30365e3
feat(worker): add variant translation worker job for PA<->CA allele r…
bencap Apr 17, 2026
c0793d3
refactor(annotations): rename success_data to annotation_metadata
bencap Apr 17, 2026
0b4d242
feat(job): add SYSTEM_MAINTENANCE job type to JobType enum
bencap Apr 17, 2026
ddf34c5
feat(decorator): add job_id validation to with_guaranteed_job_run_record
bencap Apr 17, 2026
3277246
feat(jobs): rename cleanup_stalled_jobs cron job and add standalone j…
bencap Apr 17, 2026
2177183
feat(pipeline): add map_annotate_score_set pipeline with variant mapp…
bencap Apr 17, 2026
8da20d2
feat(clingen): enhance CAR submission handling with error logging and…
bencap Apr 17, 2026
bece3e1
feat(decorators): implement task-local session management in ensure_s…
bencap Apr 17, 2026
afbebf1
build(dependencies): pin setuptools version to avoid compatibility is…
bencap Apr 17, 2026
13c5e56
fix(pipeline): commit status changes to prevent deadlocks during job …
bencap Apr 17, 2026
d77be8d
feat(job-management): add cancellation check for jobs in terminal sta…
bencap Apr 17, 2026
2efeafd
Refactor job and pipeline management documentation
bencap Apr 17, 2026
f1fdfdf
Add worker job definitions
sallybg Apr 15, 2026
b44f3eb
Add annotation types for new post-mapping annotations
sallybg Apr 15, 2026
ab669ab
Add standalone job definitions for new post-mapping jobs
sallybg Apr 15, 2026
d222eff
feat(clinvar): enhance NCBI session management with retry strategy an…
bencap Apr 18, 2026
924f31d
feat(clingen): implement cache pre-warming job to optimize downstream…
bencap Apr 19, 2026
0dec09f
feat(clinvar): consolidate ClinVar refresh job to process all archiva…
bencap Apr 19, 2026
332c189
feat(definitions): update job dependencies to use warm_clingen_cache …
bencap Apr 19, 2026
5cabeba
feat(job_manager): ensure logging context is a instance level job var…
bencap Apr 19, 2026
b3a48bf
feat(clinvar): refactor ClinVar data fetching and parsing
bencap Apr 19, 2026
0c5678d
feat(definitions): remove redundant job definition for VEP population
bencap Apr 19, 2026
44f63b0
feat(variant_annotation_status): drop redundant indexes to optimize w…
bencap Apr 19, 2026
9e6dfb0
feat(annotation_status_manager): implement batched writes and auto-fl…
bencap Apr 19, 2026
e32e116
feat(variant_annotation_status): simplify primary key to use only 'id…
bencap Apr 19, 2026
f2f6344
feat(annotation_status_manager): add methods for retrieving annotatio…
bencap Apr 19, 2026
eb1418b
feat(annotation_failure_category): add failure category enum and upda…
bencap Apr 20, 2026
169a254
feat(run_jobs): add scripts for running standalone jobs and pipelines…
bencap Apr 20, 2026
e821cab
feat(worker): add failure categorization, Slack safety, stale job
bencap Apr 20, 2026
acec94a
feat(logging): enhance logging for allele processing in jobs
bencap Apr 20, 2026
3202cf5
feat(variant_processing): refine error handling in variant creation a…
bencap Apr 20, 2026
1fb75ef
feat(lifecycle): set maximum workers for process pool in startup hook
bencap Apr 20, 2026
d71a3b3
feat(cleanup): adjust timeout thresholds for stalled jobs and improve…
bencap Apr 20, 2026
7b3ae08
feat(best_practices): add idempotency contract guidelines for job fun…
bencap Apr 20, 2026
680e9b4
Refactor tests to remove progress update assertions
bencap Apr 20, 2026
be21683
fix(worker): add TODO#715 for migrating to an async pg driver
bencap Apr 20, 2026
77f3589
feat(score sets): handle errors during score set validation enqueue
bencap Apr 20, 2026
6661030
feat(job runs): add admin endpoints for job run monitoring and implem…
bencap Apr 20, 2026
aadf490
fix(tests): update Redis endpoint in cache backend configuration test
bencap Apr 20, 2026
2981f46
refactor(tests): rename tests to use fetch_clinvar_variant_data and u…
bencap Apr 20, 2026
5dc75d8
refactor(job runs): remove priority column and related constraints fr…
bencap Apr 20, 2026
708056f
feat: add admin-only observability endpoints for job runs and pipelines
bencap Apr 20, 2026
b46a90b
fix: update version to 2026.1.2 for release
bencap Apr 20, 2026
e571be8
fix(tests): ensure slack_sdk is imported for Slack notification tests
bencap Apr 20, 2026
eaabb1c
fix(mypy): fix all mypy type errors
bencap Apr 20, 2026
4c00ad4
refactor(cleanup): enhance handling of stalled pipeline jobs based on…
bencap Apr 21, 2026
b1f2c6f
refactor: remove deprecated script for mapping UniProt IDs from metadata
bencap Apr 22, 2026
fa5634b
fix(worker): clear stale ARQ keys in prepare_retry to unblock re-enqu…
bencap Apr 22, 2026
0104f02
perf(clingen-cache): warm cache with bounded concurrency instead of s…
bencap Apr 24, 2026
f5dc0f8
fix(worker): use per-attempt arq job ids to make retries safe
bencap Apr 24, 2026
8f561f9
fix(mapping): remove duplicate slack error notifications
bencap Apr 24, 2026
39c2b16
fix(cleanup): use ARQ Redis presence check for stalled QUEUED jobs
bencap Apr 24, 2026
81c6224
feat(cleanup): add handling for stuck pipelines without active jobs
bencap Apr 25, 2026
f7487d7
feat(annotations): add external_service_rejected failure category and…
bencap Apr 27, 2026
bbca6bf
fix(clingen): return succeeded on partial CAR rejection, failed only …
bencap Apr 27, 2026
07d42d7
feat(worker): alert via slack when leaf annotation jobs fail for all …
bencap Apr 27, 2026
4916d4e
Refactor VEP job to handle batching outside of data-fetching function…
sallybg Apr 29, 2026
7d76cca
Remove unused code
sallybg Apr 29, 2026
b602395
Merge branch 'feature/sally/561/vep-hgvs-vt-automation-with-newest-ch…
bencap Apr 29, 2026
481583f
fix(annotation-type): Remove duplicate annotation type
bencap Apr 29, 2026
97579c4
fix(hgvs): Remove unnecessary lib stub for HGVS manipulation
bencap Apr 30, 2026
7cc6f73
feat(vep): refactor variant recoder and consequence functions to supp…
bencap Apr 30, 2026
98a9322
feat(vep): update populate_vep_for_score_set to handle async conseque…
bencap Apr 30, 2026
0428ce3
feat(vep): make VEP and recoder batch sizes configurable in populate_…
bencap Apr 30, 2026
18ca195
fix(vep): update recoder batch size from 100 to 25 for improved perfo…
bencap Apr 30, 2026
522f522
fix(uniprot): treat per-gene mapping failures as DATA_ERROR instead o…
bencap May 1, 2026
2217be4
feat(worker): add structured Slack alerts for job failures and refact…
bencap May 1, 2026
5fd063b
refactor(worker): enforce flush-before-return and fix terminal progre…
bencap May 1, 2026
f13d386
feat(worker): suppress slack alerts on retried failures; add retry co…
bencap May 1, 2026
9137f1e
refactor(worker): inline should_retry() calls; drop pre-computed will…
bencap May 1, 2026
8d7a9f8
perf(clinvar): reduce peak memory usage when parsing ClinVar TSV files
bencap May 1, 2026
55f4fdd
perf(vep): run variant recoder batches concurrently with semaphore
bencap May 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 56 additions & 22 deletions .github/instructions/api.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,32 +109,66 @@ responses=shared_responses # Defines 4xx/5xx response schemas

## Worker Integration

### Job Pipeline
Many operations chain through multiple worker jobs:
1. `create_variants_for_score_set` — Parse uploaded CSV, create variant records
2. `map_variants_for_score_set` — Map variants via DCD Mapping / VRS
3. `submit_score_set_mappings_to_*` — Submit to ClinGen services
### Pipeline System

Most write operations trigger a multi-step pipeline via the worker:

```python
from mavedb.lib.workflow.pipeline_factory import PipelineFactory

# In a router endpoint:
pipeline, entrypoint_job_run = PipelineFactory.create_pipeline(
db=db,
name="validate_map_annotate_score_set",
pipeline_params={
"score_set_id": score_set.id,
"updater_id": user_data.user.id,
"correlation_id": logging_context().get("correlation_id"),
},
)
db.commit()

await worker.enqueue_job("start_pipeline", entrypoint_job_run.id)
```

This creates a `Pipeline` with multiple `JobRun` records and `JobDependency` records, then enqueues the pipeline's `start_pipeline` entrypoint in ARQ. The worker coordinates the rest — each job runs after its dependencies complete.

### Job Function Signature

All job functions follow this signature (the decorator injects `job_manager`):

### Job Patterns
```python
async def create_variants_for_score_set(ctx: dict, score_set_id: int, correlation_id: str):
logging_context = setup_job_state(ctx, correlation_id)
db = ctx["db"]

try:
# ... processing ...
pass
except Exception as e:
send_slack_error(e, logging_context)
raise
@with_pipeline_management
async def create_variants_for_score_set(
ctx: dict, job_id: int, job_manager: JobManager
) -> JobExecutionOutcome:
job = job_manager.get_job()
validate_job_params(["score_set_id", "correlation_id", "updater_id"], job)
# ... business logic using job_manager.db ...
return JobExecutionOutcome.succeeded(data={"variants_created": count})
```

### Backoff and Retry
Use `enqueue_job_with_backoff()` for jobs that may need retries (e.g., external service calls).
Callers pass only `ctx` and `job_id` when enqueueing. The decorator creates the `JobManager` from the `job_id`.

### Correlation IDs

Correlation IDs flow from the API request through the pipeline to each job:

## Correlation IDs
Every request gets a correlation ID via starlette-context middleware. Pass it to worker jobs for end-to-end request tracing:
```python
from mavedb.lib.logging.context import save_to_logging_context
correlation_id = save_to_logging_context({"score_set_urn": urn})
# In the router — capture correlation ID from starlette-context
from mavedb.lib.logging.context import save_to_logging_context, logging_context

save_to_logging_context({"score_set_urn": urn})
correlation_id = logging_context().get("correlation_id")

# Pass to pipeline via pipeline_params
pipeline, entrypoint = PipelineFactory.create_pipeline(
db=db,
name="validate_map_annotate_score_set",
pipeline_params={"correlation_id": correlation_id, ...},
)
```

Each job retrieves the correlation ID from its `job_params` and uses `job_manager.save_to_context()` for structured logging.

For detailed worker conventions, see `.github/instructions/worker.instructions.md` and `src/mavedb/worker/README.md`.
122 changes: 118 additions & 4 deletions .github/instructions/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,17 @@ src/mavedb/
├── models/ # SQLAlchemy ORM models
├── view_models/ # Pydantic request/response models
├── routers/ # API endpoint handlers
├── worker/ # ARQ background jobs
│ ├── jobs.py # Job implementations
│ └── settings.py # Worker config, function registry, cron jobs
├── worker/ # ARQ background worker system
│ ├── jobs/ # Job function implementations (by category)
│ │ ├── registry.py # Central registry of all jobs, cron definitions
│ │ ├── variant_processing/ # Variant creation and mapping
│ │ ├── external_services/ # ClinGen, ClinVar, gnomAD, UniProt
│ │ ├── pipeline_management/ # Pipeline entrypoint (start_pipeline)
│ │ └── system/ # Cron maintenance (cleanup stalled jobs)
│ ├── lib/ # Infrastructure layer
│ │ ├── decorators/ # @with_pipeline_management, @with_job_management
│ │ └── managers/ # JobManager, PipelineManager state management
│ └── settings/ # ARQ worker config, lifecycle hooks
├── lib/ # Shared utilities
│ ├── authentication.py # ORCID JWT + API key auth
│ ├── authorization.py # Permission checks
Expand Down Expand Up @@ -94,7 +102,7 @@ Do not comment obvious operations, variable assignments, or code that is self-ex
- **Structured logging**: Use `logger` with `extra=logging_context()` for correlation IDs via starlette-context
- **HTTP exceptions**: FastAPI `HTTPException` with appropriate status codes
- **Domain exceptions**: `src/mavedb/lib/exceptions.py` — `MixedTargetError`, `NonexistentOrcidError`, etc.
- **Worker errors**: `send_slack_error()` + full logging context
- **Worker errors**: `send_slack_job_error()` or `send_slack_job_error()` + full logging context
- **Validation errors**: Two distinct classes exist:
- `src/mavedb/lib/validation/exceptions.py` — validation package exceptions
- `src/mavedb/lib/exceptions.py` — legacy `ValidationError` (Django-style, used in some older code)
Expand Down Expand Up @@ -140,3 +148,109 @@ poetry run python -m mavedb.scripts.<script_name>
- [server_main.py](src/mavedb/server_main.py) — App setup and dependency injection
- [authentication.py](src/mavedb/lib/authentication.py) — Auth patterns
- [conftest.py](tests/conftest.py) — Test fixtures and database setup

### Naming Conventions
- **Variables & functions**: `snake_case` (e.g., `score_set_id`, `create_variants_for_score_set`)
- **Classes**: `PascalCase` (e.g., `ScoreSet`, `UserData`, `ProcessingState`)
- **Constants**: `UPPER_SNAKE_CASE` (e.g., `MAPPING_QUEUE_NAME`, `DEFAULT_LDH_SUBMISSION_BATCH_SIZE`)
- **Enum values**: `snake_case` (e.g., `ProcessingState.success`, `MappingState.incomplete`)
- **Database tables**: `snake_case` with descriptive association table names (e.g., `scoreset_contributors`, `experiment_set_doi_identifiers`)
- **API endpoints**: kebab-case paths (e.g., `/score-sets`, `/experiment-sets`)

### Documentation Conventions
*For general Python documentation standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific additions:*

- **Algorithm explanations**: Include comments explaining complex logic, especially URN generation and bioinformatics operations
- **Design decisions**: Comment on why certain architectural choices were made
- **External dependencies**: Explain purpose of external bioinformatics libraries (HGVS, SeqRepo, etc.)
- **Bioinformatics context**: Document biological reasoning behind genomic data processing patterns

### Commenting Guidelines
**Core Principle: Write self-explanatory code. Comment only to explain WHY, not WHAT.**

**✅ WRITE Comments For:**
- **Complex bioinformatics algorithms**: Variant mapping algorithms, external service interactions
- **Business logic**: Why specific validation rules exist, regulatory requirements
- **External API constraints**: Rate limits, data format requirements
- **Non-obvious calculations**: Score normalization, statistical methods
- **Configuration values**: Why specific timeouts, batch sizes, or thresholds were chosen

**❌ AVOID Comments For:**
- **Obvious operations**: Variable assignments, simple loops, basic conditionals
- **Redundant descriptions**: Comments that repeat what the code clearly shows
- **Outdated information**: Comments that don't match current implementation

### Error Handling Conventions
- **Structured logging**: Always use `logger` with `extra=logging_context()` for correlation IDs
- **HTTP exceptions**: Use FastAPI `HTTPException` with appropriate status codes and descriptive messages
- **Custom exceptions**: Define domain-specific exceptions in `src/mavedb/lib/exceptions.py`
- **Worker job errors**: Send Slack notifications via `send_slack_job_error()` or `send_slack_job_failure()` and log with full context
- **Validation errors**: Use Pydantic validators and raise `ValueError` with clear messages

### Code Style and Organization Conventions
*For general Python style conventions, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Async patterns**: Use `async def` for I/O operations, regular functions for CPU-bound work
- **Database operations**: Use SQLAlchemy 2.0 style with `session.scalars(select(...)).one()`
- **Pydantic models**: Separate request/response models with clear inheritance hierarchies
- **Bioinformatics data flow**: Structure code to clearly show genomic data transformations

### Testing Conventions
*For testing philosophy, mocking boundaries, and conventions see `.github/instructions/testing.instructions.md`. For general Python testing standards, see `.github/instructions/python.instructions.md`. The following are MaveDB-specific patterns:*

- **Test function naming**: Use descriptive names that reflect bioinformatics operations (e.g., `test_cannot_publish_score_set_without_variants`)
- **Fixtures**: Use `conftest.py` for shared fixtures, especially database and worker setup
- **Mocking**: Mock only at system boundaries (external services, Redis/ARQ, Slack). Do not mock internal helpers or `update_progress`
- **Constants**: Define test data including genomic sequences and variants in `tests/helpers/constants.py`
- **Integration testing**: Test full bioinformatics workflows including external service interactions

## Codebase Conventions

### URN Validation
- Use regex patterns from `src/mavedb/lib/validation/urn_re.py`
- Validate URNs in Pydantic models with `@field_validator`
- URN generation logic in `src/mavedb/lib/urns.py` and `temp_urns.py`

### Worker Jobs (ARQ/Redis)
- **Two-layer architecture**: Infrastructure (decorators + managers) handles lifecycle/state; business layer (jobs/) implements domain logic
- **Job registry**: All jobs registered in `src/mavedb/worker/jobs/registry.py` — `BACKGROUND_FUNCTIONS`, `BACKGROUND_CRONJOBS`, `STANDALONE_JOB_DEFINITIONS`
- **Job function signature**: `async def job_name(ctx: dict, job_id: int, job_manager: JobManager) -> JobExecutionOutcome` — `job_manager` is injected by the decorator, not passed by callers
- **Decorators**: `@with_pipeline_management` (most jobs), `@with_job_management` (standalone), `@with_guaranteed_job_run_record` (cron/auto-created JobRun)
- **Pipeline system**: `PipelineFactory.create_pipeline()` creates Pipeline + JobRun + JobDependency records from definitions in `src/mavedb/lib/workflow/definitions.py`
- **Session management**: Task-local DB sessions via `ContextVar` prevent concurrent ARQ jobs from sharing sessions
- **Commit discipline**: Decorators commit lifecycle state changes; `update_progress()` commits as a checkpoint; job code should NOT commit
- **Key job types**:
- `create_variants_for_score_set` - Parse uploaded CSV, create variant records
- `map_variants_for_score_set` - Map variants via DCD Mapping / VRS
- `submit_score_set_mappings_to_car/ldh` - Submit to ClinGen services
- `cleanup_stalled_jobs` - Cron job for recovering stuck jobs
- **Enqueueing pipelines**: Routers call `PipelineFactory.create_pipeline()` then `ArqRedis.enqueue_job("start_pipeline", ...)` with the pipeline's entrypoint JobRun ID
- **Detailed documentation**: See `src/mavedb/worker/README.md` and `.github/instructions/worker.instructions.md`

### View Models (Pydantic)
- **Base model** (`src/mavedb/view_models/base/base.py`) converts empty strings to None and uses camelCase aliases
- **Inheritance patterns**: `Base` → `Create` → `Modify` → `Saved` model hierarchy
- **Field validation**: Use `@field_validator` for single fields, `@model_validator(mode="after")` for cross-field validation
- **URN validation**: Validate URNs with regex patterns from `urn_re.py` in field validators
- **Transform functions**: Use functions in `validation/transform.py` for complex data transformations
- **Separate models**: Request (`Create`, `Modify`) vs response (`Saved`) models with different field requirements

### External Integrations
- **HGVS/SeqRepo** for genomic sequence operations
- **DCD Mapping** for variant mapping and VRS transformation
- **CDOT** for transcript/genomic coordinate conversion
- **GA4GH VRS** for variant representation standardization
- **ClinGen services** for allele registry and linked data hub submissions

## Key Files to Reference
- `src/mavedb/models/score_set.py` - Primary data model patterns
- `src/mavedb/routers/score_sets.py` - Complex router with worker integration
- `src/mavedb/worker/jobs/registry.py` - Job registration and available functions
- `src/mavedb/worker/jobs/variant_processing/creation.py` - Reference pipeline job implementation
- `src/mavedb/lib/workflow/definitions.py` - Pipeline and job definitions
- `src/mavedb/view_models/score_set.py` - Pydantic model hierarchy examples
- `src/mavedb/server_main.py` - Application setup and dependency injection
- `src/mavedb/data_providers/services.py` - External service integration patterns
- `src/mavedb/lib/authentication.py` - Authentication and authorization patterns
- `tests/conftest.py` - Test fixtures and database setup
- `docker-compose-dev.yml` - Service architecture and dependencies
Loading
Loading