feat: implement GET /run/{id} endpoints (#37) by saathviksheerla · Pull Request #319 · openml/server-api

saathviksheerla · 2026-04-21T12:56:02Z

Resolves #37.
This PR introduces the GET and POST /run/{id} endpoints for the OpenML Python API

Overview:

Database Concurrency: Leverages asyncio.gather to concurrently query across expdb and userdb tables in a single bound, significantly reducing database I/O latency.
Modernized Schemas: Replicated standard PHP mappings into Python Pydantic models.
Bloat Reduction: Specifically addressed the evaluation engine formatting to only return aggregate scores, avoiding the bloat of repeating evaluation outputs for each fold.
Testing: Added robust comprehensive unit tests and automated PHP DeepDiff comparisons checking IDs 24 through 34 (tests/routers/openml/runs_get_test.py), enforcing complete structural parity.

Checklist

Always:

I have performed a self-review of my own pull request to ensure it contains all relevant information, and the proposed changes are minimal but sufficient to accomplish their task.

Required for code changes:

Tests pass locally
I have commented my code in hard-to-understand areas, and provided or updated docstrings as needed
I have added tests that cover the changes (only required if not already under coverage)

If applicable:

I have made corresponding changes to the documentation pages (/docs)

coderabbitai · 2026-04-21T12:56:11Z

Walkthrough

Adds GET and POST endpoints at /run/{run_id} returning full run metadata. Introduces new DB helpers in src/database/runs.py to fetch runs, uploader name, tags, input datasets, output files, evaluations (filtered by configured evaluation_engine_ids), task type, and task evaluation measure. Adds src/config.load_run_configuration and a [run] section with evaluation_engine_ids in config.toml. Extends src/schemas/runs.py with Pydantic models for run, parameters, datasets, files, and evaluations. Updates flows query to include implementation.fullName. Adds comprehensive API and DB tests.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 79.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: implementing GET and POST /run/{id} endpoints, which is the primary objective of this pull request.
Description check	✅ Passed	The description is well-related to the changeset, providing comprehensive details about the implementation approach, design decisions, and test coverage for the new endpoints.
Linked Issues check	✅ Passed	The PR fully implements the GET /run/{id} endpoint as requested in issue `#37`, with comprehensive database queries, schema definitions, endpoint handler, and test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the GET /run/{id} endpoints: database helpers, schema models, endpoint handler, configuration for evaluation filtering, and comprehensive tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-21T12:57:28Z

Codecov Report

❌ Patch coverage is 98.72611% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.37%. Comparing base (621e1c3) to head (4461c63).

Files with missing lines	Patch %	Lines
tests/routers/openml/runs_get_test.py	98.18%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #319      +/-   ##
==========================================
- Coverage   93.53%   89.37%   -4.16%     
==========================================
  Files          67       68       +1     
  Lines        3109     3418     +309     
  Branches      221      232      +11     
==========================================
+ Hits         2908     3055     +147     
- Misses        143      312     +169     
+ Partials       58       51       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

The concurrent asyncio.gather calls for multiple queries all reuse the same expdb/userdb AsyncConnection; many async DB drivers do not permit overlapping operations on a single connection, so consider either using a pool (distinct connections) or running these queries sequentially if the driver cannot safely multiplex.
In get_uploader_name, CONCAT(first_name, ' ', last_name) will produce awkward values when either component is NULL; using CONCAT_WS(' ', first_name, last_name) (or equivalent COALESCE logic) would give cleaner uploader_display values and avoid trailing/leading spaces or "None"-like artifacts.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The concurrent `asyncio.gather` calls for multiple queries all reuse the same `expdb`/`userdb` `AsyncConnection`; many async DB drivers do not permit overlapping operations on a single connection, so consider either using a pool (distinct connections) or running these queries sequentially if the driver cannot safely multiplex.
- In `get_uploader_name`, `CONCAT(first_name, ' ', last_name)` will produce awkward values when either component is NULL; using `CONCAT_WS(' ', first_name, last_name)` (or equivalent COALESCE logic) would give cleaner uploader_display values and avoid trailing/leading spaces or "None"-like artifacts.

## Individual Comments

### Comment 1
<location path="src/routers/openml/runs.py" line_range="144-146" />
<code_context>
+            name=row.name,
+            # Whole-number floats (e.g. counts) are converted to int to match PHP's
+            # integer representation. e.g. 253.0 → 253, 0.0 → 0.
+            value=int(row.value)
+            if isinstance(row.value, float) and row.value.is_integer()
+            else row.value,
+            array_data=row.array_data,
+        )
</code_context>
<issue_to_address>
**suggestion:** Normalizing integer-like metric values only for floats may miss Decimal or string-encoded numerics.

This handles only `float` values; if the driver returns `Decimal` (or string numerics), whole-number values won’t be coerced to `int` and will differ from the PHP representation. Consider centralising this in a helper that casts any numeric with a zero fractional part to `int` (and maybe handles `Decimal` explicitly) if parity with PHP is required.
</issue_to_address>

### Comment 2
<location path="tests/routers/openml/runs_get_test.py" line_range="57-66" />
<code_context>
+    assert get_resp.json() == post_resp.json()
+
+
+async def test_get_run_top_level_shape(py_api: httpx.AsyncClient) -> None:
+    """Response contains all expected top-level keys."""
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    run = response.json()
+    expected_keys = {
+        "run_id",
+        "uploader",
+        "uploader_name",
+        "task_id",
+        "task_type",
+        "flow_id",
+        "flow_name",
+        "setup_id",
+        "setup_string",
+        "parameter_setting",
+        "error",
+        "tag",
+        "input_data",
+        "output_data",
+    }
+    assert expected_keys <= run.keys(), f"Missing keys: {expected_keys - run.keys()}"
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an explicit test that `task_evaluation_measure` is omitted from the JSON when null/empty.

The code relies on `task_evaluation_measure` being normalized to `None` and excluded via `response_model_exclude_none=True`, but we only assert that expected keys are present. Please add a test (e.g., using task 115) that explicitly checks `"task_evaluation_measure" not in response.json()` when no measure is configured, to lock in this behavior and guard against future schema or exclusion changes.

Suggested implementation:

```python
    """GET and POST /run/{id} return identical JSON bodies."""
    get_resp, post_resp = await asyncio.gather(
        py_api.get(f"/run/{_RUN_ID}"),
        py_api.post(f"/run/{_RUN_ID}"),
    )
    assert get_resp.status_code == HTTPStatus.OK
    assert post_resp.status_code == HTTPStatus.OK
    assert get_resp.json() == post_resp.json()


async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
    """task_evaluation_measure is not present in JSON when no measure is configured for the task."""
    response = await py_api.get(f"/run/{_RUN_ID_NO_EVAL_MEASURE}")
    run = response.json()

    # The field should be omitted entirely when normalized to None and excluded via response_model_exclude_none=True
    assert "task_evaluation_measure" not in run


import asyncio

```

1. Define `_RUN_ID_NO_EVAL_MEASURE` in this test module (or reuse an existing constant) and set it to the ID of a run whose task has no evaluation measure configured, for example a run on task 115 as noted in your comment.
2. Ensure the fixture or test data you use for `_RUN_ID_NO_EVAL_MEASURE` reflects a task with `task_evaluation_measure` normalized to `None`, so the test correctly locks in the exclusion behavior.
</issue_to_address>

### Comment 3
<location path="src/routers/openml/runs.py" line_range="68" />
<code_context>
+    response_model_exclude_none=True,
+)
+@router.get("/{run_id}", response_model_exclude_none=True)
+async def get_run(
+    run_id: int,
+    expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the async context loading and result-mapping logic from `get_run` into small helper functions to make the endpoint body more linear and readable.

You can reduce the cognitive load of `get_run` by extracting the “context loading” and the “mapping” into focused helpers. This keeps behaviour identical but removes the big `gather` tuple and long inline mapping blocks.

For example, pull the `asyncio.gather` and associated types into a small helper:

```python
from dataclasses import dataclass
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from sqlalchemy import Row
    from sqlalchemy.ext.asyncio import AsyncConnection

@dataclass
class RunContext:
    uploader_name: str | None
    tags: list[str]
    input_data_rows: list[Row]
    output_file_rows: list[Row]
    evaluation_rows: list[Row]
    task_type: str | None
    task_evaluation_measure: str | None
    setup: Row | None
    parameter_rows: list[Row]

async def _load_run_context(
    run: "Row",
    run_id: int,
    expdb: AsyncConnection,
    userdb: AsyncConnection,
    engine_ids: list[int],
) -> RunContext:
    (
        uploader_name,
        tags,
        input_data_rows,
        output_file_rows,
        evaluation_rows,
        task_type,
        task_evaluation_measure,
        setup,
        parameter_rows,
    ) = await asyncio.gather(
        database.runs.get_uploader_name(run.uploader, userdb),
        database.runs.get_tags(run_id, expdb),
        database.runs.get_input_data(run_id, expdb),
        database.runs.get_output_files(run_id, expdb),
        database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
        database.runs.get_task_type(run.task_id, expdb),
        database.runs.get_task_evaluation_measure(run.task_id, expdb),
        database.setups.get(run.setup, expdb),
        database.setups.get_parameters(run.setup, expdb),
    )
    return RunContext(
        uploader_name=uploader_name,
        tags=tags,
        input_data_rows=input_data_rows,
        output_file_rows=output_file_rows,
        evaluation_rows=evaluation_rows,
        task_type=task_type,
        task_evaluation_measure=task_evaluation_measure,
        setup=setup,
        parameter_rows=parameter_rows,
    )
```

Then the endpoint body can become more linear and self-describing:

```python
async def get_run(...):
    run = await database.runs.get(run_id, expdb)
    if run is None:
        raise RunNotFoundError(f"Run {run_id} not found.", code=236)

    engine_ids: list[int] = config.load_run_configuration().get(
        "evaluation_engine_ids", [1]
    )
    ctx = await _load_run_context(run, run_id, expdb, userdb, engine_ids)

    flow = await database.flows.get(ctx.setup.implementation_id, expdb) if ctx.setup else None

    parameter_settings = _build_parameter_settings(ctx.parameter_rows)
    input_datasets = _build_input_datasets(ctx.input_data_rows)
    output_files = _build_output_files(ctx.output_file_rows)
    evaluations = _build_evaluations(ctx.evaluation_rows)

    normalised_measure = ctx.task_evaluation_measure or None
    error_messages = [run.error_message] if run.error_message else []

    return Run(
        run_id=run_id,
        uploader=run.uploader,
        uploader_name=ctx.uploader_name,
        task_id=run.task_id,
        task_type=ctx.task_type,
        task_evaluation_measure=normalised_measure,
        flow_id=ctx.setup.implementation_id if ctx.setup else 0,
        flow_name=flow.full_name if flow else None,
        setup_id=run.setup,
        setup_string=ctx.setup.setup_string if ctx.setup else None,
        parameter_setting=parameter_settings,
        error_message=error_messages,
        tag=ctx.tags,
        input_data={"dataset": input_datasets},
        output_data={"file": output_files, "evaluation": evaluations},
    )
```

The mapping helpers stay small and local, but hide the detail from the main endpoint:

```python
def _build_parameter_settings(parameter_rows: list["Row"]) -> list[ParameterSetting]:
    return [
        ParameterSetting(
            name=p["name"],
            value=p["value"],
            component=p["flow_id"],
        )
        for p in parameter_rows
    ]

def _build_input_datasets(rows: list["Row"]) -> list[InputDataset]:
    return [InputDataset(did=row.did, name=row.name, url=row.url) for row in rows]

def _build_output_files(rows: list["Row"]) -> list[OutputFile]:
    return [OutputFile(file_id=row.file_id, name=row.field) for row in rows]

def _build_evaluations(rows: list["Row"]) -> list[EvaluationScore]:
    def _normalise_value(v: object) -> object:
        return int(v) if isinstance(v, float) and v.is_integer() else v

    return [
        EvaluationScore(
            name=row.name,
            value=_normalise_value(row.value),
            array_data=row.array_data,
        )
        for row in rows
    ]
```

These extractions keep all current behaviour (including the integer conversion, error message list, and envelope shapes) but make `get_run` much easier to read and change later.
</issue_to_address>

### Comment 4
<location path="src/schemas/runs.py" line_range="108" />
<code_context>
+    # At the Python level we keep the name error_message for clarity.
+    error_message: list[str] = Field(serialization_alias="error")  # [] when NULL in DB
+    tag: list[str]
+    input_data: dict[str, list[InputDataset]]  # {"dataset": [...]}
+    output_data: dict[
+        str,
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the dict-typed input_data/output_data fields and union with small wrapper models and, optionally, simplifying the error_message aliasing to make the schema easier to understand and use.

You can reduce the mental overhead around `input_data` / `output_data` and the union by introducing small wrapper models while preserving the exact JSON shape.

### 1. Replace `dict[str, ...]` with explicit wrapper models

Current:

```python
class Run(BaseModel):
    ...
    input_data: dict[str, list[InputDataset]]  # {"dataset": [...]}
    output_data: dict[
        str,
        list[OutputFile | EvaluationScore],
    ]  # {"file": [...], "evaluation": [...]}
```

Suggested:

```python
class InputData(BaseModel):
    dataset: list[InputDataset]


class OutputData(BaseModel):
    file: list[OutputFile]
    evaluation: list[EvaluationScore]


class Run(BaseModel):
    ...
    input_data: InputData
    output_data: OutputData
```

Pydantic will still serialize to:

```json
{
  "input_data": {
    "dataset": [ ... ]
  },
  "output_data": {
    "file": [ ... ],
    "evaluation": [ ... ]
  }
}
```

This removes:

- The need to remember magic keys (`"dataset"`, `"file"`, `"evaluation"`).
- The `OutputFile | EvaluationScore` union, since each field has a single concrete type.

Callers now get proper IDE/type support (`run.output_data.file`, `run.output_data.evaluation`) and clearer validation/error messages.

### 2. Optional: simplify the `error_message` aliasing

If you don’t need to *accept* `"error"` on input (only *emit* it on output), you can avoid `populate_by_name=True` + `serialization_alias` and let the response layer opt into alias usage:

```python
class Run(BaseModel):
    ...
    error_message: list[str] = Field(alias="error")
```

Then, when returning responses:

```python
return Run(...).model_dump(by_alias=True)
```

This:

- Keeps the Python attribute as `error_message`.
- Uses `"error"` in the JSON.
- Avoids model-level `populate_by_name=True` unless you rely on it elsewhere.

If you do need both `"error"` and `error_message` as input, keeping `serialization_alias` is fine, but you could still isolate that pattern in a tiny boundary schema (e.g. `RunResponse`) that just wraps `Run` and handles aliases.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-21T13:06:12Z

+            value=int(row.value)
+            if isinstance(row.value, float) and row.value.is_integer()
+            else row.value,


suggestion: Normalizing integer-like metric values only for floats may miss Decimal or string-encoded numerics.

This handles only float values; if the driver returns Decimal (or string numerics), whole-number values won’t be coerced to int and will differ from the PHP representation. Consider centralising this in a helper that casts any numeric with a zero fractional part to int (and maybe handles Decimal explicitly) if parity with PHP is required.

sourcery-ai · 2026-04-21T13:06:12Z

+async def test_get_run_top_level_shape(py_api: httpx.AsyncClient) -> None:
+    """Response contains all expected top-level keys."""
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    run = response.json()
+    expected_keys = {
+        "run_id",
+        "uploader",
+        "uploader_name",
+        "task_id",
+        "task_type",


suggestion (testing): Add an explicit test that task_evaluation_measure is omitted from the JSON when null/empty.

The code relies on task_evaluation_measure being normalized to None and excluded via response_model_exclude_none=True, but we only assert that expected keys are present. Please add a test (e.g., using task 115) that explicitly checks "task_evaluation_measure" not in response.json() when no measure is configured, to lock in this behavior and guard against future schema or exclusion changes.

Suggested implementation:

"""GET and POST /run/{id} return identical JSON bodies.""" get_resp, post_resp = await asyncio.gather( py_api.get(f"/run/{_RUN_ID}"), py_api.post(f"/run/{_RUN_ID}"), ) assert get_resp.status_code == HTTPStatus.OK assert post_resp.status_code == HTTPStatus.OK assert get_resp.json() == post_resp.json() async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None: """task_evaluation_measure is not present in JSON when no measure is configured for the task.""" response = await py_api.get(f"/run/{_RUN_ID_NO_EVAL_MEASURE}") run = response.json() # The field should be omitted entirely when normalized to None and excluded via response_model_exclude_none=True assert "task_evaluation_measure" not in run import asyncio

Define _RUN_ID_NO_EVAL_MEASURE in this test module (or reuse an existing constant) and set it to the ID of a run whose task has no evaluation measure configured, for example a run on task 115 as noted in your comment.

Ensure the fixture or test data you use for _RUN_ID_NO_EVAL_MEASURE reflects a task with task_evaluation_measure normalized to None, so the test correctly locks in the exclusion behavior.

sourcery-ai · 2026-04-21T13:06:12Z

+    response_model_exclude_none=True,
+)
+@router.get("/{run_id}", response_model_exclude_none=True)
+async def get_run(


issue (complexity): Consider extracting the async context loading and result-mapping logic from get_run into small helper functions to make the endpoint body more linear and readable.

You can reduce the cognitive load of get_run by extracting the “context loading” and the “mapping” into focused helpers. This keeps behaviour identical but removes the big gather tuple and long inline mapping blocks.

For example, pull the asyncio.gather and associated types into a small helper:

from dataclasses import dataclass from typing import TYPE_CHECKING if TYPE_CHECKING: from sqlalchemy import Row from sqlalchemy.ext.asyncio import AsyncConnection @dataclass class RunContext: uploader_name: str | None tags: list[str] input_data_rows: list[Row] output_file_rows: list[Row] evaluation_rows: list[Row] task_type: str | None task_evaluation_measure: str | None setup: Row | None parameter_rows: list[Row] async def _load_run_context( run: "Row", run_id: int, expdb: AsyncConnection, userdb: AsyncConnection, engine_ids: list[int], ) -> RunContext: ( uploader_name, tags, input_data_rows, output_file_rows, evaluation_rows, task_type, task_evaluation_measure, setup, parameter_rows, ) = await asyncio.gather( database.runs.get_uploader_name(run.uploader, userdb), database.runs.get_tags(run_id, expdb), database.runs.get_input_data(run_id, expdb), database.runs.get_output_files(run_id, expdb), database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids), database.runs.get_task_type(run.task_id, expdb), database.runs.get_task_evaluation_measure(run.task_id, expdb), database.setups.get(run.setup, expdb), database.setups.get_parameters(run.setup, expdb), ) return RunContext( uploader_name=uploader_name, tags=tags, input_data_rows=input_data_rows, output_file_rows=output_file_rows, evaluation_rows=evaluation_rows, task_type=task_type, task_evaluation_measure=task_evaluation_measure, setup=setup, parameter_rows=parameter_rows, )

Then the endpoint body can become more linear and self-describing:

async def get_run(...): run = await database.runs.get(run_id, expdb) if run is None: raise RunNotFoundError(f"Run {run_id} not found.", code=236) engine_ids: list[int] = config.load_run_configuration().get( "evaluation_engine_ids", [1] ) ctx = await _load_run_context(run, run_id, expdb, userdb, engine_ids) flow = await database.flows.get(ctx.setup.implementation_id, expdb) if ctx.setup else None parameter_settings = _build_parameter_settings(ctx.parameter_rows) input_datasets = _build_input_datasets(ctx.input_data_rows) output_files = _build_output_files(ctx.output_file_rows) evaluations = _build_evaluations(ctx.evaluation_rows) normalised_measure = ctx.task_evaluation_measure or None error_messages = [run.error_message] if run.error_message else [] return Run( run_id=run_id, uploader=run.uploader, uploader_name=ctx.uploader_name, task_id=run.task_id, task_type=ctx.task_type, task_evaluation_measure=normalised_measure, flow_id=ctx.setup.implementation_id if ctx.setup else 0, flow_name=flow.full_name if flow else None, setup_id=run.setup, setup_string=ctx.setup.setup_string if ctx.setup else None, parameter_setting=parameter_settings, error_message=error_messages, tag=ctx.tags, input_data={"dataset": input_datasets}, output_data={"file": output_files, "evaluation": evaluations}, )

The mapping helpers stay small and local, but hide the detail from the main endpoint:

def _build_parameter_settings(parameter_rows: list["Row"]) -> list[ParameterSetting]: return [ ParameterSetting( name=p["name"], value=p["value"], component=p["flow_id"], ) for p in parameter_rows ] def _build_input_datasets(rows: list["Row"]) -> list[InputDataset]: return [InputDataset(did=row.did, name=row.name, url=row.url) for row in rows] def _build_output_files(rows: list["Row"]) -> list[OutputFile]: return [OutputFile(file_id=row.file_id, name=row.field) for row in rows] def _build_evaluations(rows: list["Row"]) -> list[EvaluationScore]: def _normalise_value(v: object) -> object: return int(v) if isinstance(v, float) and v.is_integer() else v return [ EvaluationScore( name=row.name, value=_normalise_value(row.value), array_data=row.array_data, ) for row in rows ]

These extractions keep all current behaviour (including the integer conversion, error message list, and envelope shapes) but make get_run much easier to read and change later.

sourcery-ai · 2026-04-21T13:06:12Z

+    # At the Python level we keep the name error_message for clarity.
+    error_message: list[str] = Field(serialization_alias="error")  # [] when NULL in DB
+    tag: list[str]
+    input_data: dict[str, list[InputDataset]]  # {"dataset": [...]}


issue (complexity): Consider replacing the dict-typed input_data/output_data fields and union with small wrapper models and, optionally, simplifying the error_message aliasing to make the schema easier to understand and use.

You can reduce the mental overhead around input_data / output_data and the union by introducing small wrapper models while preserving the exact JSON shape.

1. Replace dict[str, ...] with explicit wrapper models

Current:

class Run(BaseModel): ... input_data: dict[str, list[InputDataset]] # {"dataset": [...]} output_data: dict[ str, list[OutputFile | EvaluationScore], ] # {"file": [...], "evaluation": [...]}

Suggested:

class InputData(BaseModel): dataset: list[InputDataset] class OutputData(BaseModel): file: list[OutputFile] evaluation: list[EvaluationScore] class Run(BaseModel): ... input_data: InputData output_data: OutputData

Pydantic will still serialize to:

{ "input_data": { "dataset": [ ... ] }, "output_data": { "file": [ ... ], "evaluation": [ ... ] } }

This removes:

The need to remember magic keys ("dataset", "file", "evaluation").

The OutputFile | EvaluationScore union, since each field has a single concrete type.

Callers now get proper IDE/type support (run.output_data.file, run.output_data.evaluation) and clearer validation/error messages.

2. Optional: simplify the error_message aliasing

If you don’t need to accept "error" on input (only emit it on output), you can avoid populate_by_name=True + serialization_alias and let the response layer opt into alias usage:

class Run(BaseModel): ... error_message: list[str] = Field(alias="error")

Then, when returning responses:

return Run(...).model_dump(by_alias=True)

This:

Keeps the Python attribute as error_message.

Uses "error" in the JSON.

Avoids model-level populate_by_name=True unless you rely on it elsewhere.

If you do need both "error" and error_message as input, keeping serialization_alias is fine, but you could still isolate that pattern in a tiny boundary schema (e.g. RunResponse) that just wraps Run and handles aliases.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tests/routers/openml/runs_get_test.py (1)

322-346: Consider asserting the PHP error envelope shape before indexing.

Line 343 indexes php_response.json()["error"]["code"] unconditionally in the non-OK branch. If PHP ever returns a different body (timeout, HTML, alternate error shape), the test fails with an opaque KeyError/TypeError rather than a clear parity diff. A small guard or .get(...) chain would make regressions easier to triage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/runs_get_test.py` around lines 322 - 346, In
test_get_run_equal update the non-OK branch in test_get_run_equal to first
validate the PHP error envelope before indexing
php_response.json()["error"]["code"]; e.g., assert that php_response.json() is a
dict and contains "error" mapping with a "code" key (or use safe .get chaining
and explicit assertions) so that accessing php_response.json()["error"]["code"]
cannot raise KeyError/TypeError and failures produce a clear assertion message
comparing php_code and py_code (refer to php_response, py_response, and
test_get_run_equal to locate the change).

src/database/runs.py (1)

138-158: Use bindparam(..., expanding=True) for consistency with the rest of the codebase.

The codebase already uses SQLAlchemy's bindparam(expanding=True) successfully in src/routers/openml/tasks.py and datasets.py with aiomysql (verified by passing tests like test_list_tasks_api_happy_path and test_list_tasks_filter_task_id). Adopting this pattern here eliminates manual string interpolation and the security noqa comment while maintaining compatibility.

♻️ Suggested refactor

-    if not evaluation_engine_ids:
-        return []
-
-    # Build :eid_0, :eid_1, ... placeholders — one per engine ID.
-    eid_params = {f"eid_{i}": eid for i, eid in enumerate(evaluation_engine_ids)}
-    placeholders = ", ".join(f":eid_{i}" for i in range(len(evaluation_engine_ids)))
-
-    query = text(
-        f"""
-        SELECT `m`.`name`, `e`.`value`, `e`.`array_data`
-        FROM `evaluation` `e`
-        JOIN `math_function` `m` ON `e`.`function_id` = `m`.`id`
-        WHERE `e`.`source` = :run_id
-          AND `e`.`evaluation_engine_id` IN ({placeholders})
-        """,  # noqa: S608  # placeholders are trusted integer params, not user input
-    )
-    rows = await expdb.execute(
-        query,
-        parameters={"run_id": run_id, **eid_params},
-    )
+    if not evaluation_engine_ids:
+        return []
+
+    query = text(
+        """
+        SELECT `m`.`name`, `e`.`value`, `e`.`array_data`
+        FROM `evaluation` `e`
+        JOIN `math_function` `m` ON `e`.`function_id` = `m`.`id`
+        WHERE `e`.`source` = :run_id
+          AND `e`.`evaluation_engine_id` IN :engine_ids
+        """,
+    ).bindparams(bindparam("engine_ids", expanding=True))
+    rows = await expdb.execute(
+        query,
+        parameters={"run_id": run_id, "engine_ids": evaluation_engine_ids},
+    )

(Requires from sqlalchemy import bindparam.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/database/runs.py` around lines 138 - 158, Replace the manual per-ID
placeholders (eid_params/placeholders) with SQLAlchemy's expanding bind param:
import bindparam and change the WHERE clause to use IN (:eids) (or similar name)
combined with bindparam("eids", expanding=True), then pass the list
evaluation_engine_ids as the "eids" parameter in the expdb.execute call
(alongside run_id); remove the handcrafted eid_params, the placeholders string,
and the S608 noqa comment so the query uses bindparam(expanding=True)
consistently with the rest of the codebase (refer to symbols
evaluation_engine_ids, eid_params, placeholders, query, and expdb.execute).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/database/runs.py`:
- Around line 50-61: The SELECT using CONCAT in the query executed via
userdb.execute can return NULL if either first_name or last_name is NULL; change
the SQL to use CONCAT_WS(' ', `first_name`, `last_name`) (or wrap columns with
COALESCE) so a single populated name still yields a non-NULL `name` for the
uploader_id; update the query string in the function that builds the text(...)
passed to userdb.execute (the block referencing parameters={"uploader_id":
uploader_id}) accordingly.

In `@src/routers/openml/runs.py`:
- Around line 161-178: The Run constructor currently forces
flow_id=setup.implementation_id if setup else 0 which emits a synthetic 0;
change it to pass None when setup is missing (i.e.
flow_id=setup.implementation_id if setup else None and setup_id=run.setup if
run.setup is not None else None) so the schema-typed int | None can omit fields
with response_model_exclude_none=True; additionally, avoid unnecessary DB calls
by short-circuiting the gather that calls database.setups.* when run.setup is
None (skip the two database.setups.* calls in the concurrent gather when
run.setup is falsy).

In `@src/schemas/runs.py`:
- Around line 99-102: Change the schema fields flow_id and setup_id in the Runs
schema to be Optional[int] (int | None) instead of plain int so callers can pass
None instead of a magic 0; then update the OpenML runs router to stop using the
sentinel 0 and pass None when missing (replace flow_id=setup.implementation_id
if setup else 0 with flow_id=setup.implementation_id if setup else None, and
ensure setup_id is passed as None when run.setup is null). Also scan usages of
Runs/flow_id/setup_id to remove any logic that treats 0 as "missing" and rely on
None checks instead.

---

Nitpick comments:
In `@src/database/runs.py`:
- Around line 138-158: Replace the manual per-ID placeholders
(eid_params/placeholders) with SQLAlchemy's expanding bind param: import
bindparam and change the WHERE clause to use IN (:eids) (or similar name)
combined with bindparam("eids", expanding=True), then pass the list
evaluation_engine_ids as the "eids" parameter in the expdb.execute call
(alongside run_id); remove the handcrafted eid_params, the placeholders string,
and the S608 noqa comment so the query uses bindparam(expanding=True)
consistently with the rest of the codebase (refer to symbols
evaluation_engine_ids, eid_params, placeholders, query, and expdb.execute).

In `@tests/routers/openml/runs_get_test.py`:
- Around line 322-346: In test_get_run_equal update the non-OK branch in
test_get_run_equal to first validate the PHP error envelope before indexing
php_response.json()["error"]["code"]; e.g., assert that php_response.json() is a
dict and contains "error" mapping with a "code" key (or use safe .get chaining
and explicit assertions) so that accessing php_response.json()["error"]["code"]
cannot raise KeyError/TypeError and failures produce a clear assertion message
comparing php_code and py_code (refer to php_response, py_response, and
test_get_run_equal to locate the change).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3b7d0bda-8760-40a3-972a-b4add43d2eef

📥 Commits

Reviewing files that changed from the base of the PR and between 621e1c3 and 1685dec.

📒 Files selected for processing (7)

src/config.py
src/config.toml
src/database/flows.py
src/database/runs.py
src/routers/openml/runs.py
src/schemas/runs.py
tests/routers/openml/runs_get_test.py

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In database/runs.get_evaluations you use bindparam when building the query, but it is not imported in this module, so add the appropriate from sqlalchemy import bindparam import to avoid runtime errors.
load_run_configuration re-reads the config file on every call, unlike the other cached config loaders; consider decorating it with @functools.cache for consistency and to avoid repeated disk I/O.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `database/runs.get_evaluations` you use `bindparam` when building the query, but it is not imported in this module, so add the appropriate `from sqlalchemy import bindparam` import to avoid runtime errors.
- `load_run_configuration` re-reads the config file on every call, unlike the other cached config loaders; consider decorating it with `@functools.cache` for consistency and to avoid repeated disk I/O.

## Individual Comments

### Comment 1
<location path="tests/routers/openml/runs_get_test.py" line_range="196-201" />
<code_context>
+    assert str(error.get("code")) == _RUN_NOT_FOUND_CODE
+
+
+async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
+    """task_evaluation_measure is not present in JSON when no measure is configured."""
+    # Run 24 is known to not have a task evaluation measure (verified in db test)
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    run = response.json()
+    assert "task_evaluation_measure" not in run
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a complementary test for runs where task_evaluation_measure is present and propagated.

To exercise both branches of `get_task_evaluation_measure` and the response serialization, please also add a test using a run with an evaluation measure configured in `task_inputs`, asserting that `task_evaluation_measure` is present in the JSON and matches the DB value.

Suggested implementation:

```python
async def test_get_run_not_found(py_api: httpx.AsyncClient) -> None:
    """Non-existent run returns 404 with error code 236 (PHP compat)."""
    response = await py_api.get(f"/run/{_MISSING_RUN_ID}")
    assert response.status_code == HTTPStatus.NOT_FOUND
    error = response.json()
    # Verify PHP-compat error code
    assert str(error.get("code")) == _RUN_NOT_FOUND_CODE


async def test_task_evaluation_measure_omitted_when_null(
    py_api: httpx.AsyncClient,
) -> None:
    """task_evaluation_measure is not present in JSON when no measure is configured."""
    # This run is known to not have a task evaluation measure (verified in db test)
    response = await py_api.get(f"/run/{_RUN_ID_WITHOUT_MEASURE}")
    run = response.json()
    assert "task_evaluation_measure" not in run


async def test_task_evaluation_measure_present_when_configured(
    py_api: httpx.AsyncClient,
) -> None:
    """task_evaluation_measure is present and matches DB when a measure is configured."""
    # This run is known to have a task evaluation measure configured in task_inputs
    response = await py_api.get(f"/run/{_RUN_ID_WITH_MEASURE}")
    assert response.status_code == HTTPStatus.OK

    run = response.json()
    assert "task_evaluation_measure" in run
    assert run["task_evaluation_measure"] == _EXPECTED_TASK_EVALUATION_MEASURE


import asyncio
from http import HTTPStatus
from typing import Any

import deepdiff
import httpx
import pytest
from sqlalchemy.ext.asyncio import AsyncConnection

import database.runs

```

To fully wire this up you will also need to:
1. Define `_RUN_ID_WITHOUT_MEASURE` (replacing the previous `_RUN_ID` used for the "omitted" test if necessary) to be a run id without a configured task evaluation measure, consistent with your test DB seed (comment previously mentioned “Run 24”).
2. Define `_RUN_ID_WITH_MEASURE` as a run id that has a `task_evaluation_measure` configured in `task_inputs` in your test database.
3. Define `_EXPECTED_TASK_EVALUATION_MEASURE` to match the expected serialized JSON value of the measure from the DB for `_RUN_ID_WITH_MEASURE` (e.g. a string like `"predictive_accuracy"` or a dict, depending on how it is stored/serialized).
4. Ensure these constants are placed alongside your other module-level test constants (e.g. near `_MISSING_RUN_ID`, `_RUN_NOT_FOUND_CODE`, etc.) so they can be reused across tests.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-22T05:59:02Z

+async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
+    """task_evaluation_measure is not present in JSON when no measure is configured."""
+    # Run 24 is known to not have a task evaluation measure (verified in db test)
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    run = response.json()
+    assert "task_evaluation_measure" not in run


suggestion (testing): Add a complementary test for runs where task_evaluation_measure is present and propagated.

To exercise both branches of get_task_evaluation_measure and the response serialization, please also add a test using a run with an evaluation measure configured in task_inputs, asserting that task_evaluation_measure is present in the JSON and matches the DB value.

Suggested implementation:

async def test_get_run_not_found(py_api: httpx.AsyncClient) -> None: """Non-existent run returns 404 with error code 236 (PHP compat).""" response = await py_api.get(f"/run/{_MISSING_RUN_ID}") assert response.status_code == HTTPStatus.NOT_FOUND error = response.json() # Verify PHP-compat error code assert str(error.get("code")) == _RUN_NOT_FOUND_CODE async def test_task_evaluation_measure_omitted_when_null( py_api: httpx.AsyncClient, ) -> None: """task_evaluation_measure is not present in JSON when no measure is configured.""" # This run is known to not have a task evaluation measure (verified in db test) response = await py_api.get(f"/run/{_RUN_ID_WITHOUT_MEASURE}") run = response.json() assert "task_evaluation_measure" not in run async def test_task_evaluation_measure_present_when_configured( py_api: httpx.AsyncClient, ) -> None: """task_evaluation_measure is present and matches DB when a measure is configured.""" # This run is known to have a task evaluation measure configured in task_inputs response = await py_api.get(f"/run/{_RUN_ID_WITH_MEASURE}") assert response.status_code == HTTPStatus.OK run = response.json() assert "task_evaluation_measure" in run assert run["task_evaluation_measure"] == _EXPECTED_TASK_EVALUATION_MEASURE import asyncio from http import HTTPStatus from typing import Any import deepdiff import httpx import pytest from sqlalchemy.ext.asyncio import AsyncConnection import database.runs

To fully wire this up you will also need to:

Define _RUN_ID_WITHOUT_MEASURE (replacing the previous _RUN_ID used for the "omitted" test if necessary) to be a run id without a configured task evaluation measure, consistent with your test DB seed (comment previously mentioned “Run 24”).

Define _RUN_ID_WITH_MEASURE as a run id that has a task_evaluation_measure configured in task_inputs in your test database.

Define _EXPECTED_TASK_EVALUATION_MEASURE to match the expected serialized JSON value of the measure from the DB for _RUN_ID_WITH_MEASURE (e.g. a string like "predictive_accuracy" or a dict, depending on how it is stored/serialized).

Ensure these constants are placed alongside your other module-level test constants (e.g. near _MISSING_RUN_ID, _RUN_NOT_FOUND_CODE, etc.) so they can be reused across tests.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/runs.py`:
- Around line 100-109: The asyncio.gather is launching multiple database.* calls
that all share the same AsyncConnection object (expdb), which is unsafe; change
the code so calls that use the shared expdb run sequentially or each acquires
its own connection from the engine. Concretely, replace the concurrent gather of
database.runs.get_uploader_name, get_tags, get_input_data, get_output_files,
get_evaluations (with evaluation_engine_ids=engine_ids), get_task_type,
get_task_evaluation_measure, database.setups.get, and
database.setups.get_parameters — all currently passed the shared expdb — with
either (a) awaiting them one-by-one using await database.runs.get_tags(...,
expdb) etc., or (b) modify the call sites to obtain a fresh
connection/transaction for each concurrent task (e.g., acquire a new async
connection from the engine and pass that connection instead of expdb) so no two
coroutines share the same AsyncConnection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4bc0f947-3947-4e10-bc3d-a819bccf4079

📥 Commits

Reviewing files that changed from the base of the PR and between 1685dec and 4ca818c.

📒 Files selected for processing (4)

src/database/runs.py
src/routers/openml/runs.py
src/schemas/runs.py
tests/routers/openml/runs_get_test.py

🚧 Files skipped from review as they are similar to previous changes (2)

src/database/runs.py
tests/routers/openml/runs_get_test.py

coderabbitai · 2026-04-22T06:03:13Z

+        await asyncio.gather(
+            database.runs.get_uploader_name(run.uploader, userdb),
+            database.runs.get_tags(run_id, expdb),
+            database.runs.get_input_data(run_id, expdb),
+            database.runs.get_output_files(run_id, expdb),
+            database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
+            database.runs.get_task_type(run.task_id, expdb),
+            database.runs.get_task_evaluation_measure(run.task_id, expdb),
+            database.setups.get(run.setup, expdb),
+            database.setups.get_parameters(run.setup, expdb),


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f "runs.py" | head -20

Repository: openml/server-api

Length of output: 128

🏁 Script executed:

wc -l src/routers/openml/runs.py

Repository: openml/server-api

Length of output: 91

🏁 Script executed:

sed -n '90,120p' src/routers/openml/runs.py

Repository: openml/server-api

Length of output: 1304

🏁 Script executed:

# Check the full function context and initialization sed -n '70,170p' src/routers/openml/runs.py

Repository: openml/server-api

Length of output: 3129

🌐 Web query:

SQLAlchemy AsyncConnection concurrent tasks safety async/await

💡 Result:

No, a single AsyncConnection instance is not safe to share across concurrent asyncio tasks. Like AsyncSession, AsyncConnection is stateful (e.g., holds a specific database connection and transaction state from the pool) and designed for use within a single task/coroutine. Sharing it across concurrent tasks (e.g., via asyncio.gather) leads to undefined behavior, potential errors detecting concurrent access (especially in SQLAlchemy 2.0+), and race conditions. The AsyncEngine, however, is safe to share across concurrent tasks. It manages an asyncio-compatible connection pool (using asyncio.Queue) that allows multiple tasks to concurrently acquire their own AsyncConnection instances via await engine.connect. Each task should create and use its own AsyncConnection within a context manager: async with engine.connect as conn: result = await conn.execute(stmt) This pattern supports high concurrency without blocking the event loop, as the pool handles connection checkout/checkin asynchronously. Official docs emphasize separate AsyncSession per task but the same principle applies to core AsyncConnection usage. GitHub discussions confirm sessions/connections cannot be shared concurrently, recommending per-task creation. Sync Connection is thread-safe when using Engine.pool, but async follows task-local pattern. Example for concurrent tasks: async def task(engine): async with engine.connect as conn: await conn.execute(...) async def main: await asyncio.gather(*(task(engine) for _ in range(10)))

Citations:

1: https://www.sqlalchemy.org/docs/21/orm/extensions/asyncio.html

2: https://www.sqlalchemy.org/docs/07/orm/extensions/asyncio.html

3: http://docs.sqlalchemy.org/en/latest/orm/extensions/asyncio.html

4: https://docs.sqlalchemy.org/en/latest/orm/extensions/asyncio.html

5: how to use multiple async sessions with gather (answer: use TaskGroup, not gather) sqlalchemy/sqlalchemy#9312

6: Handling Concurrency while sharing the session sqlalchemy/sqlalchemy#8554

7: https://stackoverflow.com/questions/42160820/is-it-thread-safe-to-use-sqlalchemy-with-engine-connections-instead-of-sessions

🏁 Script executed:

# Check if AsyncSession is used anywhere in the codebase rg "AsyncSession" src/ --type py

Repository: openml/server-api

Length of output: 43

🏁 Script executed:

# Check database module to understand connection/session setup fd -t f "database" -type d | head -5

Repository: openml/server-api

Length of output: 231

🏁 Script executed:

# Look at database/__init__.py or similar for connection patterns ls -la src/database/ | head -20

Repository: openml/server-api

Length of output: 828

🏁 Script executed:

# Search for other asyncio.gather patterns in the codebase rg "asyncio\.gather" src/ -A 5 -B 2

Repository: openml/server-api

Length of output: 6116

🏁 Script executed:

# Check how AsyncConnection is created/provided rg "AsyncConnection" src/ --type py -B 2 -A 2

Repository: openml/server-api

Length of output: 33637

🏁 Script executed:

# Check the database module initialization head -50 src/database/__init__.py

Repository: openml/server-api

Length of output: 109

🏁 Script executed:

# Look at how expdb is obtained in the router grep -n "expdb" src/routers/openml/runs.py | head -20

Repository: openml/server-api

Length of output: 1193

Don't run concurrent queries on the same AsyncConnection.

Eight of the nine asyncio.gather tasks share expdb, causing overlapping statements on one SQLAlchemy connection. AsyncConnection is stateful and designed for sequential use only. Sharing it across concurrent tasks causes undefined behavior and race conditions. Per SQLAlchemy docs: https://docs.sqlalchemy.org/en/20/orm/session_basics.html#is-the-session-thread-safe-is-asyncsession-safe-to-share-in-concurrent-tasks

Use sequential queries on the shared connection, or acquire separate connections per concurrent task from the engine.

Sequential execution for shared connection

- await asyncio.gather( - database.runs.get_uploader_name(run.uploader, userdb), - database.runs.get_tags(run_id, expdb), - database.runs.get_input_data(run_id, expdb), - database.runs.get_output_files(run_id, expdb), - database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids), - database.runs.get_task_type(run.task_id, expdb), - database.runs.get_task_evaluation_measure(run.task_id, expdb), - database.setups.get(run.setup, expdb), - database.setups.get_parameters(run.setup, expdb), - ), + ( + await database.runs.get_uploader_name(run.uploader, userdb), + await database.runs.get_tags(run_id, expdb), + await database.runs.get_input_data(run_id, expdb), + await database.runs.get_output_files(run_id, expdb), + await database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids), + await database.runs.get_task_type(run.task_id, expdb), + await database.runs.get_task_evaluation_measure(run.task_id, expdb), + await database.setups.get(run.setup, expdb), + await database.setups.get_parameters(run.setup, expdb), + ),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routers/openml/runs.py` around lines 100 - 109, The asyncio.gather is launching multiple database.* calls that all share the same AsyncConnection object (expdb), which is unsafe; change the code so calls that use the shared expdb run sequentially or each acquires its own connection from the engine. Concretely, replace the concurrent gather of database.runs.get_uploader_name, get_tags, get_input_data, get_output_files, get_evaluations (with evaluation_engine_ids=engine_ids), get_task_type, get_task_evaluation_measure, database.setups.get, and database.setups.get_parameters — all currently passed the shared expdb — with either (a) awaiting them one-by-one using await database.runs.get_tags(..., expdb) etc., or (b) modify the call sites to obtain a fresh connection/transaction for each concurrent task (e.g., acquire a new async connection from the engine and pass that connection instead of expdb) so no two coroutines share the same AsyncConnection.

In theory valid point but for our usecase all operations are read so no such issue arises is what i feel and it is consistent with codebase (https://github.com/openml/server-api/blob/main/src/routers/openml/datasets.py) for reference.

another point to note is that we can use connection pool(sourcery-ai suggested) but i feel it is too much overhead and complex for our usecase.

@saathviksheerla, thank you for the context and the pointer to datasets.py.

A few clarifications to sharpen the discussion:

The SQLAlchemy warning covers AsyncConnection too, not just AsyncSession. The asyncpg database connection itself, acquired from SQLAlchemy's pool, is not concurrency safe — using it in two concurrent awaitables produces an error. The same principle extends to AsyncConnection since it wraps the underlying DBAPI connection.

Read vs. write does not make concurrent use safe. SQLAlchemy 2.0 introduced proactive detection of concurrent methods invoked on a single Session/AsyncSession — these typically occur when a single instance is shared among concurrent tasks (e.g., asyncio.gather()). Without this warning system, the pattern still produces invalid state within the objects, including driver-level errors on the database connections themselves. The state machine in the connection object does not distinguish reads from writes.

The practical risk depends on the driver. aiomysql (which this project likely uses for MySQL) may be more lenient in practice than asyncpg, which is why you may not be seeing failures locally. That said, this is relying on undocumented driver tolerance, not a supported guarantee.

Codebase consistency is a fair point. If datasets.py uses the same asyncio.gather pattern with a shared AsyncConnection and it has been stable in production, then changing only this endpoint in isolation would be inconsistent. If this is an accepted trade-off across the codebase, that's a maintainer call to make uniformly.

Given that this is a consistent established pattern in the codebase and not an isolated deviation, I'll defer to your judgment here. You may want to track this as a future codebase-wide improvement if you ever see InterfaceError: another operation is in progress in production logs.

🧠 Learnings used

Learnt from: saathviksheerla Repo: openml/server-api PR: 267 File: src/routers/openml/runs.py:0-0 Timestamp: 2026-03-08T07:45:07.969Z Learning: In the openml/server-api Python codebase, non-boolean ENUM values stored as strings coming from the database should be treated as their literal string values, not coerced via truthiness checks. Ensure that fields like row.selected (with MySQL ENUM('true','false')) are passed through as the exact string (e.g., 'true' or 'false') and validated by Pydantic using Literal['true', 'false'] at the schema boundary. Do not apply a ternary like 'true' if row.selected else 'false'. Review places where the value is transformed before response: rely on the string value and let Pydantic enforce the literal type, and adjust any serialization if necessary to avoid implicit boolean conversion.

Learnt from: saathviksheerla Repo: openml/server-api PR: 267 File: src/routers/openml/runs.py:45-51 Timestamp: 2026-03-08T07:59:51.515Z Learning: In Python code using SQLAlchemy, VARCHAR/text columns are returned as Python strings. Avoid wrapping such values with str(...) when constructing objects (e.g., TraceIteration). Treat string-type columns as strings and pass them through directly unless a transformation is actually required. This guidance applies to similar string columns across Python files in the repository (and can be considered for enum-like values that are exposed as strings).

Learnt from: saathviksheerla Repo: openml/server-api PR: 277 File: src/routers/openml/tasks.py:218-233 Timestamp: 2026-03-17T12:52:03.020Z Learning: In the openml/server-api repo, the GET+POST list endpoints reuse a single handler with both router.post and router.get and declare all filter parameters with Body(...). Do not flag Body(...) parameters on GET handlers in this repository. This pattern applies to files under src/routers/openml/ (e.g., list tasks in tasks.py and list datasets in datasets.py).

sourcery-ai

Hey - I've found 2 issues

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tests/routers/openml/runs_get_test.py" line_range="82-91" />
<code_context>
+    assert expected_keys <= run.keys(), f"Missing keys: {expected_keys - run.keys()}"
+
+
+async def test_get_run_known_values(py_api: httpx.AsyncClient) -> None:
+    """Run 24 returns the exact values confirmed against the DB."""
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    assert response.status_code == HTTPStatus.OK
+    run = response.json()
+
+    # Core identifiers
+    assert run["run_id"] == _RUN_ID
+    assert run["uploader"] == _RUN_UPLOADER_ID
+    assert run["uploader_name"] == "Cynthia Glover"
+    assert run["task_id"] == _RUN_TASK_ID
+    assert run["task_type"] == "Supervised Classification"
+    assert run["flow_id"] == _RUN_FLOW_ID
+    assert run["setup_id"] == _RUN_SETUP_ID
+    assert "Python_3.10.5" in run["setup_string"]
+
+    # Tags
+    assert "openml-python" in run["tag"]
+
+    # Error — NULL in DB -> empty list
+    assert run["error"] == []
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case for non-empty `error_message` to verify correct serialization to `error` list

Right now you only cover the case where `error_message` is NULL and serialized as an empty `error` list. Please also add a test where `error_message` is non-null and assert that the response exposes it as `error: [<message>]`, either by using a known failing run in the test DB or by patching `database.runs.get` to return a row with `error_message="Some error"`. This will help catch regressions in how `error` is serialized.

Suggested implementation:

```python
from sqlalchemy.ext.asyncio import AsyncConnection


async def test_get_run_non_empty_error(py_api: httpx.AsyncClient) -> None:
    """A run with a non-null error_message is serialized as a single-item error list."""
    response = await py_api.get(f"/run/{_RUN_WITH_ERROR_ID}")
    assert response.status_code == HTTPStatus.OK
    run = response.json()

    assert run["error"] == [_RUN_ERROR_MESSAGE]

```

To make this test work, you also need to:
1. Define `_RUN_WITH_ERROR_ID` and `_RUN_ERROR_MESSAGE` near the other `_RUN_*` constants in this test module (e.g. alongside `_RUN_ID`, `_RUN_TASK_ID`, etc.), using a run in your test DB that has a non-null `error_message`. For example:
   - `_RUN_WITH_ERROR_ID = 42`
   - `_RUN_ERROR_MESSAGE = "Some error from the backend"`
2. Ensure that, in your test database/fixtures, the run with ID `_RUN_WITH_ERROR_ID` has `error_message = _RUN_ERROR_MESSAGE` so that the API serializes it as `error: [_RUN_ERROR_MESSAGE]`.
3. If your API normalizes/changes error messages (e.g. prefixes, trimming), adjust `_RUN_ERROR_MESSAGE` and/or the assertion accordingly (for example, use `in` or `startswith` instead of exact equality).
</issue_to_address>

### Comment 2
<location path="tests/routers/openml/runs_get_test.py" line_range="413-422" />
<code_context>
+    assert not differences, f"Differences for run {run_id}: {differences}"
+
+
+def test_build_evaluations_coverage() -> None:
+    """Ensure _build_evaluations string-normalization branches are covered."""
+
+    class MockRow:
+        def __init__(self, name: str, value: object, array_data: str | None = None) -> None:
+            self.name = name
+            self.value = value
+            self.array_data = array_data
+
+    rows = [
+        MockRow("float_val", 1.0),
+        MockRow("str_float", "2.0"),
+        MockRow("str_text", "not_a_number"),
+        MockRow("unhandled_type", ["list"]),
+    ]
+    evals = _build_evaluations(rows)
+
+    values = {e.name: e.value for e in evals}
+    expected_one = 1
+    expected_two = 2
+    assert values["float_val"] == expected_one
+    assert values["str_float"] == expected_two
+    assert values["str_text"] is None
+    assert values["unhandled_type"] is None
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `_build_evaluations` coverage to include non-integer float strings

This test currently covers integer-like floats/strings, non-numeric strings, and an unsupported type. Please also add a fractional numeric string (e.g. `MockRow("str_float_fraction", "1.5")`) and/or a non-integer float (e.g. `1.5`) and assert that `_normalise_value` preserves these as floats (e.g. `values["str_float_fraction"] == 1.5`). This will verify that non-integer numeric values are not coerced to ints or nulled.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-22T10:21:40Z

+async def test_get_run_known_values(py_api: httpx.AsyncClient) -> None:
+    """Run 24 returns the exact values confirmed against the DB."""
+    response = await py_api.get(f"/run/{_RUN_ID}")
+    assert response.status_code == HTTPStatus.OK
+    run = response.json()
+
+    # Core identifiers
+    assert run["run_id"] == _RUN_ID
+    assert run["uploader"] == _RUN_UPLOADER_ID
+    assert run["uploader_name"] == "Cynthia Glover"


suggestion (testing): Add a test case for non-empty error_message to verify correct serialization to error list

Right now you only cover the case where error_message is NULL and serialized as an empty error list. Please also add a test where error_message is non-null and assert that the response exposes it as error: [<message>], either by using a known failing run in the test DB or by patching database.runs.get to return a row with error_message="Some error". This will help catch regressions in how error is serialized.

Suggested implementation:

from sqlalchemy.ext.asyncio import AsyncConnection async def test_get_run_non_empty_error(py_api: httpx.AsyncClient) -> None: """A run with a non-null error_message is serialized as a single-item error list.""" response = await py_api.get(f"/run/{_RUN_WITH_ERROR_ID}") assert response.status_code == HTTPStatus.OK run = response.json() assert run["error"] == [_RUN_ERROR_MESSAGE]

To make this test work, you also need to:

Define _RUN_WITH_ERROR_ID and _RUN_ERROR_MESSAGE near the other _RUN_* constants in this test module (e.g. alongside _RUN_ID, _RUN_TASK_ID, etc.), using a run in your test DB that has a non-null error_message. For example:

_RUN_WITH_ERROR_ID = 42

_RUN_ERROR_MESSAGE = "Some error from the backend"

Ensure that, in your test database/fixtures, the run with ID _RUN_WITH_ERROR_ID has error_message = _RUN_ERROR_MESSAGE so that the API serializes it as error: [_RUN_ERROR_MESSAGE].

If your API normalizes/changes error messages (e.g. prefixes, trimming), adjust _RUN_ERROR_MESSAGE and/or the assertion accordingly (for example, use in or startswith instead of exact equality).

sourcery-ai · 2026-04-22T10:21:40Z

+def test_build_evaluations_coverage() -> None:
+    """Ensure _build_evaluations string-normalization branches are covered."""
+
+    class MockRow:
+        def __init__(self, name: str, value: object, array_data: str | None = None) -> None:
+            self.name = name
+            self.value = value
+            self.array_data = array_data
+
+    rows = [


suggestion (testing): Extend _build_evaluations coverage to include non-integer float strings

This test currently covers integer-like floats/strings, non-numeric strings, and an unsupported type. Please also add a fractional numeric string (e.g. MockRow("str_float_fraction", "1.5")) and/or a non-integer float (e.g. 1.5) and assert that _normalise_value preserves these as floats (e.g. values["str_float_fraction"] == 1.5). This will verify that non-integer numeric values are not coerced to ints or nulled.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/routers/openml/runs_get_test.py (3)
297-306: Use the existing _RUN_TASK_ID constant.

Both DB tests hardcode task id 115, which is already defined as _RUN_TASK_ID at line 23. Using the constant keeps the fixture data in one place.
🧹 Suggested cleanup
 async def test_db_get_task_type(expdb_test: AsyncConnection) -> None:
     """database.runs.get_task_type returns 'Supervised Classification' for task 115."""
-    task_type = await database.runs.get_task_type(115, expdb_test)
+    task_type = await database.runs.get_task_type(_RUN_TASK_ID, expdb_test)
     assert task_type == "Supervised Classification"


 async def test_db_get_task_evaluation_measure_missing(expdb_test: AsyncConnection) -> None:
     """get_task_evaluation_measure returns None (not '') when absent."""
-    measure = await database.runs.get_task_evaluation_measure(115, expdb_test)
+    measure = await database.runs.get_task_evaluation_measure(_RUN_TASK_ID, expdb_test)
     assert measure is None
Similar cleanup applies to test_db_get_uploader_name (line 311: 1159 vs _RUN_UPLOADER_ID).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/runs_get_test.py` around lines 297 - 306, Tests hardcode
task IDs instead of using the existing _RUN_TASK_ID constant; update the two
tests test_db_get_task_type and test_db_get_task_evaluation_measure_missing to
replace the literal 115 with the shared constant _RUN_TASK_ID, and similarly
replace the hardcoded uploader id in test_db_get_uploader_name with
_RUN_UPLOADER_ID so database.runs.get_task_type,
database.runs.get_task_evaluation_measure and the uploader-name test use the
canonical fixtures.
315-318: Reusing _MISSING_RUN_ID as a user ID is semantically misleading.

_MISSING_RUN_ID represents a non-existent run, not user. The value happens to work because no user with id 999_999_999 exists either, but a reader is momentarily tripped up. Consider introducing a dedicated constant, e.g. _MISSING_USER_ID = 999_999_999, or inline the literal with a comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/runs_get_test.py` around lines 315 - 318, The test
test_db_get_uploader_name_missing is using _MISSING_RUN_ID (which denotes a
nonexistent run) as a user id, which is semantically confusing; update the test
to use a dedicated missing-user identifier by adding a new constant
_MISSING_USER_ID = 999_999_999 (or replace the call with the literal 999_999_999
plus a comment) and call database.runs.get_uploader_name(_MISSING_USER_ID,
user_test) so the intent is clear; ensure the new symbol is defined near other
test constants and referenced in test_db_get_uploader_name_missing and any
related tests.
380-398: Comment vs. filter logic disagree — please clarify/fix.

The comment states PHP's aggregate row has repeat="0" and fold="0", but the filter at line 391-393 drops every entry that has a repeat or fold key. If the aggregate truly carries those keys, this filter removes it too and leaves an empty list, which should fail DeepDiff against the Python response (which keeps the aggregate). Since the tests reportedly pass, the aggregate likely lacks repeat/fold entirely — in which case the comment is misleading.

Please reconcile the comment with actual PHP output (e.g., capture one sample response and describe what distinguishes per-fold vs. aggregate entries). Also consider replacing the unreachable elif branch at line 394-398 with a simpler invariant check; the inline rationale there is self-contradictory ("PHP returns a list anyway if duplicates exist. If they don't, it's a dict.") and hard to follow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/runs_get_test.py` around lines 380 - 398, Update the
misleading comment and fix the evaluation-filter logic around
php_json["run"]["output_data"]["evaluation"]: clarify whether PHP's aggregate
row includes repeat/fold (edit the comment to match actual sample output), and
change the branch handling so that if evaluation is a list you remove only
per-fold entries (e.g., those with repeat/fold present and not equal to "0"),
and if evaluation is a dict then either wrap it as the single aggregate element
(when repeat/fold are absent or equal to "0") or set it to [] when it represents
a non-aggregate entry; replace the confusing elif block with this explicit check
and conversion to ensure the aggregate is preserved and the behavior matches the
comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/routers/openml/runs_get_test.py`:
- Around line 291-294: The test test_db_get_evaluations_empty_engine_list
currently only asserts type(list) but should assert that get_evaluations returns
an empty list; update the assertion after calling
database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) to
assert rows == [] (or len(rows) == 0) so the test verifies emptiness rather than
just type.

---

Nitpick comments:
In `@tests/routers/openml/runs_get_test.py`:
- Around line 297-306: Tests hardcode task IDs instead of using the existing
_RUN_TASK_ID constant; update the two tests test_db_get_task_type and
test_db_get_task_evaluation_measure_missing to replace the literal 115 with the
shared constant _RUN_TASK_ID, and similarly replace the hardcoded uploader id in
test_db_get_uploader_name with _RUN_UPLOADER_ID so database.runs.get_task_type,
database.runs.get_task_evaluation_measure and the uploader-name test use the
canonical fixtures.
- Around line 315-318: The test test_db_get_uploader_name_missing is using
_MISSING_RUN_ID (which denotes a nonexistent run) as a user id, which is
semantically confusing; update the test to use a dedicated missing-user
identifier by adding a new constant _MISSING_USER_ID = 999_999_999 (or replace
the call with the literal 999_999_999 plus a comment) and call
database.runs.get_uploader_name(_MISSING_USER_ID, user_test) so the intent is
clear; ensure the new symbol is defined near other test constants and referenced
in test_db_get_uploader_name_missing and any related tests.
- Around line 380-398: Update the misleading comment and fix the
evaluation-filter logic around php_json["run"]["output_data"]["evaluation"]:
clarify whether PHP's aggregate row includes repeat/fold (edit the comment to
match actual sample output), and change the branch handling so that if
evaluation is a list you remove only per-fold entries (e.g., those with
repeat/fold present and not equal to "0"), and if evaluation is a dict then
either wrap it as the single aggregate element (when repeat/fold are absent or
equal to "0") or set it to [] when it represents a non-aggregate entry; replace
the confusing elif block with this explicit check and conversion to ensure the
aggregate is preserved and the behavior matches the comment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fb7e0c97-fbbe-4d78-b109-8761ecbb61aa

📥 Commits

Reviewing files that changed from the base of the PR and between 4ca818c and 4461c63.

📒 Files selected for processing (2)

src/config.py
tests/routers/openml/runs_get_test.py

✅ Files skipped from review due to trivial changes (1)

src/config.py

coderabbitai · 2026-04-22T10:24:05Z

+async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None:
+    """get_evaluations with no engine IDs returns an empty list (not an error)."""
+    rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[])
+    assert isinstance(rows, list)


⚠️ Potential issue | 🟡 Minor

Assertion should verify the list is empty, not just that it's a list.

The docstring says "returns an empty list (not an error)", but the current assertion only checks isinstance(rows, list). The non-empty-list case already passes in the previous test, so explicitly assert emptiness here:

🔧 Suggested fix

async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None: """get_evaluations with no engine IDs returns an empty list (not an error).""" rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) - assert isinstance(rows, list) + assert rows == []

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None:

"""get_evaluations with no engine IDs returns an empty list (not an error)."""

rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[])

assert isinstance(rows, list)

async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None:

"""get_evaluations with no engine IDs returns an empty list (not an error)."""

rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[])

assert rows == []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_get_test.py` around lines 291 - 294, The test test_db_get_evaluations_empty_engine_list currently only asserts type(list) but should assert that get_evaluations returns an empty list; update the assertion after calling database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) to assert rows == [] (or len(rows) == 0) so the test verifies emptiness rather than just type.

implement get /run/{id}

1685dec

saathviksheerla marked this pull request as ready for review April 21, 2026 13:03

sourcery-ai Bot reviewed Apr 21, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/database/runs.py

Comment thread src/routers/openml/runs.py

Comment thread src/schemas/runs.py Outdated

saathviksheerla marked this pull request as draft April 21, 2026 13:16

address bot feedback, add test_evaluation test

4ca818c

saathviksheerla marked this pull request as ready for review April 22, 2026 05:57

sourcery-ai Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

saathviksheerla marked this pull request as draft April 22, 2026 06:26

add task_evaluation_measure test, address bot review

4461c63

saathviksheerla marked this pull request as ready for review April 22, 2026 10:19

sourcery-ai Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Uh oh!

Conversation

saathviksheerla commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 21, 2026

Choose a reason for hiding this comment

1. Replace dict[str, ...] with explicit wrapper models

2. Optional: simplify the error_message aliasing

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saathviksheerla Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

saathviksheerla commented Apr 21, 2026 •

edited

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading

1. Replace `dict[str, ...]` with explicit wrapper models

2. Optional: simplify the `error_message` aliasing

coderabbitai Bot Apr 22, 2026 •

edited

Loading