feat: implement GET /run/{id} endpoints (#37)#319
feat: implement GET /run/{id} endpoints (#37)#319saathviksheerla wants to merge 3 commits intoopenml:mainfrom
Conversation
WalkthroughAdds GET and POST endpoints at /run/{run_id} returning full run metadata. Introduces new DB helpers in src/database/runs.py to fetch runs, uploader name, tags, input datasets, output files, evaluations (filtered by configured evaluation_engine_ids), task type, and task evaluation measure. Adds src/config.load_run_configuration and a 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #319 +/- ##
==========================================
- Coverage 93.53% 89.37% -4.16%
==========================================
Files 67 68 +1
Lines 3109 3418 +309
Branches 221 232 +11
==========================================
+ Hits 2908 3055 +147
- Misses 143 312 +169
+ Partials 58 51 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The concurrent
asyncio.gathercalls for multiple queries all reuse the sameexpdb/userdbAsyncConnection; many async DB drivers do not permit overlapping operations on a single connection, so consider either using a pool (distinct connections) or running these queries sequentially if the driver cannot safely multiplex. - In
get_uploader_name,CONCAT(first_name, ' ', last_name)will produce awkward values when either component is NULL; usingCONCAT_WS(' ', first_name, last_name)(or equivalent COALESCE logic) would give cleaner uploader_display values and avoid trailing/leading spaces or "None"-like artifacts.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The concurrent `asyncio.gather` calls for multiple queries all reuse the same `expdb`/`userdb` `AsyncConnection`; many async DB drivers do not permit overlapping operations on a single connection, so consider either using a pool (distinct connections) or running these queries sequentially if the driver cannot safely multiplex.
- In `get_uploader_name`, `CONCAT(first_name, ' ', last_name)` will produce awkward values when either component is NULL; using `CONCAT_WS(' ', first_name, last_name)` (or equivalent COALESCE logic) would give cleaner uploader_display values and avoid trailing/leading spaces or "None"-like artifacts.
## Individual Comments
### Comment 1
<location path="src/routers/openml/runs.py" line_range="144-146" />
<code_context>
+ name=row.name,
+ # Whole-number floats (e.g. counts) are converted to int to match PHP's
+ # integer representation. e.g. 253.0 → 253, 0.0 → 0.
+ value=int(row.value)
+ if isinstance(row.value, float) and row.value.is_integer()
+ else row.value,
+ array_data=row.array_data,
+ )
</code_context>
<issue_to_address>
**suggestion:** Normalizing integer-like metric values only for floats may miss Decimal or string-encoded numerics.
This handles only `float` values; if the driver returns `Decimal` (or string numerics), whole-number values won’t be coerced to `int` and will differ from the PHP representation. Consider centralising this in a helper that casts any numeric with a zero fractional part to `int` (and maybe handles `Decimal` explicitly) if parity with PHP is required.
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/runs_get_test.py" line_range="57-66" />
<code_context>
+ assert get_resp.json() == post_resp.json()
+
+
+async def test_get_run_top_level_shape(py_api: httpx.AsyncClient) -> None:
+ """Response contains all expected top-level keys."""
+ response = await py_api.get(f"/run/{_RUN_ID}")
+ run = response.json()
+ expected_keys = {
+ "run_id",
+ "uploader",
+ "uploader_name",
+ "task_id",
+ "task_type",
+ "flow_id",
+ "flow_name",
+ "setup_id",
+ "setup_string",
+ "parameter_setting",
+ "error",
+ "tag",
+ "input_data",
+ "output_data",
+ }
+ assert expected_keys <= run.keys(), f"Missing keys: {expected_keys - run.keys()}"
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an explicit test that `task_evaluation_measure` is omitted from the JSON when null/empty.
The code relies on `task_evaluation_measure` being normalized to `None` and excluded via `response_model_exclude_none=True`, but we only assert that expected keys are present. Please add a test (e.g., using task 115) that explicitly checks `"task_evaluation_measure" not in response.json()` when no measure is configured, to lock in this behavior and guard against future schema or exclusion changes.
Suggested implementation:
```python
"""GET and POST /run/{id} return identical JSON bodies."""
get_resp, post_resp = await asyncio.gather(
py_api.get(f"/run/{_RUN_ID}"),
py_api.post(f"/run/{_RUN_ID}"),
)
assert get_resp.status_code == HTTPStatus.OK
assert post_resp.status_code == HTTPStatus.OK
assert get_resp.json() == post_resp.json()
async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
"""task_evaluation_measure is not present in JSON when no measure is configured for the task."""
response = await py_api.get(f"/run/{_RUN_ID_NO_EVAL_MEASURE}")
run = response.json()
# The field should be omitted entirely when normalized to None and excluded via response_model_exclude_none=True
assert "task_evaluation_measure" not in run
import asyncio
```
1. Define `_RUN_ID_NO_EVAL_MEASURE` in this test module (or reuse an existing constant) and set it to the ID of a run whose task has no evaluation measure configured, for example a run on task 115 as noted in your comment.
2. Ensure the fixture or test data you use for `_RUN_ID_NO_EVAL_MEASURE` reflects a task with `task_evaluation_measure` normalized to `None`, so the test correctly locks in the exclusion behavior.
</issue_to_address>
### Comment 3
<location path="src/routers/openml/runs.py" line_range="68" />
<code_context>
+ response_model_exclude_none=True,
+)
+@router.get("/{run_id}", response_model_exclude_none=True)
+async def get_run(
+ run_id: int,
+ expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the async context loading and result-mapping logic from `get_run` into small helper functions to make the endpoint body more linear and readable.
You can reduce the cognitive load of `get_run` by extracting the “context loading” and the “mapping” into focused helpers. This keeps behaviour identical but removes the big `gather` tuple and long inline mapping blocks.
For example, pull the `asyncio.gather` and associated types into a small helper:
```python
from dataclasses import dataclass
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from sqlalchemy import Row
from sqlalchemy.ext.asyncio import AsyncConnection
@dataclass
class RunContext:
uploader_name: str | None
tags: list[str]
input_data_rows: list[Row]
output_file_rows: list[Row]
evaluation_rows: list[Row]
task_type: str | None
task_evaluation_measure: str | None
setup: Row | None
parameter_rows: list[Row]
async def _load_run_context(
run: "Row",
run_id: int,
expdb: AsyncConnection,
userdb: AsyncConnection,
engine_ids: list[int],
) -> RunContext:
(
uploader_name,
tags,
input_data_rows,
output_file_rows,
evaluation_rows,
task_type,
task_evaluation_measure,
setup,
parameter_rows,
) = await asyncio.gather(
database.runs.get_uploader_name(run.uploader, userdb),
database.runs.get_tags(run_id, expdb),
database.runs.get_input_data(run_id, expdb),
database.runs.get_output_files(run_id, expdb),
database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
database.runs.get_task_type(run.task_id, expdb),
database.runs.get_task_evaluation_measure(run.task_id, expdb),
database.setups.get(run.setup, expdb),
database.setups.get_parameters(run.setup, expdb),
)
return RunContext(
uploader_name=uploader_name,
tags=tags,
input_data_rows=input_data_rows,
output_file_rows=output_file_rows,
evaluation_rows=evaluation_rows,
task_type=task_type,
task_evaluation_measure=task_evaluation_measure,
setup=setup,
parameter_rows=parameter_rows,
)
```
Then the endpoint body can become more linear and self-describing:
```python
async def get_run(...):
run = await database.runs.get(run_id, expdb)
if run is None:
raise RunNotFoundError(f"Run {run_id} not found.", code=236)
engine_ids: list[int] = config.load_run_configuration().get(
"evaluation_engine_ids", [1]
)
ctx = await _load_run_context(run, run_id, expdb, userdb, engine_ids)
flow = await database.flows.get(ctx.setup.implementation_id, expdb) if ctx.setup else None
parameter_settings = _build_parameter_settings(ctx.parameter_rows)
input_datasets = _build_input_datasets(ctx.input_data_rows)
output_files = _build_output_files(ctx.output_file_rows)
evaluations = _build_evaluations(ctx.evaluation_rows)
normalised_measure = ctx.task_evaluation_measure or None
error_messages = [run.error_message] if run.error_message else []
return Run(
run_id=run_id,
uploader=run.uploader,
uploader_name=ctx.uploader_name,
task_id=run.task_id,
task_type=ctx.task_type,
task_evaluation_measure=normalised_measure,
flow_id=ctx.setup.implementation_id if ctx.setup else 0,
flow_name=flow.full_name if flow else None,
setup_id=run.setup,
setup_string=ctx.setup.setup_string if ctx.setup else None,
parameter_setting=parameter_settings,
error_message=error_messages,
tag=ctx.tags,
input_data={"dataset": input_datasets},
output_data={"file": output_files, "evaluation": evaluations},
)
```
The mapping helpers stay small and local, but hide the detail from the main endpoint:
```python
def _build_parameter_settings(parameter_rows: list["Row"]) -> list[ParameterSetting]:
return [
ParameterSetting(
name=p["name"],
value=p["value"],
component=p["flow_id"],
)
for p in parameter_rows
]
def _build_input_datasets(rows: list["Row"]) -> list[InputDataset]:
return [InputDataset(did=row.did, name=row.name, url=row.url) for row in rows]
def _build_output_files(rows: list["Row"]) -> list[OutputFile]:
return [OutputFile(file_id=row.file_id, name=row.field) for row in rows]
def _build_evaluations(rows: list["Row"]) -> list[EvaluationScore]:
def _normalise_value(v: object) -> object:
return int(v) if isinstance(v, float) and v.is_integer() else v
return [
EvaluationScore(
name=row.name,
value=_normalise_value(row.value),
array_data=row.array_data,
)
for row in rows
]
```
These extractions keep all current behaviour (including the integer conversion, error message list, and envelope shapes) but make `get_run` much easier to read and change later.
</issue_to_address>
### Comment 4
<location path="src/schemas/runs.py" line_range="108" />
<code_context>
+ # At the Python level we keep the name error_message for clarity.
+ error_message: list[str] = Field(serialization_alias="error") # [] when NULL in DB
+ tag: list[str]
+ input_data: dict[str, list[InputDataset]] # {"dataset": [...]}
+ output_data: dict[
+ str,
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the dict-typed input_data/output_data fields and union with small wrapper models and, optionally, simplifying the error_message aliasing to make the schema easier to understand and use.
You can reduce the mental overhead around `input_data` / `output_data` and the union by introducing small wrapper models while preserving the exact JSON shape.
### 1. Replace `dict[str, ...]` with explicit wrapper models
Current:
```python
class Run(BaseModel):
...
input_data: dict[str, list[InputDataset]] # {"dataset": [...]}
output_data: dict[
str,
list[OutputFile | EvaluationScore],
] # {"file": [...], "evaluation": [...]}
```
Suggested:
```python
class InputData(BaseModel):
dataset: list[InputDataset]
class OutputData(BaseModel):
file: list[OutputFile]
evaluation: list[EvaluationScore]
class Run(BaseModel):
...
input_data: InputData
output_data: OutputData
```
Pydantic will still serialize to:
```json
{
"input_data": {
"dataset": [ ... ]
},
"output_data": {
"file": [ ... ],
"evaluation": [ ... ]
}
}
```
This removes:
- The need to remember magic keys (`"dataset"`, `"file"`, `"evaluation"`).
- The `OutputFile | EvaluationScore` union, since each field has a single concrete type.
Callers now get proper IDE/type support (`run.output_data.file`, `run.output_data.evaluation`) and clearer validation/error messages.
### 2. Optional: simplify the `error_message` aliasing
If you don’t need to *accept* `"error"` on input (only *emit* it on output), you can avoid `populate_by_name=True` + `serialization_alias` and let the response layer opt into alias usage:
```python
class Run(BaseModel):
...
error_message: list[str] = Field(alias="error")
```
Then, when returning responses:
```python
return Run(...).model_dump(by_alias=True)
```
This:
- Keeps the Python attribute as `error_message`.
- Uses `"error"` in the JSON.
- Avoids model-level `populate_by_name=True` unless you rely on it elsewhere.
If you do need both `"error"` and `error_message` as input, keeping `serialization_alias` is fine, but you could still isolate that pattern in a tiny boundary schema (e.g. `RunResponse`) that just wraps `Run` and handles aliases.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| value=int(row.value) | ||
| if isinstance(row.value, float) and row.value.is_integer() | ||
| else row.value, |
There was a problem hiding this comment.
suggestion: Normalizing integer-like metric values only for floats may miss Decimal or string-encoded numerics.
This handles only float values; if the driver returns Decimal (or string numerics), whole-number values won’t be coerced to int and will differ from the PHP representation. Consider centralising this in a helper that casts any numeric with a zero fractional part to int (and maybe handles Decimal explicitly) if parity with PHP is required.
| async def test_get_run_top_level_shape(py_api: httpx.AsyncClient) -> None: | ||
| """Response contains all expected top-level keys.""" | ||
| response = await py_api.get(f"/run/{_RUN_ID}") | ||
| run = response.json() | ||
| expected_keys = { | ||
| "run_id", | ||
| "uploader", | ||
| "uploader_name", | ||
| "task_id", | ||
| "task_type", |
There was a problem hiding this comment.
suggestion (testing): Add an explicit test that task_evaluation_measure is omitted from the JSON when null/empty.
The code relies on task_evaluation_measure being normalized to None and excluded via response_model_exclude_none=True, but we only assert that expected keys are present. Please add a test (e.g., using task 115) that explicitly checks "task_evaluation_measure" not in response.json() when no measure is configured, to lock in this behavior and guard against future schema or exclusion changes.
Suggested implementation:
"""GET and POST /run/{id} return identical JSON bodies."""
get_resp, post_resp = await asyncio.gather(
py_api.get(f"/run/{_RUN_ID}"),
py_api.post(f"/run/{_RUN_ID}"),
)
assert get_resp.status_code == HTTPStatus.OK
assert post_resp.status_code == HTTPStatus.OK
assert get_resp.json() == post_resp.json()
async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
"""task_evaluation_measure is not present in JSON when no measure is configured for the task."""
response = await py_api.get(f"/run/{_RUN_ID_NO_EVAL_MEASURE}")
run = response.json()
# The field should be omitted entirely when normalized to None and excluded via response_model_exclude_none=True
assert "task_evaluation_measure" not in run
import asyncio- Define
_RUN_ID_NO_EVAL_MEASUREin this test module (or reuse an existing constant) and set it to the ID of a run whose task has no evaluation measure configured, for example a run on task 115 as noted in your comment. - Ensure the fixture or test data you use for
_RUN_ID_NO_EVAL_MEASUREreflects a task withtask_evaluation_measurenormalized toNone, so the test correctly locks in the exclusion behavior.
| response_model_exclude_none=True, | ||
| ) | ||
| @router.get("/{run_id}", response_model_exclude_none=True) | ||
| async def get_run( |
There was a problem hiding this comment.
issue (complexity): Consider extracting the async context loading and result-mapping logic from get_run into small helper functions to make the endpoint body more linear and readable.
You can reduce the cognitive load of get_run by extracting the “context loading” and the “mapping” into focused helpers. This keeps behaviour identical but removes the big gather tuple and long inline mapping blocks.
For example, pull the asyncio.gather and associated types into a small helper:
from dataclasses import dataclass
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from sqlalchemy import Row
from sqlalchemy.ext.asyncio import AsyncConnection
@dataclass
class RunContext:
uploader_name: str | None
tags: list[str]
input_data_rows: list[Row]
output_file_rows: list[Row]
evaluation_rows: list[Row]
task_type: str | None
task_evaluation_measure: str | None
setup: Row | None
parameter_rows: list[Row]
async def _load_run_context(
run: "Row",
run_id: int,
expdb: AsyncConnection,
userdb: AsyncConnection,
engine_ids: list[int],
) -> RunContext:
(
uploader_name,
tags,
input_data_rows,
output_file_rows,
evaluation_rows,
task_type,
task_evaluation_measure,
setup,
parameter_rows,
) = await asyncio.gather(
database.runs.get_uploader_name(run.uploader, userdb),
database.runs.get_tags(run_id, expdb),
database.runs.get_input_data(run_id, expdb),
database.runs.get_output_files(run_id, expdb),
database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
database.runs.get_task_type(run.task_id, expdb),
database.runs.get_task_evaluation_measure(run.task_id, expdb),
database.setups.get(run.setup, expdb),
database.setups.get_parameters(run.setup, expdb),
)
return RunContext(
uploader_name=uploader_name,
tags=tags,
input_data_rows=input_data_rows,
output_file_rows=output_file_rows,
evaluation_rows=evaluation_rows,
task_type=task_type,
task_evaluation_measure=task_evaluation_measure,
setup=setup,
parameter_rows=parameter_rows,
)Then the endpoint body can become more linear and self-describing:
async def get_run(...):
run = await database.runs.get(run_id, expdb)
if run is None:
raise RunNotFoundError(f"Run {run_id} not found.", code=236)
engine_ids: list[int] = config.load_run_configuration().get(
"evaluation_engine_ids", [1]
)
ctx = await _load_run_context(run, run_id, expdb, userdb, engine_ids)
flow = await database.flows.get(ctx.setup.implementation_id, expdb) if ctx.setup else None
parameter_settings = _build_parameter_settings(ctx.parameter_rows)
input_datasets = _build_input_datasets(ctx.input_data_rows)
output_files = _build_output_files(ctx.output_file_rows)
evaluations = _build_evaluations(ctx.evaluation_rows)
normalised_measure = ctx.task_evaluation_measure or None
error_messages = [run.error_message] if run.error_message else []
return Run(
run_id=run_id,
uploader=run.uploader,
uploader_name=ctx.uploader_name,
task_id=run.task_id,
task_type=ctx.task_type,
task_evaluation_measure=normalised_measure,
flow_id=ctx.setup.implementation_id if ctx.setup else 0,
flow_name=flow.full_name if flow else None,
setup_id=run.setup,
setup_string=ctx.setup.setup_string if ctx.setup else None,
parameter_setting=parameter_settings,
error_message=error_messages,
tag=ctx.tags,
input_data={"dataset": input_datasets},
output_data={"file": output_files, "evaluation": evaluations},
)The mapping helpers stay small and local, but hide the detail from the main endpoint:
def _build_parameter_settings(parameter_rows: list["Row"]) -> list[ParameterSetting]:
return [
ParameterSetting(
name=p["name"],
value=p["value"],
component=p["flow_id"],
)
for p in parameter_rows
]
def _build_input_datasets(rows: list["Row"]) -> list[InputDataset]:
return [InputDataset(did=row.did, name=row.name, url=row.url) for row in rows]
def _build_output_files(rows: list["Row"]) -> list[OutputFile]:
return [OutputFile(file_id=row.file_id, name=row.field) for row in rows]
def _build_evaluations(rows: list["Row"]) -> list[EvaluationScore]:
def _normalise_value(v: object) -> object:
return int(v) if isinstance(v, float) and v.is_integer() else v
return [
EvaluationScore(
name=row.name,
value=_normalise_value(row.value),
array_data=row.array_data,
)
for row in rows
]These extractions keep all current behaviour (including the integer conversion, error message list, and envelope shapes) but make get_run much easier to read and change later.
| # At the Python level we keep the name error_message for clarity. | ||
| error_message: list[str] = Field(serialization_alias="error") # [] when NULL in DB | ||
| tag: list[str] | ||
| input_data: dict[str, list[InputDataset]] # {"dataset": [...]} |
There was a problem hiding this comment.
issue (complexity): Consider replacing the dict-typed input_data/output_data fields and union with small wrapper models and, optionally, simplifying the error_message aliasing to make the schema easier to understand and use.
You can reduce the mental overhead around input_data / output_data and the union by introducing small wrapper models while preserving the exact JSON shape.
1. Replace dict[str, ...] with explicit wrapper models
Current:
class Run(BaseModel):
...
input_data: dict[str, list[InputDataset]] # {"dataset": [...]}
output_data: dict[
str,
list[OutputFile | EvaluationScore],
] # {"file": [...], "evaluation": [...]}Suggested:
class InputData(BaseModel):
dataset: list[InputDataset]
class OutputData(BaseModel):
file: list[OutputFile]
evaluation: list[EvaluationScore]
class Run(BaseModel):
...
input_data: InputData
output_data: OutputDataPydantic will still serialize to:
{
"input_data": {
"dataset": [ ... ]
},
"output_data": {
"file": [ ... ],
"evaluation": [ ... ]
}
}This removes:
- The need to remember magic keys (
"dataset","file","evaluation"). - The
OutputFile | EvaluationScoreunion, since each field has a single concrete type.
Callers now get proper IDE/type support (run.output_data.file, run.output_data.evaluation) and clearer validation/error messages.
2. Optional: simplify the error_message aliasing
If you don’t need to accept "error" on input (only emit it on output), you can avoid populate_by_name=True + serialization_alias and let the response layer opt into alias usage:
class Run(BaseModel):
...
error_message: list[str] = Field(alias="error")Then, when returning responses:
return Run(...).model_dump(by_alias=True)This:
- Keeps the Python attribute as
error_message. - Uses
"error"in the JSON. - Avoids model-level
populate_by_name=Trueunless you rely on it elsewhere.
If you do need both "error" and error_message as input, keeping serialization_alias is fine, but you could still isolate that pattern in a tiny boundary schema (e.g. RunResponse) that just wraps Run and handles aliases.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/routers/openml/runs_get_test.py (1)
322-346: Consider asserting the PHP error envelope shape before indexing.Line 343 indexes
php_response.json()["error"]["code"]unconditionally in the non-OK branch. If PHP ever returns a different body (timeout, HTML, alternate error shape), the test fails with an opaqueKeyError/TypeErrorrather than a clear parity diff. A small guard or.get(...)chain would make regressions easier to triage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_get_test.py` around lines 322 - 346, In test_get_run_equal update the non-OK branch in test_get_run_equal to first validate the PHP error envelope before indexing php_response.json()["error"]["code"]; e.g., assert that php_response.json() is a dict and contains "error" mapping with a "code" key (or use safe .get chaining and explicit assertions) so that accessing php_response.json()["error"]["code"] cannot raise KeyError/TypeError and failures produce a clear assertion message comparing php_code and py_code (refer to php_response, py_response, and test_get_run_equal to locate the change).src/database/runs.py (1)
138-158: Usebindparam(..., expanding=True)for consistency with the rest of the codebase.The codebase already uses SQLAlchemy's
bindparam(expanding=True)successfully insrc/routers/openml/tasks.pyanddatasets.pywith aiomysql (verified by passing tests liketest_list_tasks_api_happy_pathandtest_list_tasks_filter_task_id). Adopting this pattern here eliminates manual string interpolation and the security noqa comment while maintaining compatibility.♻️ Suggested refactor
- if not evaluation_engine_ids: - return [] - - # Build :eid_0, :eid_1, ... placeholders — one per engine ID. - eid_params = {f"eid_{i}": eid for i, eid in enumerate(evaluation_engine_ids)} - placeholders = ", ".join(f":eid_{i}" for i in range(len(evaluation_engine_ids))) - - query = text( - f""" - SELECT `m`.`name`, `e`.`value`, `e`.`array_data` - FROM `evaluation` `e` - JOIN `math_function` `m` ON `e`.`function_id` = `m`.`id` - WHERE `e`.`source` = :run_id - AND `e`.`evaluation_engine_id` IN ({placeholders}) - """, # noqa: S608 # placeholders are trusted integer params, not user input - ) - rows = await expdb.execute( - query, - parameters={"run_id": run_id, **eid_params}, - ) + if not evaluation_engine_ids: + return [] + + query = text( + """ + SELECT `m`.`name`, `e`.`value`, `e`.`array_data` + FROM `evaluation` `e` + JOIN `math_function` `m` ON `e`.`function_id` = `m`.`id` + WHERE `e`.`source` = :run_id + AND `e`.`evaluation_engine_id` IN :engine_ids + """, + ).bindparams(bindparam("engine_ids", expanding=True)) + rows = await expdb.execute( + query, + parameters={"run_id": run_id, "engine_ids": evaluation_engine_ids}, + )(Requires
from sqlalchemy import bindparam.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/runs.py` around lines 138 - 158, Replace the manual per-ID placeholders (eid_params/placeholders) with SQLAlchemy's expanding bind param: import bindparam and change the WHERE clause to use IN (:eids) (or similar name) combined with bindparam("eids", expanding=True), then pass the list evaluation_engine_ids as the "eids" parameter in the expdb.execute call (alongside run_id); remove the handcrafted eid_params, the placeholders string, and the S608 noqa comment so the query uses bindparam(expanding=True) consistently with the rest of the codebase (refer to symbols evaluation_engine_ids, eid_params, placeholders, query, and expdb.execute).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/database/runs.py`:
- Around line 50-61: The SELECT using CONCAT in the query executed via
userdb.execute can return NULL if either first_name or last_name is NULL; change
the SQL to use CONCAT_WS(' ', `first_name`, `last_name`) (or wrap columns with
COALESCE) so a single populated name still yields a non-NULL `name` for the
uploader_id; update the query string in the function that builds the text(...)
passed to userdb.execute (the block referencing parameters={"uploader_id":
uploader_id}) accordingly.
In `@src/routers/openml/runs.py`:
- Around line 161-178: The Run constructor currently forces
flow_id=setup.implementation_id if setup else 0 which emits a synthetic 0;
change it to pass None when setup is missing (i.e.
flow_id=setup.implementation_id if setup else None and setup_id=run.setup if
run.setup is not None else None) so the schema-typed int | None can omit fields
with response_model_exclude_none=True; additionally, avoid unnecessary DB calls
by short-circuiting the gather that calls database.setups.* when run.setup is
None (skip the two database.setups.* calls in the concurrent gather when
run.setup is falsy).
In `@src/schemas/runs.py`:
- Around line 99-102: Change the schema fields flow_id and setup_id in the Runs
schema to be Optional[int] (int | None) instead of plain int so callers can pass
None instead of a magic 0; then update the OpenML runs router to stop using the
sentinel 0 and pass None when missing (replace flow_id=setup.implementation_id
if setup else 0 with flow_id=setup.implementation_id if setup else None, and
ensure setup_id is passed as None when run.setup is null). Also scan usages of
Runs/flow_id/setup_id to remove any logic that treats 0 as "missing" and rely on
None checks instead.
---
Nitpick comments:
In `@src/database/runs.py`:
- Around line 138-158: Replace the manual per-ID placeholders
(eid_params/placeholders) with SQLAlchemy's expanding bind param: import
bindparam and change the WHERE clause to use IN (:eids) (or similar name)
combined with bindparam("eids", expanding=True), then pass the list
evaluation_engine_ids as the "eids" parameter in the expdb.execute call
(alongside run_id); remove the handcrafted eid_params, the placeholders string,
and the S608 noqa comment so the query uses bindparam(expanding=True)
consistently with the rest of the codebase (refer to symbols
evaluation_engine_ids, eid_params, placeholders, query, and expdb.execute).
In `@tests/routers/openml/runs_get_test.py`:
- Around line 322-346: In test_get_run_equal update the non-OK branch in
test_get_run_equal to first validate the PHP error envelope before indexing
php_response.json()["error"]["code"]; e.g., assert that php_response.json() is a
dict and contains "error" mapping with a "code" key (or use safe .get chaining
and explicit assertions) so that accessing php_response.json()["error"]["code"]
cannot raise KeyError/TypeError and failures produce a clear assertion message
comparing php_code and py_code (refer to php_response, py_response, and
test_get_run_equal to locate the change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3b7d0bda-8760-40a3-972a-b4add43d2eef
📒 Files selected for processing (7)
src/config.pysrc/config.tomlsrc/database/flows.pysrc/database/runs.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_get_test.py
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
database/runs.get_evaluationsyou usebindparamwhen building the query, but it is not imported in this module, so add the appropriatefrom sqlalchemy import bindparamimport to avoid runtime errors. load_run_configurationre-reads the config file on every call, unlike the other cached config loaders; consider decorating it with@functools.cachefor consistency and to avoid repeated disk I/O.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database/runs.get_evaluations` you use `bindparam` when building the query, but it is not imported in this module, so add the appropriate `from sqlalchemy import bindparam` import to avoid runtime errors.
- `load_run_configuration` re-reads the config file on every call, unlike the other cached config loaders; consider decorating it with `@functools.cache` for consistency and to avoid repeated disk I/O.
## Individual Comments
### Comment 1
<location path="tests/routers/openml/runs_get_test.py" line_range="196-201" />
<code_context>
+ assert str(error.get("code")) == _RUN_NOT_FOUND_CODE
+
+
+async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None:
+ """task_evaluation_measure is not present in JSON when no measure is configured."""
+ # Run 24 is known to not have a task evaluation measure (verified in db test)
+ response = await py_api.get(f"/run/{_RUN_ID}")
+ run = response.json()
+ assert "task_evaluation_measure" not in run
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a complementary test for runs where task_evaluation_measure is present and propagated.
To exercise both branches of `get_task_evaluation_measure` and the response serialization, please also add a test using a run with an evaluation measure configured in `task_inputs`, asserting that `task_evaluation_measure` is present in the JSON and matches the DB value.
Suggested implementation:
```python
async def test_get_run_not_found(py_api: httpx.AsyncClient) -> None:
"""Non-existent run returns 404 with error code 236 (PHP compat)."""
response = await py_api.get(f"/run/{_MISSING_RUN_ID}")
assert response.status_code == HTTPStatus.NOT_FOUND
error = response.json()
# Verify PHP-compat error code
assert str(error.get("code")) == _RUN_NOT_FOUND_CODE
async def test_task_evaluation_measure_omitted_when_null(
py_api: httpx.AsyncClient,
) -> None:
"""task_evaluation_measure is not present in JSON when no measure is configured."""
# This run is known to not have a task evaluation measure (verified in db test)
response = await py_api.get(f"/run/{_RUN_ID_WITHOUT_MEASURE}")
run = response.json()
assert "task_evaluation_measure" not in run
async def test_task_evaluation_measure_present_when_configured(
py_api: httpx.AsyncClient,
) -> None:
"""task_evaluation_measure is present and matches DB when a measure is configured."""
# This run is known to have a task evaluation measure configured in task_inputs
response = await py_api.get(f"/run/{_RUN_ID_WITH_MEASURE}")
assert response.status_code == HTTPStatus.OK
run = response.json()
assert "task_evaluation_measure" in run
assert run["task_evaluation_measure"] == _EXPECTED_TASK_EVALUATION_MEASURE
import asyncio
from http import HTTPStatus
from typing import Any
import deepdiff
import httpx
import pytest
from sqlalchemy.ext.asyncio import AsyncConnection
import database.runs
```
To fully wire this up you will also need to:
1. Define `_RUN_ID_WITHOUT_MEASURE` (replacing the previous `_RUN_ID` used for the "omitted" test if necessary) to be a run id without a configured task evaluation measure, consistent with your test DB seed (comment previously mentioned “Run 24”).
2. Define `_RUN_ID_WITH_MEASURE` as a run id that has a `task_evaluation_measure` configured in `task_inputs` in your test database.
3. Define `_EXPECTED_TASK_EVALUATION_MEASURE` to match the expected serialized JSON value of the measure from the DB for `_RUN_ID_WITH_MEASURE` (e.g. a string like `"predictive_accuracy"` or a dict, depending on how it is stored/serialized).
4. Ensure these constants are placed alongside your other module-level test constants (e.g. near `_MISSING_RUN_ID`, `_RUN_NOT_FOUND_CODE`, etc.) so they can be reused across tests.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def test_task_evaluation_measure_omitted_when_null(py_api: httpx.AsyncClient) -> None: | ||
| """task_evaluation_measure is not present in JSON when no measure is configured.""" | ||
| # Run 24 is known to not have a task evaluation measure (verified in db test) | ||
| response = await py_api.get(f"/run/{_RUN_ID}") | ||
| run = response.json() | ||
| assert "task_evaluation_measure" not in run |
There was a problem hiding this comment.
suggestion (testing): Add a complementary test for runs where task_evaluation_measure is present and propagated.
To exercise both branches of get_task_evaluation_measure and the response serialization, please also add a test using a run with an evaluation measure configured in task_inputs, asserting that task_evaluation_measure is present in the JSON and matches the DB value.
Suggested implementation:
async def test_get_run_not_found(py_api: httpx.AsyncClient) -> None:
"""Non-existent run returns 404 with error code 236 (PHP compat)."""
response = await py_api.get(f"/run/{_MISSING_RUN_ID}")
assert response.status_code == HTTPStatus.NOT_FOUND
error = response.json()
# Verify PHP-compat error code
assert str(error.get("code")) == _RUN_NOT_FOUND_CODE
async def test_task_evaluation_measure_omitted_when_null(
py_api: httpx.AsyncClient,
) -> None:
"""task_evaluation_measure is not present in JSON when no measure is configured."""
# This run is known to not have a task evaluation measure (verified in db test)
response = await py_api.get(f"/run/{_RUN_ID_WITHOUT_MEASURE}")
run = response.json()
assert "task_evaluation_measure" not in run
async def test_task_evaluation_measure_present_when_configured(
py_api: httpx.AsyncClient,
) -> None:
"""task_evaluation_measure is present and matches DB when a measure is configured."""
# This run is known to have a task evaluation measure configured in task_inputs
response = await py_api.get(f"/run/{_RUN_ID_WITH_MEASURE}")
assert response.status_code == HTTPStatus.OK
run = response.json()
assert "task_evaluation_measure" in run
assert run["task_evaluation_measure"] == _EXPECTED_TASK_EVALUATION_MEASURE
import asyncio
from http import HTTPStatus
from typing import Any
import deepdiff
import httpx
import pytest
from sqlalchemy.ext.asyncio import AsyncConnection
import database.runsTo fully wire this up you will also need to:
- Define
_RUN_ID_WITHOUT_MEASURE(replacing the previous_RUN_IDused for the "omitted" test if necessary) to be a run id without a configured task evaluation measure, consistent with your test DB seed (comment previously mentioned “Run 24”). - Define
_RUN_ID_WITH_MEASUREas a run id that has atask_evaluation_measureconfigured intask_inputsin your test database. - Define
_EXPECTED_TASK_EVALUATION_MEASUREto match the expected serialized JSON value of the measure from the DB for_RUN_ID_WITH_MEASURE(e.g. a string like"predictive_accuracy"or a dict, depending on how it is stored/serialized). - Ensure these constants are placed alongside your other module-level test constants (e.g. near
_MISSING_RUN_ID,_RUN_NOT_FOUND_CODE, etc.) so they can be reused across tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/runs.py`:
- Around line 100-109: The asyncio.gather is launching multiple database.* calls
that all share the same AsyncConnection object (expdb), which is unsafe; change
the code so calls that use the shared expdb run sequentially or each acquires
its own connection from the engine. Concretely, replace the concurrent gather of
database.runs.get_uploader_name, get_tags, get_input_data, get_output_files,
get_evaluations (with evaluation_engine_ids=engine_ids), get_task_type,
get_task_evaluation_measure, database.setups.get, and
database.setups.get_parameters — all currently passed the shared expdb — with
either (a) awaiting them one-by-one using await database.runs.get_tags(...,
expdb) etc., or (b) modify the call sites to obtain a fresh
connection/transaction for each concurrent task (e.g., acquire a new async
connection from the engine and pass that connection instead of expdb) so no two
coroutines share the same AsyncConnection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4bc0f947-3947-4e10-bc3d-a819bccf4079
📒 Files selected for processing (4)
src/database/runs.pysrc/routers/openml/runs.pysrc/schemas/runs.pytests/routers/openml/runs_get_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/database/runs.py
- tests/routers/openml/runs_get_test.py
| await asyncio.gather( | ||
| database.runs.get_uploader_name(run.uploader, userdb), | ||
| database.runs.get_tags(run_id, expdb), | ||
| database.runs.get_input_data(run_id, expdb), | ||
| database.runs.get_output_files(run_id, expdb), | ||
| database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids), | ||
| database.runs.get_task_type(run.task_id, expdb), | ||
| database.runs.get_task_evaluation_measure(run.task_id, expdb), | ||
| database.setups.get(run.setup, expdb), | ||
| database.setups.get_parameters(run.setup, expdb), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "runs.py" | head -20Repository: openml/server-api
Length of output: 128
🏁 Script executed:
wc -l src/routers/openml/runs.pyRepository: openml/server-api
Length of output: 91
🏁 Script executed:
sed -n '90,120p' src/routers/openml/runs.pyRepository: openml/server-api
Length of output: 1304
🏁 Script executed:
# Check the full function context and initialization
sed -n '70,170p' src/routers/openml/runs.pyRepository: openml/server-api
Length of output: 3129
🌐 Web query:
SQLAlchemy AsyncConnection concurrent tasks safety async/await
💡 Result:
No, a single AsyncConnection instance is not safe to share across concurrent asyncio tasks. Like AsyncSession, AsyncConnection is stateful (e.g., holds a specific database connection and transaction state from the pool) and designed for use within a single task/coroutine. Sharing it across concurrent tasks (e.g., via asyncio.gather) leads to undefined behavior, potential errors detecting concurrent access (especially in SQLAlchemy 2.0+), and race conditions. The AsyncEngine, however, is safe to share across concurrent tasks. It manages an asyncio-compatible connection pool (using asyncio.Queue) that allows multiple tasks to concurrently acquire their own AsyncConnection instances via await engine.connect. Each task should create and use its own AsyncConnection within a context manager: async with engine.connect as conn: result = await conn.execute(stmt) This pattern supports high concurrency without blocking the event loop, as the pool handles connection checkout/checkin asynchronously. Official docs emphasize separate AsyncSession per task but the same principle applies to core AsyncConnection usage. GitHub discussions confirm sessions/connections cannot be shared concurrently, recommending per-task creation. Sync Connection is thread-safe when using Engine.pool, but async follows task-local pattern. Example for concurrent tasks: async def task(engine): async with engine.connect as conn: await conn.execute(...) async def main: await asyncio.gather(*(task(engine) for _ in range(10)))
Citations:
- 1: https://www.sqlalchemy.org/docs/21/orm/extensions/asyncio.html
- 2: https://www.sqlalchemy.org/docs/07/orm/extensions/asyncio.html
- 3: http://docs.sqlalchemy.org/en/latest/orm/extensions/asyncio.html
- 4: https://docs.sqlalchemy.org/en/latest/orm/extensions/asyncio.html
- 5: how to use multiple async sessions with gather (answer: use TaskGroup, not gather) sqlalchemy/sqlalchemy#9312
- 6: Handling Concurrency while sharing the session sqlalchemy/sqlalchemy#8554
- 7: https://stackoverflow.com/questions/42160820/is-it-thread-safe-to-use-sqlalchemy-with-engine-connections-instead-of-sessions
🏁 Script executed:
# Check if AsyncSession is used anywhere in the codebase
rg "AsyncSession" src/ --type pyRepository: openml/server-api
Length of output: 43
🏁 Script executed:
# Check database module to understand connection/session setup
fd -t f "database" -type d | head -5Repository: openml/server-api
Length of output: 231
🏁 Script executed:
# Look at database/__init__.py or similar for connection patterns
ls -la src/database/ | head -20Repository: openml/server-api
Length of output: 828
🏁 Script executed:
# Search for other asyncio.gather patterns in the codebase
rg "asyncio\.gather" src/ -A 5 -B 2Repository: openml/server-api
Length of output: 6116
🏁 Script executed:
# Check how AsyncConnection is created/provided
rg "AsyncConnection" src/ --type py -B 2 -A 2Repository: openml/server-api
Length of output: 33637
🏁 Script executed:
# Check the database module initialization
head -50 src/database/__init__.pyRepository: openml/server-api
Length of output: 109
🏁 Script executed:
# Look at how expdb is obtained in the router
grep -n "expdb" src/routers/openml/runs.py | head -20Repository: openml/server-api
Length of output: 1193
Don't run concurrent queries on the same AsyncConnection.
Eight of the nine asyncio.gather tasks share expdb, causing overlapping statements on one SQLAlchemy connection. AsyncConnection is stateful and designed for sequential use only. Sharing it across concurrent tasks causes undefined behavior and race conditions. Per SQLAlchemy docs: https://docs.sqlalchemy.org/en/20/orm/session_basics.html#is-the-session-thread-safe-is-asyncsession-safe-to-share-in-concurrent-tasks
Use sequential queries on the shared connection, or acquire separate connections per concurrent task from the engine.
Sequential execution for shared connection
- await asyncio.gather(
- database.runs.get_uploader_name(run.uploader, userdb),
- database.runs.get_tags(run_id, expdb),
- database.runs.get_input_data(run_id, expdb),
- database.runs.get_output_files(run_id, expdb),
- database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
- database.runs.get_task_type(run.task_id, expdb),
- database.runs.get_task_evaluation_measure(run.task_id, expdb),
- database.setups.get(run.setup, expdb),
- database.setups.get_parameters(run.setup, expdb),
- ),
+ (
+ await database.runs.get_uploader_name(run.uploader, userdb),
+ await database.runs.get_tags(run_id, expdb),
+ await database.runs.get_input_data(run_id, expdb),
+ await database.runs.get_output_files(run_id, expdb),
+ await database.runs.get_evaluations(run_id, expdb, evaluation_engine_ids=engine_ids),
+ await database.runs.get_task_type(run.task_id, expdb),
+ await database.runs.get_task_evaluation_measure(run.task_id, expdb),
+ await database.setups.get(run.setup, expdb),
+ await database.setups.get_parameters(run.setup, expdb),
+ ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/openml/runs.py` around lines 100 - 109, The asyncio.gather is
launching multiple database.* calls that all share the same AsyncConnection
object (expdb), which is unsafe; change the code so calls that use the shared
expdb run sequentially or each acquires its own connection from the engine.
Concretely, replace the concurrent gather of database.runs.get_uploader_name,
get_tags, get_input_data, get_output_files, get_evaluations (with
evaluation_engine_ids=engine_ids), get_task_type, get_task_evaluation_measure,
database.setups.get, and database.setups.get_parameters — all currently passed
the shared expdb — with either (a) awaiting them one-by-one using await
database.runs.get_tags(..., expdb) etc., or (b) modify the call sites to obtain
a fresh connection/transaction for each concurrent task (e.g., acquire a new
async connection from the engine and pass that connection instead of expdb) so
no two coroutines share the same AsyncConnection.
There was a problem hiding this comment.
In theory valid point but for our usecase all operations are read so no such issue arises is what i feel and it is consistent with codebase (https://github.com/openml/server-api/blob/main/src/routers/openml/datasets.py) for reference.
another point to note is that we can use connection pool(sourcery-ai suggested) but i feel it is too much overhead and complex for our usecase.
There was a problem hiding this comment.
@saathviksheerla, thank you for the context and the pointer to datasets.py.
A few clarifications to sharpen the discussion:
-
The SQLAlchemy warning covers
AsyncConnectiontoo, not justAsyncSession. The asyncpg database connection itself, acquired from SQLAlchemy's pool, is not concurrency safe — using it in two concurrent awaitables produces an error. The same principle extends toAsyncConnectionsince it wraps the underlying DBAPI connection. -
Read vs. write does not make concurrent use safe. SQLAlchemy 2.0 introduced proactive detection of concurrent methods invoked on a single
Session/AsyncSession— these typically occur when a single instance is shared among concurrent tasks (e.g.,asyncio.gather()). Without this warning system, the pattern still produces invalid state within the objects, including driver-level errors on the database connections themselves. The state machine in the connection object does not distinguish reads from writes. -
The practical risk depends on the driver.
aiomysql(which this project likely uses for MySQL) may be more lenient in practice thanasyncpg, which is why you may not be seeing failures locally. That said, this is relying on undocumented driver tolerance, not a supported guarantee. -
Codebase consistency is a fair point. If
datasets.pyuses the sameasyncio.gatherpattern with a sharedAsyncConnectionand it has been stable in production, then changing only this endpoint in isolation would be inconsistent. If this is an accepted trade-off across the codebase, that's a maintainer call to make uniformly.
Given that this is a consistent established pattern in the codebase and not an isolated deviation, I'll defer to your judgment here. You may want to track this as a future codebase-wide improvement if you ever see InterfaceError: another operation is in progress in production logs.
🧠 Learnings used
Learnt from: saathviksheerla
Repo: openml/server-api PR: 267
File: src/routers/openml/runs.py:0-0
Timestamp: 2026-03-08T07:45:07.969Z
Learning: In the openml/server-api Python codebase, non-boolean ENUM values stored as strings coming from the database should be treated as their literal string values, not coerced via truthiness checks. Ensure that fields like row.selected (with MySQL ENUM('true','false')) are passed through as the exact string (e.g., 'true' or 'false') and validated by Pydantic using Literal['true', 'false'] at the schema boundary. Do not apply a ternary like 'true' if row.selected else 'false'. Review places where the value is transformed before response: rely on the string value and let Pydantic enforce the literal type, and adjust any serialization if necessary to avoid implicit boolean conversion.
Learnt from: saathviksheerla
Repo: openml/server-api PR: 267
File: src/routers/openml/runs.py:45-51
Timestamp: 2026-03-08T07:59:51.515Z
Learning: In Python code using SQLAlchemy, VARCHAR/text columns are returned as Python strings. Avoid wrapping such values with str(...) when constructing objects (e.g., TraceIteration). Treat string-type columns as strings and pass them through directly unless a transformation is actually required. This guidance applies to similar string columns across Python files in the repository (and can be considered for enum-like values that are exposed as strings).
Learnt from: saathviksheerla
Repo: openml/server-api PR: 277
File: src/routers/openml/tasks.py:218-233
Timestamp: 2026-03-17T12:52:03.020Z
Learning: In the openml/server-api repo, the GET+POST list endpoints reuse a single handler with both router.post and router.get and declare all filter parameters with Body(...). Do not flag Body(...) parameters on GET handlers in this repository. This pattern applies to files under src/routers/openml/ (e.g., list tasks in tasks.py and list datasets in datasets.py).
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="tests/routers/openml/runs_get_test.py" line_range="82-91" />
<code_context>
+ assert expected_keys <= run.keys(), f"Missing keys: {expected_keys - run.keys()}"
+
+
+async def test_get_run_known_values(py_api: httpx.AsyncClient) -> None:
+ """Run 24 returns the exact values confirmed against the DB."""
+ response = await py_api.get(f"/run/{_RUN_ID}")
+ assert response.status_code == HTTPStatus.OK
+ run = response.json()
+
+ # Core identifiers
+ assert run["run_id"] == _RUN_ID
+ assert run["uploader"] == _RUN_UPLOADER_ID
+ assert run["uploader_name"] == "Cynthia Glover"
+ assert run["task_id"] == _RUN_TASK_ID
+ assert run["task_type"] == "Supervised Classification"
+ assert run["flow_id"] == _RUN_FLOW_ID
+ assert run["setup_id"] == _RUN_SETUP_ID
+ assert "Python_3.10.5" in run["setup_string"]
+
+ # Tags
+ assert "openml-python" in run["tag"]
+
+ # Error — NULL in DB -> empty list
+ assert run["error"] == []
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case for non-empty `error_message` to verify correct serialization to `error` list
Right now you only cover the case where `error_message` is NULL and serialized as an empty `error` list. Please also add a test where `error_message` is non-null and assert that the response exposes it as `error: [<message>]`, either by using a known failing run in the test DB or by patching `database.runs.get` to return a row with `error_message="Some error"`. This will help catch regressions in how `error` is serialized.
Suggested implementation:
```python
from sqlalchemy.ext.asyncio import AsyncConnection
async def test_get_run_non_empty_error(py_api: httpx.AsyncClient) -> None:
"""A run with a non-null error_message is serialized as a single-item error list."""
response = await py_api.get(f"/run/{_RUN_WITH_ERROR_ID}")
assert response.status_code == HTTPStatus.OK
run = response.json()
assert run["error"] == [_RUN_ERROR_MESSAGE]
```
To make this test work, you also need to:
1. Define `_RUN_WITH_ERROR_ID` and `_RUN_ERROR_MESSAGE` near the other `_RUN_*` constants in this test module (e.g. alongside `_RUN_ID`, `_RUN_TASK_ID`, etc.), using a run in your test DB that has a non-null `error_message`. For example:
- `_RUN_WITH_ERROR_ID = 42`
- `_RUN_ERROR_MESSAGE = "Some error from the backend"`
2. Ensure that, in your test database/fixtures, the run with ID `_RUN_WITH_ERROR_ID` has `error_message = _RUN_ERROR_MESSAGE` so that the API serializes it as `error: [_RUN_ERROR_MESSAGE]`.
3. If your API normalizes/changes error messages (e.g. prefixes, trimming), adjust `_RUN_ERROR_MESSAGE` and/or the assertion accordingly (for example, use `in` or `startswith` instead of exact equality).
</issue_to_address>
### Comment 2
<location path="tests/routers/openml/runs_get_test.py" line_range="413-422" />
<code_context>
+ assert not differences, f"Differences for run {run_id}: {differences}"
+
+
+def test_build_evaluations_coverage() -> None:
+ """Ensure _build_evaluations string-normalization branches are covered."""
+
+ class MockRow:
+ def __init__(self, name: str, value: object, array_data: str | None = None) -> None:
+ self.name = name
+ self.value = value
+ self.array_data = array_data
+
+ rows = [
+ MockRow("float_val", 1.0),
+ MockRow("str_float", "2.0"),
+ MockRow("str_text", "not_a_number"),
+ MockRow("unhandled_type", ["list"]),
+ ]
+ evals = _build_evaluations(rows)
+
+ values = {e.name: e.value for e in evals}
+ expected_one = 1
+ expected_two = 2
+ assert values["float_val"] == expected_one
+ assert values["str_float"] == expected_two
+ assert values["str_text"] is None
+ assert values["unhandled_type"] is None
</code_context>
<issue_to_address>
**suggestion (testing):** Extend `_build_evaluations` coverage to include non-integer float strings
This test currently covers integer-like floats/strings, non-numeric strings, and an unsupported type. Please also add a fractional numeric string (e.g. `MockRow("str_float_fraction", "1.5")`) and/or a non-integer float (e.g. `1.5`) and assert that `_normalise_value` preserves these as floats (e.g. `values["str_float_fraction"] == 1.5`). This will verify that non-integer numeric values are not coerced to ints or nulled.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async def test_get_run_known_values(py_api: httpx.AsyncClient) -> None: | ||
| """Run 24 returns the exact values confirmed against the DB.""" | ||
| response = await py_api.get(f"/run/{_RUN_ID}") | ||
| assert response.status_code == HTTPStatus.OK | ||
| run = response.json() | ||
|
|
||
| # Core identifiers | ||
| assert run["run_id"] == _RUN_ID | ||
| assert run["uploader"] == _RUN_UPLOADER_ID | ||
| assert run["uploader_name"] == "Cynthia Glover" |
There was a problem hiding this comment.
suggestion (testing): Add a test case for non-empty error_message to verify correct serialization to error list
Right now you only cover the case where error_message is NULL and serialized as an empty error list. Please also add a test where error_message is non-null and assert that the response exposes it as error: [<message>], either by using a known failing run in the test DB or by patching database.runs.get to return a row with error_message="Some error". This will help catch regressions in how error is serialized.
Suggested implementation:
from sqlalchemy.ext.asyncio import AsyncConnection
async def test_get_run_non_empty_error(py_api: httpx.AsyncClient) -> None:
"""A run with a non-null error_message is serialized as a single-item error list."""
response = await py_api.get(f"/run/{_RUN_WITH_ERROR_ID}")
assert response.status_code == HTTPStatus.OK
run = response.json()
assert run["error"] == [_RUN_ERROR_MESSAGE]To make this test work, you also need to:
- Define
_RUN_WITH_ERROR_IDand_RUN_ERROR_MESSAGEnear the other_RUN_*constants in this test module (e.g. alongside_RUN_ID,_RUN_TASK_ID, etc.), using a run in your test DB that has a non-nullerror_message. For example:_RUN_WITH_ERROR_ID = 42_RUN_ERROR_MESSAGE = "Some error from the backend"
- Ensure that, in your test database/fixtures, the run with ID
_RUN_WITH_ERROR_IDhaserror_message = _RUN_ERROR_MESSAGEso that the API serializes it aserror: [_RUN_ERROR_MESSAGE]. - If your API normalizes/changes error messages (e.g. prefixes, trimming), adjust
_RUN_ERROR_MESSAGEand/or the assertion accordingly (for example, useinorstartswithinstead of exact equality).
| def test_build_evaluations_coverage() -> None: | ||
| """Ensure _build_evaluations string-normalization branches are covered.""" | ||
|
|
||
| class MockRow: | ||
| def __init__(self, name: str, value: object, array_data: str | None = None) -> None: | ||
| self.name = name | ||
| self.value = value | ||
| self.array_data = array_data | ||
|
|
||
| rows = [ |
There was a problem hiding this comment.
suggestion (testing): Extend _build_evaluations coverage to include non-integer float strings
This test currently covers integer-like floats/strings, non-numeric strings, and an unsupported type. Please also add a fractional numeric string (e.g. MockRow("str_float_fraction", "1.5")) and/or a non-integer float (e.g. 1.5) and assert that _normalise_value preserves these as floats (e.g. values["str_float_fraction"] == 1.5). This will verify that non-integer numeric values are not coerced to ints or nulled.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/routers/openml/runs_get_test.py (3)
297-306: Use the existing_RUN_TASK_IDconstant.Both DB tests hardcode task id
115, which is already defined as_RUN_TASK_IDat line 23. Using the constant keeps the fixture data in one place.🧹 Suggested cleanup
async def test_db_get_task_type(expdb_test: AsyncConnection) -> None: """database.runs.get_task_type returns 'Supervised Classification' for task 115.""" - task_type = await database.runs.get_task_type(115, expdb_test) + task_type = await database.runs.get_task_type(_RUN_TASK_ID, expdb_test) assert task_type == "Supervised Classification" async def test_db_get_task_evaluation_measure_missing(expdb_test: AsyncConnection) -> None: """get_task_evaluation_measure returns None (not '') when absent.""" - measure = await database.runs.get_task_evaluation_measure(115, expdb_test) + measure = await database.runs.get_task_evaluation_measure(_RUN_TASK_ID, expdb_test) assert measure is NoneSimilar cleanup applies to
test_db_get_uploader_name(line 311:1159vs_RUN_UPLOADER_ID).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_get_test.py` around lines 297 - 306, Tests hardcode task IDs instead of using the existing _RUN_TASK_ID constant; update the two tests test_db_get_task_type and test_db_get_task_evaluation_measure_missing to replace the literal 115 with the shared constant _RUN_TASK_ID, and similarly replace the hardcoded uploader id in test_db_get_uploader_name with _RUN_UPLOADER_ID so database.runs.get_task_type, database.runs.get_task_evaluation_measure and the uploader-name test use the canonical fixtures.
315-318: Reusing_MISSING_RUN_IDas a user ID is semantically misleading.
_MISSING_RUN_IDrepresents a non-existent run, not user. The value happens to work because no user with id 999_999_999 exists either, but a reader is momentarily tripped up. Consider introducing a dedicated constant, e.g._MISSING_USER_ID = 999_999_999, or inline the literal with a comment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_get_test.py` around lines 315 - 318, The test test_db_get_uploader_name_missing is using _MISSING_RUN_ID (which denotes a nonexistent run) as a user id, which is semantically confusing; update the test to use a dedicated missing-user identifier by adding a new constant _MISSING_USER_ID = 999_999_999 (or replace the call with the literal 999_999_999 plus a comment) and call database.runs.get_uploader_name(_MISSING_USER_ID, user_test) so the intent is clear; ensure the new symbol is defined near other test constants and referenced in test_db_get_uploader_name_missing and any related tests.
380-398: Comment vs. filter logic disagree — please clarify/fix.The comment states PHP's aggregate row has
repeat="0"andfold="0", but the filter at line 391-393 drops every entry that has arepeatorfoldkey. If the aggregate truly carries those keys, this filter removes it too and leaves an empty list, which should failDeepDiffagainst the Python response (which keeps the aggregate). Since the tests reportedly pass, the aggregate likely lacksrepeat/foldentirely — in which case the comment is misleading.Please reconcile the comment with actual PHP output (e.g., capture one sample response and describe what distinguishes per-fold vs. aggregate entries). Also consider replacing the unreachable
elifbranch at line 394-398 with a simpler invariant check; the inline rationale there is self-contradictory ("PHP returns a list anyway if duplicates exist. If they don't, it's a dict.") and hard to follow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/routers/openml/runs_get_test.py` around lines 380 - 398, Update the misleading comment and fix the evaluation-filter logic around php_json["run"]["output_data"]["evaluation"]: clarify whether PHP's aggregate row includes repeat/fold (edit the comment to match actual sample output), and change the branch handling so that if evaluation is a list you remove only per-fold entries (e.g., those with repeat/fold present and not equal to "0"), and if evaluation is a dict then either wrap it as the single aggregate element (when repeat/fold are absent or equal to "0") or set it to [] when it represents a non-aggregate entry; replace the confusing elif block with this explicit check and conversion to ensure the aggregate is preserved and the behavior matches the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/routers/openml/runs_get_test.py`:
- Around line 291-294: The test test_db_get_evaluations_empty_engine_list
currently only asserts type(list) but should assert that get_evaluations returns
an empty list; update the assertion after calling
database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) to
assert rows == [] (or len(rows) == 0) so the test verifies emptiness rather than
just type.
---
Nitpick comments:
In `@tests/routers/openml/runs_get_test.py`:
- Around line 297-306: Tests hardcode task IDs instead of using the existing
_RUN_TASK_ID constant; update the two tests test_db_get_task_type and
test_db_get_task_evaluation_measure_missing to replace the literal 115 with the
shared constant _RUN_TASK_ID, and similarly replace the hardcoded uploader id in
test_db_get_uploader_name with _RUN_UPLOADER_ID so database.runs.get_task_type,
database.runs.get_task_evaluation_measure and the uploader-name test use the
canonical fixtures.
- Around line 315-318: The test test_db_get_uploader_name_missing is using
_MISSING_RUN_ID (which denotes a nonexistent run) as a user id, which is
semantically confusing; update the test to use a dedicated missing-user
identifier by adding a new constant _MISSING_USER_ID = 999_999_999 (or replace
the call with the literal 999_999_999 plus a comment) and call
database.runs.get_uploader_name(_MISSING_USER_ID, user_test) so the intent is
clear; ensure the new symbol is defined near other test constants and referenced
in test_db_get_uploader_name_missing and any related tests.
- Around line 380-398: Update the misleading comment and fix the
evaluation-filter logic around php_json["run"]["output_data"]["evaluation"]:
clarify whether PHP's aggregate row includes repeat/fold (edit the comment to
match actual sample output), and change the branch handling so that if
evaluation is a list you remove only per-fold entries (e.g., those with
repeat/fold present and not equal to "0"), and if evaluation is a dict then
either wrap it as the single aggregate element (when repeat/fold are absent or
equal to "0") or set it to [] when it represents a non-aggregate entry; replace
the confusing elif block with this explicit check and conversion to ensure the
aggregate is preserved and the behavior matches the comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fb7e0c97-fbbe-4d78-b109-8761ecbb61aa
📒 Files selected for processing (2)
src/config.pytests/routers/openml/runs_get_test.py
✅ Files skipped from review due to trivial changes (1)
- src/config.py
| async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None: | ||
| """get_evaluations with no engine IDs returns an empty list (not an error).""" | ||
| rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) | ||
| assert isinstance(rows, list) |
There was a problem hiding this comment.
Assertion should verify the list is empty, not just that it's a list.
The docstring says "returns an empty list (not an error)", but the current assertion only checks isinstance(rows, list). The non-empty-list case already passes in the previous test, so explicitly assert emptiness here:
🔧 Suggested fix
async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None:
"""get_evaluations with no engine IDs returns an empty list (not an error)."""
rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[])
- assert isinstance(rows, list)
+ assert rows == []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None: | |
| """get_evaluations with no engine IDs returns an empty list (not an error).""" | |
| rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) | |
| assert isinstance(rows, list) | |
| async def test_db_get_evaluations_empty_engine_list(expdb_test: AsyncConnection) -> None: | |
| """get_evaluations with no engine IDs returns an empty list (not an error).""" | |
| rows = await database.runs.get_evaluations(_RUN_ID, expdb_test, evaluation_engine_ids=[]) | |
| assert rows == [] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/routers/openml/runs_get_test.py` around lines 291 - 294, The test
test_db_get_evaluations_empty_engine_list currently only asserts type(list) but
should assert that get_evaluations returns an empty list; update the assertion
after calling database.runs.get_evaluations(_RUN_ID, expdb_test,
evaluation_engine_ids=[]) to assert rows == [] (or len(rows) == 0) so the test
verifies emptiness rather than just type.
Resolves #37.
This PR introduces the
GETandPOST /run/{id}endpoints for the OpenML Python APIOverview:
asyncio.gatherto concurrently query acrossexpdbanduserdbtables in a single bound, significantly reducing database I/O latency.tests/routers/openml/runs_get_test.py), enforcing complete structural parity.Checklist
Always:
Required for code changes:
If applicable:
/docs)