Skip to content

feat(cli): per-resource error isolation + non-zero exit on failure for resubmit/submit#299

Merged
jqnatividad merged 3 commits into
mainfrom
resubmit-cli-robustness
May 17, 2026
Merged

feat(cli): per-resource error isolation + non-zero exit on failure for resubmit/submit#299
jqnatividad merged 3 commits into
mainfrom
resubmit-cli-robustness

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Before this, ckan datapusher_plus resubmit was a 16-line loop with three pretty serious operator-experience holes:

  • No try/except — a single resource raising during datapusher_submit (e.g. a transient network blip) killed the whole batch and lost every resource queued after it.
  • No end-of-run summary — operators had to grep the stream of OK / Fail lines to count failures.
  • Unconditional exit 0 — a fully-failed batch was indistinguishable from a fully-succeeded one from a shell-exit / CI / cron perspective. Impossible to alert on.

After this PR:

  • Per-resource calls are wrapped; each lands in exactly one of ok / fail (returned falsy) / error (raised).
  • A summary block at the end prints counts plus the list of failed / errored resource ids (with the exception message for the errored ones).
  • resubmit / submit raise click.exceptions.Exit(code=1) when anything didn't succeed → CI gets the failure signal.
  • New shared --continue-on-error/--stop-on-error flag on both commands. Default is the legacy "drain the list" behaviour; --stop-on-error is for CI users who want to bail at the first failure. In submit, --stop-on-error also stops processing further packages.

What's intentionally out of scope

  • Retry / backoff inside the CLI — the Prefect pipeline already has _retry_if_transient for in-flow retries; the CLI's job is to surface outcomes, not to re-run the pipeline.
  • --retry-failed state file — adds friction; can be a follow-up if anyone actually wants it.

Note: this is unrelated to PR #253's "made DP+ resubmit more robust" commit — that one was about job-level rollback in the now-deleted jobs.py, which is closer to issue #265 (data-dictionary stash) territory. The CLI-level work here is its own thing.

Test plan

  • 11 new unit tests in tests/test_cli_resubmit.py:
    • _submit bucket bookkeeping for OK / Fail / Error mixes
    • Exception isolation (one resource raising doesn't stop the rest by default)
    • --stop-on-error semantics (break on first Fail, break on first Error)
    • Summary rendering (lists failed + errored ids, reports Skipped only when stopped early)
    • CLI exit-code wiring through click.testing.CliRunner (exit 0 vs exit 1 vs --stop-on-error early termination)
  • Full unit suite: 127/127 passing (was 116; +11 new). No regressions.
  • Integration CI on this PR.
  • Manual: ckan datapusher_plus resubmit -y on a small dataset with one failing resource — verify Fail: 1 summary + exit 1.
  • Manual: ckan datapusher_plus resubmit -y --stop-on-error — verify it stops at the first failure.
  • Manual: ckan datapusher_plus submit <good-package> -y — verify exit 0.

CLI surface (new)

ckan datapusher_plus resubmit [-y] [--continue-on-error | --stop-on-error]
ckan datapusher_plus submit [<package>] [-y] [--continue-on-error | --stop-on-error]

🤖 Generated with Claude Code

jqnatividad and others added 2 commits May 16, 2026 13:35
…r resubmit/submit

Before this, ``ckan datapusher_plus resubmit`` was a 16-line loop with:

- no try/except — a single resource raising during ``datapusher_submit``
  killed the whole batch and lost all later resources;
- no end-of-run summary — operators had to grep the stream of "OK" /
  "Fail" lines to count failures;
- unconditional exit 0 — a fully-failed batch was indistinguishable
  from a fully-succeeded one from a shell-exit / CI / cron perspective.

``_submit`` now isolates per-resource exceptions, classifies each
resource into one of ``ok`` / ``fail`` / ``error``, prints a summary
block at the end, and returns a bool the commands use to raise
``click.exceptions.Exit(code=1)`` when anything didn't succeed.

A shared ``--continue-on-error/--stop-on-error`` flag is wired into
both ``resubmit`` and ``submit``. Default is the legacy "drain the
list" behaviour; ``--stop-on-error`` is for CI users who want to bail
at the first failure (the ``submit`` command also stops processing
further packages in that case).

Out of scope (intentional):

- No retry/backoff inside the CLI — the Prefect pipeline already
  has ``_retry_if_transient`` for the in-flow retries; the CLI's job
  is to surface outcomes, not to re-run the pipeline.
- No ``--retry-failed`` state file — adds friction; can be a follow-up
  if it's actually wanted.

Verify

- 11 new unit tests in ``tests/test_cli_resubmit.py``: bucket
  bookkeeping, exception isolation, ``--stop-on-error`` semantics,
  summary rendering, and CLI exit codes via Click's CliRunner.
- Full unit suite: 127/127 (was 116; +11 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert the three "Manual" checkboxes from PR #299 into automated
``INTEGRATION=1`` tests so future PRs verify the resubmit / submit
CLI end-to-end and not just via CliRunner.

These add the coverage CliRunner can't reach: that
``ckan datapusher_plus resubmit`` actually registers (entry_points
wiring), finds real datastore-resident resources via
``datastore_backend.get_all_resources_ids_in_datastore``, and
propagates its exit code out of the CKAN bootstrap into the shell.

Three tests, one per plan item:

1. ``test_resubmit_exits_one_when_resource_fails`` — seeds a
   happy-path CSV → resource → datastore, then patches the
   resource URL to a 404 so the next ingest fails. Calls
   ``ckan datapusher_plus resubmit -y`` and asserts exit 1 +
   summary mentions the failing resource id.
2. ``test_resubmit_stop_on_error_breaks_early`` — same setup as
   (1), invokes with ``--stop-on-error`` and asserts exit 1.
   Doesn't pin the ordering (datastore listing order isn't
   deterministic) but the unit suite already nails that down.
3. ``test_submit_good_package_exits_zero`` — happy-path
   ``ckan datapusher_plus submit <package> -y``; asserts exit 0
   and ``Fail: 0`` / ``Error: 0`` in the summary.

Invokes the CLI via ``docker compose exec ckan`` so the tests run
on the host while the CKAN bootstrap lives inside the integration
stack. Skips cleanly when ``docker`` or the compose file isn't
available, so they don't break collection on machines without the
stack set up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the ckan datapusher_plus resubmit / submit CLI operator experience by isolating per-resource failures, printing an end-of-run summary, and returning a non-zero exit code when any resource doesn’t succeed (with an optional --stop-on-error mode for CI).

Changes:

  • Add shared --continue-on-error/--stop-on-error flag and make resubmit / submit exit with code 1 when any resource fails/errors.
  • Refactor _submit() to bucket outcomes into OK/Fail/Error, continue by default, and print a consolidated summary.
  • Add unit tests for _submit() bookkeeping + CLI exit codes, plus integration tests that validate real container CLI exit-code propagation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
ckanext/datapusher_plus/cli.py Adds stop/continue behavior, per-resource error isolation, summary output, and non-zero exit status on any non-OK outcome.
tests/test_cli_resubmit.py New unit tests validating _submit() bucketing, stop-on-error semantics, summary rendering, and Click exit codes.
tests/integration/test_cli_resubmit.py New integration tests running the real ckan CLI in Docker to assert registration + exit-code propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ckanext/datapusher_plus/cli.py
Comment thread ckanext/datapusher_plus/cli.py Outdated
Comment thread tests/integration/test_cli_resubmit.py Outdated
Comment thread tests/test_cli_resubmit.py Outdated
Four small follow-ups from Copilot's inline review:

* cli.py: add ``else: Context = dict`` fallback under the
  ``tk.check_ckan_version("2.10")`` guard so ``cast(Context, ...)`` in
  ``submit`` / ``_submit`` doesn't NameError on older CKANs. ``cast``
  is a runtime no-op (returns its second arg unchanged), so ``dict``
  is a safe stand-in while keeping the 2.10+ type-checker signal.

* cli.py: drop "datastore" from the ``_submit`` progress header.
  ``_submit`` is shared between ``resubmit`` (datastore-resident
  resources) and ``submit`` (package resources, possibly pre-datastore),
  so "Submitting N resource(s)" matches both call sites.

* tests/test_cli_resubmit.py: reword the module docstring. The
  pre-refactor ``_submit`` didn't "swallow every exception" — it had
  no try/except, so the first raising resource crashed the whole
  batch (Click surfaced the traceback). When nothing raised it still
  always exited 0; that part was right.

* tests/integration/test_cli_resubmit.py: make
  ``_docker_compose_available()`` actually probe the Compose v2 plugin
  via ``docker compose version`` (instead of just checking for the
  ``docker`` binary). Systems with only Compose v1 / ``docker-compose``
  now skip cleanly instead of failing inside ``_run_ckan_cli``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 9da7aeb into main May 17, 2026
1 check passed
@jqnatividad jqnatividad deleted the resubmit-cli-robustness branch May 17, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants