Skip to content

feat: Lakeflow Jobs Plugin#265

Open
atilafassina wants to merge 2 commits intomainfrom
ekniazev/jobs-plugin-core
Open

feat: Lakeflow Jobs Plugin#265
atilafassina wants to merge 2 commits intomainfrom
ekniazev/jobs-plugin-core

Conversation

@atilafassina
Copy link
Copy Markdown
Contributor

@atilafassina atilafassina commented Apr 10, 2026

Important

to maintain contribution history this PR should be rebased when merged
(don't merge it if it has more than 2 commits)

Summary

Resource-scoped jobs plugin following the files plugin pattern. Jobs are configured as named resources discovered from environment variables at startup.

Design

  • Resource-scoped: Only configured jobs are accessible — this is not an open SDK wrapper
  • Env-var discovery: Jobs are discovered from DATABRICKS_JOB_<KEY> env vars (e.g. DATABRICKS_JOB_ETL=123)
  • Single-job shorthand: DATABRICKS_JOB_ID maps to the "default" key
  • Manifest declares job resources with CAN_MANAGE_RUN permission
  • Works with databricks apps init --features jobs

API

// Trigger a configured job
const { run_id } = await appkit.jobs("etl").runNow();

// Trigger and wait for completion
const run = await appkit.jobs("etl").runNowAndWait();

// OBO access
await appkit.jobs("etl").asUser(req).runNow();

// List recent runs
const runs = await appkit.jobs("etl").listRuns({ limit: 10 });

// Single-job shorthand
await appkit.jobs("default").runNow();

Files changed

  • plugins/jobs/manifest.json — declares job resource with CAN_MANAGE_RUN permission
  • plugins/jobs/types.tsJobAPI, JobHandle, JobsExport, IJobsConfig types
  • plugins/jobs/plugin.tsJobsPlugin with discoverJobs(), getResourceRequirements(), resource-scoped createJobAPI()
  • plugins/jobs/index.ts — barrel exports
  • connectors/jobs/client.tslistRuns now respects limit parameter
  • plugins/jobs/tests/plugin.test.ts — 32 tests covering discovery, resource requirements, exports, OBO, multi-job, and auto-fill

Documentation safety checklist

  • Examples use least-privilege permissions
  • Sensitive values are obfuscated
  • No insecure patterns introduced

Reopened from #223 — moved head branch from fork to upstream to fix CI secrets access.

demo

lakeflow-jobs.mp4

Copilot AI review requested due to automatic review settings April 10, 2026 20:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new resource-scoped “jobs” plugin to @databricks/appkit, following the existing “files” plugin pattern: jobs are discovered from environment variables at startup and exposed via a keyed accessor API, with HTTP routes for triggering and monitoring runs.

Changes:

  • Introduces plugins/jobs (manifest, defaults, params mapping, types, plugin implementation, and extensive tests).
  • Adds connectors/jobs with telemetry + cancellation support and updates exports/docs to surface the new plugin and types.
  • Updates templates and generated API docs/sidebars to include the new Jobs plugin.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
template/appkit.plugins.json Adds “jobs” to plugin template metadata + resource description.
pnpm-lock.yaml Locks new dependency (zod@4.3.6) and related lockfile updates.
packages/appkit/src/plugins/jobs/types.ts Public types for Jobs plugin API/config (JobAPI/JobHandle/IJobsConfig).
packages/appkit/src/plugins/jobs/plugin.ts Core JobsPlugin: env discovery, dynamic resource requirements, API methods, HTTP routes.
packages/appkit/src/plugins/jobs/params.ts TaskType-based param mapping into SDK request fields.
packages/appkit/src/plugins/jobs/defaults.ts Execution defaults for read/write/stream operations.
packages/appkit/src/plugins/jobs/manifest.json Plugin manifest + config schema + baseline resource definition.
packages/appkit/src/plugins/jobs/index.ts Barrel exports for Jobs plugin/types.
packages/appkit/src/plugins/jobs/tests/plugin.test.ts Comprehensive unit tests for discovery, API, routes, and validation.
packages/appkit/src/plugins/index.ts Exposes jobs from the plugins barrel.
packages/appkit/src/index.ts Exposes jobs and related public types/configs from package root.
packages/appkit/src/connectors/jobs/client.ts JobsConnector SDK wrapper with telemetry instrumentation + limit handling.
packages/appkit/src/connectors/jobs/types.ts Connector config type (timeout/telemetry).
packages/appkit/src/connectors/jobs/index.ts Connector barrel exports.
packages/appkit/src/connectors/index.ts Exposes jobs connector from connectors barrel.
packages/appkit/package.json Adds zod dependency for runtime param schemas + JSON schema generation.
docs/docs/api/appkit/typedoc-sidebar.ts Adds new Jobs docs entries to sidebar.
docs/docs/api/appkit/*.md Adds generated API docs pages for Jobs plugin types.
docs/docs/api/appkit/index.md Adds Jobs-related types to API index list.
docs/docs/api/appkit/Interface.BasePluginConfig.md Updates “Extended by” list to include IJobsConfig.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/types.ts
Comment thread packages/appkit/src/connectors/jobs/client.ts
@atilafassina atilafassina changed the title feat: add resource-scoped jobs plugin for Databricks Lakeflow Jobs feat: Lakeflow Jobs Plugin Apr 10, 2026
@pkosiec
Copy link
Copy Markdown
Member

pkosiec commented Apr 13, 2026

How it is different from @keugenek PR here: #221?

If you based on top of Evgenii's work, maybe it's worth to recognize his contribution? What I'm thinking is either:

  • have 2 commits on this PR: Evgenii's base one and yours (with all changes, PR review fixes etc.), and temporarily enable the "Rebase & Merge" option to have 2 commits on main
  • or, merge Evgenii's PR as is, right before this PR (so we'd need to wait for this PR to be approved first)

WDYT?

@atilafassina
Copy link
Copy Markdown
Contributor Author

atilafassina commented Apr 14, 2026

@pkosiec

How it is different from @keugenek PR here: #221?
If you based on top of Evgenii's work, maybe it's worth to recognize his contribution? What I'm thinking is either:

  • have 2 commits on this PR: Evgenii's base one and yours (with all changes, PR review fixes etc.), and temporarily enable the "Rebase & Merge" option to have 2 commits on main
  • or, merge Evgenii's PR as is, right before this PR (so we'd need to wait for this PR to be approved first)

WDYT?

I got stuck trying to update his PR because it was opened from a fork, it's noted in the PR description:

Reopened from #223 — moved head branch from fork to upstream to fix CI secrets access.

So, I couldn't get a clean CI to merge it, and if I merged to main, it would trigger a release.
You'll notice that in the commit history the very first commit is his, that's his work that I iterated on top of.

It's also marked as a comment on the original PR

#223 (comment)

@pkosiec
Copy link
Copy Markdown
Member

pkosiec commented Apr 14, 2026

So, I couldn't get a clean CI to merge it, and if I merged to main, it would trigger a release.

It wouldn't trigger a release currently (finalizing releases is manual) so there shouldn't be a problem with that 👍 but I'd wait until your PR (on top of this one) is ready to merge, and I'd merge both one after another. What do you think?

@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch from 493146a to 791b146 Compare April 14, 2026 15:17
@atilafassina atilafassina requested a review from pkosiec April 14, 2026 15:19
@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch 4 times, most recently from 040f7bd to d8133b9 Compare April 14, 2026 19:10
Copy link
Copy Markdown
Collaborator

@MarioCadenas MarioCadenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments, but I'm actually wondering if this should just be a "jobs" plugin or if we should extend this and make it support both jobs and pipelines tbh 😅

Comment thread packages/appkit/src/plugins/jobs/plugin.ts
Comment thread packages/appkit/src/plugins/jobs/plugin.ts
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/src/plugins/jobs/plugin.ts
Comment thread packages/appkit/src/plugins/jobs/plugin.ts Outdated
Comment thread packages/appkit/package.json Outdated
@MarioCadenas
Copy link
Copy Markdown
Collaborator

a few more things I noticed going through the core package:


1. listRuns cache key probably never hits cache

in plugin.ts, listRuns uses options ?? {} in the cache key:

self._readSettings(["jobs:listRuns", jobKey, options ?? {}])

when options is undefined (which is common), this creates a new {} every time — so if the cache compares by reference, every call is a miss and the 60s TTL does nothing. the other read methods get this right with primitives (["jobs:getRun", jobKey, runId]). can we normalize this to something like ["jobs:listRuns", jobKey, options?.limit ?? "default"]?


2. mapParams silently coerces non-primitives to garbage

in params.ts, the notebook/python_wheel/sql cases do String(v) on every value:

notebook_params: Object.fromEntries(
  Object.entries(params).map(([k, v]) => [k, String(v)]),
),

if someone passes an object or array through (e.g. with a loose zod schema like z.record(z.unknown())), String(v) produces "[object Object]" or "1,2,3" — the job would run with garbage params and no error. can we at least add a guard that rejects non-primitive values here, or coerce more defensively?


3. polling in runAndWait — no backoff or jitter

I know I already asked if we can simplify runAndWait, but specifically the poll interval is fixed at 5s forever. with the default 10min timeout that's 120 API calls per waiter, and if you have multiple SSE clients polling the same run they all hit the Jobs API at the same cadence with no jitter. should we add some basic exponential backoff (e.g. 2s → 5s → 10s → cap at 15s) and a small random jitter so concurrent pollers don't align?

Copy link
Copy Markdown
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments 👍 Nice work!

Comment thread .claude/skills/install-appkit-tarball.md Outdated
Comment thread .claude/skills/install-appkit-tarball.md Outdated
Comment thread packages/appkit/src/plugins/jobs/index.ts
@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch 4 times, most recently from c379239 to b6cd0cf Compare April 21, 2026 12:52
Jobs are configured as named resources (DATABRICKS_JOB_<KEY> env vars)
and discovered at startup, following the files plugin pattern.

API is scoped to configured jobs:
  appkit.jobs('etl').runNow()
  appkit.jobs('etl').runNowAndWait()
  appkit.jobs('etl').lastRun()
  appkit.jobs('etl').listRuns()
  appkit.jobs('etl').asUser(req).runNow()

Single-job shorthand via DATABRICKS_JOB_ID env var.
Supports OBO access via asUser(req).

Co-authored-by: Isaac
Signed-off-by: Evgenii Kniazev <evgenii.kniazev@databricks.com>
@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch from 65bcd39 to e5e2427 Compare April 21, 2026 13:32
@atilafassina
Copy link
Copy Markdown
Contributor Author

I added some comments, but I'm actually wondering if this should just be a "jobs" plugin or if we should extend this and make it support both jobs and pipelines tbh 😅

Adding this here just for historical reference.
We synced on this and decided to ship them as separate plugins initially and observe usage first.

@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch 2 times, most recently from 96c9014 to 7672564 Compare April 22, 2026 14:25
@@ -1,5 +1,6 @@
export * from "./files";
export * from "./genie";
export * from "./jobs";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL if that makes sense 🙏

Code Review Results

Scope: merge-base 422afb3..working tree (42 files, ~5200 lines changed, ~1900 non-test/non-generated)
Intent: Add a Lakeflow Jobs plugin following the resource-scoped pattern with env-var discovery, HTTP endpoints, SSE streaming, Zod parameter validation, OBO support, and a StreamManager abort-on-disconnect fix.
Mode: interactive

Reviewers: correctness, testing, maintainability, project-standards, agent-native-reviewer, learnings-researcher, security, api-contract, reliability, adversarial, kieran-typescript, previous-comments

  • security -- new HTTP endpoints accept user-provided jobKey, params, runId; OBO handling
  • api-contract -- 5 new REST endpoints, new exported TypeScript types
  • reliability -- polling loops, abort signals, timeouts, StreamManager abort fix
  • adversarial -- 1900+ non-test changed lines, external API integration, data mutations
  • kieran-typescript -- substantial TypeScript with async generators, closures, generics
  • previous-comments -- PR has 20 comments and 7 reviews

P1 -- High

# File Issue Reviewer Confidence Route
1 plugins/jobs/tests/plugin.test.ts No test coverage for POST /:jobKey/run?stream=true — entire SSE streaming code path untested testing 0.95 safe_auto -> review-fixer
2 plugins/jobs/plugin.ts:653 GET /runs/:runId returns raw result.data while POST /run, GET /runs, GET /status all use envelope wrappers — inconsistent API surface api-contract 0.85 gated_auto -> human

P2 -- Moderate

# File Issue Reviewer Confidence Route
3 plugins/jobs/plugin.ts:191 Job ID 0 treated as missing: if (!id) falsy check rejects id=0 — should use id === undefined correctness 0.95 safe_auto -> review-fixer
4 plugins/jobs/plugin.ts:520 Double Zod validation: _parseRunParams validates then returns raw params, _validateAndMap re-validates — Zod transforms run twice, non-idempotent transforms produce wrong output adversarial, kieran-typescript 0.88 manual -> downstream-resolver
5 docs/docs/plugins/jobs.md:170 GET /runs/:runId response shape undocumented — other endpoints document returns explicitly api-contract 0.90 safe_auto -> review-fixer
6 plugins/jobs/tests/plugin.test.ts Missing tests for SKIPPED and INTERNAL_ERROR terminal states in runAndWait testing 0.85 safe_auto -> review-fixer
7 plugins/jobs/tests/plugin.test.ts Missing test for runAndWait polling timeout branch (line 337–341) testing 0.80 safe_auto -> review-fixer
8 plugins/jobs/plugin.ts:360 Undefined life_cycle_state causes polling to continue silently until timeout — no log/warning for unrecognized states adversarial, kieran-typescript 0.82 advisory -> human
9 template/appkit.plugins.json:93 Template resource description too verbose per reviewer feedback — pkosiec requested shortening and removing env var details previous-comments 0.75 safe_auto -> review-fixer

P3 -- Low

# File Issue Reviewer Confidence Route
10 plugins/jobs/plugin.ts:110 Case sensitivity mismatch: env discovery lowercases keys but HTTP routing is case-sensitive — /ETL/run returns 404 for job discovered as etl adversarial 0.75 advisory -> human
11 plugins/jobs/plugin.ts:754 return resolveJob as JobsExport uses type cast instead of typed assignment — weakens type checking kieran-typescript 0.73 advisory -> human
12 plugins/jobs/plugin.ts:639 Number.parseInt silently truncates "42abc" to 42 — no strict numeric validation on runId adversarial 0.68 advisory -> human
13 plugins/jobs/plugin.ts:332 Dual timeout: polling loop timeout (waitTimeout) and execute interceptor timeout (JOBS_STREAM_DEFAULTS) are independent, could confuse debugging reliability 0.68 advisory -> human
14 jobs.route.tsx / JobsPage.tsx Duplicated frontend code (~287 lines each) between dev-playground route and template page maintainability 0.82 advisory -> human

Pre-existing Issues

# File Issue Reviewer
1 plugin/interceptors/retry.ts:68 sleep() uses plain setTimeout without abort signal awareness — retry backoff blocks abort propagation reliability

Agent-Native Gaps

  • No getRunOutput HTTP route: JobAPI.getRunOutput() exists programmatically but has no HTTP endpoint — agents using HTTP can't fetch run output
  • Generic 400 error messages: Parameter validation failures return "Invalid job parameters" without Zod error details — agents can't self-correct based on the error
  • No job config discovery endpoint: Available job keys, parameter schemas, and task types are only exposed to the frontend via clientConfig() — no HTTP route for agents to discover what jobs are available

Coverage

  • Suppressed: 3 findings below 0.60 confidence
  • Residual risks: No rate limiting on job endpoints; TOCTOU window between verifyRunScope and actual operation; prototype pollution not explicitly guarded in unvalidated params
  • Testing gaps: No integration test for SSE streaming with client disconnect; no test for Zod transforms with side effects; no multi-client SSE reconnection test
  • Failed reviewers: 0 of 12

Verdict: Ready with fixes

Reasoning: Well-designed plugin that follows established patterns. The security surface is clean (input sanitization, resource scoping, error message redaction). The main concerns are: (1) missing streaming test coverage for a critical code path, (2) a minor API consistency issue with unwrapped GET response, and (3) the double Zod validation pattern that could break with transform schemas. No critical bugs or security vulnerabilities found.

Fix order: #1 streaming test gap → #3 job ID falsy check → #9 template description → #5/#6/#7 remaining test gaps → #2 response envelope (if desired, this is a design decision)

analytics(),
genie(),
files(),
jobs(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did apps init --template (from your PR and did immediate deploy.
Firstly, it failed because it couldn't install npm packages 🤔 not sure why? I deleted the package-lock and the deploy succeeded, but I cannot use my app:


[appkit:plugin] Plugin execution failed {   error: ApiError: Provided OAuth token does not have required scopes: jobs [ReqId: 716c9e46-6921-4438-b856-2366ffa51ae2]       at parseErrorFromResponse (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apierr.js:100:12)       at fn (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/api-client.js:193:69)       at process.processTicksAndRejections (node:internal/process/task_queues:105:5)       at async retry (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/retries/retries.js:95:22)       at async ApiClient.request (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/api-client.js:168:26)       at async JobsService._listRuns (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apis/jobs/api.js:606:17)       at async JobsService.listRuns (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apis/jobs/api.js:620:30)       at async file:///app/python/source_code/node_modules/@databricks/appkit/dist/connectors/jobs/client.js:60:21       at async telemetry.startActiveSpan.name (file:///app/python/source_code/node_modules/@databricks/appkit/dist/connectors/jobs/client.js:83:20)       at async file:///app/python/source_code/node_modules/@databricks/appkit/dist/plugin/interceptors/telemetry.js:29:20 {     errorCode: 403,     statusCode: 403,     response: {       error_code: 403,       message: 'Provided OAuth token does not have required scopes: jobs [ReqId: 716c9e46-6921-4438-b856-2366ffa51ae2]'     },     details: [],     errorInfoType: 'type.googleapis.com/google.rpc.ErrorInfo'   },   plugin: 'jobs' }
--
Apr 23, 2026, 03:28:09 PM GMT+2 | APP | ...77d18cf | [appkit:plugin] Plugin execution failed {   error: ApiError: Provided OAuth token does not have required scopes: jobs [ReqId: 05fc24d4-686e-462f-8469-adc1c2b0da93]       at parseErrorFromResponse (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apierr.js:100:12)       at fn (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/api-client.js:193:69)       at process.processTicksAndRejections (node:internal/process/task_queues:105:5)       at async retry (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/retries/retries.js:95:22)       at async ApiClient.request (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/api-client.js:168:26)       at async JobsService._runNow (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apis/jobs/api.js:822:17)       at async JobsService.runNow (/app/python/source_code/node_modules/@databricks/appkit/node_modules/@databricks/sdk-experimental/dist/apis/jobs/api.js:836:32)       at async telemetry.startActiveSpan.name (file:///app/python/source_code/node_modules/@databricks/appkit/dist/connectors/jobs/client.js:83:20)       at async file:///app/python/source_code/node_modules/@databricks/appkit/dist/plugin/interceptors/telemetry.js:29:20       at async TimeoutInterceptor.intercept (file:///app/python/source_code/node_modules/@databricks/appkit/dist/plugin/interceptors/timeout.js:17:11) {     errorCode: 403,     statusCode: 403,     response: {       error_code: 403,       message: 'Provided OAuth token does not have required scopes: jobs [ReqId: 05fc24d4-686e-462f-8469-adc1c2b0da93]'     },     details: [],     errorInfoType: 'type.googleapis.com/google.rpc.ErrorInfo'   },   plugin: 'jobs' }


Image

Do you know why? Can you reproduce it? Here's my job: https://dogfood.staging.databricks.com/apps-v2/app/pkjobs2/overview?o=6051921418418893

Image

HTTP and SDK surface:
- injectRoutes: POST /:jobKey/run (JSON or SSE via ?stream=true),
  GET /:jobKey/runs, GET /:jobKey/runs/:runId, GET /:jobKey/status,
  DELETE /:jobKey/runs/:runId
- runAndWait streams JobRunStatus updates with abortable polling,
  terminal-state detection, and exponential backoff with jitter
- _parseRunParams eagerly validates request bodies so streaming
  requests get a clean 400 instead of a generic SSE error
- Guard against raw params on jobs with neither a schema nor a taskType
- Execution interceptors wired via _readSettings / _writeSettings;
  stream defaults for SSE
- Tests covering discovery, validation, routes, error propagation, OBO

Cleanup and polish:
- Mark JobsPlugin export @internal (test-only access)
- Shorten manifest job ID field description; env var is already declared
  via the resource field's env key
- Remove install-appkit-tarball skill superseded upstream by
  install-appkit-artifact

Playground + template:
- Buffer incomplete SSE lines across reads so data: events split across
  chunks are not dropped
- Cap stream log at 500 entries to prevent unbounded growth
- Use counter-based keys on each log entry to avoid React key collisions

Co-authored-by: Isaac
Signed-off-by: Atila Fassina <atila@fassina.eu>
@atilafassina atilafassina force-pushed the ekniazev/jobs-plugin-core branch from 7672564 to 2956f21 Compare April 23, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants