1278 recursive databricks and spark checks and custom queries#1311
Draft
rob-h-w wants to merge 4 commits into
Draft
1278 recursive databricks and spark checks and custom queries#1311rob-h-w wants to merge 4 commits into
rob-h-w wants to merge 4 commits into
Conversation
To the Contribution>Troubleshooting section. Because I ran into that issue after checking out.
This is the beginning of an attempt to support ODCS' nested constraint and quality definition capabilities in `test` subcommands, where the underlying technology supports it. See [the relevant issue](datacontract#1278). Add recursive traversal of nested struct and array-of-struct fields when generating ibis quality checks, scoped to verified backends only. Check generation (create_checks.py): - Add _iter_property_paths() to recursively yield (model, field_path, prop, is_nested) tuples for nested struct fields and array item models - Struct recursion enabled for dataframe and databricks; array recursion for dataframe only - Nested SQL quality checks emit MetricType.UNSUPPORTED with a warning preset on all other backends - Use get_server_type() instead of server.type so imported DCS contracts with type="custom" are resolved correctly Check execution (ibis_check_execute.py): - Add _resolve_expr() / _resolve_nested_expr() for dotted-path ibis expressions - Add _resolve_dtype() / _field_present() for nested schema introspection - Update _run_present() to reuse the already-resolved model schema rather than re-fetching the table, fixing a case-sensitivity failure on Oracle - Update _run_type(), _run_freshness(), _run_duplicate(), _missing_expr(), _valid_expr(), _invalid_expr(), _samples_for() to accept resolved expressions instead of bare column names Spark temp view materialisation (kafka.py, connect.py): - Add add_spark_nested_views() to create {model}__{field} Spark temp views for nested struct fields and exploded array-of-struct items - Call add_spark_nested_views_for_contract() in the dataframe and Databricks-via-Spark connection paths before creating the ibis pyspark backend Tests: - tests/fixtures/dataframe/datacontract_nested.yaml: nested struct + array fixture - tests/test_create_checks_nested.py: unit tests for recursive generation and backend gating - tests/test_ibis_check_execute.py: regression tests for Oracle-style presence check without extra table lookup - tests/test_test_dataframe.py: Spark integration pass/fail for nested struct, nested SQL quality, and array-item checks - tests/test_test_databricks.py: unit test confirming nested struct SQL enabled and array recursion suppressed for Databricks Created with Claude Sonnet 4.6.
…atacontract#1278) Implement support for recursive nested struct and array checks on Databricks, with zero-permission requirements (SELECT-only). This enables data contract validation on read-only SQL warehouses without requiring CREATE VOLUME or CREATE TABLE permissions. Changes: - New module `databricks_nested_models.py`: CTE-based virtual model generation for array item checks. Uses `LATERAL VIEW OUTER explode_outer()` to expose nested array elements as queryable tables without creating real volumes. - Modified `_connect_databricks()` in `connect.py`: Introduced `_NoVolumeBackend` subclass that overrides `_post_connect()` with a no-op, bypassing ibis' default `CREATE VOLUME IF NOT EXISTS` call. Connection succeeds on read-only warehouses. - Updated `connect_ibis()` Databricks branch: Builds and attaches virtual model CTE queries to the backend connection for downstream table resolution. - Enabled array recursion for Databricks: Added "databricks" to `_SUPPORTED_NESTED_ARRAY_SERVER_TYPES` in `create_checks.py`, matching feature parity with Dataframe backend. - Enhanced `_resolve_table()` in `ibis_check_execute.py`: Falls back to virtual model CTE queries before attempting list_tables(), allowing nested array models (e.g., `orders__items`) to resolve via pre-built WITH clauses. - Test updates: Rewrote Databricks auth tests to patch the correct backend method, added `test_no_create_volume_on_connect` to verify volume creation is skipped, flipped nested array expectations to enable checks on array items. - New test file `test_connect_databricks_virtual_models.py`: Unit tests for CTE query generation and schema filtering logic. Result: 52 real-world data contract checks now pass against Databricks without any CREATE/WRITE operations. Recursive struct checks (dotted paths) and recursive array item checks (CTE virtual models) both fully supported.
Per the PR template.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supports recursive databricks & spark checks and custom queries, replaces the ibis implementation that requires write permissions on connection until a new release of ibis stops doing that.
uv run pytest)uv run ruff check --fix && uv run ruff format)I don't see a
README.mdupdate that's in scope.