Skip to content

Clarify or fix composite filter boolean semantics #23

@jpetey75

Description

@jpetey75

Problem

The SDK currently exposes filter composition with &, |, and repeated .filter() calls, but the behavior for mixed boolean logic does not preserve the semantics users would reasonably expect from the Python expression.

This came up while reviewing #22. That PR correctly enables multiple rules on the same field for flat same-field filters, but it also made the broader filter API easier to trust for range and cohort workflows. The remaining issue is that mixed AND/OR expressions can silently serialize differently from the expression a data scientist wrote.

Repro cases

Repeated .filter() calls after an OR composite

The query docs say multiple .filter() calls are combined with AND logic:

query = (
    model.query()
    .filter((model.dimensions.status == "active") | (model.dimensions.status == "pending"))
    .filter(model.dimensions.status != "deleted")
)

Expected logical shape:

(status = active OR status = pending) AND status != deleted

Current behavior preserves the existing or aggregation when appending a single filter, producing the equivalent of:

status = active OR status = pending OR status != deleted

The relevant path is Query.filter() when self._filters already exists and the new filter is a DimensionFilter.

Nested boolean expressions flatten precedence

The docs currently show a complex filter example like:

f = (
    (model.dimensions.country == "USA") &
    ((model.dimensions.amount > 1000) | (model.dimensions.priority == "high"))
)

Expected logical shape:

country = USA AND (amount > 1000 OR priority = high)

Current behavior flattens the filters into a single aggregation, so the nested precedence is not represented in the serialized payload.

Why this matters

For exploratory analysis and notebook workflows, users often write cohort filters, date windows, and exclusion rules in Python expressions. If the SDK silently changes OR/AND semantics, a query can return a materially different population without an obvious failure. That is especially risky for data scientists using this SDK to build analysis datasets.

Possible resolutions

  1. Implement nested composite serialization if the Lightdash API supports nested boolean filter groups.
  2. If the API only supports flat and/or groups, reject mixed nested expressions early with a clear error.
  3. Update docs to avoid advertising unsupported complex combinations until the behavior is implemented.
  4. Fix repeated .filter() calls so they are actually AND-ed, or document the exact current behavior if preserving aggregation is intentional.

Acceptance criteria

  • Mixed AND/OR filter expressions either serialize with correct precedence or raise an explicit unsupported-operation error.
  • Multiple .filter() calls always behave as documented, especially when the first filter is an OR composite.
  • Tests cover:
    • (a | b).filter(c) style chaining through Query.filter().
    • a & (b | c) nested expression behavior.
    • The documented complex combination example.
  • docs/SDK_GUIDE.md matches the actual supported filter semantics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions