Skip to content

[fix](fe) Disable row-store lazy fetch for shared base columns#62864

Open
englefly wants to merge 1 commit intoapache:masterfrom
englefly:variant-row-store
Open

[fix](fe) Disable row-store lazy fetch for shared base columns#62864
englefly wants to merge 1 commit intoapache:masterfrom
englefly:variant-row-store

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Prevent topn lazy materialization from using row-store fetch when multiple lazy slots share the same base column, such as variant root/subcolumn accesses. This avoids wrong results in the non-two-phase row-store path.

Release note

None

Check List (For Author)

  • Test: No need to test (waiting for manual confirmation)
  • Behavior changed: Yes (lazy materialization now falls back from row-store fetch to column fetch when lazy slots share a base column)
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Prevent topn lazy materialization from using row-store fetch when multiple lazy slots share the same base column, such as variant root/subcolumn accesses. This avoids wrong results in the non-two-phase row-store path.

### Release note

None

### Check List (For Author)

- Test: No need to test (waiting for manual confirmation)
- Behavior changed: Yes (lazy materialization now falls back from row-store fetch to column fetch when lazy slots share a base column)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Copy Markdown
Contributor Author

/review

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 blocking issues.

  1. PhysicalPlanTranslator.canUseRowStoreForLazySlots() still allows row-store fetch for a single lazy variant subcolumn. BE row-store decoding keys only by col_unique_id and ignores column_paths(), so a plan that lazily materializes only kv[ssl] still returns the variant root instead of the subcolumn.
  2. The legacy defer-materialize / two-phase-read path is still unfixed. Changing enableTwoPhaseReadOpt to false only hides the problem for the default session; if a session explicitly enables it and reaches DeferMaterializeTopNResult, OlapTable.generateTwoPhaseReadOption() still turns on row-store fetch without any slot-level safeguard.

Critical checkpoints:

  • Goal / correctness: Partially achieved. The duplicate-base-column case in PhysicalLazyMaterialize is covered, but the row-store correctness hole is not closed end to end.
  • Scope / minimality: The translator change is small, but the session-variable default flip broadens behavior without fixing the legacy path.
  • Concurrency: No new concurrency or locking issues found.
  • Lifecycle / static init: No special lifecycle issues found.
  • Config behavior: Existing config enable_two_phase_read_opt changes default behavior, but the underlying legacy path remains reachable when explicitly enabled.
  • Parallel paths: Not complete. The same safeguard is still missing from the legacy DeferMaterializeTopNResult / generateTwoPhaseReadOption() path.
  • Tests: Added helper-level unit coverage only. Missing coverage for an actual lazy variant subcolumn plan and for the legacy path.
  • Test results / persistence / writes / observability / FE-BE variable passing: No additional issues found here.

User focus: no additional review focus was provided.

List<Boolean> rowStoreFlags = new ArrayList<>();
for (Relation relation : materialize.getRelations()) {
rowStoreFlags.add(shouldUseRowStore(relation));
rowStoreFlags.add(shouldUseRowStore(relation, materialize.getLazySlots(relation)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not sufficient for variant subcolumns. LogicalOlapScan builds kv[ssl] as a SlotReference with the root column unique id plus a non-empty subpath, so canUseRowStoreForLazySlots() still returns true when there is only one such lazy slot. But the BE row-store path (RowIdStorageReader::read_batch_doris_format_row() -> JsonbSerializeUtil::jsonb_to_block()) decodes only by col_unique_id and never consults column_paths(), so it materializes the whole variant root instead of the requested subcolumn. We need to disable row-store whenever any lazy slot has a subpath, not only when multiple slots share the same base column.

// 2. spawn fetch RPC to other nodes to get related data by sorted rowids
@VarAttrDef.VarAttr(name = ENABLE_TWO_PHASE_READ_OPT, fuzzy = true)
public boolean enableTwoPhaseReadOpt = true;
public boolean enableTwoPhaseReadOpt = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default here only hides the bug for the default session; it does not fix the remaining legacy path. If a session explicitly enables enable_two_phase_read_opt and reaches DeferMaterializeTopNResult, visitPhysicalDeferMaterializeResultSink() still uses OlapTable.generateTwoPhaseReadOption(), which unconditionally enables row-store fetch for row-store tables. The same wrong-result case is still reachable there, so this needs the same slot-level guard in the legacy path instead of only a default flip.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 66.67% (8/12) 🎉
Increment coverage report
Complete coverage report

return false;
}
Optional<Column> originalColumn = ((SlotReference) lazySlot).getOriginalColumn();
if (!originalColumn.isPresent() || !originalColumnUniqueIds.add(originalColumn.get().getUniqueId())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's !originalColumnUniqueIds.add(originalColumn.get().getUniqueId()) mean? maybe need add comment to explain it

// 2. spawn fetch RPC to other nodes to get related data by sorted rowids
@VarAttrDef.VarAttr(name = ENABLE_TWO_PHASE_READ_OPT, fuzzy = true)
public boolean enableTwoPhaseReadOpt = true;
public boolean enableTwoPhaseReadOpt = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wahy change the default value?

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 58.33% (7/12) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants