HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables by ramitg254 · Pull Request #6413 · apache/hive

ramitg254 · 2026-04-07T12:51:34Z

What changes were proposed in this pull request?

added getEffectivePartCols() in most places possible to avoid code duplication.

Why are the changes needed?

getPartCols() does not have support for iceberg tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ci tests and local build

deniskuzZ · 2026-04-09T20:12:19Z

@ramitg254 please take a look: 9e7535c. I would suggest following similar approach

ramitg254 · 2026-04-10T05:36:07Z

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

deniskuzZ · 2026-04-10T06:50:49Z

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

Where did I say that? The ask was to keep the original method unchanged. same here

ramitg254 · 2026-04-10T07:08:55Z

oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols()

ramitg254 · 2026-04-10T07:18:52Z

I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed.
So I was first focussing on making getPartCols() unified for iceberg tables as well.

please share your thoughts on this idea

deniskuzZ · 2026-04-24T17:16:14Z

    }

-    List<String> partialPvals = MetaStoreUtils.getPvals(tbl.getPartCols(), partialPartSpec);
+    List<String> partialPvals = MetaStoreUtils.getPvals(tbl.getEffectivePartCols(), partialPartSpec);


deniskuzZ · 2026-04-24T17:16:55Z

            if (tbl.getDataLocation() != null) {
              Path partPath = new Path(tbl.getDataLocation(),
-                  Warehouse.makePartName(tbl.getPartCols(),
+                  Warehouse.makePartName(tbl.getEffectivePartCols(),


deniskuzZ · 2026-04-24T17:17:59Z

    ArrayList<ColumnInfo> partitionColumns = new ArrayList<ColumnInfo>();
-    for (FieldSchema part_col : viewTable.getPartCols()) {
-      colName = part_col.getName();
+    for (FieldSchema partCol : viewTable.getEffectivePartCols()) {


why it's needed here?

deniskuzZ · 2026-04-24T17:19:27Z

-    List<String> pvals = new ArrayList<String>();
-    for (FieldSchema field : tbl.getPartCols()) {
+    List<String> pvals = new ArrayList<>();
+    for (FieldSchema field : tbl.getEffectivePartCols()) {


do we have tests for that. non-native use DummyPartition isn't it?

deniskuzZ · 2026-04-24T17:20:40Z

-    List<String> pvals = new ArrayList<String>();
-    for (FieldSchema field : table.getPartCols()) {
+    List<String> pvals = new ArrayList<>();
+    for (FieldSchema field : table.getEffectivePartCols()) {


deniskuzZ · 2026-04-24T17:21:54Z

  /**
   * These fields are all cached fields.  The information comes from tTable.
   */
+  private List<FieldSchema> cachedPartCols;


maybe rename to simply partitionCols since it's not actually a cache?

can we reuse ttable? t.setPartitionKeys?

yes it can be renamed to partitionCols as it was added because for iceberg table getStorageHandler.getPartitionKeys() calls convertToIceberg so too much calls to metastore was made for a given particular running query and too many calls were leading to sometime timed out exception and some other exception due to some outdated conf.
to avoid that it was added so it is not really a cahe

I think we shouldn't setPartitionKeys for ttable for non native tables as partition evolution and other stuff are supported

deniskuzZ · 2026-04-24T17:29:29Z

+    return cachedPartCols;
+  }
+
+  private boolean isTableTypeSet() {


why do we need this?

deniskuzZ · 2026-04-24T17:30:21Z

-    f_list.addAll(getCols());
-    f_list.addAll(getPartCols());
-    return f_list;
+    ArrayList<FieldSchema> allCols = new ArrayList<>(getCols());


List. code to interface

deniskuzZ · 2026-04-24T17:30:53Z

-    return hasNonNativePartitionSupport() ? getStorageHandler().isPartitioned(this) : 
-        CollectionUtils.isNotEmpty(getPartCols());
+    return hasNonNativePartitionSupport() ? getStorageHandler().isPartitioned(this) :
+        CollectionUtils.isNotEmpty(getEffectivePartCols());


keep getPartCols() here

deniskuzZ · 2026-04-24T17:31:38Z

      org.apache.hadoop.hive.metastore.api.Partition tp) {

-    List<FieldSchema> fsl = getPartCols();
+    List<FieldSchema> fsl = getEffectivePartCols();


do we need to change here? tests? does it duplicate IcebergTableUtil.getPartitionSpec?

deniskuzZ · 2026-04-24T17:35:39Z

          Table tab = cppCtx.getParseContext().getViewProjectToTableSchema().get(op);
-          List<FieldSchema> fullFieldList = new ArrayList<FieldSchema>(tab.getCols());
-          fullFieldList.addAll(tab.getPartCols());
+          List<FieldSchema> fullFieldList = new ArrayList<>(tab.getAllCols());


no need to wrap in yet another list

deniskuzZ · 2026-04-24T17:35:58Z


  private static List<PrimitiveTypeInfo> extractPartColTypes(Table tab) {
-    List<FieldSchema> pCols = tab.getPartCols();
+    List<FieldSchema> pCols = tab.getEffectivePartCols();


is that needed? test?

deniskuzZ · 2026-04-24T17:36:41Z

      usePartitionColumns(properties, partColNames);
    } else {
-      List<FieldSchema> partCols = table.getPartCols();
+      List<FieldSchema> partCols = table.getEffectivePartCols();


is that needed? test?

deniskuzZ · 2026-04-24T17:37:22Z

    }
    queryStr.append(',');
-    appendCols(targetTable.getPartCols(), alias, null, FieldSchema::getName);
+    appendCols(targetTable.getEffectivePartCols(), alias, null, FieldSchema::getName);


i don't think we need this, it might duplicate the columns

#6413 (comment)

deniskuzZ · 2026-04-24T17:37:47Z

  public void appendAcidSelectColumns(Operation operation) {
    queryStr.append("ROW__ID,");
-    for (FieldSchema fieldSchema : targetTable.getPartCols()) {
+    for (FieldSchema fieldSchema : targetTable.getEffectivePartCols()) {


it's definitely not needed in native

#6413 (comment)

deniskuzZ · 2026-04-24T17:37:57Z

  @Override
  public List<String> getDeleteValues(Operation operation) {
-    List<String> deleteValues = new ArrayList<>(1 + targetTable.getPartCols().size());
+    List<String> deleteValues = new ArrayList<>(1 + targetTable.getEffectivePartCols().size());


#6413 (comment)

deniskuzZ · 2026-04-24T17:38:36Z

    //insert into newTableName select * from ts <where partition spec>
    StringBuilder rewrittenQueryStr = generateExportQuery(
-            newTable.getPartCols(), tokRefOrNameExportTable, (ASTNode) tokRefOrNameExportTable.parent, newTableName);
+        newTable.getEffectivePartCols(),


this is acid, we don't need to touch it

did this beacuse of this #6413 (comment) if you think it can break things then I will switch it back to old one

deniskuzZ · 2026-04-24T17:39:16Z

        this.specType = SpecType.STATIC_PARTITION;
        this.partitions = partitions;
-        List<FieldSchema> partCols = this.tableHandle.getPartCols();
+        List<FieldSchema> partCols = this.tableHandle.getEffectivePartCols();


is that needed? test?

#6413 (comment)

deniskuzZ · 2026-04-24T17:41:33Z

    if (isPartitionStats) {
      if (partTransformSpec == null) {
-        for (FieldSchema fs : tbl.getPartCols()) {
+        for (FieldSchema fs : tbl.getEffectivePartCols()) {


i don't think it's needed - part columns are already part of col list. tests?

#6413 (comment)

deniskuzZ · 2026-04-24T17:42:01Z

    {
      // check partitioning column order and types
-      List<FieldSchema> existingTablePartCols = table.getPartCols();
+      List<FieldSchema> existingTablePartCols = table.getEffectivePartCols();


do we have import? test?

#6413 (comment)

deniskuzZ · 2026-04-24T17:44:00Z

      this.onClause = onClause;
      allTargetTableColumns.addAll(targetTable.getCols());
-      allTargetTableColumns.addAll(targetTable.getPartCols());
+      allTargetTableColumns.addAll(targetTable.getEffectivePartCols());


i don't think we need to change this + we can simplify allTargetTableColumns.addAll(targetTable.getAllCols()

deniskuzZ · 2026-04-24T17:46:43Z

  private static int calculatePartPrefix(Table tbl, Set<String> partSpecKeys) {
    int partPrefixToDrop = 0;
-    for (FieldSchema fs : tbl.getPartCols()) {
+    for (FieldSchema fs : tbl.getEffectivePartCols()) {


any tests covering this for iceberg?

I am not aware about that did this because of :#6413 (comment)

deniskuzZ · 2026-04-24T17:47:38Z

          } else  {
            // partition spec is not specified but column schema can have partitions specified
-            for(FieldSchema f : targetTable.getPartCols()) {
+            for(FieldSchema f : targetTable.getEffectivePartCols()) {


do we really need this? tests?

#6413 (comment)

deniskuzZ · 2026-04-24T17:48:59Z

      List<String> cols = new ArrayList<String>();
      if (qbp.getAnalyzeRewrite() != null) {
-        List<FieldSchema> partitionCols = tab.getPartCols();
+        List<FieldSchema> partitionCols = tab.getEffectivePartCols();


we don't even enter here, see if above - !tab.hasNonNativePartitionSupport()

#6413 (comment)

deniskuzZ · 2026-04-24T17:49:55Z

            }
          } else {
-            partColSchema.addAll(tbl.getPartCols());
+            partColSchema.addAll(tbl.getEffectivePartCols());


is this needed? tests?

#6413 (comment)

deniskuzZ · 2026-04-24T17:53:28Z

so many getPartCols to getEffectivePartCols changes make we wonder if we even need getEffectivePartCols. maybe we just need to drop partitionCols list from getCols() ?
cc @kasakrisz

ramitg254 · 2026-04-25T06:57:41Z

@deniskuzZ I was updating getPartCols() with getEffectivePartCols() to moste places as we should eventually move to this generic common method.
the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well.
Since updating getCols() will cause many changes and we should take care of that in some separate ticket where it will easy to replace those left places of getPartCols().
So as of now switching to the newer method wherever it is not breaking any test and later when getPartCols() isn't needed after updation of logic of getCols() the getEffectivePartCols() can be renamed to getPartCols() and everything will come down to single method

deniskuzZ · 2026-04-25T08:37:37Z

@deniskuzZ I was updating getPartCols() with getEffectivePartCols() to moste places as we should eventually move to this generic common method. the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well. Since updating getCols() will cause many changes and we should take care of that in some separate ticket where it will easy to replace those left places of getPartCols(). So as of now switching to the newer method wherever it is not breaking any test and later when getPartCols() isn't needed after updation of logic of getCols() the getEffectivePartCols() can be renamed to getPartCols() and everything will come down to single method

@ramitg254 i like the idea of having a single getPartCols() method.

the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well.

Since you've already identified them, why not apply the getCols() patch by stripping partition columns in the same PR and reuse getPartCols() everywhere?

ramitg254 · 2026-04-25T12:15:04Z

I was planning to but updating getCols() will alone cause test failures for all q files whichever has describe command for iceberg tables and also query plans will itself get affected as stats logic current take this getCols() into account and there are around 90+ occurences of it in code so it will lead to breakage as well so I thought it will be better if we take care of it as a separate change

deniskuzZ · 2026-04-25T13:13:00Z

I was planning to but updating getCols() will alone cause test failures for all q files whichever has describe command for iceberg tables and also query plans will itself get affected as stats logic current take this getCols() into account and there are around 90+ occurences of it in code so it will lead to breakage as well so I thought it will be better if we take care of it as a separate change

I guess that was the main intent — to integrate Iceberg partition handling into the existing code with minimal workarounds/code duplication.

Maybe I’m missing something, but, unfortunately, I don’t see much value in the current state of PR, sorry.
It doesn’t seem to enable any missing partition optimizations (there are no q-test changes), including the one mentioned above in HIVE-29525, and instead appears to be more of a partial refactor.

Let’s see what Krisztian thinks about it.

asf-ci-hive added tests pending tests unstable and removed tests pending labels Apr 7, 2026

ramitg254 force-pushed the HIVE-29413 branch from 0d2baee to d97e174 Compare April 8, 2026 13:31

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 8, 2026

ramitg254 force-pushed the HIVE-29413 branch from d97e174 to 9e87b12 Compare April 8, 2026 18:21

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 8, 2026

ramitg254 force-pushed the HIVE-29413 branch from 9e87b12 to 565a2eb Compare April 9, 2026 10:03

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Apr 9, 2026

ramitg254 mentioned this pull request Apr 9, 2026

HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables #6337

Closed

asf-ci-hive added tests pending and removed tests unstable labels Apr 9, 2026

asf-ci-hive added tests unstable and removed tests pending labels Apr 9, 2026

asf-ci-hive added tests pending and removed tests unstable labels Apr 10, 2026

deniskuzZ reviewed Apr 24, 2026

View reviewed changes

Conversation

ramitg254 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

deniskuzZ commented Apr 9, 2026

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramitg254 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramitg254 commented Apr 7, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

deniskuzZ commented Apr 10, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

ramitg254 commented Apr 10, 2026 •

edited

Loading

deniskuzZ Apr 24, 2026 •

edited

Loading

deniskuzZ Apr 24, 2026 •

edited

Loading

deniskuzZ Apr 24, 2026 •

edited

Loading

ramitg254 Apr 25, 2026 •

edited

Loading

deniskuzZ commented Apr 24, 2026 •

edited

Loading

ramitg254 commented Apr 25, 2026 •

edited

Loading