Skip to content

HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables#6413

Open
ramitg254 wants to merge 7 commits intoapache:masterfrom
ramitg254:HIVE-29413
Open

HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables#6413
ramitg254 wants to merge 7 commits intoapache:masterfrom
ramitg254:HIVE-29413

Conversation

@ramitg254
Copy link
Copy Markdown
Contributor

@ramitg254 ramitg254 commented Apr 7, 2026

What changes were proposed in this pull request?

added getEffectivePartCols() in most places possible to avoid code duplication.

Why are the changes needed?

getPartCols() does not have support for iceberg tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ci tests and local build

@deniskuzZ
Copy link
Copy Markdown
Member

@ramitg254 please take a look: 9e7535c. I would suggest following similar approach

@ramitg254
Copy link
Copy Markdown
Contributor Author

ramitg254 commented Apr 10, 2026

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

@deniskuzZ
Copy link
Copy Markdown
Member

deniskuzZ commented Apr 10, 2026

9e7535c

but here we are creating separate method getEffectivePartCols() and leaving getPartCols() as it is, which as per our discussion on that closed pr we shouldn't do that, and only go ahead with updating getPartCols()

Where did I say that? The ask was to keep the original method unchanged. same here

@ramitg254
Copy link
Copy Markdown
Contributor Author

ramitg254 commented Apr 10, 2026

oh I got confused due to this comment: #6337 (comment) in which getSupportedPartCols() was just separate method similar to getEffectivePartCols()

@ramitg254
Copy link
Copy Markdown
Contributor Author

ramitg254 commented Apr 10, 2026

I am fine with that earlier approach as well but recently I saw this one: https://issues.apache.org/jira/browse/HIVE-29525 so I thought we should have unified getPartCols() and getCols() which gives similar results as native hive tables as first step towards solving this after that those plan logics can be taken care of later on when that ticket will be addressed.
So I was first focussing on making getPartCols() unified for iceberg tables as well.

please share your thoughts on this idea

}

List<String> partialPvals = MetaStoreUtils.getPvals(tbl.getPartCols(), partialPartSpec);
List<String> partialPvals = MetaStoreUtils.getPvals(tbl.getEffectivePartCols(), partialPartSpec);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

if (tbl.getDataLocation() != null) {
Path partPath = new Path(tbl.getDataLocation(),
Warehouse.makePartName(tbl.getPartCols(),
Warehouse.makePartName(tbl.getEffectivePartCols(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

ArrayList<ColumnInfo> partitionColumns = new ArrayList<ColumnInfo>();
for (FieldSchema part_col : viewTable.getPartCols()) {
colName = part_col.getName();
for (FieldSchema partCol : viewTable.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it's needed here?

List<String> pvals = new ArrayList<String>();
for (FieldSchema field : tbl.getPartCols()) {
List<String> pvals = new ArrayList<>();
for (FieldSchema field : tbl.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for that. non-native use DummyPartition isn't it?

List<String> pvals = new ArrayList<String>();
for (FieldSchema field : table.getPartCols()) {
List<String> pvals = new ArrayList<>();
for (FieldSchema field : table.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

/**
* These fields are all cached fields. The information comes from tTable.
*/
private List<FieldSchema> cachedPartCols;
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. maybe rename to simply partitionCols since it's not actually a cache?
  2. can we reuse ttable? t.setPartitionKeys?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. yes it can be renamed to partitionCols as it was added because for iceberg table getStorageHandler.getPartitionKeys() calls convertToIceberg so too much calls to metastore was made for a given particular running query and too many calls were leading to sometime timed out exception and some other exception due to some outdated conf.
    to avoid that it was added so it is not really a cahe

  2. I think we shouldn't setPartitionKeys for ttable for non native tables as partition evolution and other stuff are supported

return cachedPartCols;
}

private boolean isTableTypeSet() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

f_list.addAll(getCols());
f_list.addAll(getPartCols());
return f_list;
ArrayList<FieldSchema> allCols = new ArrayList<>(getCols());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List. code to interface

return hasNonNativePartitionSupport() ? getStorageHandler().isPartitioned(this) :
CollectionUtils.isNotEmpty(getPartCols());
return hasNonNativePartitionSupport() ? getStorageHandler().isPartitioned(this) :
CollectionUtils.isNotEmpty(getEffectivePartCols());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep getPartCols() here

org.apache.hadoop.hive.metastore.api.Partition tp) {

List<FieldSchema> fsl = getPartCols();
List<FieldSchema> fsl = getEffectivePartCols();
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to change here? tests? does it duplicate IcebergTableUtil.getPartitionSpec?

Table tab = cppCtx.getParseContext().getViewProjectToTableSchema().get(op);
List<FieldSchema> fullFieldList = new ArrayList<FieldSchema>(tab.getCols());
fullFieldList.addAll(tab.getPartCols());
List<FieldSchema> fullFieldList = new ArrayList<>(tab.getAllCols());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to wrap in yet another list


private static List<PrimitiveTypeInfo> extractPartColTypes(Table tab) {
List<FieldSchema> pCols = tab.getPartCols();
List<FieldSchema> pCols = tab.getEffectivePartCols();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that needed? test?

usePartitionColumns(properties, partColNames);
} else {
List<FieldSchema> partCols = table.getPartCols();
List<FieldSchema> partCols = table.getEffectivePartCols();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that needed? test?

}
queryStr.append(',');
appendCols(targetTable.getPartCols(), alias, null, FieldSchema::getName);
appendCols(targetTable.getEffectivePartCols(), alias, null, FieldSchema::getName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need this, it might duplicate the columns

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public void appendAcidSelectColumns(Operation operation) {
queryStr.append("ROW__ID,");
for (FieldSchema fieldSchema : targetTable.getPartCols()) {
for (FieldSchema fieldSchema : targetTable.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's definitely not needed in native

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Override
public List<String> getDeleteValues(Operation operation) {
List<String> deleteValues = new ArrayList<>(1 + targetTable.getPartCols().size());
List<String> deleteValues = new ArrayList<>(1 + targetTable.getEffectivePartCols().size());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//insert into newTableName select * from ts <where partition spec>
StringBuilder rewrittenQueryStr = generateExportQuery(
newTable.getPartCols(), tokRefOrNameExportTable, (ASTNode) tokRefOrNameExportTable.parent, newTableName);
newTable.getEffectivePartCols(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is acid, we don't need to touch it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this beacuse of this #6413 (comment) if you think it can break things then I will switch it back to old one

this.specType = SpecType.STATIC_PARTITION;
this.partitions = partitions;
List<FieldSchema> partCols = this.tableHandle.getPartCols();
List<FieldSchema> partCols = this.tableHandle.getEffectivePartCols();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that needed? test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (isPartitionStats) {
if (partTransformSpec == null) {
for (FieldSchema fs : tbl.getPartCols()) {
for (FieldSchema fs : tbl.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's needed - part columns are already part of col list. tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
// check partitioning column order and types
List<FieldSchema> existingTablePartCols = table.getPartCols();
List<FieldSchema> existingTablePartCols = table.getEffectivePartCols();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have import? test?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.onClause = onClause;
allTargetTableColumns.addAll(targetTable.getCols());
allTargetTableColumns.addAll(targetTable.getPartCols());
allTargetTableColumns.addAll(targetTable.getEffectivePartCols());
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need to change this + we can simplify allTargetTableColumns.addAll(targetTable.getAllCols()

private static int calculatePartPrefix(Table tbl, Set<String> partSpecKeys) {
int partPrefixToDrop = 0;
for (FieldSchema fs : tbl.getPartCols()) {
for (FieldSchema fs : tbl.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any tests covering this for iceberg?

Copy link
Copy Markdown
Contributor Author

@ramitg254 ramitg254 Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware about that did this because of :#6413 (comment)

} else {
// partition spec is not specified but column schema can have partitions specified
for(FieldSchema f : targetTable.getPartCols()) {
for(FieldSchema f : targetTable.getEffectivePartCols()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List<String> cols = new ArrayList<String>();
if (qbp.getAnalyzeRewrite() != null) {
List<FieldSchema> partitionCols = tab.getPartCols();
List<FieldSchema> partitionCols = tab.getEffectivePartCols();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't even enter here, see if above - !tab.hasNonNativePartitionSupport()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
} else {
partColSchema.addAll(tbl.getPartCols());
partColSchema.addAll(tbl.getEffectivePartCols());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ
Copy link
Copy Markdown
Member

deniskuzZ commented Apr 24, 2026

so many getPartCols to getEffectivePartCols changes make we wonder if we even need getEffectivePartCols. maybe we just need to drop partitionCols list from getCols() ?
cc @kasakrisz

@ramitg254
Copy link
Copy Markdown
Contributor Author

ramitg254 commented Apr 25, 2026

@deniskuzZ I was updating getPartCols() with getEffectivePartCols() to moste places as we should eventually move to this generic common method.
the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well.
Since updating getCols() will cause many changes and we should take care of that in some separate ticket where it will easy to replace those left places of getPartCols().
So as of now switching to the newer method wherever it is not breaking any test and later when getPartCols() isn't needed after updation of logic of getCols() the getEffectivePartCols() can be renamed to getPartCols() and everything will come down to single method

@deniskuzZ
Copy link
Copy Markdown
Member

@deniskuzZ I was updating getPartCols() with getEffectivePartCols() to moste places as we should eventually move to this generic common method. the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well. Since updating getCols() will cause many changes and we should take care of that in some separate ticket where it will easy to replace those left places of getPartCols(). So as of now switching to the newer method wherever it is not breaking any test and later when getPartCols() isn't needed after updation of logic of getCols() the getEffectivePartCols() can be renamed to getPartCols() and everything will come down to single method

@ramitg254 i like the idea of having a single getPartCols() method.

the only places I left getPartCols() are those where logic is broken for iceberg tables with respect to getCols() giving partition columns as well.

Since you've already identified them, why not apply the getCols() patch by stripping partition columns in the same PR and reuse getPartCols() everywhere?

@ramitg254
Copy link
Copy Markdown
Contributor Author

ramitg254 commented Apr 25, 2026

I was planning to but updating getCols() will alone cause test failures for all q files whichever has describe command for iceberg tables and also query plans will itself get affected as stats logic current take this getCols() into account and there are around 90+ occurences of it in code so it will lead to breakage as well so I thought it will be better if we take care of it as a separate change

@deniskuzZ
Copy link
Copy Markdown
Member

I was planning to but updating getCols() will alone cause test failures for all q files whichever has describe command for iceberg tables and also query plans will itself get affected as stats logic current take this getCols() into account and there are around 90+ occurences of it in code so it will lead to breakage as well so I thought it will be better if we take care of it as a separate change

I guess that was the main intent — to integrate Iceberg partition handling into the existing code with minimal workarounds/code duplication.

Maybe I’m missing something, but, unfortunately, I don’t see much value in the current state of PR, sorry.
It doesn’t seem to enable any missing partition optimizations (there are no q-test changes), including the one mentioned above in HIVE-29525, and instead appears to be more of a partial refactor.

Let’s see what Krisztian thinks about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants