Skip to content

Single-Pass Extraction for Archives with TAR Wrapper (tar.xz etc)#206

Open
softworkz wants to merge 2 commits into
ip7z:mainfrom
softworkz:chained_tar_extract
Open

Single-Pass Extraction for Archives with TAR Wrapper (tar.xz etc)#206
softworkz wants to merge 2 commits into
ip7z:mainfrom
softworkz:chained_tar_extract

Conversation

@softworkz
Copy link
Copy Markdown

This seems to be one of the most requested features over decades, and I had finally come to a point where the lack of this capability was so annoying and impacting my productivity that I needed a solution.

Motivation

When extracting those kinds of files (.tar.gz, .tar.xz, .tgz etc.) with 7-zip, it's always a two-step process:

  • extract the .tar from the compresed archive
  • extract the contents of the tar file to a subfolder

This has a variety of disadvantages:

  • You need to do three things (extract + navigate to extracted folder + extract tar) instead of just one
  • It's even worse, when you need to ectract multiple archives next to each otrher (e.g. to compare contents)
    For example, when you need to extract 5 archives
    • With normal archives, it is a single operation: select all 5, right-click the selection and choose "Extract to .*
    • For tar.** files, you need to navigate to each folder and extract again, so it's 1 + 5 * 2 = 11
      => 11 stepts vs. 1 step is a huge difference
  • It wastes disk space
    Obviously, you get the .tar plus the tar contents extracted, so this allocates the (not quite) double amount of disk space
    Unless you clean up the tars manually - but again, that's not something I want to be required to do
  • It creates undesriable folder structures on disk
    Always having an additional folder in the path, makes it harder to keep an overview and you have to deal with longer paths

Attempted Solutions

Console Piping via STD/IN/OUT

This is what has been suggested to use and it's the only way to get a single-pass extraction for those coompound archives.
I have a hot-key configured anyway, for archive extraction in Windows Explorer with 7z, so I tried that out.

It's been easy to get it set up, but from the very first test, it became clear that this is not a viable option at all. I pressed the hot-key and... - nothing happened. I had no idea whether the extraction had started or whether it's still running or whether it's already completed.

=> No visual feedback at all! Not being able to see or know what has happened is a hard show-stopper IMO

Use Piping with the 7z GUI executable (7zG.exe)

The visual feedback (when using from within Windows Explorer for example) is provided by 7zG.exe. It's largely the same functionality as 7z.exe with the addition of GUI feedback for operations (like showing progress, asking about whether to overwrite files, etc.). That's what the Win Explorer integration is calling.

I tried to pipe 7z => 7zG - but it didn't work. That's because Windows applications need to use subsystem=console in order receive data via STDIN, but 7zG is compiled with subsystem=windows.
So I thought to be clever and switched 7zG to subsystem=console, It also needed a few code changes and then I got it working.
It was better than the method above, but still not the result I wanted to achieve, It has the following shortcomings:

  • Now, a console Window was always popping up for every 7z extraction (from Win ExploreR)
    (that's due to the subsystem change to console)
    => Ugly and annoying
  • No progress indication
    When piping from one process into another, the receiver gets a continuous stream - without knowing the length of the input. Hence, it cannot know or calculate its progress and you get a progress dialog showing 100% from start to end
  • Single-File Limitation
    Another major drawback with piping: you can handle a single file at a time only
    (you could span multiple processed of course, but then, for 5 archives to extract, you would get 10 processes and 5 different progress dialogs being shown

Proper Solution: Internal Piping

At this point, it had become clear that the only proper solution is to handle this internally, so I threw away all the above and started fresh. Looking at the code - much to my surprise - I realized that most of the required infrastructure is already in place, making it easier to implement than expected. This PR is the result:

Chained TAR Extraction

This implements single-pass extraction for archives like .tar.gz, .tar.xz, .tgz etc. For command-line use, it adds a new switch -sce.
It unconditionally enables single-pass extraction for Windows Explorer integration (context menus). It's still possible to choose "Open" instead and extract the tar only.
Progress reporting is tied to the outer stream in this case, because the size for the inner stream is not always known up-fraont.

Copy link
Copy Markdown

@gyurix gyurix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful feature, but implementation duplicates tar-wrapper knowledge in Explorer (IsTarCompoundExtension) and in core extraction flow, so support will drift from actual codec support over time.

Main concern: tar-specific binder/thread logic now lives inside generic Extract.cpp with very little proof around fallback/error paths (wrong inner format, partial outer stream, wildcard/subset extract, data-after-end). That is high-risk code for a convenience feature.

Merge readiness: 5/10. Before merge: move wrapper detection closer to codec logic, add focused regression coverage for fallback/error paths, and split UI/context-menu churn from extraction-core changes if possible.

@softworkz
Copy link
Copy Markdown
Author

@gyurix - Thanks a lot for your review!

You're brinding up some good points, most of them clear, only one I'd like to check back about:

The general idea is to have the whole extraction fail in case when the inner extraction fails - without any fallback (like extracting just .tar file). I think this is the better way, because when you specify the new -sce flag or you are using the explorer context menu (which was already showing the extraction folder name), you are expecting - and also relying - on getting the inner contents extracted. When this is not possible, the extraction should error instead of falling back to extracting the intermediary archive.

Do you think it should be otherwise?

@softworkz softworkz force-pushed the chained_tar_extract branch from eb45494 to cbbcbb2 Compare April 17, 2026 05:26
@softworkz
Copy link
Copy Markdown
Author

PR Updated

  • I have moved ChainedExtraction into its own files (ChainedExtract.cpp/.h)
  • ChainedExtraction::TryChainedExtract() is generalized now, no longer specific to tar
  • Added CArcInfoEx::GetWrappedExt() to find a defined additional extension (AddExt)
  • ChainedExtraction no longer uses hard-coded extensions but compares the extension of DefaultName with the result from GetWrappedExt()
  • Regarding the mentioned failure cases, these are properly handled
    • wrong inner format - like file.tar.gz with the .tar containing garbage or anything non-tar
      => inner extraction fails, error propages to outer extraction: whole operation fails
    • partial outer stream
      => if the inner extraction has already succeeded, it still checkst the outer OperationResolt, and in this case, it returns E_FAIL with a message like "Outer stream: Unexpected end of data"
      => if the outer stream fails earlier, its HRESULT is returned directly.
    • wildcard/subset extract
      => this is working properly, selecting files from the inner archive
    • data-after-end
      => If the outer codec reports kDataAfterEnd, SetChainedStreamError catches it and returns E_FAIL with message: "Outer stream: There are some data after the end of the payload data".
  • For the context menu, I see no reasonable way other than to keep the extensions hardcoded, because the extension doesn't have access to the codecs and I think that Explorer shell extensions shouldn't do any executions that could potentially delay the display of the context menu
  • I've also moved the Explorer extension update into a separate commit

This implements single-pass extraction for archives like .tar.gz, .tar.xz, .tgz etc.
and adds a new switch "-sce" for command-line use
This commit unconditionally enables chained extraction for explorer integration (context menus).
It's still possible to choose "Open" instead and extract the tar only.
@softworkz softworkz force-pushed the chained_tar_extract branch from cbbcbb2 to 6128789 Compare April 17, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants