Single-Pass Extraction for Archives with TAR Wrapper (tar.xz etc)#206
Single-Pass Extraction for Archives with TAR Wrapper (tar.xz etc)#206softworkz wants to merge 2 commits into
Conversation
gyurix
left a comment
There was a problem hiding this comment.
Useful feature, but implementation duplicates tar-wrapper knowledge in Explorer (IsTarCompoundExtension) and in core extraction flow, so support will drift from actual codec support over time.
Main concern: tar-specific binder/thread logic now lives inside generic Extract.cpp with very little proof around fallback/error paths (wrong inner format, partial outer stream, wildcard/subset extract, data-after-end). That is high-risk code for a convenience feature.
Merge readiness: 5/10. Before merge: move wrapper detection closer to codec logic, add focused regression coverage for fallback/error paths, and split UI/context-menu churn from extraction-core changes if possible.
|
@gyurix - Thanks a lot for your review! You're brinding up some good points, most of them clear, only one I'd like to check back about: The general idea is to have the whole extraction fail in case when the inner extraction fails - without any fallback (like extracting just .tar file). I think this is the better way, because when you specify the new Do you think it should be otherwise? |
eb45494 to
cbbcbb2
Compare
PR Updated
|
This implements single-pass extraction for archives like .tar.gz, .tar.xz, .tgz etc. and adds a new switch "-sce" for command-line use
This commit unconditionally enables chained extraction for explorer integration (context menus). It's still possible to choose "Open" instead and extract the tar only.
cbbcbb2 to
6128789
Compare
This seems to be one of the most requested features over decades, and I had finally come to a point where the lack of this capability was so annoying and impacting my productivity that I needed a solution.
Motivation
When extracting those kinds of files (.tar.gz, .tar.xz, .tgz etc.) with 7-zip, it's always a two-step process:
This has a variety of disadvantages:
For example, when you need to extract 5 archives
=> 11 stepts vs. 1 step is a huge difference
Obviously, you get the .tar plus the tar contents extracted, so this allocates the (not quite) double amount of disk space
Unless you clean up the tars manually - but again, that's not something I want to be required to do
Always having an additional folder in the path, makes it harder to keep an overview and you have to deal with longer paths
Attempted Solutions
Console Piping via STD/IN/OUT
This is what has been suggested to use and it's the only way to get a single-pass extraction for those coompound archives.
I have a hot-key configured anyway, for archive extraction in Windows Explorer with 7z, so I tried that out.
It's been easy to get it set up, but from the very first test, it became clear that this is not a viable option at all. I pressed the hot-key and... - nothing happened. I had no idea whether the extraction had started or whether it's still running or whether it's already completed.
=> No visual feedback at all! Not being able to see or know what has happened is a hard show-stopper IMO
Use Piping with the 7z GUI executable (
7zG.exe)The visual feedback (when using from within Windows Explorer for example) is provided by
7zG.exe. It's largely the same functionality as7z.exewith the addition of GUI feedback for operations (like showing progress, asking about whether to overwrite files, etc.). That's what the Win Explorer integration is calling.I tried to pipe 7z => 7zG - but it didn't work. That's because Windows applications need to use subsystem=console in order receive data via STDIN, but 7zG is compiled with subsystem=windows.
So I thought to be clever and switched 7zG to subsystem=console, It also needed a few code changes and then I got it working.
It was better than the method above, but still not the result I wanted to achieve, It has the following shortcomings:
(that's due to the subsystem change to console)
=> Ugly and annoying
When piping from one process into another, the receiver gets a continuous stream - without knowing the length of the input. Hence, it cannot know or calculate its progress and you get a progress dialog showing 100% from start to end
Another major drawback with piping: you can handle a single file at a time only
(you could span multiple processed of course, but then, for 5 archives to extract, you would get 10 processes and 5 different progress dialogs being shown
Proper Solution: Internal Piping
At this point, it had become clear that the only proper solution is to handle this internally, so I threw away all the above and started fresh. Looking at the code - much to my surprise - I realized that most of the required infrastructure is already in place, making it easier to implement than expected. This PR is the result:
Chained TAR Extraction
This implements single-pass extraction for archives like .tar.gz, .tar.xz, .tgz etc. For command-line use, it adds a new switch
-sce.It unconditionally enables single-pass extraction for Windows Explorer integration (context menus). It's still possible to choose "Open" instead and extract the tar only.
Progress reporting is tied to the outer stream in this case, because the size for the inner stream is not always known up-fraont.