This is simple CLI utility that converts tar files to zip files. Uncompressed zip files are better for random access than tar archives, especially compared to compressed tar archives. This makes uncompressed zipfiles very useful for distributed filesystems where FileIO is usually a limiting factor, like what is common on HPC systems.
It aims for two main features
- Environment agnostic. For a given tar file the created zip file should be identical even when run on different systems.
- Streaming. The unpacked archive should not need to be stored in full as intermediate step (either on disk or memory).
There are some additional features from using Python standard libraries:
- Automatic detection of tar compression
- Python as only hard dependency
There are a number of alternative solutions, but they typically only fulfill one of the two main features.
- Bash:
tar -xf <input tar> && tar -tf <input tar> | zip -X -D -0 -@ <output zip> - https://github.com/JULIELab/tar2zip
- https://github.com/takanoriyanagitani/rs-tars2zips
- Get the source (git clone or download archive and unpack)
pip install .
tar2zip --help- Only deals with regular files in the tar archive. Empty directories are ignored and symlinks throws an error.
- Duplicate files in tar archives throws an error.