How to make AppleArchive + ZLIB compatible with non-Apple systems?

I very much love the performance of AppleArchive and how approachable it is, and believe it to be one of the most underrated frameworks in the SDK. In a scenario quite typical, I need to compress files and submit them to a back end, where the server handling the files is not an Apple platform. Obviously, individual files compressed with AA will not be compatible with other systems out of the box, but there are compatible compression algorithms.

ZLIB is recommended for cases where cross-platform compatibility is necessary. As I understand it, AA adds additional headers to files in order to support preservation of file attributes, ownership and other data. Following the steps outlined in the docs, I've written code to compress single files. I can easily compress and decompress using AA without issue.

To create a proof-of-concept, I've written some code in python using its zlib module. In order to get to the compressed data, it's necessary to handle the AA header fields. The first 64 bytes of a compressed file appear as follows:

AA documentation states that ZLIB Level 5 compression is used, and comes in the form of raw DEFLATE data prefixed with two header bytes. In this case, these bytes are 78 5e, which begin at the 28th byte and appear as x^ above. My hope was that seeking to the start of the compressed data, then passing what remains to a decompressor object initialized with the correct WBITS would work.

It works fantastically for files 1MB or less in size. Files which are larger only decompress the first megabyte. The decompressor object is reaching EOF, and I've tried various ways of attempting to seek to and concatenate the other blocks, but to no avail.

Using the older Compression framework and the method specified here, with the same algorithm, yields different results. I can decompress files of any size using python's zlib module. My assumption is that AppleArchive is doing something differently in order to support its multithreading capabilities, perhaps even with asymmetric encoding where the blocks are not ordered.

Is there a solution to this problem? If not, why would one ever use ZLIB versus the much more efficient LZFSE? I could use the older Compression API, but it is significantly slower compressing synchronously, and performance is critical with the application I am adding this feature to.

Replies

Apple Archive is an archive format, akin to .zip or .tar. Using it with a non-Apple back-end service is going to be hard because there are no common third-party libraries for that format (IIRC the format is not documented for third-party use).

Within the archive there are streams of compressed data. The format support various compression algorithms, and at least some of them are standardised.

Apple Archive doesn’t let you get at those streams of data, and that’s because it’s not working at that level. Indeed, Apple Archive uses Compression framework for that sort of thing.

You’re right that Compression framework is single threaded. That’s because an individual data stream compression operation is fundamentally serialised. If you want to compress in parallel, you need to split the input up into chunks, compress each chunk individually, and then join those chunks somehow. That’s what Apple Archive is doing, using its own archive format.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • @eskimo Thanks for the reply! I understand this much, and I guess what I'm asking is whether it's possible to work with these data streams. If it's not feasible, why is ZLIB an option with AA? The use case would seem to be exclusively about compatibility. LZFSE has an equivalent ratio while being much quicker. I could certainly create my own async streams with Compression, like you mentioned, but I'm attempting to exhaust this route first.

  • I suppose one could use zlib in a different context, compressing strings or otherwise using something other than whole individual archive containers. Oh well - I know this falls into the "unsupported" bucket of dumb questions, but being that AA is part of Accelerate I've wondered whether there is some more magic happening than simply using GCD and Compression to split up stream work items. At the end of the day, I have a thousand other priorities, but my curiosity will remain.

Add a Comment