Is a singular ZIP File heavier than separated files?

Meow-Misfit@lemmy.world · 1 day ago

Is a singular ZIP File heavier than separated files?

molave@reddthat.com · edit-2 1 day ago

I don’t know the details, but in principle, the zip compression process tries to identify the textual commonalities between the pages. The more commonalities the 10 pages have, the smaller the zip file will be.

If each page is textually very different, (example, Page 1 is “AB” , Page 2 is “CD”, etc.), it’s possible that the zip file will be larger. It’s because it will contain the full contents of each page, plus the metadata of the zip file.

Anyone more knowledgeable can correct me on this.

Björn@swg-empire.de · 1 day ago

This depends on the file format used for the pages. If it’s plain txt zipping them will greatly decrease the file size. If you just scanned the pages and have them as jpg, png or pdf zipping them will not greatly decrease the file size. The size might still decrease a little bit or increase a little bit.

blackbrook@mander.xyz · 1 day ago

Wouldn’t trying it out and seeing how much it saved be about the same amount of work as typing in this question?

GreyEyedGhost@piefed.ca · 2 hours ago

A single point of data rarely answers the question unless you’re looking for absolutes. “Will zipping 10 files individually be smaller than zipping them into a single file?” Sure, easy enough to do it once. Now, what kind of data are we talking about? How big, and how random, is the data in those files? Does it get better with more files, or is the a sweet spot where it’s better, but it’s worse if you use too few files, or too many? I don’t think you could test for those scenarios very quickly, and they all fall under the original question. OTOH, someone who has studied the subject could probably give you an answer easily enough in just a few minutes. Or he could have tried a web search and find the answer, which pretty much comes down to, “It depends which compression system you use.”

ℕ𝕖𝕞𝕠@slrpnk.net · 1 day ago

The larger the file, the more patterns that can be compressed. Plus some of the dictionary overhead will be duplicated in multiple files.

The single file should be, at most, no larger than the sum of smaller files. But potentially much smaller.

grue@lemmy.world · 1 day ago

Taking less storage is almost the entire point of a zip file. It only takes more space than the original files in pathological cases (e.g. maybe if you’re trying to compress already-compressed data, like a video file).

over_clox@lemmy.world · 1 day ago

If the compressed files are of the same/similar format, more compression is possible as the algorithm can detect more related patterns to compress.

But if you toss in a variety of file formats, compression will tend to suffer more.

Sometimes, the easiest way is just to try and see, different formats lend themselves to better or worse compression.

The files that tend to be worst at compression are the ones that are already compressed themselves.

Scott@lem.free.as · edit-2 1 day ago

I’m not quite sure what you’re asking.

ZIP, by default, is a compression tool. It takes multiple files, creates an index of the files within and then performs compression on all the files combined (to allow for a better dictionary). The index and dictionary are “overhead” that exists for each ZIP file.

Sending multiple files, uncompressed, or sending multiple ZIP files (one for each file) will almost certainly be less efficient.

Jimmycrackcrack@lemmy.ml · 34 minutes ago

It’s an interesting question though because I guess if the files being zipped are already nearly or maximally compressed then I’d assume that the zip of all those files actually was in some extremely tiny negligible way, actually slightly larger than those files on their own.

Meow-Misfit@lemmy.world · 1 day ago

Example: Book A I have all 10 pages of it in a jpg each.

Let’s say the size of all these 10 pages togheter is 300MB (not tech savy, don’t know if this is realistic).

If I put them on a zip, will the size be smaller? Like, reduce to 250MB or something?

klangcola@reddthat.com · 17 hours ago

For jpg’s, no they will not get smaller. Maybe even a smidge bigger if you zip them. Usually not enough to make a practical difference.

Zip does generic lossless compression, meaning it can be extracted for a bit-perfect copy of the original. Very simplified it works by finding patterns repeating and replacing a long pattern with a short key, and storing an index to replace the keys with the original pattern on extraction.

Jpg’s use lossy compression, meaning some detail is lost and can never be reproduced. Jpg is highly optimized to only drop details that don’t matter much for human perception of the image.

Since jpg is already compressed, there will not be any repeating patterns (duplicate information) for the zip algorithm to find.

slazer2au@lemmy.world · 1 day ago

For images it may be better but images are already compressed so there may not be a large saving in zipping them.

Alternative options would be to use more storage efficient formats like webp for instance.