Since (and indeed before) the days when the Apple II and Commodore 64 sat on our desks and we wrote software to blank cassettes, computer users the world over have faced a universal conundrum … sacrificing file size without sacrificing quality or quantity.
The computing industry has a funny way of outdoing itself over and over again — in resources available and users’ ability to quickly use them up and hunger for more — smaller, quicker or just plain better.
So as Macs have grown with us, we’ve reached (and outgrown) their limitations. The floppy disk has truly gone by the wayside (prematurely, in some users’ opinions), and the Zip disk — once the envy of big file owners everywhere — is looking a bit like the Y2K bug now that DVD-writable drives are becoming standard.
But as we continue to demand the most out of our transport media, and more so as the Internet continues to make file transfer an increasingly important part of life, there’ll always be a need for one of the oldest technologies around — file compression.
From the shareware font you download off the web to the new Adobe or Microsoft software you install, file compression is everywhere. The proof of its reach into a wired society is the standards covered by the biggest players. Aladdin Systems’ Stuffit Expander supports 22 formats across four platforms, and between it and it’s PC equivalent — Winzip — managing data compression is so seamless the two platforms might as well be identical.
But compression is a lot more than stuffing or zipping files so they don’t take all night to upload.
It’s the cornerstone for many of the file formats we take for granted, from JPG and GIF to MPG, MP3 and Quicktime (originally intended as a compression standard, Apple’s multi-talented media player has since evolved into a multimedia file format).
And as the Mac moves into a Unix-driven and inspired world with OS X — using system architecture that one Apple spokesperson described as ‘not just an update, but a completely new step’, will more formats come into the fray?
The Standards
There is a sea of freeware, shareware and full-service compression management programs in the market for every platform imaginable. Aladdin more or less has the compression world stitched up for Mac, while Stuffit Expander, along with other free or popular utilities like Compact Pro, Diskdoubler, Zipit, Openup, SmartZip, MacRAR, GIF Blast, SunTar and LHA Expander can decompress all the major formats.
Virtually the same situation exists on Windows, Linux and Unix. Winzip (by virtue of being a Windows utility) is the world’s most used file expander, while hundreds of freebies proliferate. For Unix, the old standard of compress has been overshadowed by gzip, whose main advantages are (as stated on its homepage) ‘much better compression and freedom from patented algorithms’.
Mac OS X
So how are we going to be affected by such a radical shift in system architecture, powered by completely different underlying technologies? In the words of an Apple Australia technician; ‘The user experience won’t change too much.’
Whether we realise it yet or not, part of Apple’s strategy in giving us a totally new system was to prepare the Mac user for an increasingly connected and cross-platform world. As we swap more and more files between web servers and users, it will become increasingly important that platform-specific data is either made unnecessary or packaged with the file intended for transport.
The computer at the receiving end of a file doesn’t want strange codes like Mac resource forks — just what it needs to display the file. Part of Apple’s rationale (dubbed ‘revolutionary’ by one source) was to do away with resource forks and use ‘package files’ that contain the same information but stay with the file no matter where it goes. In theory you no longer have a file hitting a computer across the Internet without giving any clue what it is or how to display it.
But how will all this affect file compression? A lot will still come to you as .sit files. Some files — particularly software installers, will be disk image or .dmg files. You may occasionally come across a .gzip or a .tar, and then you’re in Unix territory
Unix Compression
.tar is the name given to a Unix format once used to create archives for data tapes, hence the acronym for Tape Archive. It’s not a compression but an archiving standard — like creating a Mac .sea (only not compressed). If you receive a .tar file, you’re simply getting a group of files. Typically a .tar will be compressed in turn, creating a .tar.gz file.
.gz refers to Gzip, the standard currently blowing the previous Unix mainstay — compress — out of the water. Its popularity has led its authors to create a Mac version (MacGzip) to go along with the Unix-standard utility, Gunzip. Mac OS X doesn’t directly support MacGzip, but with a PowerPC version, Classic mode will. So dealing with groovyfont.gzip or groovyfont.gz should be as easy as dealing with groovyfont.sit
Even if you come across an old-school Unix compress file, Stuffit Expander — already available for the Mac OS X — recognises all the major Unix formats anyway.
The Future
Accepted standards, a stronger system incorporating the richest operating components built into any personal computer so far, and software developers like Aladdin who keep their focus way ahead of the game add up to one thing. As we connect more and more to swap files, access web pages and deal with more platforms on more computers than ever before for work or play, the technology of file compression should keep up without a hitch.
And we might decide those floppy drives weren’t so necessary after all!
What Compression Is
Imagine a printed, English-language version of War & Peace. With only 26 letters to choose from, plus spaces, punctuation and common words like ‘the’, ‘and’ ‘of’ etc, you’d expect a few repeated patterns. If you got rid of the repeated words or strings and flagged them with a single code, you’d save quite a lot of space.
File compression software works the same way, looking for patterns in the data (and keep in mind that the number of data bits in a large image file would be enormous — even compared to War & Peace) that are repeated. The repeated patterns are called ‘redundancy’.
After taking out all the redundancy and replacing it with much shorter code, it uses an algorithm or ‘dictionary’ (usually based on the standard devised by Lempel & Ziv in the late ’70’s) to store what each piece of shortened code should be expanded to at decompression stage.
In theory — after stripping all the redundancy from a file and still including the dictionary of algorithmic code — the file size can be reduced drastically with little or no loss of data or quality.
What File Compression Isn’t
If you download a Mac file from the web, it’ll often have the file extension .hqx or .bin. Both are Macintosh encoding standards — models for converting binary data into ASCII text (the major Unix equivalent is UU, for PC it’s Base64).
The one most people have never heard of but everyone uses is MIME (Multipurpose Internet Mail Extensions), the lynchpin of email formatting so any email client can decode the message being sent.
Encoded files can sometimes be larger than the original, which seems to beg the questions; why? The answer is that all operating systems can read and decode ASCII text (A technology that goes back further than most of us can remember), so no matter where you’re sending a file, there’s no chance of the transmission corrupting binary data like software, graphics, video or sound.
You’ll sometimes download or receive a file called *.hqx.sit or a *.bin.sit — it’s simply been encoded for safe transport and then compressed for faster transport (the decompression algorithm goes to work on the ASCII text as it would normally on binary data).
When that happens, your decompression program will appear to run twice. The first time, it’s decompressing the encoded file, and the second action is to decode the ASCII text back to its binary state.
Putting Con into Compression
Using technical know-how usually attributed to authors of virii and hackers, some operators promise ‘incredible’ and ‘impossible’ compression rates of down to 1%, but be wary. Research at the time of writing didn’t turn up any Mac versions, but some mailing list FAQs reported scam compression programs.
As a computer science research website points out, there is a mathematical limit to how much lower your compressed file will be compared to the original. For ‘lossless general compression the horizon seems to be approximately a 2:1 ratio’. The ratio can be much higher for lossy compression, but not nearly what scam software claims.
So beware of utilities that claims unheard-of compression ratios. Usually, they work simply by moving the majority of data in a file to an unused and hidden cluster on your disk, and usually don’t take any precaution against clusters being overwritten. In effect, all you’ve done is moved the bulk of your target data to somewhere you can’t see it, probably to delete it before too long.
It’s Your Loss
Lossy
Images, sound or video have a lot of redundant information. Rather than use compression algorithms to build a library of repeated patterns, sometimes your compression software will use creative judgement and just let it go.
Some multimedia formats (such as MPG, JPG and MP3) use their compression behaviour to trim off redundant patterns they know you won’t need because they simply won’t show up in your media player.
The result is that you have lost some data during compression, hence the term ‘lossy’ compression.
Lossless
Lossless compression is used for data that has to be recreated with 100% accuracy at the decompression stage (such as installer software or text files). As the name implies, compression renders no loss of data.
Unless you want to get very technical, don’t worry about using the wrong type — the gamut of compression software knows what it’s doing, by the nature of the media it was built to compress for.
What Compression Format is That?
If you’re confused about the difference between compression utilities and compression formats, that’s because there often isn’t one.
Compression only works one way — by using algorithms to identify and take out redundancy. But some of the biggest names (and professional bodies) in the business make compression part of our lives in very different ways;
GIF
Graphics Interchange Format. Invented by the first major online service provider, Compuserve, now the industry standard for non-photographic images on the web. Exploits the enormous data gap between screen and print resolutions (by making large parts of the file transparent, for example) and built-in compression results in ridiculously small file sizes.
JPEG
Named after the Joint Photographic Experts Group who conceived the standard, JPG is used for compressing colour and greyscale photographic images. Enjoys extensive use on the web and the technology behind a million joke emails. Interestingly, JPEG was designed with the knowledge that the human eye can’t perceive changes in colour as accurately as changes in brightness.
MPEG
Short for Moving Picture Experts Group and leading the race ahead of .wav and .avi, the compression and presentation format for video files. Lossy, but much smaller for similar quality, thanks to what one technical website called ‘very sophisticated compression techniques’.
DivX
A technology that compresses digital video for fast download over cable or ADSL with no reduction in picture quality. Currently enjoying a rash of popularity among cable Internet users swapping full-length films.