I did some very extensive testing & reviews back in the 90's when Stacker and its ilk were on the market. Microsoft bought out much of that technology and its now pat of the operating system. You can select a folder, a file, or a drive and elect to "compress" the data.
For the sake of this discussion, lets assume a compressed file will be 50% smaller than a normal one. That's a lower than normal compression ratio compared to zipping an NSF file, but on disk compression isn't quite as effective because of the way it works.
Taking just disk I/O into account, that means 50% less data traveling to and from the mechanical media itself -- the slowest part of the transfer. In a perfect world, that means you double the speed. In practice, there is overhead however. You have to assign processor time to the compression, and you have the overhead of the programming itself and in memory copying of data that must take place.
12 years or so ago when I did tests, I determined that on a 386 25mhz processor with an ide hard disk operating at 33mhz you crossed the threshold where processor time loss was lower than disk read/write gain. In other words, compression was faster.
Today there are a lot of different options for drives. Today's drives transfers about 7-10 times faster than those of 12 years ago. Processors, however, tend to run 20 or more times faster (its not just about clock speed, but leave it at that). On a multiprocessor or Hyperthreading machine even more. Add to that, that the processor utilization on most servers is the only resource that isn't being tapped out. Most of the time we have 20-75% free processor cycles.
My purely subjective testing thus far definitely supports my hypothesis that increased processor use is well worth it you can cut disk usage in half. I'm seeing end user noticeable performance increases using compressed NSF files on both servers and workstations. I have not done any objective empirical testing recently however.
In the early days of Stacker, my big worry was reliability, but the technology proved out, and has been extremely reliable. Yes, its true that a sector failure will wipe twice as much data this way. Has it been an issue for you? Today's file systems are all virtualized anyway, so its not like you're getting in the way of hardware calls to the drive on interrupt 0x13 like you were back then.
Comment Entry |
Please wait while your document is saved.
I'll shortly be posting an update to my RAM drive blog explaining that in a
production system I just can't produce any measurable benefits, although
they're clearly visible on an old disk subsystem.
I'll enjoy testing the compressed file system this week.
I think some full compact tasks against a large DB with and without compression
will be a good indicator of your theory.