Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Professional Services

Second Signal

Presentations

Andrew's Blog

Support

The case for using on disk compression for your Notes Data directory

By Andrew Pollack on 04/22/2005 at 02:21 PM EDT

I did some very extensive testing & reviews back in the 90's when Stacker and its ilk were on the market. Microsoft bought out much of that technology and its now pat of the operating system. You can select a folder, a file, or a drive and elect to "compress" the data.

For the sake of this discussion, lets assume a compressed file will be 50% smaller than a normal one. That's a lower than normal compression ratio compared to zipping an NSF file, but on disk compression isn't quite as effective because of the way it works.

Taking just disk I/O into account, that means 50% less data traveling to and from the mechanical media itself -- the slowest part of the transfer. In a perfect world, that means you double the speed. In practice, there is overhead however. You have to assign processor time to the compression, and you have the overhead of the programming itself and in memory copying of data that must take place.
12 years or so ago when I did tests, I determined that on a 386 25mhz processor with an ide hard disk operating at 33mhz you crossed the threshold where processor time loss was lower than disk read/write gain. In other words, compression was faster.

Today there are a lot of different options for drives. Today's drives transfers about 7-10 times faster than those of 12 years ago. Processors, however, tend to run 20 or more times faster (its not just about clock speed, but leave it at that). On a multiprocessor or Hyperthreading machine even more. Add to that, that the processor utilization on most servers is the only resource that isn't being tapped out. Most of the time we have 20-75% free processor cycles.

My purely subjective testing thus far definitely supports my hypothesis that increased processor use is well worth it you can cut disk usage in half. I'm seeing end user noticeable performance increases using compressed NSF files on both servers and workstations. I have not done any objective empirical testing recently however.

In the early days of Stacker, my big worry was reliability, but the technology proved out, and has been extremely reliable. Yes, its true that a sector failure will wipe twice as much data this way. Has it been an issue for you? Today's file systems are all virtualized anyway, so its not like you're getting in the way of hardware calls to the drive on interrupt 0x13 like you were back then.

There are - loading - comments....

Very interesting...By Ben Rose on 04/23/2005 at 03:15 PM EDT

And ties in nicelessly with my RAM drive testing this week.

I'll shortly be posting an update to my RAM drive blog explaining that in a
production system I just can't produce any measurable benefits, although
they're clearly visible on an old disk subsystem.

I'll enjoy testing the compressed file system this week.

I think some full compact tasks against a large DB with and without compression
will be a good indicator of your theory.

Works for meBy Chris Linfoot on 04/28/2005 at 08:25 AM EDT

On local workstation, Notes data folder is 3/5 of its previous size.
Performance as measured by time to open a .nsf and start working on it has
actually improved.

I may try this on a Domino server here which based on my workstation may yield
between 70 and 100 GB of extra usable space.

Why did I not think of this before?

But...By Chris Linfoot on 04/29/2005 at 07:20 AM EDT

One of my readers points this out:

http://www-1.ibm.com/support/docview.wss?rs=463&context=SSKTMJ&context=SSKTWP&q1
=Compression&uid=swg21103313&loc=en_US&cs=utf-8&lang=en

Check out your server CPU performance....By Andrew Pollack on 04/29/2005 at 07:42 AM EDT

....Take a look at your machine. Watch CPU percentage vs. hard disk wait
time. If you've got a machine that's so busy its actually bogging down CPU
time, then I'd have to agree. However, I suspect the servers used in these
examples are hardly indicative of general use machines who's CPU power far
outstrips I/O.

If you're using an old dual P3 but with super fast scsi raid drives, you may in
fact be worse off. If, however, you're running a hyperthreaded dual p4 or
xeon machine you're sitting on so much processor power that you're rarely
seeing above 20% usage in total.

Pretty much where I was with thisBy Chris Linfoot on 04/29/2005 at 10:55 AM EDT

Yup. I see plenty of idle CPU, so I'm ignoring the naysayers and just doing it.

Subject
Your Name
Homepage
*Your Email
	* Your email address is required, but not displayed.

Your thoughts....


	Remember Me

Andrew Pollack's Blog

The case for using on disk compression for your Notes Data directory

Site Links

Useful Links

Other Recent Stories...