Data compression

This post examines the compression of a database named Bankware (BW) at a client site. The database size was 300+ GB.

The tests for BW were done on the sk-bankware server aimed for the ART (Accounting, Reporting and Treasury) applications in the Skopje regional office of the client. The sk-bankware server owns the following HW&SW:

  • CPU: NUMA 2 x 12 = 24 cores

  • RAM: 32 GB (28 GB dedicated to the SQL Server instance)

  • Storage: RAID5 disks 2.5 TB

  • SQL Server: Microsoft SQL Server Enterprise Edition (64-bit)

  • OS: Windows server 2008 R2 Standard edition

Short introduction to compression

Data compression in SQL Server can be of two types: PAGE and ROW. The PAGE type compression is more common for the OLAP-like systems and the ROW type is more common for the OLTP-like systems. As the PAGE type is stronger than ROW, it requires the CPU to do more compressing and decompressing of the data, and it’s one reason why PAGE is usually used in systems where there is no much data update (update, delete, insert), or it is less frequently done.

Data compression is recommended by Microsoft for both OLTP and OLAP systems. In the following white paper they show the benefits of the compression as an advanced feature. Then, SAP by default use ROW type of compression for all their applications (blog white paper 1, white paper 2). Another satisfying story of this feature is the following of NetApp.

Conclusion of the Microsoft’s white paper for the compression is that it’s safe and useful feature for all systems in general.

Advantages and disadvantages

In order to check and ensure that the recommendations for applying data compression on systems is working well for Bankware, some tests were done on it.

There are advantages and disadvantages of applying compression on BW. If the advantages are high and disadvantages are low or negligible, then the tradeoff is warrantable and the feature is useful for the system where it’s applied. If the disadvantages’ impact is significant on the system, then the feature should not be applied to the system.

Advantages identified:
  • Significantly reduced storage for the data

  • Significantly reduced number of data pages in the memory buffer of the SQL Server instance

  • Significantly reduced I/O transfer

  • Faster queries

Disadvantages identified:
  • CHECKDB is expected to last more

  • REBUILD indexes operation is expected to last more

  • Some other maintenance operations (update stats, reorganize, …) are expected to last more

Results and tradeoff of the compression

The PAGE type compression was applied on Bankware. Sometimes the PAGE and ROW types of compression don’t differ much in size. However PAGE is more recommended for OLAP-like systems. The decision for the PAGE type was chosen additionally on the columns definitions analysis. The overview showed that the char-type columns with their definitions dominate in terms of the storage required for storing the data. 

Data storage, I/O performance counters and Queries execution

Data storage. The gain of the applied compression is shown on Figure 1.

Figure 1. Data storage for compressed and uncompressed Bankware

The storage gain by applying compression usually flows between 30-90%. An OLAP system can be compressed up to 90%, whereas the OLTP systems are usually compressed within the range of 30-50%. A compression rate of 60% for our Bankware is very satisfying.

I/O performance counters. The I/O benefit is significant. A number of 21 I/O performance counters were monitored for a simulating multi-session environment during an intensive 30 minute time interval. Their values are showed in the table below.

CounterCompressed valueUncompressed valueComment
valuevalue
Avg. Disk Bytes/Read133,510.00190,967.00Smaller amount of bytes read in the compressed database
Avg. Disk Bytes/Transfer134,259.00191,457.00Smaller amount of transfered bytes in the compressed database
Avg. Disk Queue Length56.44168.89Significantly smaller disk queue length in the compressed database
Avg. Disk Read Queue Length56.36168.68Significantly smaller read disk queue length in the compressed database
Avg. Disk sec/Read0.060.14Significantly smaller number or disk reads per second
Avg. Disk sec/Transfer0.060.14Significantly smaller number or disk transfers per second
Avg. Disk Write Queue Length0.080.21Significantly Smaller write queue length
Buffer cache hit ratio99.6499.6A little better buffer cache hit ratio in the compressed database
Current Disk Queue Length31.6863.16Smaller current disk queue length
Disk Bytes/sec69,021,971.00174,491,018.00Significant smaller amount of disk bytes per second
Disk Read Bytes/sec67,028,918.00171,626,244.00Significant smaller amount of read disk bytes per second
Disk Reads/sec302.8721.51Significant smaller number of disk reads per second
Disk Transfer/sec336.4768.81Significant smaller number of disk transfers per second
Disk Writes Bytes/sec1,993,053.002,864,774.00Smaller number of disks writes per second
Disk Writes/sec33.647.3Smaller writes per second
Free Pages2,357,687.00902,590.84Significantly higher number of free pages
Lazy Writes/sec0.320.79Smaller number of lazy writes
Page IO latch waits (avg) (ms)35.989.96Significantly smller IO latch waits
Page life expectancy716818.75A bit smaller page life expectancy (min is 300) at the compressed database
Processor time (%)50.6858.57Smaller Processor time

Table: I/O Performance counters

Results are improving.

The queries that execute behind the following reports were run on a compressed and uncompressed version of the Bankware database with equal overloading conditions:

  • Reporting->Reports->Current Internal Balance

  • Reporting->Reports->Statements

  • Reporting->Entry analysis->Ledger entries

  • Reporting->Entry analysis->Subledger entries

  • Reporting->Reports->Generate report

The number of queries that took part in the test was around 170. Aggregated results for the runs of the two groups of queries are shown in the next Figure 2.

Figure 2. Execution on queries on a compressed and uncompressed Bankware database

The gain for the queries execution speed is 23%. This is a good improvement too.

CHECKDB

CHECKDB is a maintenance command that is used at Bankware on a weekly basis. The results were expected to be lower for the compressed database because of the compressed data; and in that case more operations are done in the CPU, like decompression, check and compression operations. When the CHECKDB is done on uncompressed only the check operation is done by the CPU. Results are shown in the next Figure 3.

Figure 3. Results for the CHECKDB on compressed and uncompressed BW database

Whilst the CHECKDB for uncompressed database finished in 1995 seconds (33 min 15 sec), the same operation for the compressed database lasted for 2476 seconds (41 min 16 sec). The time duration for this operation was about 20% more on the compressed database.

REBUILD

Test for the indexes maintenance was done too. Less page splits will occur in the new index structure on the compressed database but more CPU time will be spent for the compression and decompression during the update-type operations. Results are shown in Figure 4.

Figure 4. Results for the REBUILD operation for compressed and uncompressed BW database

The rebuild time for the indexes does not differ much, i.e. the difference is about 300 seconds or 5 minutes, which compared to 1 hour and 30 minutes (approx) is practically not a difference.

Final words

A conclusion is that compression as an advanced feature in SQL Server is fitting very good on the Bankware database and therefore it should be applied. The tradeoff between the advantages and disadvantages is positive for the advantages. Disadvantages are very small so that they are almost not to be noticed.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+

Leave a Reply

Your email address will not be published. Required fields are marked *

*