As the data grows in relational databases (new business requirements, managing audit data etc), the cost associated with the disk systems to store that data and the resources required to manage them becomes vital. Oracle understands this requirement and it is continuously implementing and improving on the compression techniques and algorithms while simultaneously meeting the application scalability and performance requirements. In this article, I will be discussing the concept behind the Oracle Compression from database perspective and its advantages and disadvantages. The various compression features might differ with different releases but the underlying concept remains the same. I will also demonstrate the actual compression happening at the database level with suitable examples.
Earlier RDBMS versions (6,7) were using a dictionary based approach based on static, table-wide dictionary. So, a table will have an associated look up global dictionary table which holds the lookup entries and the compressed data. The advantage of this approach was higher compression factor as compared to the latest block level compression. Also the global dictionary table will take less space as compared to the block level approach (explained next). But with the advent of new partition techniques and features, the block level approach holds greater merits.
Latest Compressing Techniques (specifically in reference to Oracle 10GR2):
The latest compression technique used by Oracle is a unique compression algorithm specially designed for RDMBS systems. The main features of this approach are:
Block Level compression:
The compression window for data compression is a database block. This means the data is compressed block wide instead of table. Every compressed block is self contained and all the information needed to uncompress the data is available within the block.
The block contains a dictionary called as “symbol table” within each compressed block. This symbol table will have the common data and the links to associate this data with the actual rows.
If a sequence of columns are having common data, then instead of saving each individual column as separate entry, the combination of the columns will be saved in the symbol table as a whole. This approach is particularly useful in case of materialized views using grouping sets and cubes (where the data is ordered).
Cross Column Compression:
As explain above, the symbol table holds the common data. This is true for the column values for different columns. So, if column values from the different columns are different, they all will share the same symbol table entry.
Advantage of Block Level Compression:
Disk Space saving:
This is the obvious reason for implementing data compression. Although the space saving comes at a cost of decreased query performance (time required to decompress the data), when used along with other features, this can be minimized.
Coexisting compressed and non-compressed data:
The same table can have both compressed and non-compressed data. This is possible because the compression is at block level, one block might hold compressed data and the other block might hold normal uncompressed data. This is particularly useful with partitioning where you can compress the old partitions while keep the current partition as non-compressed.
Less Buffer Size required:
The data from a compressed block is read and cached in its compressed format and it is decompressed only at data access time. As the data is cached in compressed form, more data can be hold into the same
amount of buffer cache.
How to use compression:
Compression can be used for tables, partitions and materialized view (Index compression is also possible, but that is out of scope of this article). The keyword “COMPRESS” is used with the table’s meta-data to compress the data. Please note that the compression works only in the following cases:
CREATE TABLE AS SELECT
Following example shows the compression using CTAS.
CREATE TABLE tbl_nocompress AS SELECT mod(rownum, 10) || 'ANY ADDITIONAL STRING TO MAKE THE COLUMN LENGTH LONGER, COMPRESSION WORKS BETTER' col1 FROM all_objects; CREATE TABLE tbl_compress COMPRESS AS SELECT mod(rownum, 10) || 'ANY ADDITIONAL STRING TO MAKE THE COLUMN LENGTH LONGER, COMPRESSION WORKS BETTER' col1 SELECT segment_name, segment_type, blocks FROM dba_segments WHERE segment_name IN ('TBL_COMPRESS', 'TBL_NOCOMPRESS'); -- SEGMENT_NAME | SEGMENT_TYPE | BLOCKS -- TBL_COMPRESS | TABLE | 148 -- TBL_NOCOMPRESS | TABLE | 1208
Two tables are created having 10 distinct values, one is compressed and other without compression. The blocks used for non-compressed table are almost 10 times that of compressed one. Please note that this is an indicative example and actual compression ratio will depend on database settings (for ex: db block size).
INSERT /*+ APPEND */
Now, lets insert the same data, with and without using “APPEND” hint:
INSERT INTO tbl_compress SELECT mod(rownum, 10) || 'ANY ADDITIONAL STRING TO MAKE THE COLUMN LENGTH LONGER, COMPRESSION WORKS BETTER' col1 FROM all_objects; SELECT segment_name, segment_type, blocks FROM dba_segments WHERE segment_name IN ('TBL_COMPRESS', 'TBL_NOCOMPRESS'); -- SEGMENT_NAME | SEGMENT_TYPE | BLOCKS -- TBL_COMPRESS | TABLE | 1224 -- TBL_NOCOMPRESS | TABLE | 1208
Although the tbl_compress is created with “COMPRESS” parameter (check the create script), normal DMLs are done without compression. If you use insert/update/delete, the modified data is NOT COMPRESSED. If you start with a large compressed table and update each row, each row will be decompressed and stored decompressed. This verifies that same table can hold both compressed and non-compressed data. Now, recreate the table and insert the entries in append mode:
DROP TABLE tbl_compress PURGE; CREATE TABLE tbl_compress COMPRESS AS SELECT mod(rownum, 10) || 'ANY ADDITIONAL STRING TO MAKE THE COLUMN LENGTH LONGER, COMPRESSION WORKS BETTER' col1 FROM all_objects; INSERT /*+ APPEND*/INTO tbl_compress SELECT mod(rownum, 10) || 'ANY ADDITIONAL STRING TO MAKE THE COLUMN LENGTH LONGER, COMPRESSION WORKS BETTER' col1 FROM all_objects; -- SEGMENT_NAME | SEGMENT_TYPE | BLOCKS -- TBL_COMPRESS | TABLE | 292 -- TBL_NOCOMPRESS | TABLE | 1208
This time we inserted the data in append mode, and the new data is compressed (block size approx. doubled instead of 10 times). Please note that although we are inserting the same data as it is already existing in the tables, new blocks are getting allocated to save it. The compression would have been much higher, if the table level compression were used instead of block level compression.
DIRECT PATH LOADS (SQLLRD)
while using DIRECT=TRUE with sqlldr
ALTER TABLE <TABLE_NAME> MOVE
Now lets try to compress the tbl_nocompress with ALTER command.
ALTER TABLE tbl_nocompress MOVE COMPRESS; SELECT segment_name, segment_type, blocks FROM dba_segments WHERE segment_name IN ('TBL_COMPRESS', 'TBL_NOCOMPRESS'); -- SEGMENT_NAME | SEGMENT_TYPE | BLOCKS -- TBL_COMPRESS | TABLE | 292 -- TBL_NOCOMPRESS | TABLE | 148
This is an easy way of compressing the table in case you were doing normal DMLs on a table with “compressed” attribute set (Remember that this will make the indexes on the table invalid because of rowid changes).
Achieving higher Compression Ratio:
Read Only/mostly Data => Compression is generally used for “read only” or “read mostly” data like archived data or old partitioned data. The current running partition can be kept uncompressed while the old partitions can be compressed.
Large DB Block Size => The larger the size of block size, the more rows it finds for compression in the same block and thus the chances of achieving higher compression are more.
Ordered data => With the right sorting order, more rows can be find with common data and thus higher compression can be achieved. As mentioned before, this can be particularly useful with materialized views used for aggregating data.
Long Fields with less Cardinality => Longer fields with less cardinality are ideal for compression as the space saving is larger.
Adding / Removing Columns from compressed tables:
Earlier it was not possible to add or remove columns from a compressed table. In Oracle 10G, you can add the column but can’t drop it (tested on 10.2.0.4). One way is to set the columns as unused:
ALTER TABLE tbl_compress ADD col2 NUMBER; ALTER TABLE tbl_compress DROP COLUMN col2; -- ORA-39726: unsupported add/drop column operation on compressed tables ALTER TABLE tbl_compress SET UNUSED COLUMN col2;
Please note that once you set the column as unused, there is no easy way to get it back. There is no command like “ALTER TABLE SET USED COLUMN
“It is sort of “not necessary” of course in this case – in fact, I’d say dropping a column from a compressed table probably should be done via ‘alter table t move’ anyway (in order to, well, compress the data on the block again…)”
This should have given a broad idea about the concept. In oracle 11G, oracle has added more sophisticated compression techniques like “OLTP Compression”, “RMAN Compression”, “Network Compression” but that is out of scope of this article. Also, we are not discussing the performance penalty we pay for compression and the trade-off between space/performance. There are in-depth white papers available on web for consultation.