historical partitioning is the solution for many problems
volkmar.buehringer, January 17, 2003 - 12:05 pm UTC
this feature fits nice in the usage of historical partitioning
- from time to time all old date is packed in a historical partition with compress turned on
it is important to mention that with the right sorting order
you can improve the compression ratio, and the more
old data with the same base key you have the more you can compress
( also sorting can improve the clustering ratio of
your indexes )
so tables would consist of a recent partition
with large pctfree for updates
and a compressed historical partition ( with no updates )
indexes are local with pctfree 0 rebuild on old partition
(there are no changes )
and broken up indexes on new partition
this setup not only saves a lot of disks, speeds up
queries of old data ( think of clustering for indexes )
better cache utilisation ...
and also speeds up insertion of new data, because
indexes of recent partition are smaller and so better
cached
from time to time you have to rebuild this setup to
put data from recent partition into historical partition
compared to most useless reorganization this
really helps
Resource Tradeoff.
Jer Smith, January 17, 2003 - 12:45 pm UTC
Whether it's compression on the filesystem level for your applications, compression in network protocols, or compression on your database tables, you're making a trade-off.
The way I always looked at it is that compression offloads additional work from the disk onto the CPU (and maybe RAM). In the example, you're reading a whole lot fewer blocks off disk when your reading the compressed table--to make up for it, at some point, those need to be decompressed, which takes CPU work.
If you've got a slow disk, but plenty of CPU to spare, you should benefit by compression. If you've got a lightning fast 15K RPM disk array and a 486, you'd be a fool to compress. (well, you are already, but...)
What I find interesting is that the evidence from the examples here don't illustrate my theory. Each of the compressed queries show _less_ CPU time for the compressed tables. I'd guess that either the CPU time due to decompression isn't included in the stats, or the CPU used by reading the disk is greater than from doing the compression (which would depend on your system, I would guess).
Any thoughts?
Also another question related to this: you mentioned the compressed table being nicer on your buffer cache--so it's stored in there still compressed? Does that mean we have to go through decompressing the data every time?
January 17, 2003 - 2:06 pm UTC
the cpu time to do the compression is there -- it is all there.
We have a "low bandwidth" compression algorithm. A simple symbol table lookup. It is not binary compression ala 'zip' or 'gzip' (which is hugely costly). It is a symbol table lookup which is very fast.
You are seeing the side effect of not having to latch, relatch, get, reget the blocks over and over again.
The block is stored "compressed" on disk and in the cache.
One of the reasons I did things like:
select * from big_table_compressed
and
select * from big_table_compressed where id = :x
was to drive home that this stuff doesn't carry a 200 or 300% overhead -- no way. We decompressed each block (more then once) for the select * from big_table_compressed -- we decompressed each block many times for the keyed read....
In depth analysis - Excellent!
Paul D. Phillips, January 17, 2003 - 1:20 pm UTC
I hope I didn't come across as being critical; in my testing, (which, of course, I have since purged) the run times were very different. I believed that it was going to be encountered by others, also.
I will need to rerun my tests, obviously. I'll try to share what I find with you. For me, though, it will be difficult to re-setup the tests, I used a much larger amount of data. Regardless, I appreciate your "rapid reaction". I was very disappointed in the results. Now it seems the COMPRESS answer may be the one I was originally hoping for - smaller and as efficient.
Why is it optional
Frank, January 18, 2003 - 3:25 am UTC
Not having read the original article, I do not see any cons in your tests here.
Why did Oracle not make it the default setting ?
January 18, 2003 - 9:03 am UTC
it only works on BULK operations
o insert /*+ append */
o create table as select
o direct path loads
o moves
it is for read mostly/read only data.
it will impact the speed of the load to a degree.
it would change the default -- something which we are always hesistant to do -- and with this one, since it is specically and only a data warehouse feature, I don't see it becoming the default.
The article is online at:
</code>
https://asktom.oracle.com/Misc/oramag/on-the-explicit-size-and-complex.html <code>
A reader, June 17, 2003 - 11:01 am UTC
Hi Tom,
Is compress table is good for Datawarehose only? Can we use it in OLTP applications?
Our DBA is telling us "update" is very costly for compress table. Reason is it has to read data,uncomress it,update data and again comress it and save to disk.
Could you pls. through some light(update) on this ?
Thanks
June 17, 2003 - 12:50 pm UTC
DBA is right but for all of the wrong reasons.
Table compression works only for BULK direct path type operations:
alter table move
create table as select
insert /*+ APPEND */
direct path sqlldr
if you use insert/update/delete -- the modified data IS NOT COMPRESSED. If you start with a large compressed table and update each row -- each row will be decompressed and stored decompressed (that is the "costly" part)
Now, if you have say an AUDIT TRAIL in this OLTP database that is partitioned by say MONTH and you must keep say 3 years online you can:
a) alter the table to be compress -- NO IMPACT on any inserted data, no overhead here.
b) at the end of the month, create next months partition -- just like you do now
c) alter table move partition -- move last months partition, this'll COMPRESS it as well (big bulky moves)
so, you can use this in OLTP systems -- you just apply it to READ MOSTLY or READ ONLY data. Using partitioning and other techniques makes this very feasible.
Dave, June 17, 2003 - 3:07 pm UTC
Let me mention one of the "cons" of compressed tables-- you can neither add nor delete columns from a compressed table.
A reader, June 18, 2003 - 9:41 am UTC
Why is it so Tom?
June 19, 2003 - 7:33 am UTC
if you are talking about the followup right above -- it is currently a limitation of a compressed table. "it just is", the block format is very different.
It is trivial to "drop" a column (at least set unused). You would simply use a view (rename T to T_TBL, create view T as select a,b,d,e from t_tbl -- column C is now "gone")
Is there way to to estimated space saving from compression without actually compressing the table.
Ajeet, November 05, 2003 - 5:21 pm UTC
Hi Tom,
I have read allmost all I could find on compression including your book.Very nice feature.I have a big warehouse having Large tables and I am compressing them one by way.I want to collect data on space saved.Is there a way I can actually tell the difference in space usage between a compress and noncompressed table without actually compressing the tables.The reason I asked this -- that my test environment has alot less data then production --so actually I have to copy production to test and then compress and then measures the savings of space..or query performance.
I do think this question is not logical but I thought you may tell me a way..as you always do.
Thanks
Ajeet
November 06, 2003 - 7:33 am UTC
you have to compress the table to measure the effect.
drawbacks
A reader, November 06, 2003 - 8:17 am UTC
Re: A Reader
Dushan, November 07, 2003 - 3:58 am UTC
Table Compression -- in regards to Jan/Feb 2003 article
Marianne Carr, March 22, 2004 - 6:40 pm UTC
excellent tests on compression feature. I've had a really hard time finding out much information in terms of real life experiences.
Table Compression
Vivek Sharma, September 10, 2004 - 6:16 am UTC
Dear Tom,
I was just going thru the white paper on "Table Compression in Oracle 9i Release 2 : A Performance Analysis" written by Meikel Poess.
On Page 10 of the document, Author discusses about the best sorting order and how to decide the sort order to take the maximum benefit of Compression. The author says that "Generally, a long column yields a larger increase in compression compared to a short columns". This is self explanatory and true. But the next statement "As for the cardinality, it turns out that sorting on a very low cardinality columns such as GENDER or MARITAL STATUS is less effective as sorting on medium cardinality columns." I could not understand this statement.
Can you please explain this with a small example (which you normally do). The way you explain with an example clarifies all our doubts which needs an appreciation.
Thanks and Regards
Vivek
September 10, 2004 - 9:18 am UTC
how about you actually point us to the paper in question so we can read it in context. (and did you consider asking the author themselves to comment on their writing?)
Response to Vivek
Peter, September 10, 2004 - 9:52 am UTC
I may be talking complete rubbish here (waiting to be corrected) but...
As I understand it, table compression looks at recurrent data patterns in a block - this is unlike index compression that goes for recurrent VALUES in a column.
In a data warehouse (I don't do OLTP) every row will (should!) be different in a table - there will always be at least one dimensional value that is different. If you order a compression by just sex (only 2 (perhaps 3?)values) it would be likely that most of the surrounding rows would not have much in common so would not compress well. If you used a medium cardinality column (say item sold) there would be a lot more scope for similar data patterns being close together.
In simple tests on our DW compression of a partition gave us a 52% reduction of size, but by ordering it we had a massive 75% reduction in size! and with no noticeable change in query performance.
September 10, 2004 - 10:17 am UTC
yes, but -- why look at cardinality in this case?
If goal = maximum compression then I'd be looking for the longest columns that have the fewest values. Order by that and you'll achieve maximum compression.
Single byte fields such as gender, a single digit or something won't compress very much (a byte is a byte after all). You need longer, multi-byte pieces of data to compress well (you factor out repeating stuff and the longer it is, the better the space savings)
Ah-ha
Peter, September 10, 2004 - 10:29 am UTC
Thanks...
I should have stopped after my first paragraph! - I generalized that which should not be generalized!
In our DWH we have some dimensions that should (in a purist world) be attributes! By ordering by one of those dimensions (in our specific case) we bring in masses of repeating data patterns!
Think I'll stop commenting for a while!
Cardinality
A reader, September 10, 2004 - 10:37 am UTC
"yes, but -- why look at cardinality in this case?"
...
"If goal = maximum compression then I'd be looking for the longest columns that have the fewest values. Order by that and you'll achieve maximum compression"
Isnt cardinality and "number of distinct values in a column" the same thing? So I dont understand your comment above about "why look at cardinality"
September 10, 2004 - 10:42 am UTC
if gender was a char(100) field -- and had two values, the rule of thumb (ROT) proposed above would be 100% false. It would compress like crazy.
I have a feeling that this is a case of false causality -- the observation:
"ordering by low cardinality columns isn't good for maximum compression"
the supposed conclusion:
"it is low cardinality columns"
my conclusion:
"really low cardinality columns are very short columns as well -- one byte, it doesn't compress because of the length of the data -- not because of the cardinality!"
You are looking for LONG things that have the FEWEST values -- you need TWO factors. It is not cardinality. It is not length. It is a balance between the two.
awesome explanation!
Pravesh Karthik from Chennai, September 10, 2004 - 1:47 pm UTC
Compression and Index
Kishor Bhalwankar, September 15, 2004 - 8:24 am UTC
Suppose I have a table with with 40 partitions.
The size of table is around 800 G.
Now I am compressing first 10 Partitions.
I have 1 Local and 3 Global Indexes on this table.
will All indexes be invalid ?
If Yes.. Is there any way for workaround.
I dont want to rebuild or recreate the indexes.
"like update global caluse in truncating partition."
The Table is so Huge.. Is there any overhead of compress on CPU side ? This is historical data in first 10 Partitions. This data is frequently accessed(only select, NO DML).
September 15, 2004 - 9:59 am UTC
the global indexes will (rowids change) be invalid totally.
the local index PARTITIONS will (not the entire index)
In 9i, you can do the alter move with "maintain global indexes", in 10g you can maintain global and local.
in order to compress, you HAVE to rebuild.
compression is hugely expensive during the compress phase, nomimal during the uncompress stage.
Compression ratio to expect?
A reader, October 24, 2004 - 12:11 pm UTC
I have a table like this
create table t
(
acct char(6) not null,
sec char(6) not null,
the_type char(1) not null check (the_type in ('a','b','c','d','e','f')),
the_date date not null,
num1 number not null,
num2 number not null,
num3 number not null
);
1. The table has about 100 million rows
2. The acct and sec columns always have 6-byte data, so varchar/char doesnt really matter (for storage at least)
3. The the_date contains dates from 1996 to current with a fairly average distribution
4. acct has 30K distinct values
5. sec has 100K distinct values
The table is about 3GB in size
I did
create table t_comp
compress
as
select * from t
order by the_date,the_type,acct,sec;
It took nearly 4 hours to do this and t_comp turned out to be 1.5GB in size!
Shouldnt I get a better compression ratio since there is so much repetetive data? What is my optimal order by to get maximum compression?
Thanks
October 24, 2004 - 2:02 pm UTC
well, the_type isn't going to compress much at all (it is already pretty tiny).
and the_type could have caused acct,sec to repeat "not as much as it could have" (what if all of the the_types='a' varied widely for acct,sec)
probably best to have ordered by the BIGGEST columns that repeated often. the_date,acct,sec perhaps.
and remember -- small data is going to compress "less good" than big data. char(6)'s are small.
what you might do is play with a subset of the data (a 1% random sample) and see what you see, should only take a couple of seconds for each CTAS, for example:
ops$tkyte@ORA9IR2> create table t
2 (
3 acct char(6) not null,
4 sec char(6) not null,
5 the_type char(1) not null check (the_type in ('a','b','c','d','e','f')),
6 the_date date not null,
7 num1 number not null,
8 num2 number not null,
9 num3 number not null
10 );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert /*+ APPEND */ into t
2 select to_char( mod(rownum,30000), 'fm000000' ),
3 to_char( mod(rownum,100000),'fm000000' ),
4 chr( ascii('a')+mod(rownum,6) ),
5 to_date( '01-jan-1996', 'dd-mon-yyyy' ) + mod(rownum, 3220),
6 rownum, rownum, rownum
7 from big_table.big_table ;
1000000 rows created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select column_name, data_length, num_distinct
2 from user_tab_columns
3 where table_name = 'T';
COLUMN_NAME DATA_LENGTH NUM_DISTINCT
------------------------------ ----------- ------------
ACCT 6 30000
SEC 6 100000
THE_TYPE 1 6
THE_DATE 7 3220
NUM1 22 1000000
NUM2 22 1000000
NUM3 22 1000000
7 rows selected.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t1;
Table dropped.
ops$tkyte@ORA9IR2> create table t1 compress
2 as
3 select * from t order by the_date, acct, sec, the_type;
Table created.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T1', method_opt=>'for all columns size repeat' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t2;
Table dropped.
ops$tkyte@ORA9IR2> create table t2 compress
2 as
3 select * from t order by the_date, the_type, acct, sec;
Table created.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T2', method_opt=>'for all columns size repeat' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t3;
Table dropped.
ops$tkyte@ORA9IR2> create table t3 compress
2 as
3 select * from t;
Table created.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T3', method_opt=>'for all columns size repeat' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> select table_name, blocks from user_tables where table_name in ( 'T', 'T1', 'T2', 'T3' )
2 union all
3 select name, lf_blks+br_blks from index_stats;
TABLE_NAME BLOCKS
------------------------------ ----------
T 6094
T1 4156
T2 4146
T3 5049
But again, with such small columns, you might not see "overly dramatic results". To me, a 50% compression ratio was pretty darn good.
Table compression in 10g - adding columns
AliGee, November 09, 2004 - 7:32 am UTC
In 9iR2, there was a restriction adding columns to a compressed table. Has this changed in 10g (10.1.0.3)?
I'm worried that if I have a table 3Tb in size uncompressed, and I use compressed partitions to bring it down to a more reasonable size (e.g. 1Tb), if I need to add a column to it I will need to find 3Tb+ of temporary space to add the column, and then recompress the partitions.
If this hasn't changed, any ideas to work around the issue?
November 09, 2004 - 9:05 am UTC
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t compress
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add x number;
alter table t add x number
*
ERROR at line 1:
ORA-22856: cannot add columns to object tables
<b>that was then... this is now</b>
ops$tkyte@ORA10G> drop table t;
Table dropped.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> create table t compress
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> alter table t add x number;
Table altered.
Excellent.
AliGee, November 09, 2004 - 10:08 am UTC
Thanks Tom. Great news.
Table Compression does not work Over Database Link
Reader, November 24, 2004 - 1:12 am UTC
Hi Tom,
I created a compressed table in Database A. I then created a database link in Database B to Database A. When I insert data from Database B into the Table in Database A using this this Database link, why doesn't compression happen?
Is this feature not supported?
Thanks
November 24, 2004 - 7:16 am UTC
table compression, as documented, only compresses during large bulk (direct path) operations such as:
o create table as select
o insert /*+ append */
o alter table move
a normal insert, update or delete is never done "compressed", slow by slow processing does not lend itself to compression.
Table Compression does not work Over Database Link
Reader, November 24, 2004 - 7:50 am UTC
Hi Tom,
I was using the statement :-
insert /*+ append */ into tableA@dblink select * from tableB;
where tableB contains somewhere around 900000 records.
The resulting table size was around 40MB. But when I executed the same statement as
insert /*+ append */ into tableA select * from tableB;(ie on local database)
The table size comes to 19MB.
I observed the same when using Solaris to Solaris,Solaris to Windows and Windows to Linux. The Oracle version was
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for Solaris: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
Thanks
Thanks
November 24, 2004 - 8:03 am UTC
direct path over dblink = not direct path, conventional path.
the insert /*+ APPEND */ into t@db
was not any different than insert into t@db
Table Compression does not work Over Database Link
Reader, November 24, 2004 - 9:04 am UTC
Hi Tom,
Thanks for the reply. Any workarounds apart from loading it into a temp table in the remote DB and then from this temp table loading it to the compressed table?
Thanks once again
November 24, 2004 - 9:05 am UTC
pull it instead of pushing it, do it from the site that wants the data, have it PULL it.
(how would putting it into a "temp" table help?)
Table Compression does not work Over Database Link
Reader, November 25, 2004 - 12:13 am UTC
Hi Tom,
Thanks .. I was thinking about PUSHING it to a normal table , and then at the end of the day INSERT it to the compressed table by running a script from Local DB and then truncate the normal table.
November 25, 2004 - 9:31 am UTC
well, if you are going to run a script on the local db, why not just PULL the data in that script instead of move from stage table to real table?
I mean, if you are going local, just do it there. Why:
a) push data from T1 -> T2@remote
b) load T2_real from T2
when you can just
a) pull data from T1@remote -> T2_real
Table Compression does not work Over Database Link
Reader, November 26, 2004 - 12:46 am UTC
Hi Tom,
The Scenario is like this :-
1. Every 3 hours a file containing 1.5 million odd records is ftp'ed from Server 1, uploaded to in a table in Database A (Server 2).
2. Computations / Calculations are done based on these records and after that these records have to be moved to Server3 (Database B) where they are kept for reporting purpose and truncate them from Database A (disk Space constraints).
3. Now I have 3 options :
option a: ftp the same file every 3 hours directly from server 1 to server 3 and upload it, but I want to make sure that this file is uploaded to Database B only after calculations are done in Database A.
option b: After finishing the calculations in Database A, run a script to login to Database B from Server 2 , and then PULL the data from Database A.
option c: the process which i am doing now (which needs to change).
thanks
November 26, 2004 - 9:24 am UTC
option b seems to be the easiest? one additional "connect u/p@remote" in your script and a single change from
insert into t@remote select * from local;
to
insert into t select * from local@remote;
Table Compression does not work Over Database Link
Reader, November 27, 2004 - 12:08 am UTC
Yeah even I think the same and Implemented it ...
Thanks.
insert /*+ append */
A reader, January 07, 2005 - 1:59 pm UTC
After trial and error, suppose I do
create table t as
select ...
order by c1,c2,c3;
This give me best compression.
Now, if I do
insert /*+ append */ into t
select ...
order by c1,c2,c3;
1. the new rows are inserted compressed, right?
2. what if I take out the order by in the insert above? Are they still stored compressed?
Thanks
January 08, 2005 - 4:00 pm UTC
1) only if you actually created the table compressed, but assuming that, yes.
2) yes, they are still stored compressed ASSUMING the table is compressed, they are just physically stored in a different sort order.
You may use order by with compressed tables in order to achieve MAX compression if that is your goal. Order by some columns that
a) are "wide"
b) have few distinct values or have some large values that repeat alot.
A reader, January 12, 2005 - 11:28 am UTC
Lets say my goal is max compression. I determine the optimal ORDER BY to achieve this. I create the table as
create table t COMPRESS as
select ...
order by <optimal order of columns>;
Now are you saying that regardless of whether subsequent
INSERT /*+ APPEND */ statements use that same order by or not, I will still achieve the same compression? Or am I missing something?
Thanks
January 12, 2005 - 1:22 pm UTC
you will achieve optimal compression on the first load (the CTAS) for the loaded data.
You will achieve optimal compression of each "batch" as well.
but the entire set will not be as compressed as it could be after a couple of batched loads -- since the ENTIRE compressed set of data is not ordered -- just each batch.
So, probably as close as you'll get without rebuild the entire thing.
A reader, January 13, 2005 - 9:52 am UTC
"but the entire set will not be as compressed as it could be after a couple of batched loads -- since the ENTIRE compressed set of data is not ordered -- just each batch"
I dont understand. Suppose my CTAS used "order by c1,c2,c3". Each subsequent batch also uses insert /*+ append */ order by c1,c2,c3. So the data would (should?) end up in the same blocks since Oracle knows the table is compressed. Thats why I was expecting the dictionary to track the column order similar to *_tab_ind_columns.
So, if I get optimal comrpession using CTAS and even if I do all subsequent loads as direct-path loads, the table would still "slowly fall apart" i.e. slowly get less and less compressed? Thats not so good?
You are basically saying is that compressed table feature is only good for "compress once and dont touch it"?. That seems to limit the usability of this great feature?
Thanks
January 13, 2005 - 10:53 am UTC
say you have
c1 c2 c3
---- ---- -----
1 1 1
1 1 1
1 1 1
1 1 1
2 2 2
2 2 2
....
2 2 2
2 2 2
2 2 2
to begin with -- after the CTAS. The 1's and 2's (pretend they are big strings instead of 1's and 2's) are as compressed as they can be. Say on a single block.
Now you insert /*+ APPEND */ -- that *ADDS* to the 'end' of the table. It'll insert say:
c1 c2 c3
---- ---- -----
1 1 1
1 1 1
1 1 1
1 1 1
...
1 1 1
3 3 3
3 3 3
3 3 3
....
3 3 3
this time. This data is ADDED to the end of the table.
Each LOAD is compressed "to the max", however -- for this table, it would compress a little more if all of the 1's were together -- but they cannot be together since one was loaded yesterday and one was loaded today and insert /*+ append */ by its very definition never ever uses an existing block to store data (it always grabs space from above the HWM and writes directly to blocks).
I do not see this as a limit of anything, just a natural and somewhat obvious side effect of incrementally loading.
If your goal is "absolutely use the fewest bits on disk when you are done", you would have to re-order the entire set (which probably makes that trade off cost prohibitive, you would live (happily) with very good compression rather then rebuild the entire thing each time.
Thanks
A reader, January 13, 2005 - 12:19 pm UTC
Ah, that explains it nicely. I guess I was getting mixed up between how tables are compressed and how indexes are compressed
So, in your example above, if there were a index on (c1,c2,c3), that *would* be compressed after all the batch loads just as if it were to be created after all the batch loads, right? Because even the newly added rows would go to the same blocks as the earlier rows and thus be subject to the repeated key compression?
Thanks
January 13, 2005 - 1:00 pm UTC
index would be -- because rows have a "place" in an index.
table would not be -- because rows in a heap table are just put whereever they fall.
Costing compressed tables
A reader, January 18, 2005 - 3:29 pm UTC
How does the CBO cost accessing a compressed table as compared to costing a regular table?
Given a uncompressed heap table segment of x MB and a compressed heap table segment of x MB, everything else remaining the same, does the CBO cost the compressed table access (say FTS) higher because it has to perform additional (de)compression steps?
Thanks
January 18, 2005 - 3:46 pm UTC
the decompression is nominal (compressing = very cpu intensive, 'decompressing' is not).
It costs them the same, you can see this by simply "alter table t compress" -- now none of the data is compressed, but the database doesn't know that. the full scan cost (cpu and otherwise in 10g) is the same.
and if you alter table t move after that, the cost should decrease for a full scan, since there is less to scan.
A reader, March 16, 2005 - 9:32 am UTC
Partition compression
Arun Gupta, April 22, 2005 - 9:39 am UTC
Tom,
I am using the method described in your book Effective Oracle by Design, Chapter 7, Compress Auditing or Transaction History.
I have created a partitioned table T_LOGS based on month of year. The data is loaded daily from log files using external tables. Each partition has it's own tablespace and datafile. We do not want to delete any partition, just archive it.
Each month, typically 4GB of uncompressed data is loaded into that month's partition. After compression, the partition takes only like 1.2GB of space. The problem is that the table temp is also created in the same tablespace as that of partition. I need to create a tablespace of nearly 5.5GB so it can hold uncompressed data+compressed data. Once the partition is exchanged with temp, the tablespace is only 1.2GB filled and 4.3GB space is empty.
Is there a way this empty space can be avoided? The entire code is in package and scheduled through dbms_job so everything is in the database.
Thanks
April 22, 2005 - 10:47 am UTC
use two tablespaces? one where you load data into uncompressed, and the other were you store the compressed version of it...
in fact, that would let you make it read only... and have to back it up once and nevermore.
Partition compression
Arun Gupta, April 22, 2005 - 3:36 pm UTC
Forgot to mention that I tried two tablespaces also, one for table temp and other for partition. On exchange partition, the entire compressed data stays in the tablespace of temp table and the partition tablespace remains empty.
Here are the statements I use:
create table temp tablespace users_data compress as select * from t_logs partition(logs_apr_05) order by logs_user;
alter table t_logs exchange partition logs_apr_05 with table temp without validation;
drop table temp;
Thanks...
April 22, 2005 - 3:49 pm UTC
when you do an exchange "no data moves"
data stays where it should be, where it was.
My point is -- keep the UNCOMPRESSED data in tablespace "1" and the compressed data in tablespace "2"
After the exchange, drop tablespace "1". minimal storage needed and in use at that point.
Updates on compressed data
Kristian, May 31, 2005 - 3:14 am UTC
Tom, I would like to continue on a question that you answered a while ago in the upper parts of this thread.
<quote>If you start with a large compressed table and update each row -- each row will be decompressed and stored decompressed</quote>
I'm very interested to know if this is true for ALL update operations, or is it only true when you update parts of the row that is stored in the symbol table? In my case I have a table that is a good candidate for compression. It's partitioned by month and I would compress each partition as it no longer is the current month. But I will still have many but "small" updates against those old partitions that have been compressed. It's only one column that will updated, a char(1) from 'N' to 'Y' and this column is probably not a member of those columns listed in the symbol lookup table in the block header. (I'm not sure, but I would guess that one-byte sized columns are not so good candidates for compression). Now, what will happen when I update that column on a row that is compressed? Will the whole rows data be stored decompressed after the update? My tests show that the rowid is still the same so for me, and that could indicate that the row is still stored compressed since i have pctfree 0 on the blocks and there shouldn't be any place left for storing decompressed data in the same block.
By the way, from this document, </code>
http://www.oracle.com/technology/products/bi/pdf/o9ir2_compression_twp.pdf, <code>do you know what the author is trying to point at with this quote:?
<quote>UPDATE operations are 10-20% slower for compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables. These may be implemented in a future release of Oracle.</quote>
Thanks for your wonderful work in the Oracle community.
/Kristian
May 31, 2005 - 7:56 am UTC
Rowids are assigned to a row when it is inserted and do not change UNLESS:
a) row movement was enabled on the table AND you did one of the following:
1) updated the partition key of a partitioned table and caused the row
to move from partition A to B
2) flashed back the table in 10g
3) shrink the table in 10g
b) you have an IOT, but there the rowid is really a UROWID and they are not
compressed in the way a table is so it doesn't count.
But, the row is migrated upon update -- indicating that the update decompressed the row -- moved it off of the original block and is now just pointing to it. We can observe this:
ops$tkyte@ORA10G> create table t
2 COMPRESS
3 as
4 select decode( mod(object_id,2), 0, 'Y', 'N' ) flag, a.*
5 from all_objects a
6 /
Table created.
ops$tkyte@ORA10G> alter table t add constraint t_pk primary key(object_id);
Table altered.
ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA10G> @mystat continued
ops$tkyte@ORA10G> set echo off
NAME VALUE
------------------------------ ----------
table fetch continued row 19
ops$tkyte@ORA10G> select object_name, flag from t where object_id = 42;
OBJECT_NAME F
------------------------------ -
I_FILE2 Y
ops$tkyte@ORA10G> @mystat2
ops$tkyte@ORA10G> set echo off
NAME V DIFF
------------------------------ ---------- ----------
table fetch continued row 19 0
<b>In the processing of that query, we did not perform any table fetch continued row operations. So the rowid in the index (yes, it used the index, you can show that as well) pointed to the block the row exists on, now we update it:</b>
ops$tkyte@ORA10G> update t set flag = 'N' where object_id = 42;
1 row updated.
ops$tkyte@ORA10G> commit;
Commit complete.
ops$tkyte@ORA10G> @mystat continued
ops$tkyte@ORA10G> set echo off
NAME VALUE
------------------------------ ----------
table fetch continued row 19
ops$tkyte@ORA10G> select object_name, flag from t where object_id = 42;
OBJECT_NAME F
------------------------------ -
I_FILE2 N
ops$tkyte@ORA10G> @mystat2
ops$tkyte@ORA10G> set echo off
NAME V DIFF
------------------------------ ---------- ----------
table fetch continued row 20 1
ops$tkyte@ORA10G>
<b>and that time we did a table fetch continued row. What that shows is that we read the index, got the rowid went to the block where the row was originally and when we got there, just found a pointer to where the row is actually now</b>
That the row didn't fit on the block anymore indicates it is now stored elsewhere, decompressed
I'd have to assume that the last comment was about the fact that every update will result in row migration (as demonstrated) and the blocks won't be reorganized and such as they would be for noncompressed tables. Normally, an update will try to coalesce the block before actually migrating the row and migrating the row involves finding free space and so on -- extra code path.
Partition compression
NR, June 28, 2005 - 2:16 pm UTC
I got 90 days (rolling window) of daily partition (Around 28 GB daily),
everyday we add new partitions (new tablespace, new data and index files.
After adding compression (at the partition level) to an existing
uncompressed partitioned table, I regained 50% of the storage
allocated for the first partition. Also, when I compress subsequent
partitions, I do see number of blocks reduced significantly.
How can I get subsequent storage, initially allocated to uncompressed
partition at the time of partition creation, back on the heap so that
it can be used for incoming data?
I see only one way I can do that
1. Create a compressed table from uncompressed partition as I did for the first one.
2. Disable bitmap indexes and then exchange partition with the compressed table.
3. Rebuild bitmap indexes and reanalyze the partition.
Is there any other way it can be done, beside allocating small extent size ( e.g. 2 MB)?
June 28, 2005 - 2:40 pm UTC
if you use autoallocate tablespaces (LMTs) we'll perform extent trimming so when you go to compress a partition via the alter table t move partition compress -- it'll trim the last extent to a nice size.
Table copy of compressed table to compressed table slow ?
Jay, June 28, 2005 - 6:55 pm UTC
Here is the observation (on 9.2.0.4 EE):
create table big_table_compr
compress
as
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
;
SQL> create table big_table_compr2
2 compress
3 as
4 select * from big_table_compr;
Table created.
Elapsed: 00:00:04.03
SQL> create table big_table_uncompr
2 as
3 select * from big_table_compr;
Table created.
Elapsed: 00:00:00.09
Why does the creation of the compressed table big_table_compr2 take a lot longer than big_table_uncompr (both are created from the same compressed table).
I thought that creating a compressed table from another compressed table with no conditions should be fast, a block to block copy, but I guess Oracle is doing: Read - Decompress - Compress - Write.
If Oracle did indeed do a block to block copy without decompression and compression for creating big_table_compr2, it should be faster than creating big_table_uncompr (which is Read - Decompress - Write).
June 28, 2005 - 8:36 pm UTC
compressing data takes longer than not compressing data, you'll never to a "block to block copy" with a create table as select -- it simply runs a query and flows the data into a create table. It is decompressed and re-compressed.
Good for all the information
A user, August 05, 2005 - 11:23 am UTC
Tom:
I need you help here, we try to implement the partition and compression at the same time with one of the large table.
We need to move the data from the old table to the new table, we find the syntax for that
CREATE TABLE TEST1.FACETS_CLAIMS_COM
(
CLAIM_ID VARCHAR2(12 BYTE) NOT NULL,
CLAIM_SEQUENCE_NO NUMBER(9) NOT NULL,
AMOUNT_MEDICARE_APPROVED ......
.......
COMPRESS
PARTITION BY RANGE(PAID_DATE)
(
PARTITION claims_1stQtr_00 VALUES LESS THAN(TO_DATE('04/01/2000','MM/DD/YYYY')),
PARTITION claims_2ndQtr_00 VALUES LESS THAN(TO_DATE('07/01/2000','MM/DD/YYYY')),
however I do want to use 'ORDER BY'to get the best compression rate, do you know WHERE we can add that ORDER BY in the creation script.
Or what do you think is the best way to create the new partition , compress table (with order by )from the old BIG table with the cheapest MOVE cost.
Thanks a lot
August 05, 2005 - 1:57 pm UTC
offline (reads supported, but no writes), create the new table, insert /*+ append */ into new_table select * from old_table order by whatever you want. index it, whatever it, drop old_table, rename new table
How to get the num_distinct and avg length of a very large table
Ajeet, August 26, 2005 - 9:03 am UTC
Tom -
In effective oracle by design - you have demonstrated in a very good way that - if we choose columns which have most repeatable values that is minimum number of distinct values and it is also one of the wider columns of the table and then load data into compressed table using these columns in the order by clause of our load statement (insert /*+ append */ into t_compressed select * from t
order by c1,c2 ) then we save maximum space.
that is very true as well.
Problem is if I have table which is of very large size (one of my table has 80 GB of data) then how would i know that which are the good columns .
becuase in this case dbms_stats will take alot of time and resources as well.
thanks
August 26, 2005 - 9:05 am UTC
do you have someone in your place of work that knows the data? Domain knowledge is another very viable way.
And -- estimate can be used to sample a small bit of the overall table, you need not compute.
index compression
John, August 27, 2005 - 3:12 am UTC
what are the pros and cons of using compress option for index? Is there any performance impact of compressing index while reading or maintaining index during DMLs? Thanks.
August 27, 2005 - 5:26 am UTC
POSSIBLE CON: if you had a couple of hot leaf blocks before, you might have hotter ones now as they contain 2x, 3x, etc the amount of data.
POSSIBLE CON: they are more complex, hence might take more cpu
POSSIBLE PRO: they are smaller and could incurr less LIO, hence might take LESS cpu
(eg: the pro/con might well offset eachother)
real short list:
pros:
o smaller, less physical IO and logical IO are likely outcomes
o less disk space
o more room in buffer cache for other things
cons:
o smaller, hence possibly more hot in a transactional system
ORDER BY
Rennie, September 07, 2005 - 12:58 pm UTC
Tom,
This is on the examples given in your book 'Effective Oracle by design' page 440, where you have given your reasoning for selecting the column order for ORDER BY while creating the temp table.
Our process of compressing old partitions is very similar but without an ORDER BY clause. We have an utility which automates the process.
My question here is to identify the column to order by in a dynamic fashion. It is a generic utility.
I have pasted the query here below. If i were to sort my results on any one column, would this be my best guess? On the couple of tests i did (on test data) my query came right.
SELECT 'ORDER BY '||column_name order_by FROM
(SELECT column_name, num_distinct, MAX(avg_col_len) OVER () mx_d, pct, num_nulls, avg_col_len FROM
(SELECT column_name, num_distinct, ROUND((num_distinct/utp.num_rows) * 100,2) pct, num_nulls, avg_col_len
FROM
USER_PART_COL_STATISTICS upcs,
USER_TAB_PARTITIONS utp
WHERE upcs.PARTITION_NAME = :p_partition_name
AND upcs.table_name = :p_table_name
AND utp.partition_name = upcs.partition_name
AND utp.table_name = upcs.table_name
)
WHERE pct <= 20
AND num_nulls = 0)
WHERE MX_D = AVG_COL_LEN
AND ROWNUM < 2
/
Wanted to know if this would be atleast 85% accurate? Please let me know your views on this query.
Thanks,
September 07, 2005 - 1:58 pm UTC
I'd want the column that is very wide, with lots of duplicates.
You could come up with a ranking function based on the width and the "density" of a column (number distinct vs number rows)
I don't see an order by on your query? you are getting some random row from that result set.
Rennie, September 07, 2005 - 10:03 pm UTC
Tom,
I calculate a percentage as follows : (num_distinct/num_rows) * 100 and then among all those values which are less than 20%, i get the column which is the widest.
20% is the percentage cut i assumed based on 2 or 3 random tests.
... and thats where i wanted your opinion.
Many thanks..
September 08, 2005 - 8:36 am UTC
Ahh, I see, you used analytics to "sort"
Updating a compressed table
VA, September 19, 2005 - 10:56 am UTC
I have a compressed table with a size in dba_segments of 1.5GB. The table is always appended to using the /*+ append */ hint and a optimal ORDER BY to optimally compress the new incrementally loaded data. So far so good.
I needed to update a column for some millions of rows. Nothing major, just update some_num to some_num+0.1.
When I fired off the UPDATE statement, I saw that the size of the segment in dba_segments started to grow rapidly. It went up to 2.5GB and the UPDATE statement was not even finished. So, I did a Ctrl-C on the UPDATE statement and the update started rolling back.
But I see that my segment size is still 2.5GB :-(
So, looks like the UPDATE and its subsequent abort, somehow bumped up the HWM from 1.5GB to 2.5GB. Since my incremental loads always use the APPEND hint, the 1GB space in there will always be wasted and be scanned unnecessarily during a FTS.
Any ideas on how to handle this? Specifically,
1. Why did updating that 1 column start to balloon up the segment size?
2. How can I get my 1GB of unused space back?
Thanks
September 19, 2005 - 11:57 am UTC
that decompressed the entire table.
rollbacks are "logical", not physical - they do not put the table back the way it was physically -- just logically.
1) because updates always decompress, they have to.
2) alter table t move;
it'll recommpress the data as well.
alter table move
VA, September 19, 2005 - 1:24 pm UTC
But this wont let me modify the data. I would like to CTAS my new table with my modifications in the SELECT query.
Question: Do I need to specify the ORDER BY in my 'create table t compress as select ...' to get the same compression ratio as I did the first time I created the table? Would my botched UPDATE have re-arranged the data in the table so that it no longer optimally compressed?
Thanks
September 19, 2005 - 1:55 pm UTC
you asked how to get back the space.
answer: alter table move.
If you want to update this, and have it be compressed -- create table as select, drop and rename.
Given what you've done to the table - the update and rollback - the rows might have migrated (probably not). You can verify just by selecting the columns you wanted to order by from the existing table - and seeing if they still look "sorted"
or analyze the table and verify chain_cnt = 0 (rows did not move).
A full scan will read them in the order they exist.
compression with dbms_redef - some questiones..
Ajeet, September 24, 2005 - 7:41 am UTC
Tom,
In your "effective design by oracle" book- you have suggested to use an appropriate column to order by while loading the data in a compressed table - i have asked some question about it before in this thread.
Now I am trying to
-partition a table wich is not partitioned using dbms_redef
and then I want to compress the data using dbms_redef (in fact I am thinking to do these together -that is partition the non-partitioned table and compress the partitions ).
but now is there a way that i can ensure the order by (assuming I know a column which is appropraite for the order by while loading the data in the compressed table).
thanks
thanks
September 24, 2005 - 8:43 am UTC
dbms_redef doesn't use direct path operations, hence, the data would not be compressed.
You would use alter table T move partition P to compress the partition - OR, if you wanted to achieve maximum compression - you would
a) create table new_table compress as select * from t partition(p) order by ...
b) alter table exchange partition
LONG RAW
masum, September 27, 2005 - 6:44 am UTC
Hi Tom
I Have One Table Whit One Field Long Raw , I want compress this table but I can not ?
please Help Me ?
SQL> desc oa_letter_text
Name Null? Type
------------------------------- -------- ----
LETTER_SEQ NOT NULL NUMBER(20)
LETTER_TEXT NOT NULL LONG RAW
TEXT_LEN NUMBER(8,3)
September 27, 2005 - 11:34 am UTC
you cannot - it is not compressable. Even if you make this a clob, it would only be able to compress the number columns (not the letter_text, lobs are not compressable).
There is nothing that can be compressed here
i want compress long raw
masum, October 01, 2005 - 6:38 am UTC
hi tom
tanks but i want compress long raw do you know how?
October 01, 2005 - 9:04 pm UTC
you won't do it in the database - in 10g there is a utl package to compress, but it is for lobs.
you would have to insert compressed long raw data.
compress long raw in 9i
masum, October 02, 2005 - 1:21 am UTC
I insert data to long raw with compress but It 's not enough
can you help me?
October 02, 2005 - 10:43 am UTC
longs and long raws and clobs and blobs are not compressed in compressed tables.
I already answered:
"you would have to insert compressed long raw data. "
that is, YOU would compress data, then insert it. Upon retrieval you would uncompress it.
Formula?
Yuan, October 27, 2005 - 2:36 pm UTC
In deciding which are good sort columns to achieve optimum compression, do the 2 factors (length & cardinality) play an equal role? If not, is there a formula you could use to determine some sort of score?
As an example, I have found 3 possible columns to order by on my table of 2884 rows:
Column A B C
Average Length 10 6 18.66
Distinct Count 869 983 1468
I'm leaning towards A or C but not sure how to decide between them.
October 27, 2005 - 2:55 pm UTC
No, you could probably come up with one (I don't have one)
probablem is....
say table has 1,000,000 rows.
say C has 1468 values....
1467 of them are unique (and are short)
1 of them repeats the other almost 1,000,000 times (and is long)
very very very compressable.
now, change the assumptions a bit - you could change your mind.
David Aldridge http://oraclesponge.blogspot.com, October 27, 2005 - 3:45 pm UTC
>> say C has 1468 values....
1467 of them are unique (and are short)
1 of them repeats the other almost 1,000,000 times (and is long)
very very very compressable. <<
I suppose you might get close to an estimation of the saving due to ordering by "col1" and "col2" by using something similar to:
select count(*) dist_vals,sum(bytes) saving
from
(
select least((vsize(col1||col2)-1),0)*count(*) bytes
from my_table
group by col1||col2
)
It accounts for both value length and repetitions, but would be crude at best. You really can't go wrong by just compressing the data and finding out.
another question
Ajeet, December 02, 2005 - 5:17 am UTC
Tom,
I am compressing a table using alter table move..and this table is getting populated at every 5 minutes using and Application .is it possible that the compression operation will lock the table and so the data load will fail/hang.
I tried to replicate it and seems like compression does allow the insert on the table during the compression.
is my understanding correct.
thanks
Ajeet
December 02, 2005 - 10:57 am UTC
alter table move will absolutely lock the table. alter table move is an offline operation (except for IOTs perhaps)
your data load program, why doesn't it load the data compressed from the get go? seems like you are trying to fix a data loader problem (in a bad way)
another question
Ajeet, December 02, 2005 - 5:18 am UTC
Tom,
I am compressing a table using alter table move..and this table is getting populated at every 5 minutes using and Application .is it possible that the compression operation will lock the table and so the data load will fail/hang.
I tried to replicate it and seems like compression does allow the insert on the table during the compression.
is my understanding correct.
thanks
Ajeet
it is an existing table
Ajeet, December 03, 2005 - 7:03 am UTC
Tom,
no - i'm not trying to fix a data loader issue as such.
this table has been in production since last 3 yrs and now we want to apply compression (current table size is approx. 1200 gb) on this table , as it is in production want to know a way to minimize the downtime.
seems like - i need to take a different approach.
thanks
December 03, 2005 - 10:30 am UTC
You will want to implement partitioning with this, just compressing the table isn't going to be useful long term as the newly inserted rows will be "not compressed"
are you ready to partition it? then we can pretty much do this online (assuming the old partitions that are already loaded won't be modified and if they are heavily modified then compressing this segement would be a bad idea all around)
yes - this table is a partitioned table (range+hash)
Ajeet, December 04, 2005 - 2:39 am UTC
Tom,
this table is a partitioned table.
i did'nt understand how can we do online if it is partitioned.pls let me know.
thanks
Ajeet
December 04, 2005 - 6:30 am UTC
did you see the parenthetical part, you did not respond to that:
(assuming
the old partitions that are already loaded won't be modified and if they are
heavily modified then compressing this segement would be a bad idea all around)
can you address that. are there partitions that are LOADED but not MODIFIED (if not, I question the usefulness of compressing)
yes data in old partitions are never modified
Ajeet, December 04, 2005 - 10:18 am UTC
Tom,
Yes - this table is partitioned by day and the data in old partitions are never to be modified - the partitions can be dropped or truncated (as this table is supposed to keep rolling 3 months of data ).and that is why we choose to compress this table after complete testing in qa,but this online part was not told before.
please advice.
thanks
Ajeet
December 04, 2005 - 11:42 am UTC
don't compress the partition being modified.
Just compress the partitions that are FINISHED with all modificiations.
alter that table, move each PARTITION that is "done" and compress it. It'll be an "offline operation" but only for that partition. do NOT compress the entire table, just do the old partitions.
it worked
Ajeet, December 05, 2005 - 4:24 am UTC
Thanks - it worked.I tested it and if i exclude the current partition , the online data load (which touches the current partition only ) works fine.and the compression scripts (which is compressing the older partitions) works fine too.
thanks for help.
Ajeet
Is there a work around on adding a column to a compressed table in 9iR2?
A Reader, February 23, 2006 - 5:01 pm UTC
Tom,
Our customer needs to add a new column to a compressed table. We cannot do it. Can we find a way to do that without "decmpressing" it in 9i? Customer says it's an emergence.
Thank you in advance.
February 23, 2006 - 8:10 pm UTC
there is not, not in 9i. It will have to be decompressed in order to add it.
It can be done online with dbms_redefintion.
A reader, March 14, 2006 - 4:24 am UTC
You mentioned above that in 10g there is a utl package to compress LOBs. What's the name of this package and do you recommend using it to save space for LOBs (BLOBs) that are very large in size?
Unusable Indexes
Vikas Khanna, May 17, 2006 - 7:33 am UTC
Hi tom,
We are using the Compress option on a huge partitioned table worth 10G. After we compress the old partitions the indexes on that table become UNUSABLE.
The no. of rows in the table are aroung 400 million. We have a column which has NULL value for 350 million rows and takes around 10 mins to rebuild in parallel ( 4 CPUs)
Since B* tree indexes don't have NULLS is the rebuild option scans the whole table to build the index.
Is there any other way to do the things in fast manner.
Thanks
May 17, 2006 - 8:01 am UTC
It has nothing to do with NULLS, it has everything to do with the fact that you changed the rowid of every single row - and the existing index data is therefore useless during the rebuild.
You have to rebuild the index from the source data - so we can get the new rowids picked up.
Still doubts
Vikas, May 17, 2006 - 12:23 pm UTC
Hi Tom,
Understood that while compression the rowids associated with rows gets changed & hence the index needs to be rebuild.
Can we somehow rebuild the index on some wanted rows where a particular column value is not null & thus operation becomes fast to index only 50 million rows instead of 400 million rows and growing continusoly,
Appreciate an answer!
May 18, 2006 - 9:53 am UTC
in order to find the 50 million out of the 400 million, you sort of have to look at the 400 million.
How to determine when to recompress a table?
Richard Smith, May 17, 2006 - 10:53 pm UTC
We have many compressed tables in our data warehouse. Occasionally we have to update compressed data which leaves some uncompressed data scattered in various partitions in the updated tables.
Over time this causes more tablespace usage and lost buffer cache capacity due to uncompressed data blocks.
Is there a way to determine if a data segment is stored in compressed or "normal" format?
May 18, 2006 - 10:52 am UTC
not without dumping blocks, nothing that I now of.
a segment contains blocks that are compressed and uncompressed. a segment itself would not necessarily be "compressed or not" (it can be, but not in the case you describe - in your case, the segment is both compressed and uncompressed)
statistical techniques could help
Phil Miesle, May 19, 2006 - 7:23 am UTC
While there may not be a way to run a query to determine through a dictionary query the "percent of decompressed blocks" you could probably do some statistical inference.
Take a partition that you believe to be representative.
Do a CTAS into a compressed table and analyze the table. Note down the Rows Per Block statistic.
Now uncompress all the data in the new table, analyze it, and note the Rows Per Block statistic.
Finally, look at the Rows Per Block statistic on your 'real' partition (being sure to have fresh statistics).
You now have a rough gauge as to how "decompressed" your real partition is. For example:
RpB, 100% decompressed: 25
RpB, 0% decompressed: 70
RpB, X% decompressed: 36 <- your 'real' table/partition
A little algebra and solve for X:
25 = a * 0 + b
70 = a * 100 + b
36 = a * X + b
b=25 (line 1)
a=(70-25)/100 = 0.45 (line 2)
X=(36-25)/0.45 = 24.4% (line 3)
Now, this model makes one key assumption:
--> All records decompress at the same rate (Y = a * X + b)
And you *need* to validate that the model is in fact valid for your data set! The model is in fact unlikely to be true when comparing two records individually...but it probably holds statistically true for a large data set.
You can test for this by decompressing known, fixed percentages of your 100% compressed table, measuring the Rows Per Block, and comparing against the model. If the model holds true (allowing for some statistical error) this technique may work for you.
To confirm this, you should then test (in a similar fashion) a few other partitions to validate the model validity for the entire table. A data warehouse is probably partitioned by dates, so compare with a similar partition from the previous year, and compare across a single year in the 'peaks' and 'troughs' of your business activity.
You might need to build a more complex model, but you should be able to use the data in your system to build a statistical model that would accurately reflect your data.
Good luck!
How can I tell what segments are compressed?
Robert, May 30, 2006 - 1:57 pm UTC
Tom,
I see all_part_tables and all_tables have compression columns which contain information about whether compression is enabled or not. How can I tell if a table, table partition or table subpartition is actually compressed or not, so I don't unnecesarilly compress it again?
May 30, 2006 - 2:03 pm UTC
you cannot, since they may all contain a mixture of compressed and uncompressed data.
1/2 the blocks could be "compressed", 1/2 the blocks could be uncompressed.
Suggestion - use a tablespace naming technique to do this. If you load data compressed (direct path insert, direct path load, create table as select) - load into a tablespace named "something_comp"
If you use alter/move to compress data - move the data into an "_comp" tablespace.
Compressed partitions
NR, June 07, 2006 - 1:39 pm UTC
Tom,
we have 2 partitioned fact tables for datawarehouse that we created as compressed ( we don't modify or update once in datawarehouse).
So all the partitions for both the fact tables are compressed as well. When I run queries against dba views, I see compression enabled. We are using SQLLDR direct path load to load data into partitions as it been suggested approach to load data into compressed partition. But I am not seeing any compression after we are done loading and anyalyzing the partitions.
But when we specifically perform following statement, then we see compression for these partitions
alter table <table_name> move partition <partition_name> compress;
Also we are creating a single partition per tablespace(LMT, autoallocate) with a datafile maxsize unlimited.
e.g. Tablespace T1 contains P1 and datafile D1.
Now question do we have to specifically run above statement to compress?
Are we missing out any specific parameters from sqlldr directo load. Here is what we are using.
silent=all
discardmax=10000
direct=true
parallel=true
skip_unusable_indexes=true
skip_index_maintenance=true
Thanks in advance for your valuable suggestions.
June 07, 2006 - 3:34 pm UTC
look at your log file from sqlldr, does it confirm that you are doing direct pathing?
Compressed Partition
NR, June 08, 2006 - 9:04 am UTC
Yes I confirmed that we are using direct path load with parallel option from the sqlldr log file.But here is something from ctl file,
APPEND INTO TABLE <table_name>
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
As you see we are using append option here and I have read somewhere either from one of your book or some oracle book. That using append option will add data as uncompresed "Is this true?"
Once again thanks for your valuable suggestions.
Regards
NR
June 08, 2006 - 10:07 am UTC
Ok, parallel direct path load, each load process will get its OWN extent (and no two processes will use the same extent). Existing free space is never used.
So say you do parallel 20
And you have a uniform extent size of 100mb
And each process loads 101mb of data.
You'll have 20 extents with 99mb free after the load (20 full, 20 with 1mb and 99mb free)..
then after you do an alter move - will release the 20x99mb of free space. I think that is what is happening. Here is an example that direct path appends data - and then alter moves it - showing no difference before and after. Then it conventional path loads it and shows a big before/after difference.
So, tell us, what about the tablespace you are loading into (you might consider SYSTEM ALLOCATED extent sizes - they can be trimmed, uniform ones CANNOT be trimmed)
ops$tkyte@ORA10GR2> create table big_table
2 COMPRESS
3 as
4 select *
5 from big_table.big_table
6 where rownum <= 1000000
7 /
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> REM !sqlldr_exp / big_table > big_table.ctl
ops$tkyte@ORA10GR2> !head big_table.ctl
LOAD DATA
INFILE *
INTO TABLE big_table
APPEND
FIELDS TERMINATED BY '|'
(
id
,owner
,object_name
,subobject_name
ops$tkyte@ORA10GR2> !sqlldr / big_table direct=y
SQL*Loader: Release 10.2.0.1.0 - Production on Thu Jun 8 09:51:20 2006
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Load completed - logical record count 1000000.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec show_space('BIG_TABLE');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 9,798
Total Blocks............................ 10,240
Total Bytes............................. 83,886,080
Total MBytes............................ 80
Unused Blocks........................... 299
Unused Bytes............................ 2,449,408
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 22,025
Last Used Block......................... 725
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> alter table big_table move;
Table altered.
ops$tkyte@ORA10GR2> exec show_space('BIG_TABLE');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 9,798
Total Blocks............................ 10,240
Total Bytes............................. 83,886,080
Total MBytes............................ 80
Unused Blocks........................... 299
Unused Bytes............................ 2,449,408
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 6,153
Last Used Block......................... 725
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> !sqlldr / big_table > /dev/null
ops$tkyte@ORA10GR2> exec show_space('BIG_TABLE');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 147
Full Blocks ..................... 25,758
Total Blocks............................ 26,624
Total Bytes............................. 218,103,808
Total MBytes............................ 208
Unused Blocks........................... 512
Unused Bytes............................ 4,194,304
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 21,641
Last Used Block......................... 512
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> alter table big_table move;
Table altered.
ops$tkyte@ORA10GR2> exec show_space('BIG_TABLE');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 14,723
Total Blocks............................ 14,976
Total Bytes............................. 122,683,392
Total MBytes............................ 117
Unused Blocks........................... 91
Unused Bytes............................ 745,472
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 41,481
Last Used Block......................... 549
PL/SQL procedure successfully completed.
Workaround to add cols to a 9i compressed table?
Charlie B., June 08, 2006 - 4:08 pm UTC
Great thread!
I used to be able to add a column to a table, anywhere within the table, with older releases of Oracle (7.x). I'd export the table, drop and recreate it with the new column but the same table name, and then import it. Import was smart enough to leave the new column null as long as we hadn't defined it as NOT NULL. (This was probably with DIRECT=N.)
In a 9i environment, could we do something similar with a compressed table to add a column? Is this a workaround to the 9i restriction regarding adding a column to a compressed table?
June 08, 2006 - 7:57 pm UTC
I never would have done that (views can be used to put a new column "anywhere"), at MOST I would have used "create table as select"
which is likely what you'll have to do in 9i - copy the data, but use create table as select (which can be compressed and add the column - even a "not null" one).
You have the same storage requirements - less even, since the newly created table will be compressed whereas the export would likely not be.
Workaround to add cols - follow-up
Charlie B., June 13, 2006 - 8:36 am UTC
Tom -
Thanks again! I agree, I saw no need to put cols in the middle of the table, but got overridden politically. That happens to all of us sometimes.
I'll keep the CTAS option in mind - already experimenting with that, very good results so far - and if I get a chance I may see if I can export/import to add a column at the end of a table. In my current environment that's usually all we need to do.
I'm all about options: if I can either CTAS or export/import to add a column, then I can choose whichever is more suitable to the current need.
use compression for not read-only tables
Jan, September 01, 2008 - 8:57 am UTC
If their are no real cons for compression, except the bulk insert only, would it be useful to compress tables with daily inserts/updates?
Our servers are practically unused outside business hours, so we have the time to compress them overnight every few days.
September 01, 2008 - 1:16 pm UTC
only you could answer that.
would you benefit from compression day to day? would it do anything for you? And would the rather large cost of compressing the entire table every day would have to be considered.
(and in 11g, conventional path operations are compressable now as well, you would not need to compress the entire table every day, the inserts would compress themselves as they were executed)
Compression difference
goiyala3, April 19, 2010 - 1:03 am UTC
Tom
I am using oracle 10.1.0.2. I see a huge difference between CTAS and Bulk Insert?
SQL> create table tt as select * from dba_objects;
Table created.
SQL> create table ttc compress as select * from dba_objects;
Table created.
SQL> create table tti compress as select * from dba_objects where 1=2;
Table created.
SQL> insert /*+ append */ into tti select * from dba_objects;
49570 rows created.
SQL> commit;
Commit complete.
SQL> select segment_name,sum(bytes) from dba_extents where segment_name in ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
------------------------------ ----------
TT 6291456
TTC 2097152
TTI 5242880
SQL> alter table TTI move;
Table altered.
SQL>
SQL> select segment_name,sum(bytes) from dba_extents where segment_name in ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
------------------------------ ----------
TT 6291456
TTC 2097152
TTI 2097152
SQL>
Why Append is 15% gain and CTAS has 70% gain?
Please give your valuable comment
April 19, 2010 - 8:52 am UTC
ops$tkyte%ORA10GR1> create table tt as select * from dba_objects;
Table created.
ops$tkyte%ORA10GR1> create table ttc compress as select * from dba_objects;
Table created.
ops$tkyte%ORA10GR1> create table tti compress as select * from dba_objects where 1=2;
Table created.
ops$tkyte%ORA10GR1> insert /*+ append */ into tti select * from dba_objects;
48686 rows created.
ops$tkyte%ORA10GR1> select count(*) from tti;
select count(*) from tti
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ops$tkyte%ORA10GR1> commit;
Commit complete.
ops$tkyte%ORA10GR1> select segment_name,sum(bytes) from dba_extents where segment_name in ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
------------------------------ ----------
TT 6291456
TTC 2097152
TTI 2097152
ops$tkyte%ORA10GR1> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Prod
PL/SQL Release 10.1.0.4.0 - Production
CORE 10.1.0.4.0 Production
TNS for Linux: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.0 - Production
something must be preventing your append from working - do a select count(*) from tti AFTER you insert and BEFORE you commit and make sure you see:
select count(*) from tti
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
if you do not, you are not getting a direct path load.
I did this in 10.1.0.4 - but I would expect no difference in 10.1
Make sure table has no triggers, no referential integrity.
Also, measure NOT the size of the segment - which includes UNUSED allocated space as well as USED allocated space, do this instead:
ops$tkyte%ORA10GR1> create or replace procedure show_space
2 ( p_segname in varchar2,
3 p_owner in varchar2 default user,
4 p_type in varchar2 default 'TABLE',
5 p_partition in varchar2 default NULL )
6 -- this procedure uses authid current user so it can query DBA_*
7 -- views using privileges from a ROLE and so it can be installed
8 -- once per database, instead of once per user that wanted to use it
9 authid current_user
10 as
11 l_free_blks number;
12 l_total_blocks number;
13 l_total_bytes number;
14 l_unused_blocks number;
15 l_unused_bytes number;
16 l_LastUsedExtFileId number;
17 l_LastUsedExtBlockId number;
18 l_LAST_USED_BLOCK number;
19 l_segment_space_mgmt varchar2(255);
20 l_unformatted_blocks number;
21 l_unformatted_bytes number;
22 l_fs1_blocks number; l_fs1_bytes number;
23 l_fs2_blocks number; l_fs2_bytes number;
24 l_fs3_blocks number; l_fs3_bytes number;
25 l_fs4_blocks number; l_fs4_bytes number;
26 l_full_blocks number; l_full_bytes number;
27
28 -- inline procedure to print out numbers nicely formatted
29 -- with a simple label
30 procedure p( p_label in varchar2, p_num in number )
31 is
32 begin
33 dbms_output.put_line( rpad(p_label,40,'.') ||
34 to_char(p_num,'999,999,999,999') );
35 end;
36 begin
37 -- this query is executed dynamically in order to allow this procedure
38 -- to be created by a user who has access to DBA_SEGMENTS/TABLESPACES
39 -- via a role as is customary.
40 -- NOTE: at runtime, the invoker MUST have access to these two
41 -- views!
42 -- this query determines if the object is a ASSM object or not
43 begin
44 execute immediate
45 'select ts.segment_space_management
46 from dba_segments seg, dba_tablespaces ts
47 where seg.segment_name = :p_segname
48 and (:p_partition is null or
49 seg.partition_name = :p_partition)
50 and seg.owner = :p_owner
51 and seg.tablespace_name = ts.tablespace_name'
52 into l_segment_space_mgmt
53 using p_segname, p_partition, p_partition, p_owner;
54 exception
55 when too_many_rows then
56 dbms_output.put_line
57 ( 'This must be a partitioned table, use p_partition => ');
58 return;
59 end;
60
61
62 -- if the object is in an ASSM tablespace, we must use this API
63 -- call to get space information, else we use the FREE_BLOCKS
64 -- API for the user managed segments
65 if l_segment_space_mgmt = 'AUTO'
66 then
67 dbms_space.space_usage
68 ( p_owner, p_segname, p_type, l_unformatted_blocks,
69 l_unformatted_bytes, l_fs1_blocks, l_fs1_bytes,
70 l_fs2_blocks, l_fs2_bytes, l_fs3_blocks, l_fs3_bytes,
71 l_fs4_blocks, l_fs4_bytes, l_full_blocks, l_full_bytes, p_partition);
72
73 p( 'Unformatted Blocks ', l_unformatted_blocks );
74 p( 'FS1 Blocks (0-25) ', l_fs1_blocks );
75 p( 'FS2 Blocks (25-50) ', l_fs2_blocks );
76 p( 'FS3 Blocks (50-75) ', l_fs3_blocks );
77 p( 'FS4 Blocks (75-100)', l_fs4_blocks );
78 p( 'Full Blocks ', l_full_blocks );
79 else
80 dbms_space.free_blocks(
81 segment_owner => p_owner,
82 segment_name => p_segname,
83 segment_type => p_type,
84 freelist_group_id => 0,
85 free_blks => l_free_blks);
86
87 p( 'Free Blocks', l_free_blks );
88 end if;
89
90 -- and then the unused space API call to get the rest of the
91 -- information
92 dbms_space.unused_space
93 ( segment_owner => p_owner,
94 segment_name => p_segname,
95 segment_type => p_type,
96 partition_name => p_partition,
97 total_blocks => l_total_blocks,
98 total_bytes => l_total_bytes,
99 unused_blocks => l_unused_blocks,
100 unused_bytes => l_unused_bytes,
101 LAST_USED_EXTENT_FILE_ID => l_LastUsedExtFileId,
102 LAST_USED_EXTENT_BLOCK_ID => l_LastUsedExtBlockId,
103 LAST_USED_BLOCK => l_LAST_USED_BLOCK );
104
105 p( 'Total Blocks', l_total_blocks );
106 p( 'Total Bytes', l_total_bytes );
107 p( 'Total MBytes', trunc(l_total_bytes/1024/1024) );
108 p( 'Unused Blocks', l_unused_blocks );
109 p( 'Unused Bytes', l_unused_bytes );
110 p( 'Last Used Ext FileId', l_LastUsedExtFileId );
111 p( 'Last Used Ext BlockId', l_LastUsedExtBlockId );
112 p( 'Last Used Block', l_LAST_USED_BLOCK );
113 end;
114 /
Procedure created.
ops$tkyte%ORA10GR1> set define on
ops$tkyte%ORA10GR1>
ops$tkyte%ORA10GR1> exec show_space( 'TTI' );
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 210
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 34
Unused Bytes............................ 278,528
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 8,713
Last Used Block......................... 94
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR1> exec show_space( 'TTC' );
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 210
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 34
Unused Bytes............................ 278,528
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 8,457
Last Used Block......................... 94
PL/SQL procedure successfully completed.
compress ratio
goiyala3, April 21, 2010 - 8:03 am UTC
SQL> create table tt as select * from dba_objects;
Table created.
SQL> create table ttc compress as select * from dba_objects;
Table created.
SQL> create table tti compress as select * from dba_objects where 1=2;
Table created.
SQL> insert /*+ append */ into tti select * from dba_objects;
24332 rows created.
SQL> select count(*) from tti;
select count(*) from tti
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
SQL> commit;
Commit complete.
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE 10.2.0.3.0 Production
TNS for Solaris: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production
SQL> select segment_name,sum(bytes) from dba_extents where segment_name in
2 ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
-------------------- ----------
TTC 1081344
TT 3538944
TTI 3571712
Is there any other parameter to be set?
April 21, 2010 - 9:05 am UTC
I don't have 10.2.0.3 to play with - but I cannot reproduce in any release I do have.
again, please - use what I asked you to use? dba_extents shows what is allocated - NOT what is used.
Compression issue in Bulk insert
A reader, April 22, 2010 - 7:55 am UTC
Tom
Following are the actual values running the show space script
FOR tt
=======
Free Blocks............................. 0
Total Blocks............................ 768
Total Bytes............................. 6,291,456
Total MBytes............................ 6
Unused Blocks........................... 85
Unused Bytes............................ 696,320
Last Used Ext FileId.................... 1
Last Used Ext BlockId................... 63,113
Last Used Block......................... 43
FOR ttc
=======
Free Blocks............................. 0
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 23
Unused Bytes............................ 188,416
Last Used Ext FileId.................... 1
Last Used Ext BlockId................... 64,265
Last Used Block......................... 105
FOR TTI
=======
Free Blocks............................. 0
Total Blocks............................ 640
Total Bytes............................. 5,242,880
Total MBytes............................ 5
Unused Blocks........................... 26
Unused Bytes............................ 212,992
Last Used Ext FileId.................... 1
Last Used Ext BlockId................... 65,033
Last Used Block......................... 102
tom, I checked with my friend also. He also confirmed that. After doing Bulk insert, move the table to get actual compression.
April 22, 2010 - 8:34 am UTC
it is entirely curious to me that these numbers are not anywhere NEAR the other numbers you reported.
SQL> select segment_name,sum(bytes) from dba_extents where segment_name in
2 ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
-------------------- ----------
TTC 1081344
TT 3538944
TTI 3571712
please do explain how that math works?
I would like to see your work - from start to finish - and I would like to see consistent numbers.
I think you are doing something *wrong* here, it quite simply does not work the way you are showing it.
(and I just realized that your query against dba_extents - completely wrong, where is the owner = part, use USER_SEGMENTS or write a correct query please)
I want you to run this script:
and anyone else out there that has 10.2.0.3 or 10.2.anything - feel free to do the same and post as well - I'd like to see what others seedrop create table tt as select * from dba_objects;
create table ttc compress as select * from dba_objects;
create table tti compress as select * from dba_objects where 1=2;
insert /*+ append */ into tti select * from dba_objects;
select count(*) from tti;
commit;
select * from v$version;
select segment_name,sum(bytes) from user_extents where segment_name in
('TT','TTC','TTI') group by segment_name;
exec show_space( 'TT' )
exec show_space( 'TTC' )
exec show_space( 'TTI' )
and post the results just like this:
ops$tkyte%ORA10GR2> drop table tt purge;
Table dropped.
ops$tkyte%ORA10GR2> drop table tti purge;
Table dropped.
ops$tkyte%ORA10GR2> drop table ttc purge;
Table dropped.
ops$tkyte%ORA10GR2> create table tt tablespace users as select * from dba_objects;
Table created.
ops$tkyte%ORA10GR2> create table ttc compress tablespace users as select * from dba_objects;
Table created.
ops$tkyte%ORA10GR2> create table tti compress tablespace users as select * from dba_objects where 1=2;
Table created.
ops$tkyte%ORA10GR2> insert /*+ append */ into tti select * from dba_objects;
50477 rows created.
ops$tkyte%ORA10GR2> select count(*) from tti;
select count(*) from tti
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
ops$tkyte%ORA10GR2> commit;
Commit complete.
ops$tkyte%ORA10GR2> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
ops$tkyte%ORA10GR2> select segment_name,sum(bytes) from user_extents where segment_name in
2 ('TT','TTC','TTI') group by segment_name;
SEGMENT_NAME SUM(BYTES)
------------------------------ ----------
TTC 2097152
TTI 2097152
TT 6291456
ops$tkyte%ORA10GR2> exec show_space( 'TT' )
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 693
Total Blocks............................ 768
Total Bytes............................. 6,291,456
Total MBytes............................ 6
Unused Blocks........................... 55
Unused Bytes............................ 450,560
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 1,673
Last Used Block......................... 73
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> exec show_space( 'TTC' )
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 205
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 39
Unused Bytes............................ 319,488
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 2,057
Last Used Block......................... 89
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> exec show_space( 'TTI' )
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 205
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 39
Unused Bytes............................ 319,488
Last Used Ext FileId.................... 4
Last Used Ext BlockId................... 2,185
Last Used Block......................... 89
PL/SQL procedure successfully completed.
what I see in 10.2.0.4.0
Duke Ganote, April 22, 2010 - 3:25 pm UTC
I used your SHOW_SPACE proc from here:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5350053031470#1752788300346387205 and got:
SQL> create table tt as select * from dba_objects;
Table created.
SQL> create table ttc compress as select * from dba_objects;
Table created.
SQL> create table tti compress as select * from dba_objects where 1=2;
Table created.
SQL> insert /*+ append */ into tti select * from dba_objects;
85218 rows created.
SQL> select count(*) from tti;
select count(*) from tti
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel
SQL> commit;
Commit complete.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production
SQL> select segment_name,sum(bytes) from user_extents where segment_name in
2 ('TT','TTC','TTI') group by segment_name;
SEGMENT_NA SUM(BYTES)
---------- ------------
TTC 3,407,872
TT 10,223,616
TTI 3,407,872
SQL> exec show_space( 'TT' );
PL/SQL procedure successfully completed.
SQL> exec show_space( 'TTC' );
PL/SQL procedure successfully completed.
SQL> exec show_space( 'TTI' );
PL/SQL procedure successfully completed.
SQL> set server output on;
SP2-0158: unknown SET option "server"
SQL> set serveroutput on;
SQL> exec show_space( 'TT' );
Free Blocks............................. 0
Total Blocks............................ 624
Total Bytes............................. 10,223,616
Total MBytes............................ 9
Unused Blocks........................... 1
Unused Bytes............................ 16,384
Last Used Ext FileId.................... 762
Last Used Ext BlockId................... 134,858
Last Used Block......................... 7
PL/SQL procedure successfully completed.
SQL> exec show_space( 'TTC' );
Free Blocks............................. 0
Total Blocks............................ 208
Total Bytes............................. 3,407,872
Total MBytes............................ 3
Unused Blocks........................... 1
Unused Bytes............................ 16,384
Last Used Ext FileId.................... 592
Last Used Ext BlockId................... 55,546
Last Used Block......................... 7
PL/SQL procedure successfully completed.
SQL> exec show_space( 'TTI' );
Free Blocks............................. 0
Total Blocks............................ 208
Total Bytes............................. 3,407,872
Total MBytes............................ 3
Unused Blocks........................... 1
Unused Bytes............................ 16,384
Last Used Ext FileId.................... 592
Last Used Ext BlockId................... 55,490
Last Used Block......................... 7
PL/SQL procedure successfully completed.
April 22, 2010 - 4:36 pm UTC
you see what I see - exactly.
Table Compression
goiyala3, May 03, 2010 - 7:06 am UTC
Tom,
Surprisingly, I got the same differences in space utilisation in Solaris 5.9 Oracle 10204. But In windows XP Oracle 10204 there is no difference.
May 06, 2010 - 11:44 am UTC
and yet, you persist in not posting the exact output as requested :(
And "i got the same differences" - given that were are two sets of differences we've been talking about - I don't see how anyone could figure out if you are saying:
works on solaris but not windows
or
works on windows but not solaris...
How about this - you just post a straight cut and paste of what I've asked for from the system that DOES NOT WORK, nothing else - nothing more, nothing less...
performance using Compression
maverick, June 10, 2010 - 2:14 pm UTC
Oracle 10g Rel 2:
I have created two tables, one compressed TEST_FACT_COMP
[CTAS order by "least distinct values" column to "most distinct values" columns]
and another one TEST_FACT uncompressed [CTAS order by 1,2,3 dimension values]
Gathered Statistics on both the tables using
dbms_stats.gather_schema_stats('[SCHEMA_NAME]');
And Issued the following query on both the tables:
select dim_1_id,dim_2_id,count(distinct cust_id)
from TEST_FACT_COMP
group by dim_1_id,dim_2_id;
select dim_1_id,dim_2_id,count(distinct cust_id)
from TEST_FACT
group by dim_1_id,dim_2_id;
in that order..
Here is the autotrace output for those:
1) COMPRESSED Table TEST_FACT_COMP
Execution Plan
----------------------------------------------------------
Plan hash value: 3736740383
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52609 | 667K| | 19995 (13) | 00:04:00 |
| 1 | SORT GROUP BY | | 52609 | 667K| 190M| 19995 (13)| 00:04:00 |
| 2 | TABLE ACCESS FULL| TEST_FACT_COMP | 8244K| 102M| | 4047 (11)| 00:00:49 |
-----------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
69 recursive calls
10 db block gets
16325 consistent gets
11661 physical reads
0 redo size
24921 bytes sent via SQL*Net to client
986 bytes received via SQL*Net from client
109 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
1616 rows processed
2) Table TEST_FACT
Plan hash value: 3736740383
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52881 | 671K| | 28383 (10)| 00:05:41 |
| 1 | SORT GROUP BY | | 52881 | 671K| 190M| 28383 (10)| 00:05:41 |
| 2 | TABLE ACCESS FULL| TEST_FACT | 8240K| 102M| | 12442 (5)| 00:02:30 |
----------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
50 recursive calls
4 db block gets
54285 consistent gets
60499 physical reads
0 redo size
24921 bytes sent via SQL*Net to client
987 bytes received via SQL*Net from client
109 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
1616 rows processed
I ran the following query for these two tables
select segment_name,trunc(bytes / 1024 / 1024 / 1024,2) gigabytes, blocks, extents
from user_segments
where segment_type='TABLE'
and segment_name in ('TEST_FACT','TEST_FACT_COMP');
here is the output:
SEGMENT_NAME GIGABYTES BLOCKS EXTENTS
TEST_FACT_COMP 0.12 16512 129
TEST_FACT 0.41 54656 427
Questions:
1) Eventhough, Physical reads and CPU COST of both the tables differ a lot [COMPRESSED Table is better],
the output time [which was 2:30 sec] is same for both the queries [Not sure why? in fact table that
is not compressed , fetches results in few seconds less]
2) Under TIME Column in plan, does it show real execution time? because it is not even close to real
time it took to fetch results :-)
I see that compressed table did more recursive calls than regular table..not sure what it means..
Can you throw some light on this??
June 11, 2010 - 7:04 am UTC
what is 2:30 sec.?
use tkprof, do not use wall clocks.
show your work, I see no timing information whatsoever
If you mean the 00:02:30 from the explain plan, that is a GUESS, not what actually happens, it is purely a GUESS.
Overall - which performs better, takes less storage. Use a took like SQL_TRACE and TKPROF so you are not timing things like "how long does it take sqlplus to draw on the screen". tkprof reports will show you the true elapsed time and cpu times of your queries
I'l run with TKPROF..
A reader, June 11, 2010 - 8:13 am UTC
Thanks Tom. I'll check with TKPROF and update my results..but you think TKRPOF should show better performance with Compressed table than uncompressed??
My bottom line is about execution time ..How long one has to wait to get results..that's the deal..and I'm guessing both are taking same time..[real clock time 00:02:36]..so not sure how TKPROF will differ..but i'll give it a try
June 22, 2010 - 6:31 am UTC
tkprof will show you what time was spent in the database, using a timer in sqlplus shows you lots of other stuff - you just want time in database to compare.
If tkprof shows "database faster with option X", but the wall clock says "response time faster with option Y", then we have to start looking outside the database for the reason. And the reason is likely "if you run it again, option X will sometimes be faster"
and I'll reiterate, 2:36 is not "seconds", it is minutes and seconds apparently. And furthermore, the TIME column in an explain plan is a total guess, 100% a guess, a thought of what might happen. It is not reality, it is not real, it is an estimate, it is not meaningful to you, it does not say "this will take exactly N units of time", it says "we believe this might take about this long, we could be really wrong though"
compressing "old" partitions
Mikhail P., August 27, 2010 - 12:26 pm UTC
Hi Tom,
Suppose there is a partitioned table.
And we periodically split old data into separate partitions by making
ALTER TABLE T SPLIT PARTITION ...
and make those newly created backlog partitions COMPRESSed.
But is there a way to specify an order in which rows should be moved to a new partition when making a split?
I mean to order them by a field known to be relatively big and having a lot of repeated values within partition (I hope to improve compression ratio by doing this)...
Or may be it is possible to re-order rows somehow after partition created and then make "alter table move partition ... compress"?
--
Thanks & regards,
Mike
September 07, 2010 - 7:52 am UTC
you could use create table as select to order AND compress them and then do a partition exchange to swap in the newly compressed table with the old uncompressed partition and then drop the uncompressed left over table.
Very good paper about Oracle compression
Marcus Rangel, July 11, 2011 - 8:31 am UTC