Hash Partitioning
rajeev thakur, July 22, 2016 - 1:53 pm UTC
Hi,
I have reviewed the answer,as i can see some more information required from my side regarding data load in source table.Below are my updates :
I have simple insert statements which is used to load records in source table "test", This table also having the same datatype as my target table i.e. varchar2(20) and having no space(as i have already verified insert statements).
After that ,i am using above CTAS script to load the data into target table test1 using filtering condition like rownum<50000(these 50k records are from PART_1 of source table "test").Once it gets loaded to target table "test1" all records are getting stored into one partition only(PART_1 of target ).
Kindly help me to understand what oracle internally does with these records.
Rajeev
July 22, 2016 - 3:51 pm UTC
Can you show the complete script, including the data in the source table?
As I showed, I'm not able to reproduce the problem of everything going into one partition.
Hash Partitioning
rajeev thakur, July 25, 2016 - 10:31 am UTC
Hi,
Please find the below source table scripts,Insert Methods and Target Table script:
--Source Table "TEST"--
CREATE TABLE VRI.TEST
(
X VARCHAR2(20 BYTE)
)
NOCOMPRESS
TABLESPACE VRI
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
BUFFER_POOL DEFAULT
)
NOLOGGING
PARTITION BY HASH (X)
(
PARTITION PART_1
TABLESPACE VRI,
PARTITION PART_2
TABLESPACE VRI,
PARTITION PART_3
TABLESPACE VRI,
PARTITION PART_4
TABLESPACE VRI,
PARTITION PART_5
TABLESPACE VRI,
PARTITION PART_6
TABLESPACE VRI,
PARTITION PART_7
TABLESPACE VRI,
PARTITION PART_8
TABLESPACE VRI
)
NOCACHE
NOPARALLEL
MONITORING;
Insert records using below sample script
BEGIN
FOR i IN 1 .. 500000
LOOP
INSERT INTO test
VALUES (i);
END LOOP;
COMMIT;
END;
Records Count:
select count(1) from test partition(part_1);
62771
--Target Table "TEST1"--
CREATE TABLE test1
NOLOGGING
PARTITION BY HASH (X)
(PARTITION PART_1
TABLESPACE VRI,
PARTITION PART_2
TABLESPACE VRI,
PARTITION PART_3
TABLESPACE VRI,
PARTITION PART_4
TABLESPACE VRI,
PARTITION PART_5
TABLESPACE VRI,
PARTITION PART_6
TABLESPACE VRI,
PARTITION PART_7
TABLESPACE VRI,
PARTITION PART_8
TABLESPACE VRI) AS (SELECT CAST ( ds.X AS VARCHAR2(20))AS X
FROM test ds where rownum<=50000);
select count(1) from test1 partition(part_1);
50000
My question is: Why all 50000 records get stored inside PART_1 of target table "test1" instead of spreading across multiple partitions
Regards,
Rajeev
July 26, 2016 - 8:51 pm UTC
Aha!
The source table is partitioned too! You've only loaded 50k of the 500k rows from this. So Oracle Database has only accessed the first partition of the source!
Thus you're only inserting values which map to the first partition in the new table. Which is why you're seeing this.
hash Partitioning
rajeev, July 27, 2016 - 5:51 am UTC
Hi ,
I understood while i was doing analysis on these data .
My question is , does oracle internally store any key along with each data set which further allow records to go into single partition only?
Eventhough i simply extract the records from source table and making an insert scripts,then also when i further try to load it to target table,data restore its identity and go to the first partition only.
Kindly help me to understand a little more of this.
Regards,
Rajeev
July 27, 2016 - 5:25 pm UTC
The hash algorithm is the same for each table. If you have the same number of partitions in two tables, all values loaded from partition 1 in table 1 will go in partition 1 in table 2.
You've only loaded values from partition 1. So they're all going to be in partition 1 in table 2.
See Rajesh's example below.
To Rajeev
Rajeshwaran, Jeyabal, July 27, 2016 - 1:00 pm UTC
rajesh@ORA11G> create table t1
2 partition by hash(object_id)
3 ( partition p1,
4 partition p2,
5 partition p3,
6 partition p4 ) as
7 select * from all_objects;
Table created.
rajesh@ORA11G>
rajesh@ORA11G> create table t2
2 partition by hash(object_id)
3 ( partition p1,
4 partition p2,
5 partition p3,
6 partition p4 ) as
7 select * from all_objects
8 where 1 = 0 ;
Table created.
rajesh@ORA11G> exec dbms_stats.gather_table_stats(user,'T1');
PL/SQL procedure successfully completed.
rajesh@ORA11G> select partition_name,num_rows
2 from user_tab_partitions
3 where table_name ='T1';
PARTITION_NAME NUM_ROWS
------------------------------ ----------
P1 53033
P2 53240
P3 53418
P4 52965
4 rows selected.
rajesh@ORA11G> explain plan for
2 insert into t2 select * from t1
3 where rownum <=10000;
Explained.
rajesh@ORA11G> @xplan
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 1516257301
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 10000 | 1113K| 987 (1)| 00:00:12 | | |
| 1 | LOAD TABLE CONVENTIONAL | T2 | | | | | | |
|* 2 | COUNT STOPKEY | | | | | | | |
| 3 | PARTITION HASH ALL | | 212K| 23M| 987 (1)| 00:00:12 | 1 | 4 |
| 4 | TABLE ACCESS FULL | T1 | 212K| 23M| 987 (1)| 00:00:12 | 1 | 4 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=10000)
16 rows selected.
rajesh@ORA11G> insert into t2 select * from t1
2 where rownum <=10000;
10000 rows created.
rajesh@ORA11G> commit;
Commit complete.
rajesh@ORA11G> exec dbms_stats.gather_table_stats(user,'T2');
PL/SQL procedure successfully completed.
rajesh@ORA11G> select partition_name,num_rows
2 from user_tab_partitions
3 where table_name ='T2';
PARTITION_NAME NUM_ROWS
------------------------------ ----------
P1 10000
P2 0
P3 0
P4 0
4 rows selected.
rajesh@ORA11G>
When the table is partition by HASH(OBJECT_ID) PARTITIONS 4, the database does something internally like this (but not officially documented).
HASH(object_id,4) (in case of 8 partitions, it would be like this HASH(object_id,8) ) - if the result of the HASH function is one then load data into First partition, if two then into P2, three then P3, four then P4.
So when you do insert like this
insert into t2 select * from t1 where rownum <=10000;
At runtime, the optimizer understand that both T1 and T2 as equi-partitioned on the same key column (by looking at the metadata) and since the we need only 10K records from T1 it (luckily)
a) go into first partition, get (some random) 10K records.
b) apply hash(object_id,4) on the 10K records.
c) since all the records from first partition, hashes to the same HASH value, so it got entered into first partition of T2 (since T1 and T2 both are equi-partitioned).
The same hold for this
rajesh@ORA11G> insert into t2 select * from t1 partition(p3);
53418 rows created.
rajesh@ORA11G> commit;
Commit complete.
rajesh@ORA11G> exec dbms_stats.gather_table_stats(user,'T2');
PL/SQL procedure successfully completed.
rajesh@ORA11G> select partition_name,num_rows
2 from user_tab_partitions
3 where table_name ='T2';
PARTITION_NAME NUM_ROWS
------------------------------ ----------
P1 10000
P2 0
P3 53418
P4 0
4 rows selected.
rajesh@ORA11G>
All rows from partition T1(P3) hashes to the same outcome that maps to P3 of T2.
Hash Partitioning
rajeev thakur, July 28, 2016 - 10:29 am UTC
Hi,
Thanks for the information ,Its very useful.But it increases some more curiosity about hash algorithm.
Here i am providing one more scenario like this:
data flow : source("test")->test2(non-partition table)->target("test1") as below
--Source Table "TEST"--
CREATE TABLE VRI.TEST
(
X VARCHAR2(20 BYTE)
)
NOCOMPRESS
TABLESPACE VRI
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
BUFFER_POOL DEFAULT
)
NOLOGGING
PARTITION BY HASH (X)
(
PARTITION PART_1
TABLESPACE VRI,
PARTITION PART_2
TABLESPACE VRI,
PARTITION PART_3
TABLESPACE VRI,
PARTITION PART_4
TABLESPACE VRI,
PARTITION PART_5
TABLESPACE VRI,
PARTITION PART_6
TABLESPACE VRI,
PARTITION PART_7
TABLESPACE VRI,
PARTITION PART_8
TABLESPACE VRI
)
NOCACHE
NOPARALLEL
MONITORING;
Insert records using below sample script
BEGIN
FOR i IN 1 .. 500000
LOOP
INSERT INTO test
VALUES (i);
END LOOP;
COMMIT;
END;
select COUNT(1) from test;
500000
---Data load from source table "test" to non-partition table "test2"
CREATE TABLE test2 AS SELECT * FROM test;
select COUNT(1) from test2;
500000
--Target Table "TEST1" which takes the data from a non-partition table "test2"--
CREATE TABLE test1
NOLOGGING
PARTITION BY HASH (X)
(PARTITION PART_1
TABLESPACE VRI,
PARTITION PART_2
TABLESPACE VRI,
PARTITION PART_3
TABLESPACE VRI,
PARTITION PART_4
TABLESPACE VRI,
PARTITION PART_5
TABLESPACE VRI,
PARTITION PART_6
TABLESPACE VRI,
PARTITION PART_7
TABLESPACE VRI,
PARTITION PART_8
TABLESPACE VRI) AS (SELECT CAST ( ds.X AS VARCHAR2(20))AS X
FROM test2 ds where rownum<=50000);
select count(1) from test1 partition(part_1);
50000
In this case also, database store the records in PART_1 of target table "test1", even though it receive data from a non-partitioning table "test2"
Kindly let me know ,How this made possible this time ?
July 28, 2016 - 1:04 pm UTC
I believe Rajesh has the answer for you below. But just to be sure:
If you join TEST and TEST1, are all the rows in partition 1 in both tables?
Hash Partitioning
Rajeshwaran, Jeyabal, July 28, 2016 - 12:43 pm UTC
Please use the code button.
TEST - is a Hash partitioned Table having 500K row spread against eight partitions with each partition having 62K rows approx
TEST2 - is a non-partitioned table having 500K rows.
TEST3 - when created, we request only 50K rows from TEST2, but luckily it go some set of 50K rows from TEST2 where all hashes to first partition, so all rows went into first Hash partition.
demo@ORA11G> CREATE TABLE TEST ( X VARCHAR2(20 BYTE) )
2 PARTITION BY HASH (X) (
3 PARTITION PART_1 ,
4 PARTITION PART_2 ,
5 PARTITION PART_3 ,
6 PARTITION PART_4 ,
7 PARTITION PART_5 ,
8 PARTITION PART_6 ,
9 PARTITION PART_7 ,
10 PARTITION PART_8 ) ;
Table created.
demo@ORA11G>
demo@ORA11G> BEGIN
2 FOR i IN 1 .. 500000
3 LOOP
4 INSERT INTO test
5 VALUES (i);
6 END LOOP;
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
demo@ORA11G> select count(*) from test;
COUNT(*)
----------
500000
1 row selected.
demo@ORA11G> CREATE TABLE test2 AS SELECT * FROM test;
Table created.
demo@ORA11G> select COUNT(1) from test2;
COUNT(1)
----------
500000
1 row selected.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test');
PL/SQL procedure successfully completed.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test2');
PL/SQL procedure successfully completed.
demo@ORA11G> select table_name,partition_name,num_rows
2 from user_tab_statistics
3 where table_name in ('TEST','TEST2')
4 order by 1,2 nulls first;
TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
TEST 500000
TEST PART_1 62771
TEST PART_2 62497
TEST PART_3 62938
TEST PART_4 62728
TEST PART_5 62127
TEST PART_6 62549
TEST PART_7 61991
TEST PART_8 62399
TEST2 500000
10 rows selected.
demo@ORA11G> CREATE TABLE test1
2 NOLOGGING
3 PARTITION BY HASH (X)
4 (PARTITION PART_1 ,
5 PARTITION PART_2 ,
6 PARTITION PART_3 ,
7 PARTITION PART_4 ,
8 PARTITION PART_5 ,
9 PARTITION PART_6 ,
10 PARTITION PART_7 ,
11 PARTITION PART_8 ) AS (SELECT CAST ( ds.X AS VARCHAR2(20))AS X
12 FROM test2 ds where rownum<=50000);
Table created.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test1');
PL/SQL procedure successfully completed.
demo@ORA11G> select table_name,partition_name,num_rows
2 from user_Tab_partitions
3 where table_name ='TEST1'
4 order by 1,2 nulls first;
TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
TEST1 PART_1 50000
TEST1 PART_2 0
TEST1 PART_3 0
TEST1 PART_4 0
TEST1 PART_5 0
TEST1 PART_6 0
TEST1 PART_7 0
TEST1 PART_8 0
8 rows selected.
demo@ORA11G>
July 28, 2016 - 1:18 pm UTC
Nice demo.
Hash Partitioning
rajeev thakur, July 28, 2016 - 2:01 pm UTC
Hi,
Thanks for the updates.
Still i am curious to know that once data loaded into non -partitioning table i.e "TEST2" it should actually loose its identity of partitioning. Then how it will behave the same(means store in PART_1 of target) when loading into target table "TEST3"( Why it does not spread across all 8 partition of target table?).
My Assumption:
Is there any KEY-Value pair which oracle internally store with each data which stick to the data even after it gets loaded to non-partition table .
Yes, When i join tables "TEST" and "TEST1" .
Source table "test" has data spread across all 8 partition.
In target table which receive 50k data have all in PART_1 of the table.
Best Regards,
Rajeev
July 28, 2016 - 9:03 pm UTC
When you loaded the data from your partitioned table to the non-partitioned table, Oracle happened to load all the values from partition 1 first. So these rows were "at the start" in the non-partitioned table.
When you copied from the non-partitioned table to the second partitioned table, the database again read these rows at the start of the table first.
So the only values you've read into your target table must go into partition 1.
Try this:
Load your data from partitioned table to the non-partitioned table. Then select 50k rows from the non-partitioned table.
Are they are all in partition 1 in the original?
If they are, then they're all going to be in partition 1 in the new table (assuming the same hash partition numbers).
If you need or want to avoid this, order the rows. You should find those 50k rows are spread over the partitions. So they'll load "across" the partitions in the new table.
Hash Partitioning
A reader, July 29, 2016 - 7:42 am UTC
when loading into target table "TEST3"( Why it does not spread across all 8 partition of target table?). Yes we do, when requested explicitly, by default we are being lucky enough to scan a subset of data that holds nicely in first partition.
demo@ORA11G> CREATE TABLE TEST ( X VARCHAR2(20 BYTE) )
2 PARTITION BY HASH (X) (
3 PARTITION PART_1 ,
4 PARTITION PART_2 ,
5 PARTITION PART_3 ,
6 PARTITION PART_4 ,
7 PARTITION PART_5 ,
8 PARTITION PART_6 ,
9 PARTITION PART_7 ,
10 PARTITION PART_8 ) ;
Table created.
demo@ORA11G> BEGIN
2 FOR i IN 1 .. 500000
3 LOOP
4 INSERT INTO test
5 VALUES (i);
6 END LOOP;
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
demo@ORA11G> select count(*) from test;
COUNT(*)
----------
500000
1 row selected.
demo@ORA11G> CREATE TABLE test2 AS SELECT * FROM test;
Table created.
demo@ORA11G> select COUNT(1) from test2;
COUNT(1)
----------
500000
1 row selected.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test');
PL/SQL procedure successfully completed.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test2');
PL/SQL procedure successfully completed.
demo@ORA11G> select table_name,partition_name,num_rows
2 from user_tab_statistics
3 where table_name in ('TEST','TEST2')
4 order by 1,2 nulls first;
TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
TEST 500000
TEST PART_1 62771
TEST PART_2 62497
TEST PART_3 62938
TEST PART_4 62728
TEST PART_5 62127
TEST PART_6 62549
TEST PART_7 61991
TEST PART_8 62399
TEST2 500000
10 rows selected.
demo@ORA11G> CREATE TABLE test1
2 NOLOGGING
3 PARTITION BY HASH (X)
4 (PARTITION PART_1 ,
5 PARTITION PART_2 ,
6 PARTITION PART_3 ,
7 PARTITION PART_4 ,
8 PARTITION PART_5 ,
9 PARTITION PART_6 ,
10 PARTITION PART_7 ,
11 PARTITION PART_8 ) AS (SELECT CAST ( ds.X AS VARCHAR2(20))AS X
12 FROM test2 ds where rownum<=50000);
Table created.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test1');
PL/SQL procedure successfully completed.
demo@ORA11G> select table_name,partition_name,num_rows
2 from user_Tab_partitions
3 where table_name ='TEST1'
4 order by 1,2 nulls first;
TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
TEST1 PART_1 50000
TEST1 PART_2 0
TEST1 PART_3 0
TEST1 PART_4 0
TEST1 PART_5 0
TEST1 PART_6 0
TEST1 PART_7 0
TEST1 PART_8 0
8 rows selected.
When ordered explicitly, it goes like this(spread across all the available partitions)
demo@ORA11G> drop table test1 purge;
Table dropped.
demo@ORA11G>
demo@ORA11G> CREATE TABLE test1
2 NOLOGGING
3 PARTITION BY HASH (X)
4 (PARTITION PART_1 ,
5 PARTITION PART_2 ,
6 PARTITION PART_3 ,
7 PARTITION PART_4 ,
8 PARTITION PART_5 ,
9 PARTITION PART_6 ,
10 PARTITION PART_7 ,
11 PARTITION PART_8 ) AS
12 select *
13 from (
14 select cast(ds.x as varchar2(20)) as x
15 from test2 ds
16 order by dbms_random.random
17 )
18 where rownum<=50000;
Table created.
demo@ORA11G> exec dbms_stats.gather_table_stats(user,'test1');
PL/SQL procedure successfully completed.
demo@ORA11G> select table_name,partition_name,num_rows
2 from user_Tab_partitions
3 where table_name ='TEST1'
4 order by 1,2 nulls first;
TABLE_NAME PARTITION_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
TEST1 PART_1 6295
TEST1 PART_2 6210
TEST1 PART_3 6188
TEST1 PART_4 6364
TEST1 PART_5 6210
TEST1 PART_6 6290
TEST1 PART_7 6157
TEST1 PART_8 6286
8 rows selected.
demo@ORA11G>
Hash Partitioning
rajeev thakur, July 29, 2016 - 8:19 am UTC
Hi ,
Many Thanks for the valuable information. This is really useful.
Regards,
Rajeev