Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Arghya.

Asked: January 23, 2020 - 1:30 pm UTC

Answered by: Chris Saxon - Last updated: January 27, 2020 - 1:44 pm UTC

Category: SQL - Version: 18c

Viewed 1000+ times

You Asked

Hi ask Tom team,
Hope you people are doing great in 2020.


I am working in migration team ,our goal is to take csv file from client and ultimately transform data as per our production table structure .


Requirement :-
Recenty we have received one file of 260 million data and 300 columns .
Source table structure like:-

Table_name st
(
Id,
A1,
A2,
.
.
Till A32,
B1,
B2,
.
.till B32,
.
.
Till H32
);


My out put table structure

Table_table ot
(Id,
A,(a1..a32 unpivoted into it)
B,(b1..b32 unpivoted into it)
C,(same as upper)
D,(same as upper)
E,(same as upper)
F,(same as upper)
G,(same as upper)
H(same as upper)
)


my approach :-

1)by looping it takes 3 days to complete

2) I have tried to unpivoted the whole set of data with nologging and direct path and compelled to stop after 8 hours as still it was going on .

3)dbms_parallel_execute.create_chunks_by_rowid (chunk size =500000 and parallel=10) but after completion of 2 hours still it was showing as processing of 20 chunks.

4)I was thinking of partion the table and execute it through calling jobs and direct path load .but I have no clue at all ,as table is required to partition vertically for column and horizontally for rows to achieve this.

So kindly suggest me the best approach of solving this kind of huge data with this kind of table structure as first as possible .

and we said...

I'm not clear exactly what you're doing. So the below is educated guesswork:

1) Looping will definitely take forever!

2) What exactly did you do?

3) I'm guessing you're reading straight from the CSV using an external table. If so each chunk will still read the whole file! So this may be making the process slower than reading the file in serial.

4) I don't know enough about what you're doing to recommend a partitioning strategy.

Assuming you are reading the CSV using external tables, here's something you could try:

Split the file up into N smaller files. Then either create an external table/file. Or a partitioned external table, with each partition pointing to one file. Then you can parallelize the process (using dbms_parallel or manually).

To give more meaningful help, we really need to see what your process is doing. Trace your process like so:

exec DBMS_monitor.session_trace_enable ( null, null, true, true );
***your code here***
exec DBMS_monitor.session_trace_disable;


Then get the trace file, format it, and share the results with us. Read more about this at:

https://blogs.oracle.com/sql/how-to-create-an-execution-plan#tkprof

Also include a cut-down worked example (with create tables, inserts, & selects) of what it is your code does.

and you rated our response

  (7 ratings)

Reviews

Need answers

January 23, 2020 - 5:18 pm UTC

Reviewer: Alok Verma from India

Hello sir,
Table have a record like this
A&B
C&D
J&K
How to fetch that records without using like or any type of set define off or set escape.
Please suggest me how to fetch that records.
Chris Saxon

Followup  

January 23, 2020 - 5:27 pm UTC

Need details.

What exactly are you doing? And how does this relate to the original question?

January 23, 2020 - 5:42 pm UTC

Reviewer: A reader

Hi team ,
Thank you for your valuable time ,

Actually I have succefully loaded 260 million data from csv to my source table name ST by using SQL loader .
source table is ST and table structure is already mentioned upper and now my out put table is OT structure mentioned

I will be greatful if I am able to share the trace file .Actually I have already talked with my dba to share the trace file but ,right now as from my end I have no such permission of getting trace file output .once it will be available I will share with sample data .

1) looping is discared as mentioned



4) I was thinking about partitioning as partitioning deal with faster execution ,but I have no clue of how I will partition the source or Target or both to migrate data from surce table St(structure mentioned) to out put table OT( structure given) .









Chris Saxon

Followup  

January 23, 2020 - 5:59 pm UTC

Give us an example of what you're trying to do with working SQL scripts:

- create table
- insert into
- selects

Follow up for raised question

January 23, 2020 - 6:56 pm UTC

Reviewer: A reader

m triyng to unpivot multiple columns in my dataset. Here's what my data look like.

CREATE TABLE St (idnum NUMBER,f1 NUMBER(10,5),f2 NUMBER(10,5),
,e1 NUMBER(10,5),e2 NUMBER(10,5)
,h1 NUMBER(10,5),h2 NUMBER(10,5));

INSERT INTO ST(IDNUM,F1,F2,E1,E2,H1,H2)
VALUES (1,'10.2004','5.009','9.008','8.003','.99383','1.43243');
INSERT INTO ST (IDNUM,F1,F2,E1,E2,H1,H2
VALUES (2,'4.2004','6.009','4.7008','4.60333','1.993','3.3243');
INSERT INTO ST (IDNUM,F1,F2,E1,E2,H1,H2)
VALUES (3,'10.2040','52.6009','9.5008','8.003','.99383','1.43243');
INSERT INTO ST(IDNUM,F1,F2,E1,E2,H1,H2)
VALUES (4,'9.20704','45.009','29.008','5.003','3.9583','1.243');

COMMIT;

select * from ST;

IDNUM F1 F2 E1 E2 H1 H2

1 10.2004 5.009 9.008 8.003 0.99383 1.43243
2 4.2004 6.009 4.7008 4.60333 1.993 3.3243
3 10.204 52.6009 9.5008 8.003 0.99383 1.43243
4 9.20704 45.009 29.008 5.003 3.9583 1.24

I'm unpivoting like so...

Insert into ot
select *
from (select IDNUM,F1,F2,E1,E2,H1,H2,

from T5)
UnPivot((F,E,H) for sk in ((F1,E1,H1) as 1
(F2,E2,H2) as 2
)
order by IDNUM,SK;

IDNUM SK F E H
----- -- ------- ------- -------
1 1 10.2004 9.008 .99383
1 2 5.009 8.003 1.43243

2 1 4.2004 4.7008 1.993
2 2 6.009 4.60333 3.3243

3 1 10.204 9.5008 .99383
3 2 52.6009 8.003 1.43243

4 1 9.20704 29.008 3.9583
4 2 45.009 5.003 1.243

This is just like a cut down example .
But actually in St table almost 300 columns
And structure
Id,
E1,E2...upto .E32 column
F1,f2 ....up to F32 column
G1,G2....up to G32 column
.
..
M1,m2 uptoM32 column

And my output table OT is

As my upper example
Id,E,F,G,H....G

Chris Saxon

Followup  

January 27, 2020 - 1:42 pm UTC

Thanks for sharing the example.

Have you tried running separate SQL statements for each column group you want to turn into rows?

For example:

select *
from (select ... from st)
unpivot (
  (F,E,H) for sk in ( (F1,E1,H1) as 1)
);

select *
from (select IDNUM,F1,F2,E1,E2,H1,H2 from st)
UnPivot (
  (F,E,H) for sk in ( (F2,E2,H2) as 2 )
);


This is easy to parallelise - just submit a separate job for each.

Also: do you really need that order by on the final query?

If the final result set contains billions of rows, sorting that is going to take ... quite some time!

If none of the above helps, can you get the plan and query stats from a tkprof formatted trace file and post that here too?

January 23, 2020 - 7:29 pm UTC

Reviewer: A reader

Modifying inserting into soure table should have been st ...query is fetching 260 million data in real scenario..

Insert /*+append*/into ot
select *
from (select IDNUM,F1,F2,E1,E2,H1,H2,

from st/*wrongly written as t5*/)
UnPivot((F,E,H) for sk in ((F1,E1,H1) as 1
(F2,E2,H2) as 2
)
order by IDNUM,SK;

Need answer

January 23, 2020 - 7:51 pm UTC

Reviewer: Alok Verma from India

Hello Sir,

Thank you sir for your valuable response.
Sir
I have a table in this table two columns are there col_a and col_b
Col_a have a full name and col_b have a full name code.
Records like this

Col_a col_b
Xyz. X&Z
ABC. A&B
LMN. L&M

Select * from table where col_b='X&Z'
Then it's open some pop-up.
I don't want to open this pop-up.
And also I don't want to use like operator and also don't use set define off or set escape without using that things how I get out put.

How about a parallel dml

January 26, 2020 - 7:19 am UTC

Reviewer: Rajeshwaran, Jeyabal

It is really worth to have a look at parallel dml for large data sets.

How about turning on Parallel dml before the insert like this?

alter session force parallel dml parallel 8;

Insert into ot 
select *
from (select IDNUM,F1,F2,E1,E2,H1,H2 from T5)
UnPivot((F,E,H) for sk in ((F1,E1,H1) as 1 
(F2,E2,H2) as 2
)
order by IDNUM,SK;

commit;

Chris Saxon

Followup  

January 27, 2020 - 1:44 pm UTC

That is another option they could try.

Inserting of 260 billion data

January 27, 2020 - 4:52 pm UTC

Reviewer: A reader

Hi Chris ,

Thanks for your review ,

I have tried with parallal dml but it also not going to help ,
Like :-
Alter session enable parallel dml

And then insert into /*+parallel(8)*/ ot

But I will definitely accept your proposal regarding inserting into output table column wise, for each column I can call a job and execute insert operation parallay. In case of 32 column then 32 jobs will be execute parallaly and definitely follow up with you and if possible with tkprof result .

And regarding sorting :-I have not required any sorted format data .

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.