Skip to Main Content
  • Questions
  • How to split the big file contains personal data

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Sankar.

Asked: August 29, 2016 - 1:40 pm UTC

Last updated: September 01, 2016 - 3:21 am UTC

Version: 11g

Viewed 1000+ times

You Asked

Hi Tom,

Good evening.

Can yo please guide me how we can do below requirement.

Input will take huge data file it's having personal any data it's not like fixed sized or delimited. It's like any format.
this data have to split
1. Based on the lines based
2. Based on the data which is coming in the file like in the file if the debit cards related information is there then we have to split accordingly
a>Rupay
b>Platinm
c>Titanim..etc for one type of cards one file have to split.

I hope understand my query.

Can you please give overall idea how i can proceed.

Thanks.

Regards
Sankar

and Connor said...

If there is no fixed size, or is not delimited, then you'll need parse it in some way.

What I've done in the past for situations like this, is

a) access the file as an external table
b) use SQL to get access to things like delimiters and special characters
c) use PLSQL to perform the more complex parsing patterns

So (in pseudo-code) you end up with something like:

for i in ( 
  select 
    the_line,
    instr(the_line,',',1,1) comma1,
    instr(the_line,',',1,2) comma2,
    instr(the_line,',',1,3) comma3,
    instr(the_line,'$') first_money_sign,
    case when the_line like '%CREDET%' then 'Y' end contains_credit_card
    ...
    ...
 from my_external_table
) 
loop
   ...
   ... more complex logic requiring if-then, or more parsing etc
   ... but it can use some of the metadata (comma1 etc) extracted
   .... the cursor SQL
end loop;


and if the data has to end up in a table, this plsql can be a pipelined function, so the process becomes

insert into target_table
select * from table(my_plsql_function);


Hope this helps.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Sankar s, August 31, 2016 - 2:42 pm UTC

Here when ever going to create external table it's required to know how many columns data is there at file. accordingly only able to create external table,without an idea how will prepare it. Can you please elaborate.

Thanks
Chris Saxon
September 01, 2016 - 3:21 am UTC

"Here when ever going to create external table it's required to know how many columns data is there at file."

That is not true. An external table can be *anything*, for example, if I wanted to read my database alert log, I could have:

CREATE TABLE alert_log
 ( 
 text varchar2(4000)
 ) 
 ORGANIZATION external 
 ( 
 TYPE oracle_loader 
 DEFAULT DIRECTORY datadir 
 ACCESS PARAMETERS 
 ( 
 RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1 
 READSIZE 1048576 
 FIELDS LDRTRIM 
 MISSING FIELD VALUES ARE NULL 
 REJECT ROWS WITH ALL NULL FIELDS 
 (  
 text (1:4000)  CHAR(4000) 
 ) 
 ) 
 location 
 ( 
 'alert_mydb.log' 
 ) 
 )REJECT LIMIT UNLIMITED 


Now, if I wanted to dig around *inside* that "text" column to find certain attribute, I could do that with with PLSQL. But since I'm using an external table, I can short cut some of that parsing work with the external table itself. For example, typical alert log entries are:

Fri Nov 13 12:17:33 2015
Database mounted in Exclusive Mode
Lost write protection disabled
Create Relation IPS_PACKAGE_UNPACK_HISTORY
Completed: ALTER DATABASE   MOUNT
Fri Nov 13 12:18:11 2015



So why not use my external table to detect date rows as well, eg

CREATE TABLE alert_log
 ( 
 text varchar2(4000),
 prefix varchar2(3)
 ) 
 ORGANIZATION external 
 ( 
 TYPE oracle_loader 
 DEFAULT DIRECTORY datadir 
 ACCESS PARAMETERS 
 ( 
 RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1 
 READSIZE 1048576 
 FIELDS LDRTRIM 
 MISSING FIELD VALUES ARE NULL 
 REJECT ROWS WITH ALL NULL FIELDS 
 (  
 text (1:4000)  CHAR(4000) ,
 prefix (1:3)  char(3)
 ) 
 ) 
 location 
 ( 
 'alert_mydb.log' 
 ) 
 )REJECT LIMIT UNLIMITED 


So now I can do:

select
  case when prefix in ('Mon','Tue',...) then 'Y' end is_a_date_row,
  text
from alert_log


which will help me when parsing etc.