Skip to Main Content
  • Questions
  • Load CSV into table and split column into different rows

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, SONAL.

Asked: July 22, 2017 - 10:22 pm UTC

Last updated: July 31, 2017 - 3:07 am UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

I have to load a csv file data into a table in oracle. One of the columns of csv file has data more than 4000 characters which has many values. These values can be distinguished from one another by keyword "http" or "https". I need to split them into different rows in oracle table.

Example of one row in csv file:
abc,123, http://test1 https://test2 http://test3 https://test4

NOTE: Last column value in csv can have more than 4000 characters

When i load this file into a table, it should store like below:

col1 col2 col3
abc 123 http://test1
abc 123 https://test2
abc 123 http://test3
abc 123 https://test4


and Connor said...

External tables are perfect for this. Here's my flat file

col1a,col2a,http://link10 http://link20 http://link30 http://link40
col1b,col2b,http://link11 http://link21 http://link31 http://link41
col1c,col2c,http://link12 http://link22 http://link32 http://link42
col1d,col2d,http://link13 http://link23 http://link33 http://link43


and here's how we use an external table to expose this to us via SQL, after which, a simple lateral will do the trick

SQL> CREATE TABLE links (
  2    col1 VARCHAR2(20),
  3    col2 varchar2(20),
  4    col3 varchar2(32767)
  5  )
  6  ORGANIZATION EXTERNAL (
  7    TYPE ORACLE_LOADER
  8    DEFAULT DIRECTORY temp
  9    ACCESS PARAMETERS (
 10      RECORDS DELIMITED BY NEWLINE
 11      FIELDS TERMINATED BY ','
 12      MISSING FIELD VALUES ARE NULL
 13      (
 14    col1 char(20),
 15    col2 char(20),
 16    col3 char(32767)
 17      )
 18    )
 19    LOCATION ('links.csv')
 20  )
 21  REJECT LIMIT UNLIMITED;

Table created.

SQL>
SQL> col col3 format a70
SQL> select * from links;

COL1                 COL2                 COL3
-------------------- -------------------- ----------------------------------------------------------
col1a                col2a                http://link10 http://link20 http://link30 http://link40
col1b                col2b                http://link11 http://link21 http://link31 http://link41
col1c                col2c                http://link12 http://link22 http://link32 http://link42
col1d                col2d                http://link13 http://link23 http://link33 http://link43


5 rows selected.

SQL>
SQL> select col1,
  2         col2,
  3       substr( ' '||col3||' ' ,
  4                instr( ' '||col3||' ' , ' ' ,1,x)+ length(' ') ,
  5                instr( ' '||col3||' ' , ' ' ,1,x+1) -
  6                instr( ' '||col3||' ' , ' ' ,1,x) -length(' ')) link
  7     from links,
  8        lateral( select level x
  9                from dual
 10                connect by level <= (length(col3) - length(replace(col3,' ')))/length(' ') +1)
 11   /

COL1                 COL2                 LINK
-------------------- -------------------- ------------------------------
col1a                col2a                http://link10
col1a                col2a                http://link20
col1a                col2a                http://link30
col1a                col2a                http://link40
col1b                col2b                http://link11
col1b                col2b                http://link21
col1b                col2b                http://link31
col1b                col2b                http://link41
col1c                col2c                http://link12
col1c                col2c                http://link22
col1c                col2c                http://link32
col1c                col2c                http://link42
col1d                col2d                http://link13
col1d                col2d                http://link23
col1d                col2d                http://link33
col1d                col2d                http://link43

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, July 24, 2017 - 8:23 pm UTC

Thank you. The second trick works to split column into rows. To load data into database from csv, I am trying to use sqlloader.
In csv file, if col3 has multiple http URLS then these values are in different lines enclosed in double quotes. If there is only value then it's not enclosed in double quote

This is an example of how csv looks like:

col1a,col2a," http://link10
http://link20
http://link30
http://link40"
col1b,col2b, http://link10
col1c,col2c," http://link10
http://link20"

When I am running control file, it's storing only first line of col3. Rest are ignore.
Table result
col1a col2a http://link10
Col1b col2b http://link10
Col1c col2c http://link10

Control file used:
options ( skip=1 )
LOAD DATA
INFILE 'Test.csv'
INTO TABLE Test
REPLACE
FIELDS TERMINATED BY ','
(Col1,Col2,Col3)

Please advise
Connor McDonald
July 25, 2017 - 7:25 am UTC

And I said - dont use sqlldr.

Use an external table.

Much easier.

File with multiline value and commas with in same value

A reader, July 24, 2017 - 9:08 pm UTC

I tried doing it by adding "str x'0D'" in INFILE line and it seems to be working but another problem to it is data in this column can have commas.

Example of file
col1a, col1b,
" http://link10,123
http://link20
http://link30,sasa"

col2a,col2bccx,
" http://link10,1dsds23
http://link90,fdff,
http://link4353/dfsf/fdfs"

Multi line I hope can be addressed by below command. How do we address comma with in a cell value?

INFILE 'Test.csv' "str x'0D'"

A reader, July 25, 2017 - 5:35 pm UTC

I was successfully able to load data using sql loader by making some changes in control file.

Now, I need help with only lateral part.
Column3 in table can have multiple http or https URLs. They need to split into different rows. Any text starting with http or https should go into next row along the values of other columns.Earlier code you shared was I believe assuming there will be a space between each url but that's not the case here. There is no space between urls in 3rd column. Splitted rows value should either be stored in same column or a different column in the same table.

My table data looks like:
col1 col2 col3
1a 2a http://link10/ttt/http://link20/ccc/http://link30/ffs

1b 2b https://link50/fdfdsd/https://ieuiwew
1c 2c http:/link90/ds/dsds

Expected result:
Connor McDonald
July 26, 2017 - 3:47 am UTC

If you have it in a table, the initial answer work will just as well - just substitute your table in for the external table.

A reader, July 26, 2017 - 3:58 am UTC

I tried that with initial code but it is not working as this lateral code is looking for space between each http URL and moving it to different row where as my table column has no space between http urls. It needs to split different http or https URL into different rows.

Below are some examples of col3 value
Http://link10/ash/http://link20/dffgj/

Or

HTTPS://link30/diff/https://link40/df11/

Or

HTTPS://link50/aff

Or

Http://link60/sddff


Connor McDonald
July 26, 2017 - 5:12 am UTC

which simply means that instr(lower(str),'http') is a logical delimiter, plus a little adjustment here and there for https.


A reader, July 27, 2017 - 10:28 am UTC

If we take http as a delimiter then it cuts off the word http where as we need http also to be present.

Looking for code to split row into columns with a text as delimiter

A reader, July 27, 2017 - 10:58 pm UTC

Please advise. We all are still stuck here. We are unable to split row into columns without losing http or HTTPS word. Substr code with instr is omitting http word where as we need it to present. Is there some regex which will help in this case.
Connor McDonald
July 31, 2017 - 3:07 am UTC

SQL> with
  2    http_text as
  3      ( select 'http://www.oracle.com http://cloud.oracle.com http://asktom.oracle.com' h from dual )
  4  select
  5    substr(h,1,instr(h,'http',1,2)-2),
  6    substr(h,instr(h,'http',1,2),instr(h,'http',1,3)-instr(h,'http',1,2)-1),
  7    substr(h,instr(h,'http',1,3))
  8  from http_text;

SUBSTR(H,1,INSTR(H,'H SUBSTR(H,INSTR(H,'HTTP' SUBSTR(H,INSTR(H,'HTTP',
--------------------- ----------------------- ------------------------
http://www.oracle.com http://cloud.oracle.com http://asktom.oracle.com



Adding support for https is a *little* harder, but we're here to help you not do your job for you

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.