Use case We have a program to import data into one of the tables of our application based on a plain text data file received from the customer's bank. The text file could use different formats depending upon the bank and each line in the text file could represent different types of data depending upon the first character of the line.
Currently we use SQL Loader for loading the data from the text file to the table.
This approach has a drawback that we need to add a new control file to the application whenever the file format changes.
To explain things using a simpler EMP example:
Say I have the definition of EMP table as
Create table EMP
(
employee_id number(8),
first_name varchar2(15),
last_name varchar2(15),
status varchar2(30),
record_type varchar2(1)
)
and the data file to load the data into this table is as below:
120001Mary
220003Francis
320002John Jacob
The Control File to Load this data looks like this
LOAD DATA
APPEND
-- Type 1 - Records with only first Name
INTO TABLE EMP
WHEN RECORD_TYPE = '1'
(
STATUS CONSTANT 'NEW_RECORD',
RECORD_TYPE POSITION(01:01) CHAR,
EMPLOYEE_ID POSITION(02:06) CHAR,
FIRST_NAME POSITION(07:21) CHAR
)
-- Type 2 - Records with only last Name
INTO TABLE EMP
WHEN RECORD_TYPE = '2'
(
STATUS CONSTANT 'NEW_RECORD',
RECORD_TYPE POSITION(01:01) CHAR,
EMPLOYEE_ID POSITION(02:06) CHAR,
LAST_NAME POSITION(07:21) CHAR
)
-- Type 3 - Records with both first name and last name
INTO TABLE EMP
WHEN RECORD_TYPE = '3'
(
STATUS CONSTANT 'NEW_RECORD',
RECORD_TYPE POSITION(01:01) CHAR,
EMPLOYEE_ID POSITION(02:06) CHAR,
FIRST_NAME POSITION(07:21) CHAR,
LAST_NAME POSITION(22:36) CHAR
)
Requirement :
We are investigating the possibility of converting the input text file into a JSON file based on a set up that can be given to the user so that the dependency on the control file can be eliminated.
So, in the above example, our set up table where the user can setup their new format will look like this:
Create table EMP_FORMATS
(
FORMAT_ID NUMBER(8),
FORMAT_NAME VARCHAR2(60)
);
Create table EMP_RECORD_FORMATS
(
RECORD_TYPE_ID NUMBER(8),
FORMAT_ID NUMBER(8),
RECORD_TYPE VARCHAR2(1),
DESCRIPTION VARCHAR2(100)
);
Create table EMP_FIELD_FORMATS
(
FIELD_FORMAT_ID NUMBER(8),
RECORD_TYPE_ID NUMBER(8),
FIELD_START_POSITION NUMBER,
FIELD_END_POSITION NUMBER,
FIELD_NAME VARCHAR2(20)
);
And the set up corresponding to the above control file would be :
insert into emp_formats values (30001,'Simple Format');
insert into emp_record_formats values (40001,30001,'1','Records with only First Name');
insert into emp_record_formats values (40002,30001,'2','Records with only Last Name');
insert into emp_record_formats values (40003,30001,'3','Records with both First and Last Names');
insert into emp_field_formats values (50001,40001,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50002,40001,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50003,40001,7,21,'FIRST_NAME');
insert into emp_field_formats values (50004,40002,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50005,40002,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50006,40002,7,21,'LAST_NAME');
insert into emp_field_formats values (50007,40003,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50008,40003,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50009,40003,7,21,'FIRST_NAME');
insert into emp_field_formats values (50010,40003,22,36,'LAST_NAME');
select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
from emp_formats f,
emp_record_formats rf,
emp_field_formats ff
where f.format_name = 'Simple Format'
and f.format_id = rf.format_id
and ff.record_type_id = rf.record_type_id
order by record_type, field_start_position;<u>RECORD_TYPE</u> <u>FIELD_START_POSITION</u> <u>FIELD_END_POSITION</u> <u>FIELD_NAME</u>
1 1 1 RECORD_TYPE
1 2 6 EMPLOYEE_ID
1 7 21 FIRST_NAME
2 1 1 RECORD_TYPE
2 2 6 EMPLOYEE_ID
2 7 21 LAST_NAME
3 1 1 RECORD_TYPE
3 2 6 EMPLOYEE_ID
3 7 21 FIRST_NAME
3 22 36 LAST_NAME
Please guide on 1. How can I look at the formats available in the tables and transform an input text file into JSON ? (We are first copying the input text file into a BLOB column in a database table before it is processed using a control file)
2. Once such a JSON is created, what is the best way to insert data into the EMP table?
Any sample code is appreciated
Thanks
1. You can use JSON_objectagg to combine attribute/value pairs into one document. You can define the attribute names dynamically by selecting them from a column.
So you could do something like this:
with rws as (
select '120001Mary' rec from dual union all
select '220003Francis' rec from dual union all
select '320002John Jacob' rec from dual
), rec_formats as (
select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
from emp_formats f,
emp_record_formats rf,
emp_field_formats ff
where f.format_name = 'Simple Format'
and f.format_id = rf.format_id
and ff.record_type_id = rf.record_type_id
)
select record_type,
json_objectagg (
field_name value
trim (
substr (
rec,
field_start_position,
field_end_position - field_start_position + 1
)
)
absent on null
) jdoc
from rws
join rec_formats
on record_type = substr ( rec, 1, 1 )
group by record_type;
R JDOC
- ------------------------------------------------------------------------------
1 {"RECORD_TYPE":"1","FIRST_NAME":"Mary","EMPLOYEE_ID":"20001"}
2 {"RECORD_TYPE":"2","LAST_NAME":"Francis","EMPLOYEE_ID":"20003"}
3 {"RECORD_TYPE":"3","LAST_NAME":"Jacob","FIRST_NAME":"John","EMPLOYEE_ID":"20002"} If you create external table(s), you could even create the JSON documents directly from the files rather than loading them in first.
2. You can use JSON_table to convert JSON documents to relational rows and columns. So you can insert the result of this:
insert into employees ( EMPLOYEE_ID, FIRST_NAME, LAST_NAME )
with rws as (
select '120001Mary' rec from dual union all
select '220003Francis' rec from dual union all
select '320002John Jacob' rec from dual
), rec_formats as (
select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
from emp_formats f,
emp_record_formats rf,
emp_field_formats ff
where f.format_name = 'Simple Format'
and f.format_id = rf.format_id
and ff.record_type_id = rf.record_type_id
), jdoc as (
select record_type,
json_objectagg (
field_name value
trim (
substr (
rec,
field_start_position,
field_end_position - field_start_position + 1
)
)
absent on null
) jdoc
from rws
join rec_formats
on record_type = substr ( rec, 1, 1 )
group by record_type
)
select j.*
from jdoc, json_table (
jdoc
columns (
EMPLOYEE_ID, FIRST_NAME, LAST_NAME
)
) j;That said I'm not sure what you're hoping to gain by doing this JSON conversion. At some point you need to know what the attributes & columns are so you can load them into the target table.
You can save yourself a conversion to & from JSON with something like:
with rws as (
select '120001Mary' rec from dual union all
select '220003Francis' rec from dual union all
select '320002John Jacob' rec from dual
), rec_formats as (
select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
from emp_formats f,
emp_record_formats rf,
emp_field_formats ff
where f.format_name = 'Simple Format'
and f.format_id = rf.format_id
and ff.record_type_id = rf.record_type_id
), vals as (
select record_type, field_name,
trim (
substr (
rec,
field_start_position,
field_end_position - field_start_position + 1
)
) val
from rws
join rec_formats
on record_type = substr ( rec, 1, 1 )
)
select * from vals
pivot (
max ( val ) for field_name
in (
'EMPLOYEE_ID' employee_id,
'FIRST_NAME' first_name,
'LAST_NAME' last_name
)
);
R EMPLOYEE_ID FIRST_NAME LAST_NAME
- -------------------------- -------------------------- --------------------------
1 20001 Mary
2 20003 Francis
3 20002 John Jacob