Skip to Main Content
  • Questions
  • Converting a plain text data file into JSON using the formats defined in the database and inserting to multiple columns in a table

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Rahna.

Asked: March 04, 2022 - 6:48 am UTC

Last updated: March 14, 2022 - 3:43 am UTC

Version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

Viewed 1000+ times

You Asked

Use case

We have a program to import data into one of the tables of our application based on a plain text data file received from the customer's bank. The text file could use different formats depending upon the bank and each line in the text file could represent different types of data depending upon the first character of the line.

Currently we use SQL Loader for loading the data from the text file to the table.

This approach has a drawback that we need to add a new control file to the application whenever the file format changes.

To explain things using a simpler EMP example:

Say I have the definition of EMP table as

Create table EMP
(
   employee_id  number(8),
   first_name   varchar2(15),
   last_name    varchar2(15),
   status       varchar2(30),
   record_type  varchar2(1)
)


and the data file to load the data into this table is as below:

120001Mary
220003Francis
320002John           Jacob



The Control File to Load this data looks like this


LOAD DATA
APPEND 
-- Type 1 - Records with only first Name
 
INTO  TABLE  EMP
WHEN RECORD_TYPE = '1'
(
STATUS         CONSTANT 'NEW_RECORD',
RECORD_TYPE        POSITION(01:01) CHAR,
EMPLOYEE_ID                       POSITION(02:06) CHAR,
FIRST_NAME        POSITION(07:21) CHAR
)
   
-- Type 2 -  Records with only last Name
INTO  TABLE  EMP
WHEN RECORD_TYPE = '2'
(
STATUS         CONSTANT 'NEW_RECORD',
RECORD_TYPE        POSITION(01:01) CHAR,
EMPLOYEE_ID                       POSITION(02:06) CHAR,
LAST_NAME        POSITION(07:21) CHAR
)

-- Type 3 -  Records with both first name and last name
INTO  TABLE  EMP
WHEN RECORD_TYPE = '3'
(
STATUS         CONSTANT 'NEW_RECORD',
RECORD_TYPE        POSITION(01:01) CHAR,
EMPLOYEE_ID                       POSITION(02:06) CHAR,
FIRST_NAME        POSITION(07:21) CHAR,
LAST_NAME        POSITION(22:36) CHAR
)



Requirement :

We are investigating the possibility of converting the input text file into a JSON file based on a set up that can be given to the user so that the dependency on the control file can be eliminated.

So, in the above example, our set up table where the user can setup their new format will look like this:

Create table EMP_FORMATS
(
   FORMAT_ID   NUMBER(8),
   FORMAT_NAME  VARCHAR2(60)
);

Create table EMP_RECORD_FORMATS
(
   RECORD_TYPE_ID   NUMBER(8),
   FORMAT_ID   NUMBER(8),
   RECORD_TYPE      VARCHAR2(1),
   DESCRIPTION      VARCHAR2(100)
);

Create table EMP_FIELD_FORMATS
(
   FIELD_FORMAT_ID         NUMBER(8),
   RECORD_TYPE_ID          NUMBER(8),
   FIELD_START_POSITION    NUMBER,
   FIELD_END_POSITION      NUMBER,
   FIELD_NAME              VARCHAR2(20)
);


And the set up corresponding to the above control file would be :


insert into emp_formats values (30001,'Simple Format');

insert into emp_record_formats values (40001,30001,'1','Records with only First Name');
insert into emp_record_formats values (40002,30001,'2','Records with only Last Name');
insert into emp_record_formats values (40003,30001,'3','Records with both First and Last Names');

insert into emp_field_formats values (50001,40001,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50002,40001,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50003,40001,7,21,'FIRST_NAME');

insert into emp_field_formats values (50004,40002,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50005,40002,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50006,40002,7,21,'LAST_NAME');

insert into emp_field_formats values (50007,40003,1,1,'RECORD_TYPE');
insert into emp_field_formats values (50008,40003,2,6,'EMPLOYEE_ID');
insert into emp_field_formats values (50009,40003,7,21,'FIRST_NAME');
insert into emp_field_formats values (50010,40003,22,36,'LAST_NAME');

select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
from emp_formats f,
    emp_record_formats rf,
    emp_field_formats ff
where f.format_name = 'Simple Format'
and f.format_id = rf.format_id
and ff.record_type_id = rf.record_type_id
order by record_type, field_start_position;

<u>RECORD_TYPE</u>    <u>FIELD_START_POSITION</u>  <u>FIELD_END_POSITION</u>   <u>FIELD_NAME</u>

1                 1                1                 RECORD_TYPE
1                 2                6                 EMPLOYEE_ID
1                 7                21                 FIRST_NAME
2                 1                1                 RECORD_TYPE
2                 2                6                 EMPLOYEE_ID
2                 7                21                 LAST_NAME
3                 1                1                 RECORD_TYPE
3                 2                6                 EMPLOYEE_ID
3                 7                21                 FIRST_NAME
3                 22                36                 LAST_NAME


Please guide on

1. How can I look at the formats available in the tables and transform an input text file into JSON ? (We are first copying the input text file into a BLOB column in a database table before it is processed using a control file)

2. Once such a JSON is created, what is the best way to insert data into the EMP table?

Any sample code is appreciated

Thanks

and Chris said...

1. You can use JSON_objectagg to combine attribute/value pairs into one document. You can define the attribute names dynamically by selecting them from a column.

So you could do something like this:

with rws as (
  select '120001Mary' rec from dual union all
  select '220003Francis' rec from dual union all
  select '320002John           Jacob' rec from dual
), rec_formats as (
  select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
  from   emp_formats f,
         emp_record_formats rf,
         emp_field_formats ff
  where f.format_name = 'Simple Format'
  and f.format_id = rf.format_id
  and ff.record_type_id = rf.record_type_id
)
  select record_type,
         json_objectagg ( 
           field_name value 
             trim ( 
               substr ( 
                 rec, 
                 field_start_position, 
                 field_end_position - field_start_position + 1 
               ) 
             ) 
             absent on null
         ) jdoc
  from   rws
  join   rec_formats 
  on     record_type = substr ( rec, 1, 1 )
  group  by record_type;
  
R JDOC
- ------------------------------------------------------------------------------
1 {"RECORD_TYPE":"1","FIRST_NAME":"Mary","EMPLOYEE_ID":"20001"}  
2 {"RECORD_TYPE":"2","LAST_NAME":"Francis","EMPLOYEE_ID":"20003"}
3 {"RECORD_TYPE":"3","LAST_NAME":"Jacob","FIRST_NAME":"John","EMPLOYEE_ID":"20002"}   


If you create external table(s), you could even create the JSON documents directly from the files rather than loading them in first.

2. You can use JSON_table to convert JSON documents to relational rows and columns. So you can insert the result of this:

insert into employees ( EMPLOYEE_ID, FIRST_NAME, LAST_NAME )
with rws as (
  select '120001Mary' rec from dual union all
  select '220003Francis' rec from dual union all
  select '320002John           Jacob' rec from dual
), rec_formats as (
  select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
  from   emp_formats f,
         emp_record_formats rf,
         emp_field_formats ff
  where f.format_name = 'Simple Format'
  and f.format_id = rf.format_id
  and ff.record_type_id = rf.record_type_id
), jdoc as (
  select record_type,
         json_objectagg ( 
           field_name value 
             trim ( 
               substr ( 
                 rec, 
                 field_start_position, 
                 field_end_position - field_start_position + 1 
               ) 
             ) 
             absent on null
         ) jdoc
  from   rws
  join   rec_formats 
  on     record_type = substr ( rec, 1, 1 )
  group  by record_type
)
  select j.* 
  from   jdoc, json_table (
    jdoc
    columns ( 
      EMPLOYEE_ID, FIRST_NAME, LAST_NAME
    )
  ) j;


That said I'm not sure what you're hoping to gain by doing this JSON conversion. At some point you need to know what the attributes & columns are so you can load them into the target table.

You can save yourself a conversion to & from JSON with something like:

with rws as (
  select '120001Mary' rec from dual union all
  select '220003Francis' rec from dual union all
  select '320002John           Jacob' rec from dual
), rec_formats as (
  select rf.record_type, ff.field_start_position, ff.field_end_position,ff.field_name
  from   emp_formats f,
         emp_record_formats rf,
         emp_field_formats ff
  where f.format_name = 'Simple Format'
  and f.format_id = rf.format_id
  and ff.record_type_id = rf.record_type_id
), vals as (
  select record_type, field_name,
         trim ( 
           substr ( 
             rec, 
             field_start_position, 
             field_end_position - field_start_position + 1 
           ) 
         ) val
  from   rws
  join   rec_formats 
  on     record_type = substr ( rec, 1, 1 )
)
  select * from vals
  pivot ( 
    max ( val ) for field_name
    in ( 
      'EMPLOYEE_ID' employee_id, 
      'FIRST_NAME' first_name, 
      'LAST_NAME' last_name
    )
  );
  
R EMPLOYEE_ID                FIRST_NAME                 LAST_NAME                 
- -------------------------- -------------------------- --------------------------
1 20001                      Mary                                                 
2 20003                                                 Francis                   
3 20002                      John                       Jacob

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you Christ

Rahna Kader, March 09, 2022 - 4:04 pm UTC

Really appreciate the quick response Christ. Will try this out.
Chris Saxon
March 11, 2022 - 3:53 pm UTC

You're welcome, though you've elevated me somewhat there - I'm just Chris ;)

Rahna Kader, March 11, 2022 - 4:12 pm UTC

Sorry, I didn't notice :-)
Connor McDonald
March 14, 2022 - 3:43 am UTC

Truly a blessed response :-)

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.