Skip to Main Content
  • Questions
  • Database Design for Invoices Table with Many Null Columns

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Geraldo.

Asked: December 17, 2018 - 3:18 pm UTC

Last updated: December 20, 2018 - 12:26 pm UTC

Version: 18.3.0

Viewed 1000+ times

You Asked

Hello, Ask Tom Team.

I have a 70 columns table storing valid invoices. There are cases where the invoices are rejected because business rules. When the invoices are rejected just a few columns are filled (up to 5), then all 65 remaining columns are null. Later, all the invoices (valid and rejected) are going to be retrieved by users.

Should I ...

1. store rejected invoices in this table and have the select in one table

or

2. create a new one just to store the 5 needed columns, avoid nulls and use a join to retrieve the data?

Thanks in advanced.

Regards,

and Chris said...

Assuming that the rejected invoices are documents that you need to include in audits, financial reports, etc. you should have a single invoices table.

But if rejected invoices will never have a value for the 60+ columns set for accepted invoices, then I'd split the table in two. And put all the accepted only columns in this.

If you want a simple way to get all the invoices, with accepted values where appropriate, create a view. This outer joins the two tables.

For example:

create table invoices (
  invoice_id int not null primary key,
  customer_id int not null,
  ...
);

create table invoice_details (
  invoice_id int not null primary key
    references invoices ( invoice_id ),
  ...
);

create or replace view all_invoices as 
  select i.*, d.*
  from   invoices i
  join   invoice_details d
  on     i.invoice_id = d.invoice_id;


If you want to ensure that only accepted invoices have entries in the child table (over setting the child columns to not null), you could:

- Add a status column to invoices
- A unique constraint over ( invoice_id, status )
- Copy status to the child
- Add a check constraint to child.status = 'ACCEPTED'
- A foreign key on ( invoice_id, status )

e.g.:

create table invoices (
  invoice_id int not null primary key,
  customer_id int not null,
  status     varchar2(10) not null,
  unique ( invoice_id, status ) 
);

create table invoice_details (
  invoice_id int not null primary key
    references invoices ( invoice_id ),
  status     varchar2(10) not null
    check ( status = 'ACCEPTED' ),
  constraint id_accepted_invoices_fk 
    foreign key ( invoice_id, status )
    references invoices ( invoice_id, status )
);

Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Response

Geraldo Peralta, December 17, 2018 - 7:51 pm UTC

Thanks for the response.

Ok.

1. So, basically, accepted invoices in one table, rejected in another one. If we want to view both invoices, we use a view joining the two tables?

2. In your example, invoice_details table is the one with rejected invoices? I did not get this part well.

Thanks in advanced.
Chris Saxon
December 18, 2018 - 12:32 pm UTC

No! All invoices in one table! The second table contains the columns that are only relevant to accepted invoices.

Response

A reader, December 18, 2018 - 12:56 pm UTC

Hmm. Ok. I get it.

Notes:

1. Developers tell me that rejected invoices does not need to be related with any other table. They just are stored to view those few columns.

2. The accepted invoices table has many children, including a detail table.

Knowing this:

1. what would I gain joining the tables?
2. what is the best final approach?


Chris Saxon
December 18, 2018 - 2:23 pm UTC

The real question is: what is the business meaning of a rejected invoice?

Is this a real invoice which doesn't meet certain requirements? Or is there some form of staging process, which only "real" invoices pass?

Ask someone in the finance department, not a developer!

As you stated users need to see both accepted and rejected, I suspect these are all "real" invoices. Creating separate tables to store different states of the same entity is a bad idea.

I also suspect that some of the other tables you mention will be children of both rejected and accepted invoices. Having two parents for one table is a Very Bad Idea.

So I propose you have a single table to store all invoices. And a separate table to store the mandatory-if-accepted, invalid-if-rejected attributes.

1. The gain by placing the columns accepted invoices is in data quality. If you make one large table for everything, it's tricky to validate that the accepted columns are only set if it's an accepted invoice.

And storing accepted and rejected invoices in separate tables will cause endless headaches if:

- you have a table which is the child of both
- you need to assign gap-free invoice numbers across accepted and rejected invoices
- you do any other operations which apply to invoices of every type

2. Best is subjective. My suggestions are based on past experiences working on invoice processing applications. You may have completely different business requirements. You need to validate against what your business needs are.

Response

A reader, December 18, 2018 - 2:57 pm UTC

Thanks for the response, Chris.

I asked the business analyst and he told me that those records would be used for:

a. Auditing.
b. Know the number of invoices received in a month.
c. Users can see their rejected invoices.

Knowing this:
1. still the same approach?
2. Regardless this conversation, is bad to have so many null columns in the db? -just for info-

Thanks in advanced.


Chris Saxon
December 18, 2018 - 6:01 pm UTC

Then yes, I'd definitely go with the approach I suggested.

The real problem is if you just have one big, fat table, then people can start entering values in the "accepted" columns for rejected invoices. Or not supplying mandatory info for accepted invoices.

You can get around this by adding check constraints. But this is a faff. Particularly with 60+ columns. And someone at some point will forget to update these constraints when adding a column.

With a separate table for the accepted columns, you can set these not null as needed. Avoiding the issues above.

Response

A reader, December 18, 2018 - 6:30 pm UTC

Excellent.

The accepted invoices will be so far more than the rejected ones and it is already created. I would use your approach buy I need to understand something. In your scripts above:

1. Do invoices tables store all invoices (ACCEPTED and REJECTED), then if a REJECTED one comes, it will be store the needed columns in invoice_details?

e.g.:

create table invoices (
invoice_id int not null primary key,
customer_id int not null,
status varchar2(10) not null,
unique ( invoice_id, status )
); -- Let's say is already created.

create table rejected_invoices (
invoice_id int not null primary key
references invoices (invoice_id),
status varchar2(10) not null
check (status = 'REJECTED'),
constraint id_accepted_invoices_fk
foreign key (invoice_id, status)
references invoices (invoice_id, status)
);

2. Is this code good?

Thanks in advanced.
Chris Saxon
December 19, 2018 - 3:27 pm UTC

As the comment below says, the tables should be the other way around.

The header table stores all the rejected and accepted invoices. The detail table only stores information about accepted invoices.

teeter-totter

Racer I., December 19, 2018 - 12:12 pm UTC

Hi,

@OP : nope the other way round but maybe you can salvage the existing table by renaming it to invoice_detail and removing the 5 mandatory columns :

rename table invoice to invoice_detail;
alter table invoice_detail set unused (c1, c2, c3, c4, c5);
alter table invoice_detail drop unused columns;

Then create invoice with the 5 mandatory columns and establish the fk constraint from invoice_detail to invoice.
To reiterate Chris' point :
- a row for all invoices in table invoice
- only rows for accepted invoices in invoice_detail

In case not all the 60 columns are filled for all accepted invoices (in my experience the rarely are) you can also look at an attribute table :

invoice_attribute (id_invoice, id_attributetype, value);

that way only existing field values will use space. Liberally define a default value (the most often occuring one) for each attribute and not store it at all. For example store 'N' for attribute "is_processed" und not store 'Y' as the majority should be processed if you keep a long history. The other way round if not.

regards,
Chris Saxon
December 19, 2018 - 3:29 pm UTC

Not convinced on the attribute table, but I see what you're getting at.

Response

A reader, December 19, 2018 - 1:42 pm UTC

Ok.

I am getting your point. Just tell me if the following notes (I did not mention in first place) do not change your advice:

1. There is a invoice table (let's say a header) with 60+ columns (unique properties of the invoice).

2. There is a invoice_detail table (detail table for all invoices).

3. There are many children tables derived from the invoice and invoice_detail tables.

4. At the beginning, rejected invoices were not be stored. Now, they want to store rejected invoices (with no more of ten columns).



Chris Saxon
December 19, 2018 - 3:35 pm UTC

1. The invoice table has the handful of that apply to both accepted and rejected invoices. This is the header. All invoices go in this.

2. This is the table with 60+ columns. Only accepted invoices go in this.

3. OK

4. You can switch the tables over as Racer described above. But you need to track down all the SQL which accesses the current table and check it still works. The longer your current process has been in place, the harder this will be.

You could swap out the table for a view which hides the join. You may need to build instead of insert trigger on it though to ensure all current DML works.

In which case to minimize risk it may be better to remove any not null constraints on the "accepted only" columns. And stick everything in one table.

Null columns

Jeff Richter, December 19, 2018 - 3:48 pm UTC

Null columns where there are no following columns containing Non-null values require no storage space. So strategically ordering your columns in a situation like this can make a huge difference in how much space your tables will consume.
Chris Saxon
December 19, 2018 - 4:19 pm UTC

True, though you're only talking 1 byte/null column that has a non-null after it. So you're only going to save huge amounts of space in extreme cases.

Response

Geraldo Peralta, December 19, 2018 - 5:16 pm UTC

Hmmmm. Maybe I am not explaining you very well.

The app has a header table called headers (with 60+ columns) that is parent of a detail table called product_services_details (with 20 columns and many children around it).

create table headers (
column 1 not null primary key,
...
column 65
)

create table product_services_details (
column 1 not null primary key,
...
column 20
)
Until this point, the system is working and rejected invoices are not in the database.

Requirement: we want rejected invoices to be stored.

Question: how do we put rejected invoices in this design, knowing that rejected invoices do not go to product_services_details table and that its few columns (up to 10 maybe) are in headers table? I'm confused when you say invoice_detail because we already have one table with that info (product_services_details).

1. Can you tell me how would it be?

2. About the null columns where there are no following columns containing Non-null, does it means when I create the table I have to alternate the order of null and non-null columns? Like:

create table some_table (
column1 int not null primary key,
column2 varchar2(10),
column3 int not null,
column4 int,
column4 varchar(10) not null,
column5 int,
column6 varchar(10) not null,
column7 varchar(10),
column8 int not null,
column9 int,
column10 int,
column11 int,
column12 int,
...
column 65)

Sorry for asking so much...

Regards,
Chris Saxon
December 20, 2018 - 12:25 pm UTC

I think you're getting bogged down in your current naming scheme.

My proposal is your headers table now becomes two tables:

(invoice)_headers - this stores only the columns relevant to accepted & rejected invoices - ~10 columns
(invoice)_header_details - this holds the columns only relevant to accepted invoices - the other ~50 columns currently on headers.

The table product_services_details remains as-is. You could make this a child of header_details if you want to ensure this only stores rows linked to accepted invoices.

But if your database has been around for a while, this table split adds risk. You can minimize short-term risk by making any columns mandatory for accepted-only invoices nullable. Then stick rejected invoices in headers too. Doing this brings longer-term risks to data quality.

About the null columns where there are no following columns containing Non-null

Null columns at the end of the row (i.e. those with no non-null columns after them in that row) consume no space. So placing all the mandatory columns first in the table, like so:

create table some_table (
column1 int not null primary key,
nn_column2 varchar2(10) not null,
nn_column3 int not null,
nn_column4 int not null,
nn_column4 varchar(10) not null,
nullable_column1 int,
nullable_column2 int,
nullable_column3 int,
nullable_column4 int,
...
nullable_column65)


optimizes your storage. But usually you're talking about small gains. And adding a new mandatory column to the end of the table undoes your optimization.

So this is final-leg tuning. I wouldn't bother re-ordering an existing table's columns to do this except in extreme circumstances.

the name game

Racer I., December 20, 2018 - 6:53 am UTC

Hi,

@Geraldo : well the tables are already named differently :

header : all invoices (only the 5 mandatory columns)

invoice_details : 65 columns with infos for accepted invoices (one row per accepted invoice in header, no row per rejected invoice, i.e. 1:0-1)

product_services_details : positions within an invoice (several rows per invoice in header, no rows for rejected invoices, i.e. 1:0-X *)

You only specified that you store invoices so we went with invoice as the name of the header table and invoice_details for the 65 columns. You could now call this new table "header_details" or maybe "header_attributes" (without making it an 1:X attribute table) to distinguish it from your existing positions table (product_services_details).

rename table header to header_details;
alter table header_details set unused (c1, c2, c3, c4, c5);
alter table header_details drop unused columns;
create table header (only the 5 mandatory columns)
create fk from header_details to header
re-create fk from product_services_details to header

*) are you even sure of this? If you need to store/display rejected invoices maybe their positions are of interest too, if you have them?

regards,

Racer I., December 20, 2018 - 7:12 am UTC

Hi,

regarding nulls : it used to be more interesting when storage space (disk and sga) and io-throughput was more precious. More marginal now, but the ordering could be automated by analyzing the table and recreating it (CTAS) using

order by user_tab_columns.num_nulls

The result would still only be probability based.

A 1:X attribute table for sparsely filled attributes with good omitted defaults could beat this. It's quite flexible too. But more complex to handle in code.

regards,
Chris Saxon
December 20, 2018 - 12:26 pm UTC

Indeed.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.