Skip to Main Content
  • Questions
  • Benchmark summary processing technique

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Sumeet.

Asked: January 14, 2020 - 8:44 pm UTC

Last updated: January 17, 2020 - 8:48 am UTC

Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Viewed 1000+ times

You Asked

I have a package that is responsible for processing data from Detail tables. All query logic and other parameters that is used to process data is stored inside a table, which acts as a configuration table (sample example of configuration table is provided at LiveSQL).

When the package executes, it will
1. Extract SQL string and other config parameters from the configuration table
2. Modify parameters in the SQL string
3. Run the SQL using Execute Immediate.
4. Log execution & exception
5. Loop again from step 1 using next record in config table

The reason this method of storing SQL logic and other parameters inside a table is adopted, was to gain easy control of incremental loading and data re-processing when data warehouse tables are out of sync with production tables.

My question is:
1. Would this configuration table method of data processing be conventional and acceptable to use in production ?.
2. What are some pros and cons
2. What would be some suggestions and industry standard methods for summary data processing? (Can i get a reference or link that demonstrates some data processing techniques ?).



with LiveSQL Test Case:

and Chris said...

While I'm generally a fan of data-driven applications, for me this is a step too far. It makes it way to easy to carry out one the most common attacks:

SQL Injection

Your executing SQL saved in a table. So anyone with write access to the table can enter their own statements!

This is bad. Really BAD.

Even if you put checks in to reduce this risk, I'm still not a fan.

I see you've got DELETEs in your example. How will you ensure these delete the correct rows?

It's too easy to accidentally/mailiciously enter the wrong where clause. And wipe out lots of data.

So I would avoid this approach.

What would be some suggestions and industry standard methods for summary data processing?

I'm not sure what you're asking here. But if you can write SQL query that generates the summary you want Oracle Database has a great way to optimize these:

Materialized Views

The store the result of your query. If you aggregate millions++ rows down to a few (say week/month/year totals), these can make the process much faster.


Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Another point of view

Manjunath, January 15, 2020 - 2:36 pm UTC

I agree that, generally, dynamic SQLs are poor design choice. But I beg to differ in this case.
It is very common practice in industry to make changes to this configuration table in lower environment, like SIT, get it reviewed by another dev and then do testing by testing team. Once testing team gives report that all okay, deploy to next env, like UAT and do the testing again before moving to production.
So, in this case -
1. Configuration table entry will never be moved to next env if there are issues with where clause.
2. Any entry into that table undergoes good review.
3. This table will not be exposed to outside of DB developers, via web etc, for example. So risk of SQL injection is low, unless developer itself is involved. Even then, there are process in place to catch them.

In this specific setting, i think this design makes sense. This will also save lot of $ to business and makes development easier and improves quality.
Chris Saxon
January 15, 2020 - 4:28 pm UTC

Leaving databases exposed to the internet with no authentication is "common practice":

https://www.securityweek.com/thousands-mongodb-databases-found-exposed-internet

That doesn't make it "good"!

Even with good review processes, it's easy for someone to sneak in, change some SQL in the table, download/delete some data, then switch everything back to cover their tracks.

The risks are way higher with this method compared to static SQL in your code.

If you have review processes in place, why not put the SQL in code and review that?

This will also save lot of $ to business and makes development easier and improves quality.

Can you explain exactly where these savings will come from? And how this leads to better quality than reviewing SQL in your code?

Manjunath, January 15, 2020 - 5:33 pm UTC

Can you explain exactly where these savings will come from? And how this leads to better quality than reviewing SQL in your code?

In some specific projects where I worked, there were a generic "Framework" written this way. This framework provided "unload data from table" functionality to downstream consumer applications, warehouse, data marts etc. and "execute procedure" functionality to load data from external table.
The framework provided
1. error handling
2. Header and footer verification
3. copy files from source server and send to target server functionality
4. email on error
5. logging
6. file writing etc

and may more useful functionalities. So,as a dev, I only needed to create views and add this view name and other configs to table, like target location on destination server. Then, the common scheduler will kick off and run all the views in config table, write to file and send it to destination handling any errors.
Similarly, for loading a file into table, i just had to create a procedure which repoints an external table to file and loads it.
As a dev, i didnt have to write or reuse the code so my workload was less. For testers, they didnt have to test all those functionalities which was provided by framework(which would otherwise had to be tested even if i reuse the code). For the organization, it meant less work and more saving. And for architect, every batch job followed same pattern and so no technical debt and improved quality of delivery.
This was win-win for everyone with very little risk of security.
Chris Saxon
January 16, 2020 - 2:19 pm UTC

I see what you're getting at. I still think you're underestimating the risks.

This site lists known SQL injection breaches:

https://codecurmudgeon.com/wp/sql-injection-hall-of-shame/

I count 37 for last year and 3 for 2020 already! There's almost certainly many more that haven't been publicly disclosed (yet).

which would otherwise had to be tested even if i reuse the code

All your logging, error handling, etc. should be standard (whether from a 3rd party framework or homegrown). Testers only need to explicitly check this stuff if you've changed it.

Table is only used within company's private network

Sumeet Chand, January 15, 2020 - 7:47 pm UTC

I would agree with the fact that this method has security concerns. However, this table is solely used within the private network and accessible only through DBA schemas . What would your viewpoint be now, knowing that the DBA's are the only security threat. Is this design still good?

I understand materialized views can be used and letting Oracle manage it. However, would it not affect the performance of Detail tables ? if let's say the detail table loads 80 million records per day ?
Chris Saxon
January 16, 2020 - 2:20 pm UTC

That doesn't make you safe. Internal staff are often involved in data breaches

According to Verizon's 2019 data breach report, one-third of breaches involved internal actors:

https://enterprise.verizon.com/resources/executivebriefs/2019-dbir-executive-brief.pdf

Good architecture and functional coding

Mark Wooldridge, January 16, 2020 - 4:48 pm UTC

Seems like more work than just writing parameterized procedures in pl/sql. You have to type the code anyway, why put it in a table when you can just put it in a package and compile it and verify syntax.
Chris Saxon
January 17, 2020 - 8:48 am UTC

Yep, that's my view too!

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.