A lighting fast response !!
Radek, August 17, 2016 - 4:00 pm UTC
Hi Chris,
Thanks & that was a fast reply !
I used employee and other tables to make the question simple, here is how my actual query looks like,
select /*some columns*/ from
(select a,b,c,d,e,f from p1,p2,p3 where /*conditions*/
union
select e,f,g,h,i from q1,q2,q3 where /*conditions*/) t1,
(select a,b,c,d,e,f from m1,m2,m3,m4 where /*conditions*/) t2,
(select a,b,c,d,e,f from n1,n2,n3 where /*conditions*/) t3
where t1.a = t2.a and t1.b = t3.b and t2.c = t3.c
t2,t3 are ~2000 records
t1 has ~70 Million
and I have made sure that all the tables are correctly joined.
This query is used to fetch data from past 20 years ( including every day so data increases every day).
I have checked the query plan and there was no problem with it, because it is just that the data is huge that I am facing this problem.
Say the DB of stackoverflow had this weird thought of checking all the questions that are posted in the last 5 years. So how could he fetch and view all those questions quickly ?
August 18, 2016 - 2:33 am UTC
For any SQL, there is a "perfect" execution, namely, the minimum amount of work that has to be done to get all the results. An "inefficient" query is one that does more work than that theoretical minimum.
The closer you can get the actual work done to the minimal work required, the better your query is.
This all sounds self-evident, but often people overlook that the minimum amount of work *might* actually be a lot. If I want to aggregate (say) the length of each comment from stackoverflow for the past years, then I must
a) visit in some way every single comment, OR
b) change the physical implementation to mitigate that (eg store the length of each comment in a separate table as they are created).
It is *always* this choice - accept the *current* minimal amount of work required, and see how to do this efficiently, OR, change the minimal amount of work required.
So returning to your original premise. If you've got 70million records to wade through, the task is:
a) find a way of scanning those 70 million rows *faster* (eg parallel, better execution plan), OR
b) change the implementation to scan less data (eg materialized view, indexing, compression etc).
Hope this helps.
Connor