An algorithm that is so amazing and felt like a hack in Data Virtualization.
A query in the clinical trials data of Astra Zeneca was taking around 2 hours to execute and a lot of caching errors.
It was called the clinical trials hub and I was tasked to optimize it. Semi-Join, Multi Table cache refresh along with drop and recreate indexes, the execution time of the refresh was 2 Minutes.
20 seconds - Execution of Query.
The rest of time - Insertion.
I was amazed because it had been long time since I got my head to work again but something kept running in my mind.
What if I would like to build something superior than the semi-join?
I would have to keep thinking for ever.
When there are two tables that are to be joined from Two Different Databases (Different DB Servers), SEMI-JOIN will convert the parent query in such a way the IN CLAUSE is re-written and the FETCH happens from a Single DataBase Server. I do not need to explain more as a person familiar with SQL will understand how this JOIN Algorithm works.