An algorithm that is so amazing and felt like a hack in Data Virtualization.
A query in the clinical trials data of Astra Zeneca was taking around 2 hours to execute and a lot of caching errors.
It was called the clinical trials hub and I was tasked to optimize it. Semi-Join, Multi Table cache refresh along with drop and recreate indexes, the execution time of the refresh was 2 Minutes.
20 seconds - Execution of Query.
The rest of time - Insertion.
I was amazed because it had been long time since I got my head to work again but something kept running in my mind.
What if I would like to build something superior than the semi-join?
I would have to keep thinking for ever.