Friday, January 3, 2025

OpenAI of Databases: Memory is cheap, Processing is Costly...

           As organizations gear up to implement Data Governance, Master Data Management and Meta Data Management Databases should AI themselves. Memory is cheap so think of every SQL Query being tracked in a central repository and so the unique row identifier of the SQL Query (DML and DQL) is stored in-memory. Irrespective of the number of joins, number of where conditions, analytic functions and so on.

Kind of Confusing ain't it. Let me explain. Think of this SQL Query. 

SELECT department_id, last_name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary) RANK FROM employees WHERE department_id = 60 ORDER BY RANK, last_name; 

              Obviously traditional databases would suggest to bitmap index deptartment_id column looking at cardinality for faster access. Indexing is nothing but storing the exact location of the row in memory (Data Object, Data Block, Data File and so on). Online Indexes are rebuilt online while the DDL is performed without the need for a DBA to rebuild indexes. 

        Now think of this. If the above SQL's result set originating rowid's (Employees Table Department No 60 as well as RANK function) is pre-populated in memory all the SQL Engine has to do is deal with rowid's for future instances of the same SQL Query. 

             In case of the above query when there is an update of an employee's department number all the SQL Engine has to do is update the SQL's result set rowid's (DML of rowid's for all SQL Query References that are stored in memory). Future instances of the same SQL Query will be rowid's processsing from memory. This will pave way for Real-Time ETL as well. Most importantly, OLTP to OLAP and OLTP DML resulting in rowid updates in memory that makes OLAP Querying Real-Time at the instant updates that flow from the OLTP System.

            Think of Mergers, Acquisitions and the amount of efforts spent in integration. Also think of Hyperlocal delivery where one warehouse can willfully work with multiple partners.


Until next time OpenAI of Databases: Memory is cheap, Processing is Costly...

Sunday, September 15, 2024

Query Clause Indexing

Are the World Governments (Politicians) not spending money and borrowing money because of Global Indian Computing Standards?

Imagine being able to index any CLAUSE of the SQL Query...

I am not able to patent this as I have no Time and Money and on top of that no reply from Indian Patent Officers. 

The Query Clause Indexing can UNITE the WORLD for a Cause. 

Real-Time ETL

Blockbuster of Sorts in the making by INDIA...

FIRST MOVERS ADVANTAGE TOWARDS THE KOLAGATLA'S CAN CONTINUE TO SELL AND BUY IN THE USA DOLLARS PER SE...

Tuesday, May 23, 2023

SEMI-JOIN A Beautifully Architected Algorithm by TIBCO.

 An algorithm that is so amazing and felt like a hack in Data Virtualization.


A query in the clinical trials data of Astra Zeneca was taking around 2 hours to execute and a lot of caching errors.

It was called the clinical trials hub and I was tasked to optimize it. Semi-Join, Multi Table cache refresh along with drop and recreate indexes, the execution time of the refresh was 2 Minutes.

20 seconds - Execution of Query.
The rest of time - Insertion.

I was amazed because it had been long time since I got my head to work again but something kept running in my mind.

What if I would like to build something superior than the semi-join?

I would have to keep thinking for ever.

When there are two tables that are to be joined from Two Different Databases (Different DB Servers), SEMI-JOIN will convert the parent query in such a way the IN CLAUSE is re-written and the FETCH happens from a Single DataBase Server. I do not need to explain more as a person familiar with SQL will understand how this JOIN Algorithm works. 

Wednesday, May 18, 2016

Connecting to HIVE (v0.9) from CIS v6.2.6

Hi,

On popular demand I am writing this post on how to connect to HIVE from Cisco Information Server.

We are going to see how to connect to HIVE v0.9 pre-configured on the cloudera vm (v4.21). I know it is an old version but it does the job. I assume you have gone through the articles in this blog and you have an initial understanding of the CIS workflow (Introspect, Model, Publish).

Also, creation of a database/table and insertion of data into the HIVE tables is not a part of this post.

I have created a database called "realty" and also created a table called "listings". Loaded data into the listings table with 6 records. Issued the select * from listings table from the HIVE prompt and could retrieve back the results. Screenshot below:



Now the next part is to start the HIVE thrift server by issuing the following command from the shell prompt:


NOTE: After the Hive Thrift Server is started you should not close the shell window on which the Thrift Server was started. When you interact with the Hive server from CIS you will get messages from the prompt about the ongoing events (Clients to Hive Thrift Server).

Now we can move to CIS to configure the data source and test the querying.

Create a datasource in CIS, selecting the (Hive 0.9.0) as the datasource adapter:



Click Next and provide the server IP Address of the Cloudera VM and give the port as 10000. Change the database name to your HIVE Metastore database (realty in my case) and hive Create and Introspect.



We can see the listings table and can add it to the CIS workspace.



Now the final step is to query the listings table in CIS:
You can do a right click on listings and hit show contents:
it will display the 6 rows in the table as below:



Now this table can be queried and views/web services can be created on top of this introspected listings table.

NOTE: If you get this error message while selecting the HIVE datasource in CIS: Time to approach the administration group of the CIS server.


The admin will configure the HIVE jdbc adapter by copying the adapter files from hadoop to the CIS server.