During this post we will discuss how we can compare two tables belongs to two different databases on a snowflake account. In current scenario we are getting data from the Oracle database to snowflake. As per the business planning Oracle DB is going to be decommissioned at later stage. Source team will feed the files directly to the external staging area. Currently we are using Matillion Job which connects to the Oracle and extract the data and ingesting into the Snowflake. Down the line process Oracle and Matillion job will no longer exist and will get the data from Feed files only.
And, before the decommission Business wants to ensure that current process i.e., Matillion job and new process (Feed files) are possessing the same logic. The Feed files generating by the source team should have all the data and there should not be any discrepancy among data. Hence for the data validation business has suggested the following Data Comparison approach.
Approach:
- Run the regular Matillion JOB in our ongoing DB to load CUSTOMER table.
- Create a new DB and create CUSTOMER_COPY table with DDL of CUSTOMER table.
- Business will place file on external staging area and run the COPY command to load data into new CUSTOMER_COPY Table.
- Compare the data for every column in both tables against the Primary key i.e., Invoice Number.
- Save the discrepancy record in CUSTOMER_MISMATCH table along with discrepancy column details
To achieve the same, I have written the following SQL query. The query takes the data from both tables and perform the Join on Invoice number. In case of there is any mismatch among the columns we are storing that particular record in our Customer Mismatch logging table. Also it can be the possible that multiple columns have data discrepancy for same record . As part of logging we are also concatenating all these columns in our error logging table.
We have used the below query to find any discrepancy in data.
Once we execute the Query it inserts the mismatch records in CUSTOMER_MISMATCH table.
Data Comparison with detail information: