Snowflake : Data Refresh Approach to Pre-Production Environment

Read Time:4 Minute, 6 Second

Before migrating to the production and performing efficient pre-production deployment activities. It Requires an environment for testing that exactly resembles a production environment. It seeks to mirror an actual production environment as closely as possible. This is where the development team can test, verify, and do things that will not break the live environment. Below are the implement strategy for data refresh by setting up of Non-Prod Snowflake Data warehouse environments from PRODUCTION.

Snowflake offers two mechanisms for seeding a preproduction environment with production data.
1. Snowflake Secure Data Sharing is use when the environments are on separate Snowflake accounts.
  1. Share Database is read only so it suits to requirement where we want the DB or tables for viewing the result, not perform any DML.
2. Zero-Copy Cloning use when the environments are on the same account.
  1. Full control over DB/tables and can perform DML on data.

Figure

Method to Seed Environment — Courtesy: Snowflake

2. Data Replication: Another feature enables replicating databases between Snowflake accounts (within the same organization) and keeping the database objects and stored data synchronized. Database replication is supported across regions and across cloud platforms Replicated database can be refreshed periodically with a snapshot of the primary database, replicating all data as well as DDL operations on database objects (i.e. schemas, tables, views, etc.).

Few challenges with this approach:

Few challenges with this approach:

The Snowflake accounts are not link by default and need to be link by creating service requests with the Snowflake support team.
Replicated databases cannot be use as it is always in READ mode, hence it is required to create CLONE REPLICATED database in environment.

3. Snowflake’s fast cloning to support an agile, light-weight DevOps process that does not require a lot of DBA management.

Example : Starting out, let’s assume you have one database in Snowflake (i.e. production by default) and it has a schema named PUBLIC. In this database you have two tables (A and B) and they gets populate daily by a load process.

Now you find you need to make changes and additions and realize you should not do those in production so now you need a development database. How do you do that in Snowflake?

You could do it the old-fashioned way and just create a new database then copy the existing tables and data over from production (using a “CREATE TABLE AS SELECT …” or CTAS operation). If they are large tables that would take some time and cost extra as it also requires more space.

However, using Snowflake’s fast clone feature you can do it way faster and not incur the cost for extra space. This gives our customers the ability to almost instantly make the data available to use for multiple user groups, without the additional cost (or time) of actually replicating the data.

Approach

4. Bottom-Up Approach: In the bottom approach, the sources feeding Production data warehouse should also feed data into acceptance or Development environment. Acceptance/Development data warehouse will not have all data available from Production in this approach. This approach is advisable for faster testing and small data warehouses.

5. No Loads Running: It is recommended to ensure the replication/Cloning done at a time when there are no data load activities going on in the Production database.

6. List development/TESTING environments needed for the migration.

7. List databases and data objects that need to be migrate.

8. List Snowflake accounts that exist or need to be created. Depends on the Business need if we want two physical separations for PROD and DEV or accommodated to one account.

9. List security roles, users and permissions. The SECURITYADMIN role should always be use when creating or modifying USER and ROLE objects.

10. Reduce Schema change frequency with Variant Datatype: New features commonly require schema changes. But schema changes can be expensive operationally as they require developers to coordinate their code changes with updates to the database schema. Using VARIANT columns in this way significantly reduces the DevOps operational burden for app developers.

setup

11. Set up Version Control: Set up Version Control with Compatible Database change management tools support Snowflake. These include Schema change, Flyway etc.

In addition, A great article has been posted by Ramesh on Data refresh setup. To get more details ,Please click here.

I have taken the data refresh reference and compiled the above points to get understanding at one place.

Conclusion:

IF there is requirement of physical separation of Production data from Development environments, it is must to have separate Snowflake accounts and use replication to copy data from Production to Dev and CREATE Clone in DEV for replicated data
Moreover, In case of single snowflake account, DATA CLONE feature can be use to create Development databases next to PRODUCTION database in same snowflake account.