Read_Sql() and Fetch_Pandas(): During this post we will discuss different ways to get data from Snowflake to Python. Also we will try to compare the performance in terms of time to fetch data from Snowflake. For this discussion we have considered below three functions and measure the time of each module in querying data from snowflake table. Also please note, we have done the testing on snowflake table holding approx. 6M records.
- read_sql()
- fetch_pandas_all()
- fetch_pandas_batches()
- read_sql is a built-in function in the Pandas package that returns a data frame corresponding to the result set in the query string.
Snowflake introduced a much faster method for SELECT operation, fetch_pandas_all(), and fetch_pandas_batches() which leverages Arrow. With adoption of Apache Arrow, can fetch result sets much faster while conserving memory and CPU resources. with Pandas Dataframe, the performance is even better with the introduction of our new Python APIs, which download result sets directly into a Pandas Dataframe
Note: To load data from fetch_pandas_all(), need to have snowflake-connector-python[pandas] install.
2. fetch_pandas_all(): This method fetches all the rows in a cursor and loads them into a Pandas Dataframe.
3. fetch_pandas_batches():Finally, This method fetches a subset of the rows in a cursor and delivers them to a Pandas DataFrame.
Note: However, Though the timing and behavior can be different but tries to share my observations in conjunction with these three functions.