During this post we will discuss how to identify the long running queries in your database. Let’s say in your production you have come across the situation where queries are eating up Compute resources. There are the queries which are being executed for more than 5 minutes in your production. We need to verify these queries and cancel or abort them if required. so often a slow query is the symptom of a mistake in the way the query is written.
Sometimes knowingly or unknowingly, developer has issued the wrong query by mistake. Query runs by developer forming the cartesian product however, it is the result of an erroneous join condition. An extreme case is if the join condition is missing completely. In that case, the number of rows produced will be equal to the product of the number of rows on both sides.. This results into the Long running queries. When Snowflake warehouse cannot fit an operation in memory, it starts spilling (storing) data first to the local disk of a warehouse node, If the local disk space is not suffice, the spill data is then saved to remote disks. These operations are slower than memory access and can slow down query execution a lot.
We will use the QUERY_HISTORY functions to view the query history information. The QUERY_HISTORY family of table functions can be used to query Snowflake query history along various dimensions:
select * from table(information_schema.query_history()) order by start_time;
Say we have below two tables and user has executed the below query:
INVOICE : 30 Million
CAMP : 30 Million
select * from INVOICE ,CAMP ;
Here we can see there is join missing on these tables and the query has been running since last 10 mins.
Following query has been use to identify long runner. Her we are having benchmark of 5 minutes. The Query which is older than 5 minutes will be track and will fetch the query id.
user_name, warehouse_name,start_time, datediff(minutes, start_time, current_timestamp) as total_time
from table(snowflake.information_schema.query_history(end_time_range_start=>dateadd(minutes, -5, current_timestamp())))
(EXECUTION_STATUS <> 'SUCCESS' and EXECUTION_STATUS not like 'FAILED%')
and start_time < dateadd(second, -300, current_timestamp())
order by start_time;
Later on we can cancel this query using SYSTEM$CANCEL_QUERY.