QUALIFY Clause: ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it is apply (either each row in the partition or each row returned by the query), in the order sequence of rows specified in the order_by_clause , beginning with 1.
The order_by_clause is required. It specifies the order of rows in each partition or in the whole result set. The query_partition_clause is optional. It distributes the rows into partitions to which the function is applied. If you omit the query_partition_clause, the function will treat the whole result set as a single partition.
Today one of my colleague asked me to find the duplicate records from the table using Analytical function ROW_NUMBER(). At first glance it seems very straightforward and we can use the ROW_NUMBER inside the SELECT to identify he Duplicate records. But running the below query failed and returned the highlighted error.
Basically, when you use the aliasing technique, the Snowflake compiler will just copy paste the aliased expression wherever referenced. But window functions can only appear in SELECT or ORDER BY clauses, so they will error out if used anywhere else.
Though we could able to resolve this via Common Table Expression i.e. CTE.
Snowflake provides QUALIFY clause that filters the results of window functions.
In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:
- Group by
- Order by
The QUALIFY clause requires at least one window function to be specified in at least one of the following clauses of the SELECT statement:
- The SELECT column list.
- The filter predicate of the QUALIFY clause.
But is it more performant?
As such there is no Snowflake documentation on performance improvement from using Qualify. But we can say it is just a syntax change. Qualify evaluates after a WINDOW function, which means the entire table with the WINDOW function has to be create before the Qualify is perform. So basically, it’s the same thing but using one less query