In this post, we are going to discuss Python User-Defined Functions (UDFs) within Snowflake. While we don’t have a specific use case in mind, this exploration is driven by the sheer potential these UDFs offer. Especially if we talk about the seamless integration of Python within Snowsight, this integration simplifies the coding experience. It helps in allowing user to dynamically switch between SQL, scripting, JavaScript, and Python, without needing physical installations on our local machines. The flexibility offered by Snowflake’s Python UDFs opens up a world of possibilities, making complex tasks more accessible and enhancing the overall user experience.
In this discussion, our focus lies in creating a User-Defined Function (UDF) designed to calculate the word count within a provided text. The UDF will receive an input string, tally the words within the text, and furnish an array detailing the word counts. To enhance the practicality, we will extend this functionality to work with Snowflake table columns as inputs, generating an array that represents the word counts within the specified column.
Furthermore, we will delve into an additional use case. Here, the UDF will not only take the input text but also an array of specific words. It will then search for occurrences of these words within the input text, providing an output indicating whether the words were found within the respective table column or not.
Technical Implementation:
Use Case1: Word count. Below Python UDF will count the number of words to the input string passed to the UDF.
Call the UDF:
SELECT CC_CALL_CENTER_SK,CC_REC_START_DATE,CC_REC_END_DATE,CC_NAME,CC_MKT_DESC AS INPUT,
COUNT_WORDS(CC_MKT_DESC) as word_counts from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.call_center;
Output:
Use Case2: Python UDF to search the array of keywords in Input String.
First we will create a table to hold the keywords which will be Search against the Input String
CREATE or replace TABLE search_keyword_tbl (search_text VARCHAR(50));
INSERT INTO search_keyword_tbl (search_text)
VALUES ('Shared'), ('largely'), ('Rich'), ('Reduced'), ('Blue');
Call the UDF:
SELECT
CC_CALL_CENTER_SK,CC_MKT_DESC AS INPUT,
(SELECT ARRAY_AGG(search_text) from search_keyword_tbl) as Keywords_to_find,
WORD_CNT_SELCETIVE_FIND(CC_MKT_DESC, (SELECT ARRAY_AGG(search_text) from search_keyword_tbl))
from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.call_center;
Output: