3 0
Read Time:2 Minute, 31 Second

XML Load: Traditional database architectures were designed to store and process data in strictly relational rows and columns. With structured data, every column is assigned a specific datatype (e.g., date, string, number, timestamp, etc.). But a significant share of data today is machine-generated and delivered in semi structured data formats such as JSON, Avro, and XML.

Storing the semi-structured data in a relational database  require to compromise the flexibility or performance. However, Snowflake starts by making it possible to flexibly store semi-structured records inside a relational table in native form. Therefore, This is accomplish through a custom datatype (Snowflake’s VARIANT datatype) that allows schema-less storage of hierarchical data, including JSON, Avro, XML and Parquet. This makes it possible to load semi-structured data directly into Snowflake without pre-processing, losing information, or defining a schema. A variant column can store an entire semi-structured document, in its native shape, with zero transformation required. A simple dot notation syntax allows SQL access to any value, anywhere in the document.

During this post we will talk about the XML data processing inside the snowflake.

XML Operators and Functions:

XML Operators and Functions:

$ : The dollar sign operator returns the contents, as a VARIANT, of the value it operates on. In other words, The contents of the VARIANT field.

@: The name of the “root” element of the XML contained in the VARIANT field.

XMLGET : Extracts an XML element object (often referred to as simply a “tag”) from a content of outer XML element object by the name of the tag and its instance number (counting from 0):

  • If any argument of XMLGET is NULL, the result is NULL.
  • If the tag instance is not found, the result is similarly NULL

Sample XML:

XML File

Loading XML into Snowflake:

Load XML

Devise a query to extract the Root element as well as root content from the XML file.

Will use :

@ : to extract root element

$ : to extract root content

We will also use LATERAL FLATTEN takes an XML input and explode the list of sub-elements into its own (denormalized) row.


select
val:"@", --RETURNS ROOT ELEMENT
val:"$", --RETURNS CONTENT OF ROOT ELEMENT
xml_doc_val.index,
xml_doc_val.value,
XMLGET( xml_doc_val.value, 'jurisdiction_name'):"$" as "Jurisdiction_Name",
XMLGET( xml_doc_val.value, 'count_participants'):"$" as "Count_Participants",
XMLGET( xml_doc_val.value, 'count_female' ):"$"as "Count_Female",
COALESCE( XMLGET( xml_doc_val.VALUE, 'count_american_indian' ):"$"::STRING,
XMLGET( xml_doc_val.VALUE, 'count_asian_non_hispanic' ):"$"::STRING ) AS count_american_or_asian
from XML_TABLE,
LATERAL FLATTEN(to_array(XML_TABLE.val:"$" )) xml_doc, --n this query,
LATERAL FLATTEN(to_array(xml_doc.value:"$" )) xml_doc_val;

XML Output

 

To get more details on XML attributes, refer the following blog:

https://snowflakecommunity.force.com/s/article/introduction-to-loading-and-parsing-xml-data-using-sql

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

Leave a Reply

Your email address will not be published. Required fields are marked *