
Snowflake most attractive features is its native support for semi-structured data.
Therefore, with Snowflake no need to distinguish to implement multiple or separate systems to process structured and semi-structured data. However, Process for Loading the CSV or JSON file is identical and smooth. STRIP_OUTER_ARRAY is one of the key File format property will be discussing in this post.
Most of the issues appear while loading JSON is either we are not aware of the JSON file structure, or not leveraging the parameters providing by Snowflake to ingest the Semi-Structured data.
STRIP_OUTER_ARRAY:
Today we are going to talk about one of the Snowflake important property i.e. STRIP_OUTER_ARRAY, Removes the outer set of square brackets [ ] when loading the data, separating the initial array into multiple lines. If we did not strip the outer array, our entire dataset would be loaded into a single row in the destination table.
Enough theory, Now Get your hands dirty with below scenario:
- Case 1: File doesn’t have an outer array. It has a single outer object containing a property with an inner array.
- Case 2: File has an outer array. STRIP_OUTER_ARRAY =TRUE set in FILE_FORMAT.
- Case 3:File has an outer array. STRIP_OUTER_ARRAY =TRUE removed from FILE_FORMAT.
- Case 4:Recompose the JSON file after reading line by line. SELECT $1 <<tag>> FROM @STAGE/<<File>>.json
- Case 5: Big File say 1 GB : STRIP_OUTER_ARRAY =TRUE set, still getting “Error parsing JSON: document is too large.
Lets get discuss each Case:
Case 1: No outer Array in File. It has a single outer object containing a property with an inner array:
## File Format
create or replace file format demo_db.public.json_format
type = 'json'
STRIP_OUTER_ARRAY = TRUE;
## Stage Create
create or replace stage demo_db.public.ext_json_stage
URL = 's3://testsnowflakepro/snowflake/json/json1/'
STORAGE_INTEGRATION = s3_int
file_format = demo_db.public.json_format;
create or replace table client (json_data variant not null);
copy into client
 from @ext_json_stage/Client_Json.json
{
	"Client": [
		{
			"ClientNo": 1,
			"ClientName": "Alpha",
			"ClientBusiness": [
				{
					"BusinessNo": 1,
					"IndustryCode": "12345"
				},
				{
					"BusinessNo": 2,
					"IndustryCode": "23456"
				}
			]
		},
		{
			"ClientNo": 2,
			"ClientName": "Bravo",
			"ClientBusiness": [
				{
					"BusinessNo": 1,
					"IndustryCode": "34567"
				},
				{
					"BusinessNo": 2,
					"IndustryCode": "45678"
				}
			]
		}
	]
}
Even the file have two separate records all the records got loaded into the table as one record, even the STRIP_OUTER_ARRAY = TRUE was set because File does not have outer Array.
Case 2: File has an outer array. STRIP_OUTER_ARRAY =TRUE set in FILE_FORMAT: File contains two different records and because of the the STRIP_OUTER_ARRAY = TRUE was set, Record were loaded into the table as two separate rows.
[
    {
        "timestamp": 1565001986254,
        "book": [
            {
                "symbol": "XBTUSD",
                "id": 8798792450,
                "side": "Sell",
                "size": 21580,
                "price": 12075.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798792500,
                "side": "Sell",
                "size": 827344,
                "price": 12075
            },
            {
                "symbol": "XBTUSD",
                "id": 8798792550,
                "side": "Sell",
                "size": 1000228,
                "price": 12074.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798792600,
                "side": "Sell",
                "size": 36873,
                "price": 12074
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895100,
                "side": "Buy",
                "size": 33760,
                "price": 11049
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895200,
                "side": "Buy",
                "size": 3069,
                "price": 11048
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895250,
                "side": "Buy",
                "size": 2000,
                "price": 11047.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895300,
                "side": "Buy",
                "size": 64454,
                "price": 11047
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895350,
                "side": "Buy",
                "size": 18854,
                "price": 11046.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895400,
                "side": "Buy",
                "size": 689,
                "price": 11046
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895450,
                "side": "Buy",
                "size": 6550,
                "price": 11045.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895500,
                "side": "Buy",
                "size": 20650,
                "price": 11045
            },
            {
                "symbol": "XBTUSD",
                "id": 8798895550,
                "side": "Buy",
                "size": 17118,
                "price": 11044.5
            }
        ]
    },
    {
        "timestamp": 1565001986275,
        "book": [
            {
                "symbol": "XBTUSD",
                "id": 8798792450,
                "side": "Sell",
                "size": 21580,
                "price": 12075.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798792500,
                "side": "Sell",
                "size": 827344,
                "price": 12075
            },
            {
                "symbol": "XBTUSD",
                "id": 8798792550,
                "side": "Sell",
                "size": 1000228,
                "price": 12074.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842150,
                "side": "Buy",
                "size": 736050,
                "price": 11578.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842200,
                "side": "Buy",
                "size": 58449,
                "price": 11578
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842250,
                "side": "Buy",
                "size": 2254,
                "price": 11577.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842300,
                "side": "Buy",
                "size": 55252,
                "price": 11577
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842350,
                "side": "Buy",
                "size": 2564,
                "price": 11576.5
            },
            {
                "symbol": "XBTUSD",
                "id": 8798842400,
                "side": "Buy",
                "size": 172251,
                "price": 11576
            }
        ]
    }
]create or replace table client1 (json_data variant not null);
copy into client1
 from @ext_json_stage/Book.json

Please follow the next series for Rest of cases mentioned above in this blog, Click here.
 
                                                                             
                                                                             
                                                                             
                                                                             
                                                                             
                                 
                                 
                                 
                                 
                                 
                                 
                                