PATTERN with REGEX : During this post we will discuss how we can use Regular expression in COPY command. As per the current scenario we are getting the multiple feeds files from customer in a bucket. These Feed files are varying according to their source system. Please note, all heterogenous source system place these files in single bucket.
As per the ask we need to select the respective source files (based on the nature of source system) from the bucket. So we need to use PATTERN with REGEX in COPY command to fetch only relevant feed files.
Consider the below screen shot of the bucket where Feed files has been copied by source in a bucket.
As we can see, there are different types of files and each file has to be load in respective tables.
Sales channel Files starting with Numeric i.e. 101.102 should load in Customer Invoice table.
list @DEMO_DB.public.ext_poc_stage pattern='.*[0-9]+.*[a-z]+.*[_].*.csv'
Flight Delta File i.e. part1 and part2 should be picked during the Load, Flight_Table file needs to left out.
list @DEMO_DB.public.ext_poc_stage pattern='.*Flights_Table.*[.].*.csv'
Employee delta File i.e 1,2 should be picked and load into Employee Delta table:
list @DEMO_DB.public.ext_poc_stage pattern='.*emp_load.*[0-9].*.csv'
Employee Historical File to be pick and should left the Delta Employee feed file:
list @DEMO_DB.public.ext_poc_stage pattern='.*emp_load.csv'
Load all the Persons file i.e. cust,invc,src and avoid Delta file i.e. load1,load2
list @DEMO_DB.public.ext_poc_stage pattern='.*person_load.*[^0-9].*.csv'
Load Person delta file only..
list @DEMO_DB.public.ext_poc_stage pattern='.*person_load.*[0-9].*.csv'
Once you are done with listing the files according to the requirement, its time to load the data into table.