Returns a subset of rows sampled randomly from the specified table. The following sampling methods are supported:
- Sample a fraction of a table, with a specified probability for including a given row. Therefore, the number of rows returned depends on the size of the table and the requested probability.
- However, A seed can be specified to make the sampling deterministic.
- Sample a fixed, specified number of rows. The exact number of specified rows is returned unless the table contains fewer rows.
SAMPLE and TABLESAMPLE are synonymous and can be used interchangeably.
Sample method is optional. If no method is specified, In conclusion the default is BERNOULLI.
SELECT ...
FROM ...
{ SAMPLE | TABLESAMPLE } [ samplingMethod ] ( { <probability> | <num> ROWS } ) [ { REPEATABLE | SEED } ( <seed> ) ]
[ ... ]
samplingMethod ::= { { BERNOULLI | ROW } |
{ SYSTEM | BLOCK } }
Where:
samplingMethod ::= { { BERNOULLI | ROW } | { SYSTEM | BLOCK } }
BERNOULLI | ROW
orSYSTEM | BLOCK
Specifies the method to use:
BERNOULLI
(orROW
): Includes each row with aprobability
ofp/100
. Similar to flipping a weighted coin for each row. The resulting data size is approxmilatelyof p/100 * number of Rows.SYSTEM
(orBLOCK
): Includes each block of rows with aprobability
ofp/100
. Similar to flipping a weighted coin for each block of rows. This method does not support fixed-size .
REPEATABLE | SEED ( seed )
Specifies a seed value to make this deterministic.
Which to use: Use SYSTEM/BLOCK in case of higher volume of data and BERNOULI/ROW in case of smaller tables:
CREATE OR REPLACE TRANSIENT TABLE SUPPLIER_BLOCK_ALG
AS
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.SUPPLIER sample system (3) seed (82);
CREATE OR REPLACE TRANSIENT TABLE SUPPLIER_ROW_ALG
AS
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF10000.SUPPLIER sample row (3) seed (82);
Note: Here (3) is the percentage not the number of rows:
Performance Considerations
SYSTEM | BLOCK
sampling is often faster thanBERNOULLI | ROW
sampling.- Sampling without a
seed
is often faster than sampling with aseed
. - Fixed-size sampling can be slower than equivalent fraction-based because fixed-size prevents some query optimization.
Fixed-size Row Sample:
Return a fixed-size sample of 10 rows in which each row has a max(1, 10/n) probability of being included in the sample, where n is the number of rows in the table:
select * from testtable sample (10 rows);