Snowflake COPY INTO: Efficient Data Loading
The COPY INTO
command in Snowflake facilitates the efficient loading of data from staged files into an existing table. This command is supports various staging areas including named internal stages, external stages that reference locations like Amazon S3, Google Cloud Storage, or Microsoft Azure, and directly from external URLs.
Syntax
Here's a breakdown of the syntax and functionality:
_10COPY INTO <table_name>_10FROM { internalStage | externalStage | externalLocation }_10[ FILES = ( '<file_name>', ... ) ]_10[ PATTERN = '<regex_pattern>' ]_10[ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' |_10 TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]_10[ copyOptions ]_10[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]
Key Options:
- FROM: Specifies the source of the files, which can be an internal stage, external stage, or direct external location.
- FILE_FORMAT: Defines the format of the incoming data, which can be set by referencing a predefined file format or directly in the command.
- FILES and PATTERN: List specific files or use regex to define files to be loaded.
When to Use COPY INTO
COPY INTO
is optimal for batch data loading when:
- Cost Efficiency: It's more cost-effective for scheduled large batch processing, especially if you can fully utilize compute resources.
- Ease of Use: Requires only basic SQL knowledge, integrating seamlessly with batch and ETL workflows.
- Control: Offers detailed control over data ingestion parameters and error handling.
Alternatives to Consider
For lower latency or frequent small file loads:
- Snowpipe (opens in a new tab): Ideal for continuous, event-driven loading with minimal latency.
- Snowpipe Streaming (opens in a new tab): Best for real-time data needs, providing the lowest latency by ingesting data as it arrives.
In essence, use COPY INTO
for regular, large batch loads for cost-effectiveness and ease of use.
Turn to Snowpipe or Snowpipe Streaming for real-time or continuous data needs where immediate data freshness is critical. Each option serves different needs, so the choice depends on your specific requirements for data freshness, technical resources, and budget.
Examples
Loading Basic CSV Files
To load data from a staged file into a specified table, you might use:
_10COPY INTO my_table_10FROM @my_stage_10FILES = ('data1.csv', 'data2.csv')_10FILE_FORMAT = (TYPE = CSV, FIELD_DELIMITER = ',', SKIP_HEADER = 1);
This command will load data1.csv
and data2.csv
from the stage @my_stage
into my_table
assuming the data is formatted as CSV with a header row.
Data Load with Transformation
For transforming data during the load, you could use a query to select and transform data before it is loaded:
_10COPY INTO my_table(col1, col2)_10FROM (SELECT $1, $2 FROM @my_stage/file.csv)_10FILE_FORMAT = (TYPE = CSV);
This command selects the first two columns from the staged CSV file and loads them into my_table
.