Snowflake
Efficient Data Loading with COPY INTO

Snowflake COPY INTO: Efficient Data Loading

The COPY INTO command in Snowflake facilitates the efficient loading of data from staged files into an existing table. This command is supports various staging areas including named internal stages, external stages that reference locations like Amazon S3, Google Cloud Storage, or Microsoft Azure, and directly from external URLs.

Syntax

Here's a breakdown of the syntax and functionality:


_10
COPY INTO <table_name>
_10
FROM { internalStage | externalStage | externalLocation }
_10
[ FILES = ( '<file_name>', ... ) ]
_10
[ PATTERN = '<regex_pattern>' ]
_10
[ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' |
_10
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
_10
[ copyOptions ]
_10
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]

Key Options:

  • FROM: Specifies the source of the files, which can be an internal stage, external stage, or direct external location.
  • FILE_FORMAT: Defines the format of the incoming data, which can be set by referencing a predefined file format or directly in the command.
  • FILES and PATTERN: List specific files or use regex to define files to be loaded.

When to Use COPY INTO

COPY INTO is optimal for batch data loading when:

  • Cost Efficiency: It's more cost-effective for scheduled large batch processing, especially if you can fully utilize compute resources.
  • Ease of Use: Requires only basic SQL knowledge, integrating seamlessly with batch and ETL workflows.
  • Control: Offers detailed control over data ingestion parameters and error handling.

Alternatives to Consider

For lower latency or frequent small file loads:

In essence, use COPY INTO for regular, large batch loads for cost-effectiveness and ease of use.

Turn to Snowpipe or Snowpipe Streaming for real-time or continuous data needs where immediate data freshness is critical. Each option serves different needs, so the choice depends on your specific requirements for data freshness, technical resources, and budget.

Examples

Loading Basic CSV Files

To load data from a staged file into a specified table, you might use:


_10
COPY INTO my_table
_10
FROM @my_stage
_10
FILES = ('data1.csv', 'data2.csv')
_10
FILE_FORMAT = (TYPE = CSV, FIELD_DELIMITER = ',', SKIP_HEADER = 1);

This command will load data1.csv and data2.csv from the stage @my_stage into my_table assuming the data is formatted as CSV with a header row.

Data Load with Transformation

For transforming data during the load, you could use a query to select and transform data before it is loaded:


_10
COPY INTO my_table(col1, col2)
_10
FROM (SELECT $1, $2 FROM @my_stage/file.csv)
_10
FILE_FORMAT = (TYPE = CSV);

This command selects the first two columns from the staged CSV file and loads them into my_table.

Y42 SQL completion user interface

Your AI copilot for Snowflake SQL

Write production-ready SQL with AI-based suggestions.

Get Started