In the previous post, we created few tables in Redshift and in this post we will see how to load data present in S3 into these tables. To load data into Redshift, the most preferred method is COPY command and we will use same in this post. COPY command loads data in parallel leveraging the MPP core structure of Redshift. The default file delimiter is "|" (pipe) and I have used same for data file in this example. So let's see the COPY command in action below:
COPY H_PART FROM 's3://<bucketname>/<bucketfolder>' credentials 'aws_iam_role=arn:aws:iam::<account-id>:role/<rolewiths3access>';
Following parameters are required in above statement:
1) bucketname : S3 Bucket Name
2) bucketfolder : S3 Bucket Folder in which file exists
3) account-id: Account ID used for creating role
4) rolewiths3access : Create a role with S3 access
Replace above 4 parameters in your Copy statement and run the command from any SQL Editor.
You can also give Access & Secret key for authorisation in COPY command in place of ROLE.
copy h_customer FROM 's3://<bucketname>/<bucketfolder>' access_key_id '<access-key>' secret_access_key '<secret-key>'
However the preferred method is to use IAM Role for copying data into Redshift table.
After running the command I was able to see data in my Redshift tables.
select count(*) from h_customer; --100000 select count(*) from h_lineitem; --100000 select count(*) from h_nation; --25 select count(*) from h_order;--100000 select count(*) from h_part; --100000 select count(*) from h_partsupp; --100000 select count(*) from h_supplier; --100000
I have covered very basic manner of loading data file from S3 into Redshift table in this post. In the later posts, we will see more options like manifest file, fixed width or delimited file, using compressed files etc which are available to load data into Redshift tables.