AWS S3 Loader Using AWS Lambda
The S3 loader leverages AWS Lambda functions and Amazon Simple Queue Service (SQS) to divide the CUR loading process into smaller chunks, which it can complete more efficiently. The loading process includes:
- Unzip files. The S3 loader invokes an AWS Lambda function to unzip the CUR files and write the unzipped contents to the S3 reporting bucket.
- Poll queue. The SQS queue monitors the unzipping process. When it reaches either of the following criteria, the process moves to the next step:
- Process report files and generate S3 manifest. The S3 loader invokes two AWS Lambda functions: one for data in the main CUR table and one for data in the tags table. These AWS Lambda functions spawn a series of other AWS Lambda functions (50+ in total). The functions process the data in small, manageable chunks that can be processed simultaneously. Then, the S3 manifests save the results to the S3 bucket.
- Load data S3. The final step uses a database trigger to populate the data from S3 to two tables in thedatabase.
This guide walks through how to configure Kion and AWS to use this loading process.
The use of AWS Lambda functions and SQS queue can incur charges in your AWS account, but these costs should be negligible. The entire process will only run when a new CUR is available, which is usually every 12 hours.
Configuration
Perform the following manual configuration in AWS to start using the S3 loader:
In the AWS Account Where Kion is Installed
- Run the s3loader_rds_cft.json AWS CloudFormation template, which is available here.
- What it does:
- Creates an IAM role giving read/write access to the CUR S3 bucket.
- Creates an RDS cluster parameters group that allows the following to load data and set the new role: LOAD DATA S3 MANIFEST, aurora_load_from_s3_role, and aws_default_s3_role.
- Inputs include:
- ARN of S3 bucket where the payer account stores CURs.
- Parameter group family. This needs to match the DB engine version. See the AWS docs for more information.
- What it does:
- Attach the new IAM role to the RDS cluster.
- Attach the new cluster parameters group to the RDS cluster.
- Restart the cluster.
In the Payer Account
- Run the s3loader_payer_cft.json AWS CloudFormation template, which is available here.
- What it does:
- Creates the SQS queue.
- Creates an IAM role for the AWS Lambda functions.
- Creates three AWS Lambda functions, which:
- Unzip report files.
- Generate CUR data for the database.
- Generate CUR tag data for the database.
- Creates a policy for the cloudtamer-service-role, which:
- Allows read/delete access on the SQS queue.
- Allows invoke on AWS Lambda.
- Allows read/delete access on the S3 report bucket.
- Inputs include:
- ARN of S3 bucket where the payer account stores CURs.
- IAM role name for the cloudtamer-service-role.
- What it does:
- Modify the policy on the S3 report bucket by adding the following code. This allows RDS to pull directly from S3. Replace the values between and including
<>
s with the appropriate ARN.{ "Effect": "Allow", "Principal": { "AWS": <ARN of IAM role created on Kion acccount. Multiple ARNs can be added if you are using mulitple accounts with the same payer.> }, "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": <ARN of bucket>/*" }
In the Kion Database
Run the following query in the Kion database to activate the new S3 loader, then restart the database.
UPDATE cloudtamer_config SET ct_value = 'v2' WHERE ct_key = 's3_loader_version';