AWS Lambda Tutorial - how to create a daily scheduled task

Azimuth

2019-02-15

In this tutorial, I’m going to talk about how to create a AWS Lambda function triggered by scheduled events.
For example, I want to schedule a daily job of pulling data from an API and save the data to a s3 bucket. First I need to create a lambda function, and choose “CloudWatch Events” from the trigger list.

Then click on the center function itself and scroll down to see the file editor. You can choose to upload a zip file that contains your code, but for simple functions you can also just edit the code inline.

Choose whatever runtime you want. In my case, I’m using Python 3.7.

You can change the name of the function from lambda_handler to whatever you like, but make sure to change the name shown in “Handler” too.

You can start writing your function too as you like. For example, I have the following code to download a gz file from url and decompress, and finally save to s3 bucket.

from io import BytesIO
import gzip
from datetime import datetime

from botocore.vendored import requests
import boto3


def get_export(event, context):
    
    # get file
    r = requests.get(fileurl)
    fileobj = BytesIO(r.content)
    f = gzip.GzipFile(mode="rb", fileobj=fileobj)
    data = f.read()
    
    # upload to s3
    s3 = boto3.resource("s3")
    bucket_name = "unity-analytics-library"
    today = datetime.now().strftime("%Y-%m-%d")
    s3_path = "unity-export-{0}.tsv".format(today)
    s3.Bucket(bucket_name).put_object(Key=s3_path, Body=data)
    
    return {
        'statusCode': 200,
        's3_file': s3_path
    }

Just a few things to note here:

If you want to use requests, you can use from botocore.vendored import requests instead
AWS Lambda does not support opening and saving files since it doesn’t have a filesystem. Therefore if you are doing some fileobject manipulation like me, you can use BytesIO or StringIO instead.
If you are using other AWS resources, for example s3 like me in the example, you don’t have to authenticate in your code since they are both AWS resources. However you do need to make sure the IAM role you’re using for this Lambda function has sufficient permission to do whatever you need. For example in my case I had to make a bucket policy to grant write permission to this particular IAM role.

The bucket policy would look like something like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<your-iam-role>"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<bucket-name>",
                "arn:aws:s3:::<bucket-name>/*"
            ]
        }
    ]
}