Automation testing for aws lambda functions in python - python

I have a aws lambda function which will write s3 file metadata information in dynamodb for every object created in s3 bucket, for this I have event trigger on s3 bucket. So i'm planning to automate testing using python. Can any one help out how I can automate this lambda function to test the following using unittest package.
Verify the dynamodb table existency
Validate whether the bucket exists or not in s3 for event trigger.
Verify the file count in s3 bucket and record count in Dynamodb table.

This can be done using moto and unittest. What moto will do is add in a stateful mock for AWS - your code can continue calling boto like normal, but calls won't actually be made to AWS. Instead, moto will build up state in memory.
For example, you could
Activate the mock for DynamoDB
create a DynamoDB table
Add items to the table
Retrieve items from the table and see they exist
If you're building functionality for both DynamoDB and S3, you'd leverage both the mock_s3 and mock_dynamodb2 methods from moto.
I wrote up a tutorial on how to do this (it uses pytest instead of unittest but that should be a minor difference). Check it out: joshuaballoch.github.io/testing-lambda-functions/

Related

Conditional writes to DynamoDB when executing an AWS glue script without Boto?

I've written an AWS glue job ETL script in python, and I'm looking for the proper way to perform conditional writes to the DynamoDb table I'm using as the target.
# Write to DynamoDB
glueContext.write_dynamic_frame_from_options(
frame=SelectFromCollection_node1665510217343,
connection_type="dynamodb",
connection_options={
"dynamodb.output.tableName": args["OUTPUT_TABLE_NAME"]
}
)
My script is writing to dynamo with write_dynamic_frame_from_options. The aws glue connection parameter docs make no mention of the ability to customize the write behavior in the connection options.
Is there a clean way to write conditionally without using boto?
You cannot do conditional updates with the EMR DynamoDB connector which Glue uses. It does a complete overwrite of the data. For that you would have to use Boto3 and distribute it using forEachPartition across the Spark executors.

How can i automatically delete AWS S3 files using python?

I want delete some files from S3 after certain time. i need to set a time limit for each object not for the bucket. is that possible?
I am using boto3 to upload the file into S3.
region = "us-east-2"
bucket = os.environ["S3_BUCKET_NAME"]
credentials = {
'aws_access_key_id': os.environ["AWS_ACCESS_KEY"],
'aws_secret_access_key': os.environ["AWS_ACCESS_SECRET_KEY"]
}
client = boto3.client('s3', **credentials)
transfer = S3Transfer(client)
transfer.upload_file(file_name, bucket, folder+file_name,
extra_args={'ACL': 'public-read'})
Above is the code i used to upload the object.
You have many options here. Some ideas:
You can automatically delete files are a given time period by using Amazon S3 Object Lifecycle Management. See: How Do I Create a Lifecycle Policy for an S3 Bucket?
If you requirements are more-detailed (eg different files after different time periods), you could add a Tag to each object specifying when you'd like the object deleted, or after how many days it should be deleted. Then, you could define an Amazon CloudWatch Events rule to trigger an AWS Lambda function at regular periods (eg once a day or once an hour). You could then code the Lambda function to look at the tags on objects, determine whether they should be deleted and delete the desired objects. You will find examples of this on the Internet, often called a Stopinator.
If you have an Amazon EC2 instance that is running all the time for other work, then you could simply create a cron job or Scheduled Task to run a similar program (without using AWS Lambda).

Using AWS Lambda to run Python script, how can I save data?

I have a script that gathers data from an API, and running this manually on my local machine I can save the data to a CSV or SQLite .db file.
If I put this on AWS lambda how can I store and retrieve data?
TL;DR
You can save data in an instance of a lambda function, only you don't really want to use it as permanent storage. Instead, you want to use a cloud service that specializes in storing data, which one will depend on your use case.
Some background info
When using lambda you have to think about it as an ephemeral instance in which you only have access to the /tmp directory and can save up to 512MB (see lambda limits). The data stored in the /tmp directory may be only available during the execution of the function, and there are no guarantees that any information you save there will be available in future executions.
Considerations
That is why you should consider using other cloud services to store data, e.g. Simple Storage Service (S3) for storing files, RDS for relational databases, or DynamoDB as a NoSQL database solution.
There are many other options and it will all depend on the use case.
Working solution
With python, it is very simple to store files in S3 using boto3. The code uses the library requests to do a GET request to google.com and saves the output to an S3 bucket. As an additional step, it also creates a signed URL that you can use to download the file
# lambda_function.py
import os
import boto3
from botocore.client import Config
import requests
s3 = boto3.resource('s3')
client = boto3.client('s3', config=Config(signature_version='s3v4'))
# This environment variable is set via the serverless.yml configuration
bucket = os.environ['FILES_BUCKET']
def lambda_handler(event, conntext):
# Make the API CALL
response = requests.get('https://google.com')
# Get the data you care and transform it to the desire format
body = response.text
# Save it to local storage
tmp_file_path = "/tmp/website.html"
with open(tmp_file_path, "w") as file:
file.write(body)
s3.Bucket(bucket).upload_file(tmp_file_path, 'website.html')
# OPTIONAL: Generar signed URL to download the file
url = client.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': bucket,
'Key': 'website.html'
},
ExpiresIn=604800 # 7 days
)
return url
Deployment
To deploy the lambda function I highly recommend using a deployment tool like Serverless or LambdaSharp. The following is a serverless.yml file for the serverless framework to package and deploy the code, it also creates the S3 bucket and sets the proper permissions to put objects and generate the signed url:
# serverless.yml
service: s3upload
provider:
name: aws
runtime: python3.7
versionFunctions: false
memorySize: 128
timeout: 30
# you can add statements to the Lambda function's IAM Role here
iamRoleStatements:
- Effect: "Allow"
Action:
- s3:PutObject
- s3:GetObject
Resource:
- Fn::Join: ["/", [Fn::GetAtt: [FilesBucket, Arn], "*"]]
- Fn::GetAtt: [FilesBucket, Arn]
# Package information
package:
artifact: package.zip
functions:
s3upload-function:
handler: lambda_function.lambda_handler
environment:
FILES_BUCKET:
Ref: FilesBucket
events:
# THIS LAMBDA FUNCTION WILL BE TRIGGERED EVERY 10 MINUTES
# CHECK OUT THE SERVERLESS DOCS FOR ALTERNATIVE WAYS TO
# TRIGGER THE FUNCTION
- schedule:
rate: rate(10 minutes)
# you can add CloudFormation resource templates here
resources:
Resources:
FilesBucket:
Type: AWS::S3::Bucket
Properties:
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Now package and deploy
#!/usr/bin/env bash
# deploy.sh
mkdir package
pip install -r requirements.txt --target=./package
cp lambda_function.py package/
$(cd package; zip -r ../package.zip .)
serverless deploy --verbose
Conclusion
When you run lambda functions, you must think of them as stateless. If you want to save the state of your application, it is better to use other cloud services that are well suited for your use case. For storing CSVs, S3 is an ideal solution as it is a highly available storage system that is very easy to get started using python.
with aws lambda you can use database like dynamo db which is not sql database and from there you can download csv file.
with lambda to dynamo bd integration is so easy lambda is serverless and dynamo db is nosql database.
so you can save data into dynamo db also you can use RDS(Mysql) and use man other service but best way will be dynamo db.
It really depends on what you want to do with the information afterwards.
If you want to keep it in a file, then simply copy it to Amazon S3. It can store as much data as you like.
If you intend to query the information, you might choose to put it into a database instead. There are a number of different database options available, depending on your needs.

Auto populate timestamp in DynamoDB in python using boto3

Is it possible to have DynamoDB conditionally save a timestamp if the item is created?
It looks like the AWS Java SDK provides this functionality via the #DynamoDBAutoGeneratedTimestamp annotation.
Your could write/use a DynamoDB trigger - an AWS Lambda function - to do this for you:
https://aws.amazon.com/dynamodb/faqs/
** Q. How does DynamoDB Triggers work? **
The custom logic for a DynamoDB trigger is stored in an AWS Lambda
function as code. To create a trigger for a given table, you can
associate an AWS Lambda function to the stream (via DynamoDB Streams)
on a DynamoDB table. When the table is updated, the updates are
published to DynamoDB Streams. In turn, AWS Lambda reads the updates
from the associated stream and executes the code in the function.

How to mock an unit test for a delete bucket operation using boto3

I'm using Python 2.7 and boto3 to interact with S3 buckets. So far so good!
What I'm trying to achieve now is an unit test for the delete bucket operation, but using mocked data, i.e., with no real interaction with the S3 storage.
For other unit tests throughout the project, I've used patches and boto3's Stubber successfully, but for some reason, I'm unable to find a way to use the same techniques to mock interactions using the S3 resource and the Bucket sub-resource.
This is the snippet of code I want to unit test:
def delete_bucket(self, bucket_name):
resource = boto3.resource('s3')
bucket = resource.Bucket(bucket_name)
bucket.objects.all().delete()
return bucket.delete()
Thanks!
You can use unittest.mock. With this, any method can be patched and the return value can be set:
from unittest.mock import patch
with patch('boto3.bucket.delete') as boto_delete_patch:
boto_delete_patch.return_value = 'Return value'
# Perform any action

Categories

Resources