Trigger Lambda function when a new object arrives in S3 bucket

Trigger Lambda function when a new object arrives in S3 bucket - python

I have S3 bucket named 'files'. Every day new file arrives there. Example:
/files/data-01-23-2017--11-33am.txt
/files/data-01-24-2017--10-28am.txt
How would I make a Lambda function and set a trigger to execute one shell script on EC2 when new file arrives?
Example of new file is:
/files/data-01-25-2017--11-43am.txt
Command that I would want to execute on EC2 is (with parameter as new file name that just arrived):
python /home/ec2-user/jobs/run_job.py data-01-25-2017--11-43am.txt

Amazon S3 can be configured to trigger an AWS Lambda function when a new object is created. However, Lambda functions do not have access to your Amazon EC2 instances. It is not an appropriate architecture to use.
Some alternative options (these are separate options, not multiple steps):
Instead of running a command on an Amazon EC2 instance, put your code in the Lambda function (no EC2 instance required). (Best option!)
Configure Amazon S3 to push a message into an Amazon SQS queue. Have your code on the EC2 instance regularly poll the queue. When it receives a message, process the object in S3.
Configure Amazon S3 to send a message to an Amazon SNS topic. Subscribe an end-point of your application (effectively an API) to the SNS queue, so that it receives a message when a new object has been created.

Related

How to correctly/safely access parameters from AWS SSM Parameter store for my Python script on EC2 instance?

I have a Python script that I want to run and text me a notification if a certain condition is met. I'm using Twilio, so I have a Twilio API token and I want to keep it secret. I have it successfully running locally, and now I'm working on getting it running on an EC2 instance.
Regarding AWS steps, I've created an IAM user with permissions, launched the EC2 instance (and saved the ssh keys), and created some parameters in the AWS SSM Parameter store. Then I ssh'd into the instance and installed boto3. When I try to use boto3 to grab a parameter, I'm unable to locate the credentials:
# test.py
import boto3
ssm = boto3.client('ssm', region_name='us-west-1')
secret = ssm.get_parameter(Name='/test/cli-parameter')
print(secret)
# running the file in the console
>> python test.py
...
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
I'm pretty sure this means it can't find the credentials that were created when I ran aws configure and it created the .aws/credentials file. I believe the reason for this is because I ran aws configure on my local machine, rather than running it while ssh'd into the instance. I did this to keep my AWS ID and secret key off of my EC2 instance, because I thought I'm supposed to keep that private and not put tokens/keys on my EC2 instance. I think I can solve the issue by running aws configure while ssh'd into my instance, but I want to understand what happens if there's a .aws/credentials file on my actual EC2 instance, and whether or not this is dangerous. I'm just not sure how this is all supposed to be structured, or what is a safe/correct way of running my script and accessing secret variables.
Any insight at all is helpful!

I suspect the answer you're looking for looks something like:
Create an IAM policy which allows access to the SSM parameter (why not use the SecretStore?)
Attach that IAM policy to a role.
Attach the role to your EC2 instance (instance profile)
boto3 will now automatically collect an AWS secret key, etc.. from the meta data service when it needs to talk to the parameter store.

How can i automatically delete AWS S3 files using python?

I want delete some files from S3 after certain time. i need to set a time limit for each object not for the bucket. is that possible?
I am using boto3 to upload the file into S3.
region = "us-east-2"
bucket = os.environ["S3_BUCKET_NAME"]
credentials = {
'aws_access_key_id': os.environ["AWS_ACCESS_KEY"],
'aws_secret_access_key': os.environ["AWS_ACCESS_SECRET_KEY"]
}
client = boto3.client('s3', **credentials)
transfer = S3Transfer(client)
transfer.upload_file(file_name, bucket, folder+file_name,
extra_args={'ACL': 'public-read'})
Above is the code i used to upload the object.

You have many options here. Some ideas:
You can automatically delete files are a given time period by using Amazon S3 Object Lifecycle Management. See: How Do I Create a Lifecycle Policy for an S3 Bucket?
If you requirements are more-detailed (eg different files after different time periods), you could add a Tag to each object specifying when you'd like the object deleted, or after how many days it should be deleted. Then, you could define an Amazon CloudWatch Events rule to trigger an AWS Lambda function at regular periods (eg once a day or once an hour). You could then code the Lambda function to look at the tags on objects, determine whether they should be deleted and delete the desired objects. You will find examples of this on the Internet, often called a Stopinator.
If you have an Amazon EC2 instance that is running all the time for other work, then you could simply create a cron job or Scheduled Task to run a similar program (without using AWS Lambda).

create aws sagemker endpoint with lambda function

I created an endpoint in aws sagemaker and it works well, I created a lambda function(python3.6) that takes files from S3, invoke the endpoint and then put the output in a file in S3.
I wonder if I can create the endpoint at every event(a file uploaded in an s3 bucket) and then delete the endpoint

Yes you can Using S3 event notification for object-created and call a lambda for creating endpoint for sagemaker.
This example shows how to make object-created event trigger lambda
https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
You can use python sdk to create endpoint for sagemaker
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint
But it might be slow for creating endpoint so you may be need to wait.

Is there a Lambda trigger that will push/copy files from S3 bucket to the SFTP server?

I am looking for a way to set up a Lambda trigger, where if any files are uploaded into the S3 bucket, the trigger will push/copy the file(s) to the SFTP server.

You have to use the following algorithm:
Log in to AWS console,
Lambda service,
Create a new lambda function (choose a proper runtime);
Then you have to set a trigger (a left pane). If you want to trigger your lambda by s3 bucket then click on s3.
Then you have to configure your trigger (choose your bucket and trigger action).
After you complete all of this steps it's a time to write a handler file.
Do not forget that every single trigger has to be in the same region as lambda.

Determining EC2 instance public IP from SNS in lambda

I have a lambda function which SSH ec2 instance and run some commands. This lambda function is triggered from SNS topic. SNS topic is integrated with a cloudwatch alarm. I am using python 2.7 in lambda function followed this thread https://aws.amazon.com/blogs/compute/scheduling-ssh-jobs-using-aws-lambda/. is it possible to get EC2 public IP address which actually triggered alarm?

It depends on the CloudWatch Alarm you are using to trigger the SNS publish.
My suggestion is to print out the entire event dictionary in your function and check if there is any mention about EC2 instance ID.
In case of CloudWatch EC2 alarm (eg. CPU usage) you'll find the instance ID in the metric Dimension.
# Python example
import json
message = json.loads(event['Records'][0]['Sns']['Message'])
instance_id = message['Trigger']['Dimensions'][0]
If you have the instance ID you can easily retrieve the instance public IP using boto3 as follows:
# Python example
import boto3
instance_id = 'xxxx' # This is the instance ID from the event
ec2 = boto3.client('ec2')
instances = ec2.describe_instances(InstanceIds=[instance_id])
public_ip = instances[0]['Reservations'][0]['Instances'][0]['PublicIpAddress']
Finally, as you are performing SSH from Lambda function to your EC2 instance keep in mind that Lambda Functions out of VPC get a dynamic public IP therefore it is impossible to restrict your EC2 instance security group for SSH. Leaving SSH opened to the entire world is not a good practice from a security perspective.
I suggest to run both EC2 and Lambda function in VPC restricting SSH access to your EC2 instances from Lambda vpc security group only. In that case you'll need to retrieve the private Ip address rather than the public one to be able to ssh your instance (the python logic is the same as above, the only difference is that you use 'PrivateIpAddress' instead of 'PublicIpAddress') . This is way more secure than using public internet.
I hope this helps.
G

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trigger Lambda function when a new object arrives in S3 bucket - python

Related

How to correctly/safely access parameters from AWS SSM Parameter store for my Python script on EC2 instance?

How can i automatically delete AWS S3 files using python?

create aws sagemker endpoint with lambda function

Is there a Lambda trigger that will push/copy files from S3 bucket to the SFTP server?

Determining EC2 instance public IP from SNS in lambda

Categories

Resources