I need to run some python code on aws platform periodically(probably once a day). Program job is to connect to S3, download some files from bucket, do some calculations, upload results back to S3. This program runs for about 1 hour so I cannot make use of Lambda function as it has a maximum execution time of 900s(15mins).
I am considering to use EC2 for this task. I am planning to setup python code into a startup and execute it as soon as the EC2 instance is powered on. It also shuts down the instance once the task is complete. The periodic restart of this EC2 will be handled by lambda function.
Though this a not a best approach, I want to know any alternatives within aws platform(services other than EC2) that can be best of this job.
Since
If you are looking for other solutions other than lambda and EC2 (which depending on the scenario it fits) you could use ECS (Fargate).
It's a great choice for microservices or small tasks. You build a Docker image with your code (Python, node, etc...), tag it and then you push the image to AWS ECR. Then you build a cluster for that and use the cloudwatch to schedule the task with Cloudwatch or you can call a task directly either using the CLI or another AWS resource.
You don't have time limitations like lambda
You don’t also have to setup the instance, because your dependencies are managed by Dockerfile
And, if needed, you can take advantage of the EBS volume attached to ECS (20-30GB root) and increase from that, with the possibility of working with EFS for tasks as well.
I could point to other solutions, but they are way too complex for the task that you are planning and the goal is always to use the right service for the job
Hopefully this could help!
Using EC2 or Fargate may be significant overkill. Creating a simple AWS Glue job triggered by a Lambda function (running once per day) to do this (pull from S3, open selected files (if required), do some calculations on the files contents, then push results back to S3) using Python and the AWS boto3 library (and other standard Python file-reading libs if necessary) is most likely your easiest route.
See this SO question for an example and solution.
Good luck!
Related
I've created a python script that grabs information from an API and sends it in an email. I'd like to automate this process so to run on daily basis let's say at 9AM.
The servers must be asleep when they will not be running this automation.
What is the easiest way to achieve this?
Note: Free version of AWS.
cloud9 is the ide that lets you write, run, and debug your code with just a browser.
"It preconfigures the development environment with all the SDKs, libraries, and plug-ins needed for serverless development. Cloud9 also provides an environment for locally testing and debugging AWS Lambda functions. This allows you to iterate on your code directly, saving you time and improving the quality of your code."
okay for the requirement you have posted :-
there are 2 ways of achieving this
on a local system use cron job scheduler daemon to run the script. a tutorial for cron.tutorial for cron
same thing can also be achieved by using a lambda function. lambda only runs when it is triggered, using compute resources for that particular time when it is invoked so your servers are sleeping for the rest of time( technically you are not provisioning any server for lambda)
convert your script in a function for lambda. and then use event bridge service where you can specify a corn expression to run your script everyday at 9am. wrote an article on the same may it can help.
note :- for email service you can use ses https://aws.amazon.com/ses/. my article uses ses.
To schedule Events you'd need a Lambda function with Cloudwatch events such as follow. https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
Cloud9 is an IDE.
I have a python etl project that I was to run on a schedule. I am a bit lost on where to start. I have seen some tutorials using Heroku and AWS Lambda, but these were all single script files. My main script references multiple packages that are all in the same project directory. Am I able to deploy the entire project and have it run the main script on a schedule? If so, what tools/services should I be looking at?
See Lambda Scheduled Events. You can create a Lambda function and direct AWS Lambda to execute it on a regular schedule. You can specify a fixed rate (for example, execute a Lambda function every hour or 15 minutes), or you can specify a Cron expression.
Be aware of the Lambda package size limits.
I have some python scripts that require a good machine. Instead of starting instances manually on GCP or AWS, then making sure all python libraries are installed, can I do it through python for example so that the instance is on only for the time needed to run the script?
If you're in AWS you could just create Lambda functions for your scripts and set those on a timer via Lambda or use Cloudwatch to trigger them.
In both AWS and Google Cloud, you can do just about anything via a programming language including Python.
Last year, AWS announced EC2 Pause and Resume. This feature allows you to setup and configure an EC2 instance and when you are finished with your data processing, put the instance to sleep. You then just pay for storage and IP address costs.
New – Hibernate Your EC2 Instances
Google has also announced alpha features for pausing Compute Engine instances, but this feature is not generally available today - you must apply to use this feature.
Another option supported by both AWS and Google today is instance templates. This allows you to create a template with all the options that you want, such as installing packages on startup. You can then launch a new custom instance from the console, CLI or with your favorite programming language. When your task is complete you can then stop or terminate the instance.
Of course there is also the standard method. Launch an instance, configure as required and then stop the instance. When you need processing power, start the instance, data process and then stop again. The difference between this method and pausing an instance is the total time to start an instance is faster with resume. Sort of like your laptop. Close the lid and the laptop goes to sleep. Open the lid and you have almost instant-on.
If you are fortunate enough to have a running Kubernetes cluster, you can do everything with a container and launch the container via CLI. The container will automatically stop once the container finishes its task.
You can have a script that invokes AWS CLI to start up an instance, connect to it and run the script via SSH and then terminate the instance. See AWS CLI documentation here - https://docs.aws.amazon.com/cli/latest/userguide/cli-ec2-launch.html.
An alternative to #bspeagle's suggestion of AWS Lambda is GCP's Cloud Functions.
I have a python script that runs and collects data and stores the data in a Dynamodb table.
I would like to figure out a way to get this script to run daily via amazon prime or a dedicated windows server.
I am new to AWS and this type of thing. I am open to all solutions available for something like this.
Any ideas?
You can use AWS Lambda to run your Script. To make your script run periodically you can use CloudWatch Event rule.
Find how to Schedule AWS Lambda Functions Using CloudWatch Events here
Find Schedule Expressions Using Rate or Cron here
Running scripts at a set time is easily done in Linux with cronjobs.
This answer explains it
How to cron job setup in Amazon ec2
In short it is no different from runnings crons on other linux instances
I have 3 python scripts perfectly working individually on my EC2 machine, now i want all 3 of them to run one after the other automatically using AWS Step Functions. How can i possibly do that? I have done my part on the research going through almost all the official AWS Documentations, but couldn't find a thing that could help me out.
The scripts must be adapted to work with AWS Step functions, as describe in FAQ:
Q: How does AWS Step Functions work with Amazon EC2 and other compute
resources?
All work in your state machine is done by tasks. A task may be an
Activity, which can consist of any code in any language. Activities
can be hosted on Amazon EC2, Amazon ECS, mobile devices—basically any
computer that can communicate with the AWS Step Functions API.
Activities long-poll Step Functions using API calls to request work,
receive input data, do the work, and return a result.
An example in Ruby is on this page: Example Activity Worker in Ruby:
For Python you can try to use boto3