How to host/build a CLI like python script on AWS? - python

This is a very inexperienced question. I've only ever deployed/hosted web applications so I don't understand what the system would have to look like if I want a CLI-like program hosted for anyone else to run.
It is a python script and I want it on AWS (probably will use a docker container, ECS, and terraform. So I suppose this is mainly a question about how to build the image).
The script takes flags/commands, runs while terminal printing for a few minutes, and then stops once finished. How do I host/build this so that anyone can access it through their shell/terminal? Is some sort of server akin to a http server required? There is no front end for it. And ideally many people can run this at once at any time.
EDIT: correction, there is no web GUI frontend... I add this to clear-up my loose use of these terms. Is this in principle an API?

I would encourage you to look into AWS Lambda for something like this. AWS Lambda allows you to run code (including Python) without having to think about the servers are configured. One of the main benefits is you only pay for the time the software is actually executing, rather than paying for a Virtual Machine running idle.
As you mention, if you want to move towards an 'API' design where other users can call your service, you can use Amazon API Gateway in front of the Lambda code to handle this. There are really good examples of these design patterns here: https://serverlessland.com/patterns
Something like this one: https://serverlessland.com/patterns/apigw-lambda-cdk sounds like what you are looking for, but as mentioned by SecurityObscurity, you will need to think about authentication and authorisation. This could be handled by AWS Cognito or IAM authorisation depending on your use case.

Related

Run & scale simple python scripts on Google Cloud Platform

I have a simple python script that I would like to run thousands of it's instances on GCP (at the same time). This script is triggered by the $Universe scheduler, something like "python main.py --date '2022_01'".
What architecture and technology I have to use to achieve this.
PS: I cannot drop $Universe but I'm not against suggestions to use another technologies.
My solution:
I already have a $Universe server running all the time.
Create Pub/Sub topic
Create permanent Compute Engine that listen to Pub/Sub all the time
$Universe send thousand of events to Pub/Sub
Compute engine trigger the creation of a Python Docker Image on another Compute Engine
Scale the creation of the Docker images (I don't know how to do it)
Is it a good architecture?
How to scale this kind of process?
Thank you :)
It might be very difficult to discuss architecture and design questions, as they usually are heavy dependent on the context, scope, functional and non functional requirements, cost, available skills and knowledge and so on...
Personally I would prefer to stay with entirely server-less approach if possible.
For example, use a Cloud Scheduler (server less cron jobs), which sends messages to a Pub/Sub topic, on the other side of which there is a Cloud Function (or something else), which is triggered by the message.
Should it be a Cloud Function, or something else, what and how should it do - depends on you case.
As I understand, you will have a lot of simultaneous call on a custom python code trigger by an orchestrator ($Universe) and you want it on GCP platform.
Like #al-dann, I would go to serverless approach in order to reduce the cost.
As I also understand, pub sub seems to be not necessary, you will could easily trigger the function from any HTTP call and will avoid Pub Sub.
PubSub is necessary only to have some guarantee (at least once processing), but you can have the same behaviour if the $Universe validate the http request for every call (look at http response code & body and retry if not match the expected result).
If you want to have exactly once processing, you will need more tooling, you are close to event streaming (that could be a good use case as I also understand). In that case in a full GCP, I will go to pub / sub & Dataflow that can guarantee exactly once, or Kafka & Kafka Streams or Flink.
If at least once processing is fine for you, I will go http version that will be simple to maintain I think. You will have 3 serverless options for that case :
App engine standard: scale to 0, pay for the cpu usage, can be more affordable than below function if the request is constrain to short period (few hours per day since the same hardware will process many request)
Cloud Function: you will pay per request(+ cpu, memory, network, ...) and don't have to think anything else than code but the code executed is constrain on a proprietary solution.
Cloud run: my prefered one since it's the same pricing than cloud function but you gain the portability, the application is a simple docker image that you can move easily (to kubernetes, compute engine, ...) and change the execution engine depending on cost (if the load change between the study and real world).

Azure infrastructure for a Python script triggered by a http request

I'm a bit lost in the jungle of documentation, offers, and services. I'm wondering how the infrastructure should look like, and it would be very helpful to get a nudge in the right direction.
We have a python script with pytorch that runs a prediction. The script has to be triggered from a http request. Preferably, the samples to do a prediction on also has to come from the same requester. It has to return the prediction as fast as possible.
What is the best / easiest / fastest way of doing this?
We have the script laying in a Container Registry for now. Can we use it? Azure Kubernetes Service? Azure Container Instances (is this fast enough)?
And about the trigger, should we use Azure function, or logic app?
Thank you!
Azure Functions V2 has just launched a private preview for writing Functions using Python. You can find some instructions for how to play around with it here. This would probably be one of the most simple ways to execute this script with an HTTP request. Note that since it is in private preview, I would hesitate to recommend using it in a production scenario.
Another caveat to note with Azure Functions is that there will be a cold start whenever we create a new instance of your function application. This should be in the order of magnitude of ~2-4 seconds, and should only happen on the first request after the application has not seen much traffic for a while, or if a new instance has been created to scale up your application to receive more traffic. You can avoid this cold start by making your function on a dedicated App Service Plan, but at that point you are losing a lot of the benefits of Azure Functions.

Google App Engine - run task on publish

I have been looking for a solution for my app that does not seem to be directly discussed anywhere. My goal is to publish an app and have it reach out, automatically, to a server I am working with. This just needs to be a simple Post. I have everything working fine, and am currently solving this problem with a cron job, but it is not quite sufficient - I would like the job to execute automatically once the app has been published, not after a minute (or whichever the specified time it may be set to).
In concept I am trying to have my app register itself with my server and to do this I'd like for it to run once on publish and never be ran again.
Is there a solution to this problem? I have looked at Task Queues and am unsure if it is what I am looking for.
Any help will be greatly appreciated.
Thank you.
Personally, this makes more sense to me as a responsibility of your deploy process, rather than of the app itself. If you have your own deploy script, add the post request there (after a successful deploy). If you use google's command line tools, you could wrap that in a script. If you use a 3rd party tool for something like continuous integration, they probably have deploy hooks you could use for this purpose.
The main question will be how to ensure it only runs once for a particular version.
Here is an outline on how you might approach it.
You create a HasRun module, which you use store each the version of the deployed app and this indicates if the one time code has been run.
Then make sure you increment your version, when ever you deploy your new code.
In you warmup handler or appengine_config.py grab the version deployed,
then in a transaction try and fetch the new HasRun entity by Key (version number).
If you get the Entity then don't run the one time code.
If you can not find it then create it and run the one time code, either in a task (make sure the process is idempotent, as tasks can be retried) or in the warmup/front facing request.
Now you will probably want to wrap all of that in memcache CAS operation to provide a lock or some sort. To prevent some other instance trying to do the same thing.
Alternately if you want to use the task queue, consider naming the task the version number, you can only submit a task with a particular name once.
It still needs to be idempotent (again could be scheduled to retry) but there will only ever be one task scheduled for that version - at least for a few weeks.
Or a combination/variation of all of the above.

Amazon EC2 Windows Ubuntu

I am new to AWS EC2 so that I make this post for some questions.
1) Right now, I am considering running some script on the server. I use two tools usually. One is a software can only be used in Windows. The other is just python. Should I open two instances, one for windows, one for ubuntu? Or just one instance of Windows with Git Bash installed? I want to be cost and performance efficiently.
2) I am not going to use the script very often (usually 2-3 hours per day or 10-12 hours per week). Therefore, is it easy to schedule those jobs automatically across the instances? I mean it can automatically turn off and restart given appropriate time.
3) Some of the script involves web scraping. I am also wondering if it is ok to switch IP address every time I run the script. Mainly, it is for python script.
Thanks.
1) Well, off course, the less instances you have, the less you will pay. Python can run on Windows, I just don't know how tricky it would be to make it work in your case. It all depends on what you are running and what are your management requirements. Those script languages were originally designed for Unix environments, so people usually runs it on those kind of systems, so running it in Windows may be a little unpleasant. Anyway, I don't think you should ask someone else it, you should figure it out yourself what suits you best.
2) AWS doesn't have a scheduler for EC2 (stop, starting, etc, given date/times/recurrence). It's something that I miss on it too. So, to achieve something like this you have some options.
Turning your temporary instance into an auto-scaling group of 1 instance, and scheduling policies to scale it in to zero instances and scale it out to 1 instance again when you want. The problem with this approach is: if you can't be sure how long it will take for your job to be completed, then you have a problem, off course, because those scheduled actions are based in fixed date/times. One solution for this would be the temporary instance itself changing the autoscaling group configuration to zero instances via API when it has finished. (In this case, you would just have a scale out scheduled policy, to launch the instance, leaving the termination of it to be done 'manually', via auto-scaling group configuration handling from inside the temporary instance). But be aware that auto-scaling is very tricky for begginers, and you should go throught the documentation before using it. (For exemple, each time you scale in and out you instances, they're terminated, not just stopped, and you lose every data on it.)
Not using auto-scaling group, having a regular instance, and scheduling all those actions from outside it via API. It could be from your Windows (master) instance. In this case, the master would start the temporary instance via API, which would run its things and then turn itself off when it had finished. Otherwise, the master instance would have to keep polling the temporary one somehow to know when the jobs are done and it can be shutdown from outside.
There are probably more complicated ways for doing this (Elastic Beanstalk crons, maybe).
I think, in this case, the more simple, the better. So, I would stick to the option 2). You will only need to figure how to install and use AWS CLI on Windows and manage IAM credentials and permissions to provide your CLI access enough for it to do what it needs.
3) If you don't assign an Elastic IP to your instance, you will get a different IP each time you stop and start it, so this is, by default, what you want. In auto-scaling, this is the only way, you can't even assign a fixed IP to instances.
I hope I could help you a little bit.

Examples for writing a daemon or a service in Linux

I have been looking at daemons for Linux such as httpd and have also looked at some code that can be used as a skeleton. I have done a fair amount of research and now I want to practice writing it. However, I'm not sure of what can I use a daemon for. Any good examples/ideas that I can try to execute?
I was thinking of using a daemon along with libnotify on Ubuntu to have pop-up notifications of select tweets.
Is this a bad example for implementing a daemon?
Will you even need a daemon for this?
Can this be implemented as a service rather than a daemon?
First: PEP 3143 tries to enumerate all of the fiddly details you have to get right to write a daemon in Python. And it specifies a library that takes care of those details for you.
The PEP was deferred—at least in part because the community felt it was more a responsibility of POSIX or some Linux standards group or something to first define exactly what is essential to being a daemon, before Python could have its own position on how to implement one. But it's still a great guide. However, the reference implementation of that proposed library still lives on, as python-daemon, which you can install from PyPI.
Meanwhile, the really interesting question for this project isn't so much service vs. daemon, as root vs. user. Do you want a single process that keeps track of all users' twitter accounts, and sends notifications to anyone who's logged in? Just a per-user process? Or maybe both, a single process watching all the tweets, then sending notifications via user processes?
Of course you don't really need a daemon or service for this. For example, it could be a GUI app whose main window is a configuration dialog, which keeps running (maybe with a traybar thingy) even when you close the config dialog, and it would work just as well. The question isn't whether you need a daemon, but whether it's more appropriate. Which really is a design choice.

Categories

Resources