I occasionally have really high-CPU intensive tasks. They are launched into a separate high-intensity queue, that is consumed by a really large machine (lots of CPUs, lots of RAM). However, this machine only has to run about one hour per day.
I would like automate deployment of this image on AWS, to be triggered by outstanding messages in the high-intensity queue, and then safely stopped once it is not busy. Something along the lines of:
Some agent (presumably my own software running on my monitor server) checks the queue size, determines there are x > x_threshold new jobs to be done (e.g. I want to trigger if there are 5 outstanding "big" jobs")
A specific AWS instance is started, registers itself with the broker (RabbitMQ) and consumes the jobs
Once the worker has been idle for some t > t_idle (say, longer than 10 minutes), the machine is shut down.
Are there any tools that can I use for this, to ease the automation process, or am I going to have to bootstrap everything myself?
You can public a custom metric to AWS CloudWatch, then set up an autoscale trigger and scaling policy based on your custom metrics. Autoscale can start the instance for you and will kill it based on your policy. You'll have to include the appropriate user data in the launch configuration to bootstrap your host. Just like userdata for any EC2 instance, it could be a bash script or ansible playbook or whatever your config management tool of choice is.
Maybe overkill for your scenario, but as a starting point you may want to check out AWS OpsWorks.
http://aws.amazon.com/opsworks/
http://aws.amazon.com/opsworks/faqs/
if that is indeed a bit higher level than you need, you could use aws cloudformation - perhaps a bit 'closer to the metal' for what you want.
http://aws.amazon.com/cloudformation/
Related
I have been deploying apps to Kubernetes for the last 2 years. And in my org, all our apps(especially stateless) are running in Kubernetes. I still have a fundamental question, just because very recently we found some issues with respect to our few python apps.
Initially when we deployed, our python apps(Written in Flask and Django), we ran it using python app.py. It's known that, because of GIL, python really doesn't have support for system threads, and it will only serve one request at a time, but in case the one request is CPU heavy, it will not be able to process further requests. This is causing sometimes the health API to not work. We have observed that, at this moment, if there is a single request which is not IO and doing some operation, we will hold the CPU and cannot process another request in parallel. And since it's only doing fewer operations, we have observed there is no increase in the CPU utilization also. This has an impact on how HorizontalPodAutoscaler works, its unable to scale the pods.
Because of this, we started using uWSGI in our pods. So basically uWSGI can run multiple pods under the hood and handle multiple requests in parallel, and automatically spin new processes on demand. But here comes another problem, that we have seen, uwsgi is lacking speed in auto-scaling the process tocorrected serve the request and its causing HTTP 503 errors, Because of this we are unable to serve our few APIs in 100% availability.
At the same time our all other apps, written in nodejs, java and golang, is giving 100% availability.
I am looking at what is the best way by which I can run a python app in 100%(99.99) availability in Kubernetes, with the following
Having health API and liveness API served by the app
An app running in Kubernetes
If possible without uwsgi(Single process per pod is the fundamental docker concept)
If with uwsgi, are there any specific config we can apply for k8s env
We use Twisted's WSGI server with 30 threads and it's been solid for our Django application. Keeps to a single process per pod model which more closely matches Kubernetes' expectations, as you mentioned. Yes, the GIL means only one of those 30 threads can be running Python code at time, but as with most webapps, most of those threads are blocked on I/O (usually waiting for a response from the database) the vast majority of the time. Then run multiple replicas on top of that both for redundancy and to give you true concurrency at whatever level you need (we usually use 4-8 depending on the site traffic, some big ones are up to 16).
I have exactly the same problem with a python deployment running the Flask application. Most api calls are handled in a matter of seconds, but there are some cpu intensive requests that acquire GIL for 2 minutes.... The pod keep accepting requests, ignores the configured timeouts, ignores a closed connection by the user; then after 1 minute of liveness probes failing, the pod is restarted by kubelet.
So 1 fat request can dramatically drop the availability.
I see two different solutions:
have a separate deployment that will host only long running api calls; configure ingress to route requests between these two deployments;
using multiprocessing handle liveness/readyness probes in a main process, every other request must be handled in the child process;
There are pros and cons for each solution, maybe I will need a combination of both. Also if I need a steady flow of prometheus metrics, I might need to create a proxy server on the application layer (1 more container on the same pod). Also need to configure ingress to have a single upstream connection to python pods, so that long running request will be queued, whereas short ones will be processed concurrently (yep, python, concurrency, good joke). Not sure tho it will scale well with HPA.
So yeah, running production ready python rest api server on kubernetes is not a piece of cake. Go and java have a much better ecosystem for microservice applications.
PS
here is a good article that shows that there is no need to run your app in kubernetes with WSGI
https://techblog.appnexus.com/beyond-hello-world-modern-asynchronous-python-in-kubernetes-f2c4ecd4a38d
PPS
Im considering to use prometheus exporter for flask. Looks better than running a python client in a separate thread;
https://github.com/rycus86/prometheus_flask_exporter
I have a Python program that I am running as a Job on a Kubernetes cluster every 2 hours. I also have a webserver that starts the job whenever user clicks a button on a page.
I need to ensure that at most only one instance of the Job is running on the cluster at any given time.
Given that I am using Kubernetes to run the job and connecting to Postgresql from within the job, the solution should somehow leverage these two. I though a bit about it and came with the following ideas:
Find a setting in Kubernetes that would set this limit, attempts to start second instance would then fail. I was unable to find this setting.
Create a shared lock, or mutex. Disadvantage is that if job crashes, I may not unlock before quitting.
Kubernetes is running etcd, maybe I can use that
Create a 'lock' table in Postgresql, when new instance connects, it checks if it is the only one running. Use transactions somehow so that one wins and proceeds, while others quit. I have not yet thought this out, but is should work.
Query kubernetes API for a label I use on the job, see if there are some instances. This may not be atomic, so more than one instance may slip through.
What are the usual solutions to this problem given the platform choice I made? What should I do, so that I don't reinvent the wheel and have something reliable?
A completely different approach would be to run a (web) server that executes the job functionality. At a high level, the idea is that the webserver can contact this new job server to execute functionality. In addition, this new job server will have an internal cron to trigger the same functionality every 2 hours.
There could be 2 approaches to implementing this:
You can put the checking mechanism inside the jobserver code to ensure that even if 2 API calls happen simultaneously to the job server, only one executes, while the other waits. You could use the language platform's locking features to achieve this, or use a message queue.
You can put the checking mechanism outside the jobserver code (in the database) to ensure that only one API call executes. Similar to what you suggested. If you use a postgres transaction, you don't have to worry about your job crashing and the value of the lock remaining set.
The pros/cons of both approaches are straightforward. The major difference in my mind between 1 & 2, is that if you update the job server code, then you might have a situation where 2 job servers might be running at the same time. This would destroy the isolation property you want. Hence, database might work better, or be more idiomatic in the k8s sense (all servers are stateless so all the k8s goodies work; put any shared state in a database that can handle concurrency).
Addressing your ideas, here are my thoughts:
Find a setting in k8s that will limit this: k8s will not start things with the same name (in the metadata of the spec). But anything else goes for a job, and k8s will start another job.
a) etcd3 supports distributed locking primitives. However, I've never used this and I don't really know what to watch out for.
b) postgres lock value should work. Even in case of a job crash, you don't have to worry about the value of the lock remaining set.
Querying k8s API server for things that should be atomic is not a good idea like you said. I've used a system that reacts to k8s events (like an annotation change on an object spec), but I've had bugs where my 'operator' suddenly stops getting k8s events and needs to be restarted, or again, if I want to push an update to the event-handler server, then there might be 2 event handlers that exist at the same time.
I would recommend sticking with what you are best familiar with. In my case that would be implementing a job-server like k8s deployment that runs as a server and listens to events/API calls.
I have a Celery Task-Manager to crunch some numbers for company analytics.
The Task-Manager and workers are hosted on an Amazon EC2 Linux Server.
I need to set up the system such if we send too many tasks to celery Amazon automatically sets up a new EC2 instance to run more workers and balances the load across these workers.
The services that I'm aware exist are the Amazon Autoscale and Amazon Load balancing services which seem like exactly what I want to use however, I'm not sure what the best way to configure the Celery is.
I think that I ought to have a celery "master" which is collecting all the tasks and a number of celery workers which execute them. As the number of tasks increases I want to add more workers. The way the autoscale works (by taking an AMI of the celery server) I think that I'm currently cloning the Master as well as the workers which seems like not what I want to do.
How do I organise this to achieve my end goal which is flexible autoscaling task management using Celery to manage the tasks and Amazon Web Service to host the computing.
As much detail as possible in any answers (or links to tutorials!) would be greatly appreciated as most tutorials or advice seems to assume large quantities of knowledge which I don't currently have!
You do not need a master-worker architecture to get this to work. If I understand your question correctly, you want to be able to scale based on queue size. I would say it will be easier if you have the following steps
Setup elasticache/sqs for the broker (since you're in aws)
For custom scaling - A periodic task which checks queue sizes using something like this OR add amazon autoscaling to just add/remove machines when CPU usage is high (assuming that that is a good enough indication of load). Also, start workers with --autoscale so that the CPU usage gets reflected correctly.
(ubuntu 12.04). I envision some sort of Queue to put thousands of tasks, and have the ec2 isntances plow through it in parallel (100 ec2 instances) where each instance handles one task from the queue.
Also, each ec2 instance to use the image I provide, which will have the binaries and software installed on it for use.
Essentially what I am trying to do is, run 100 processing (a python function using packages that depend on binaries installed on that image) in parallel on Amazon's EC2 for an hour or less, shut them all off, and repeat this process whenever it is needed.
Is this doable? I am using Python Boto to do this.
This is doable. You should look into using SQS. Jobs are placed on a queue and the worker instances pop jobs off the queue and perform the appropriate work. As a job is completed, the worker deletes the job from the queue so no job is run more than once.
You can configure your instances using user-data at boot time or you can bake AMIs with all of your software pre-installed. I recommend Packer for baking AMIs as it works really well and is very scriptable so your AMIs can be rebuilt consistently as things need to be changed.
For turning on and off lots of instances, look into using AutoScaling. Simply set the group's desired capacity to the number of worker instances you want running and it will take care of the rest.
This sounds like it might be easier to with EMR.
You mentioned in comments you are doing computer vision. You can make your job hadoop friendly by preparing a file where each line a base64 encoding of the image file.
You can prepare a simple bootstrap script to make sure each node of the cluster has your software installed. Hadoop streaming will allow you to use your image processing code as is for the job (instead of rewriting in java).
When your job is over, the cluster instances will be shut down. You can also specify your output be streamed directly to an S3 bucket, its all baked in. EMR is also cheap, 100 m1.medium EC2 instances running for an hour will only cost you around 2 dollars according to the most recent pricing: http://aws.amazon.com/elasticmapreduce/pricing/
I want to write a long running process (linux daemon) that serves two purposes:
responds to REST web requests
executes jobs which can be scheduled
I originally had it working as a simple program that would run through runs and do the updates which I then cron’d, but now I have the added REST requirement, and would also like to change the frequency of some jobs, but not others (let’s say all jobs have different frequencies).
I have 0 experience writing long running processes, especially ones that do things on their own, rather than responding to requests.
My basic plan is to run the REST part in a separate thread/process, and figured I’d run the jobs part separately.
I’m wondering if there exists any patterns, specifically python, (I’ve looked and haven’t really found any examples of what I want to do) or if anyone has any suggestions on where to begin with transitioning my project to meet these new requirements.
I’ve seen a few projects that touch on scheduling, but I’m really looking for real world user experience / suggestions here. What works / doesn’t work for you?
If the REST server and the scheduled jobs have nothing in common, do two separate implementations, the REST server and the jobs stuff, and run them as separate processes.
As mentioned previously, look into existing schedulers for the jobs stuff. I don't know if Twisted would be an alternative, but you might want to check this platform.
If, OTOH, the REST interface invokes the same functionality as the scheduled jobs do, you should try to look at them as two interfaces to the same functionality, e.g. like this:
Write the actual jobs as programs the REST server can fork and run.
Have a separate scheduler that handles the timing of the jobs.
If a job is due to run, let the scheduler issue a corresponding REST request to the local server.
This way the scheduler only handles job descriptions, but has no own knowledge how they are implemented.
It's a common trait for long-running, high-availability processes to have an additional "supervisor" process that just checks the necessary demons are up and running, and restarts them as necessary.
One option is to simply choose a lightweight WSGI server from this list:
http://wsgi.org/wsgi/Servers
and let it do the work of a long-running process that serves requests. (I would recommend Spawning.) Your code can concentrate on the REST API and handling requests through the well defined WSGI interface, and scheduling jobs.
There are at least a couple of scheduling libraries you could use, but I don't know much about them:
http://sourceforge.net/projects/pycron/
http://code.google.com/p/scheduler-py/
Here's what we did.
Wrote a simple, pure-wsgi web application to respond to REST requests.
Start jobs
Report status of jobs
Extended the built-in wsgiref server to use the select module to check for incoming requests.
Activity on the socket is ordinary REST request, we let the wsgiref handle this.
It will -- eventually -- call our WSGI applications to respond to status and
submit requests.
Timeout means that we have to do two things:
Check all children that are running to see if they're done. Update their status, etc.
Check a crontab-like schedule to see if there's any scheduled work to do. This is a SQLite database that this server maintains.
I usually use cron for scheduling. As for REST you can use one of the many, many web frameworks out there. But just running SimpleHTTPServer should be enough.
You can schedule the REST service startup with cron #reboot
#reboot (cd /path/to/my/app && nohup python myserver.py&)
The usual design pattern for a scheduler would be:
Maintain a list of scheduled jobs, sorted by next-run-time (as Date-Time value);
When woken up, compare the first job in the list with the current time. If it's due or overdue, remove it from the list and run it. Continue working your way through the list this way until the first job is not due yet, then go to sleep for (next_job_due_date - current_time);
When a job finishes running, re-schedule it if appropriate;
After adding a job to the schedule, wake up the scheduler process.
Tweak as appropriate for your situation (eg. sometimes you might want to re-schedule jobs to run again at the point that they start running rather than finish).