Pyramid (waitress) behind IIS randomly stops working when using APScheduler - python

I'm running a Pyramid application on Windows using Waitress as the app server and IIS as the Web Server (proxy). When I run the application, it works for a (seemingly) random amount of time before it just stops. It can go for days, even weeks at a time and then just stop, leaving IIS to throw a 502 error. When it stops, there's no way to restart it short of restarting Windows.
It's a small application which uses APScheduler to hit a couple of APIs to sync inventory between eBay/Amazon. I'm not entirely sure what is causing this problem, as there is no error shown in the logs. I had an older version of the application running (without APScheduler) and I didn't have this issue, so I'm assuming it has to do with APScheduler.
Has anybody else experienced this?

First let my say that I have not used APScheduler myself yet, and have next to no knowledge about running Python servers (or anything else really) on Windows and IIS.
So I can only guess here, but it seems clear that your problem is related to APScheduler in a way. I could imagine that something goes wrong in a thread that APScheduler uses for your background tasks, and the hanging thread brings down your whole application, due to the GIL (global interpreter lock in Python). This could for example happen when your thread(s) run into some kind of race condition.
Maybe it happens that processing starts before the previous iteration has finished processing. Or you get a really big backlog, and that leads to problems when processing starts.
Anyway, I think task queues are much better suited for background processing in web applications, because they run separately and out of the context of your web server.
You can schedule a task as soon as it's triggered by some user action, and it will be processed as soon as a worker is available, and not be deferred until a certain point in time.
I would recommend to give Celery a try, but there are also other solutions available, many based on Redis.
Celery is really powerful, and has advanced features like periodic tasks and crontab style schedules - so you can probably use it to do what you are doing using APScheduler now.
This looked very helpful for setting up Celery under Windows: http://mrtn.me/blog/2012/07/04/django-on-windows-run-celery-as-a-windows-service/
This may also prove helpful:
How to create Celery Windows Service?
Note: I haven't tried any of these myself, since I use Linux if I have a choice.
It's probably also possible to get APScheduler to work correctly, but I think it will be much easier to use Celery, because if a problem occurs in a worker you will be able to debug it much easier than a problem occuring in a background thread. Celery can also be configured to automatically send you email in case of an error.

Related

Advice on running flask app ONLY locally forever

I want to create web form that stays on forever on a single computer. Users can come to the computer fill out the form and submit it. After submitting, it will record the responses in an excel file and send emails. The next user can then come and fill out a new form automatically. I was planning on using Flask for this task since it is simple to create, but since I am not doing this on some production server, I will just have it running locally in development on the single computer.
I have never seen anyone do something like this with Flask so I was wondering if my idea is possible or if I should avoid it. I am also new to web development so I was wondering what problems there could be with keeping a flask application stay on 24/7 on a local development computer.
Thanks
There is nothing wrong with doing this in principle however, it is likely not the best solution for the time-to-reward payoff.
First, to answer your question, this could easily be done, even for a beginner, completing this in a few hours with minimal Python and HTML experience could definitely be done. Your app could crash in the background for many reasons (running out of space, bad memory addresses, etc) but most likely you will be fine.
As for specifically building it, it is all possible, there are libraries you can use to add the results to an excel file, or you can easily just append to a CSV (which is what I would recommend). Creating and sending an email, similarly is relatively simple, but again, doing it without python would be much easier.
If you are not set on flask/python, you could check out Google Forms but if you are set on python, or want to use it as a learning experience, it can definitely be done.
Your idea is possible and while there are many ways to do this kind of thing, what you are suggesting is not necessarily to be avoided.
All apps that run on a computer over a long period of time start a process and keep it going until closed. That is essentially what you are doing.
Having done this myself (and still currently doing it) at my business, I can say that it works great.
The only caveat is that to ensure that it will always be available, you need to have the process monitored by some tool to make sure that it gets restarted if it ever closes due to a variety of reasons.
In linux, supervisor is a great tool for doing that. In windows you could register it as a service. But you could also just create an easy way to restart and make it easy for the user to do so if it is down when they need it.
Yes, this could be done. It's very similar to the applications that run on the servers in data centers.
To keep the application running forever or restarting it after your system starts you'll need to use a system manager similar to systemd in Unix. You could use NSSM - the Non-Sucking Service Manager
or Service Control to monitor your application and restart it if it crashes. This will also have to be enabled on startup.
Other than this, you could use Waitres to serve your Flask application. Waitress is a WSGI web server with which you can easily configure the number of threads and workers to enable serving multiple users at the same time.
In a production environment, it's always suggested to use a web server interface like Gunicorn or Waitress.

what is a robust way to execute long-running tasks/batches under Django?

I have a Django app that is intended to be run on Virtualbox VMs on LANs. The basic user will be a savvy IT end-user, not a sysadmin.
Part of that app's job is to connect to external databases on the LAN, run some python batches against those databases and save the results in its local db. The user can then explore the systems using Django pages.
Run time for the batches isn't all that long, but runs to minutes, tens of minutes potentially, not seconds. Run frequency is infrequent at best, I think you could spend days without needing a refresh.
This is not celery's normal use case of long tasks which will eventually push the results back into the web UI via ajax and/or polling. It is more similar to a dev's occasional use of the django-admin commands, but this time intended for an end user.
The user should be able to initiate a run of one or several of those batches when they want in order to refresh the calculations of a given external database (the target db is a parameter to the batch).
Until the batches are done for a given db, the app really isn't useable. You can access its pages, but many functions won't be available.
It is very important, from a support point of view that the batches remain easily runnable at all times. Dropping down to the VMs SSH would probably require frequent handholding which wouldn't be good - it is best that you could launch them from the Django webpages.
What I currently have:
Each batch is in its own script.
I can run it on the command line (via if __name__ == "main":).
The batches are also hooked up as celery tasks and work fine that way.
Given the way I have written them, it would be relatively easy for me to allow running them from subprocess calls in Python. I haven't really looked into it, but I suppose I could make them into django-admin commands as well.
The batches already have their own rudimentary status checks. For example, they can look at the calculated data and tell whether they have been run and display that in Django pages without needing to look at celery task status backends.
The batches themselves are relatively robust and I can make them more so. This is about their launch mechanism.
What's not so great.
In Mac dev environment I find the celery/celerycam/rabbitmq stack to be somewhat unstable. It seems as if sometime rabbitmqs daemon balloons up in CPU/RAM use and then needs to be terminated. That mightily confuses the celery processes and I find I have to kill -9 various tasks and relaunch them manually. Sometimes celery still works but celerycam doesn't so no task updates. Some of these issues may be OSX specific or may be due to the DEBUG flag being switched for now, which celery warns about.
So then I need to run the batches on the command line, which is what I was trying to avoid, until the whole celery stack has been reset.
This might be acceptable on a normal website, with an admin watching over it. But I can't have that happen on a remote VM to which only the user has access.
Given that these are somewhat fire-and-forget batches, I am wondering if celery isn't overkill at this point.
Some options I have thought about:
writing a cleanup shell/Python script to restart rabbitmq/celery/celerycam and generally make it more robust. i.e. whatever is required to make celery & all more stable. I've already used psutil to figure out rabbit/celery process are running and display their status in Django.
Running the batches via subprocess instead and avoiding celery. What about django-admin commands here? Does that make a difference? Still needs to be run from the web pages.
an alternative task/process manager to celery with less capability but also less moving parts?
not using subprocess but relying on Python multiprocessing module? To be honest, I have no idea how that compares to launches via subprocess.
environment:
nginx, wsgi, ubuntu on virtualbox, chef to build VMs.
I'm not sure how your celery configuration makes it unstable but sounds like it's still the best fit for your problem. I'm using redis as the queue system and it works better than rabbitmq from my own experience. Maybe you can try it see if it improves things.
Otherwise, just use cron as a driver to run periodic tasks. You can just let it run your script periodically and update the database, your UI component will poll the database with no conflict.

Long running tasks in Pyramid web app

I need to run some tasks in background of web app (checking the code out, etc) without blocking the views.
The twist in typical Queue/Celery scenario is that I have to ensure that the tasks will complete, surviving even web app crash or restart until those tasks complete, whatever their final result.
I was thinking about recording parameters for multiprocessing.Pool in a database and starting all the incomplete tasks at webapp restart. It's doable, but I'm wondering if there's a simpler or more cost-effective aproach?
UPDATE: Why not Celery itself? Well, I used Celery in some projects and it's really a great solution, but for this task it's on the big side: it requires a separate server, communication, etc., while all I need is spawning a few processes/threads, doing some work in them (git clone ..., svn co ...) and checking whether they succeeded or failed. Another issue is that I need the solution to be as small as possible since I have to make it follow elaborate corporate guidelines, procedures, etc., and the human administrative and bureaucratic overhead I'd have to go through to get Celery onboard is something I'd prefer to avoid if I can.
I would suggest you to use Celery.
Celery does not require its own server, you can have a worker running on the same machine. You can also have a "poor man's queue" using an SQL database instead of a "real" queue/messaging server such as RabbitMQ - this setup would look very much like what you're describing, only with a separate process doing the long-running tasks.
The problem with starting long-running tasks from the webserver process is that in the production environment the web "workers" are normally managed by the webserver - multiple workers can be spawned or killed at any time. The viability of your approach would highly depend on the web server you're using and its configuration. Also, with multiple workers each trying to do a task you may have some concurrency issues.
Apart from Celery, another option is to look at UWSGI's spooler subsystem, especially if you're already using UWSGI.

Background Worker with Flask

I have a webapp that's built on python/Flask and it has a corresponding background job that runs continuously, periodically polling for data for each registered user.
I would like this background job to start when the system starts and keep running til it shuts down. Instead of setting up /etc/rc.d scripts, I just had the flask app spawn a new process (using the multiprocessing module) when the app starts up.
So with this setup, I only have to deploy the Flask app and that will get the background worker running as well.
What are the downsides of this? Is this a complete and utter hack that is fragile in some way or a nice way to set up a webapp with corresponding background task?
The downside of your approach is that there are many ways it could fail especially around stopping and restarting your flask application.
You will have to deal with graceful shutdown to give your worker a chance to finish its current task.
Sometime your worker won't stop on time and might linger while you start another one when you reboot your flask application.
Here are some approches I would suggest depending on your constraints:
script + crontab
You only have to write a script that does whatever task you want and cron will take care of running it for you every few minutes. Advantages: cron will run it for you periodically and will start when the system starts. Disadvantages: if the task takes too long, you might have multiple instances of your script running at the same time. You can find some solutions for this problem here.
supervisord
supervisord is a neat way to deal with different daemons. You can set it to run your app, your background script or both and have them start with the server. Only downside is that you have to install supervisord and make sure its daemon is running when the server starts.
uwsgi
uwsgi is a very common way for deploying flask applications. It has few features that might come in handy for managing background workers.
Celery
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. I think this is the best solution for scheduling background tasks for a flask application or any other python based application. But using it comes with some extra bulk. You will be introducing at least the following processes:
- a broker (rabbitmq or redis)
- a worker
- a scheduler
You can also get supervisord to manage all of the processes above and get them to start when the server starts.
Conclusion
In your quest of reducing the number of processes, I would highly suggest the crontab based solution as it can get you a long way. But please make sure your background script leaves an execution trace or logs of some sort.

Scheduling a regular event: Cron/Cron alternatives (including Celery)

Something I've had interest in is regularly running a certain set of actions at regular time intervals. Obviously, this is a task for cron, right?
Unfortunately, the Internet seems to be in a bit of disagreement there.
Let me elaborate a little about my setup. First, my development environment is in Windows, while my production environment is hosted on Webfaction (Linux). There is no real cron on Windows, right? Also, I use Django! And what's suggested for Django?
Celery of course! Unfortunately, setting up Celery has been more or less a literal nightmare for me - please see Error message 'No handlers could be found for logger “multiprocessing”' using Celery. And this is only ONE of the problems I've had with Celery. Others include a socket error which it I'm the only one ever to have gotten the problem.
Don't get me wrong, Celery seems REALLY cool. Unfortunately, there seems to be a lack of support, and some odd limitations built into its preferred backend, RabbitMQ. Unfortunately, no matter how cool a program is, if it doesn't work, well, it doesn't work!
That's where I hope all of you can come in. I'd like to know about cron or a cron-equivalent, which can be set up similarly (preferably identically) in both a Windows and a Linux environment.
(I've been struggling with Celery for about two weeks now and unfortunately I think it's time to toss in the towel and give up on it, at least for now.)
I had the same problem, and held off trying to solve it with celery (too complicated) or cron (external to application) and ended up finding Advanced Python Scheduler. Only just started using it but it seems reasonably mature and stable, has decent documentation and will take a number of scheduling formats (e.g. cron style).
From the documentation, running a function at a specific interval.
from apscheduler.scheduler import Scheduler
sched = Scheduler()
sched.start()
def hello_world():
print "hello world"
sched.add_interval_job(hello_world,seconds=10)
This is non-blocking, and I run something pretty identical by simply importing the module from my urls.py. Hope this helps.
A simple, non-Celery way to approach things would be to create custom django-admin commands to perform your asynchronous or scheduled tasks.
Then, on Windows, you use the at command to schedule these tasks. On Linux, you use cron.
I'd also strongly recommend ditching Windows if you can for a development environment. Your life will be so much better on Linux or even Mac OSX. Re-purpose a spare or old machine with Ubuntu for example, or run Ubuntu in a VM on your Windows box.
https://github.com/andybak/django-cron
Triggered by a single cron task but all the scheduling and configuration is done in Python.
Django Chronograph is a great alternative. You only need to setup one cron then do everything in django admin. You can schedule tasks/commands from django management.

Categories

Resources