I need to run a python script (which is listening to Twitter) which will call various methods on my django app when it gets tweets that match a particular hashtag.
At the moment, I just run the script by hand on the command line but I'd like it to run inside django if possible so that I can control it from there and so it doesn't have to perform HTTP POSTs when it gets new data.
I've looked at celery (briefly) but this seems to be for performing certain small tasks at regular intervals to me.
Is there a way to use celery (or anything else) to be able to control this long-running "listen to Twitter" script that I've got?
You should Supervisord to run your django application and your script. Making the script a part of the Django project, will let you use Django signals which you can use to write a custom signal that will be emitted every time your twitter logic is done doing what it is supposed to. Signals are blocking. If you want them to be asynchronous use Celery with Django
An alternative would be to run your django application and the twitter script via supervisord and then expose a REST API which does a HTTP POST to the Django application. You can use TastyPie for that.
Related
I am using Django Framework to write a middleware. Basically, every week, it calls an external API to receive a xml file, does some data processing and sends the results to RabbitMQ.
I am new to Django Framework. I wrote some python files to achieve the logic, but these files did not execute when running the server. I am wondering how to run these codes periodically which are not inside views.
Thank you.
I am hoping to gain a basic understanding of scheduled task processes and why things like Celery are recommended for Flask.
My situation is a web-based tool which generates spreadsheets based on user input. I save those spreadsheets to a temp directory, and when the user clicks the "download" button, I use Flask's "send_from_directory" function to serve the file as an attachment. I need a background service to run every 15 minutes or so to clear the temp directory of all files older than 15 minutes.
My initial plan was a basic python script running in a while(True) loop, but I did some research to find what people normally do, and everything recommends Celery or other task managers. I looked into Celery and found that I also need to learn about redis, and I need to apparently host redis in a unix environment. This is a lot of trouble for a script that just deletes files every 15 minutes.
I'm developing my Flask app locally in Windows with the built-in development server and deploying to a virtual machine on company intranet with IIS. I'm learning as I go, so please explain why this much machinery is needed to regularly call a script that simply deletes things. It seems like a vast overcomplication, but as I said, I'm trying to learn as I go so I want to do/learn it correctly.
Thanks!
You wouldn't use Celery or redis for this. A cron job would be perfectly appropriate.
Celery is for jobs that need to be run asynchronously but in response to events in the main server processes. For example, if a sign up form requires sending an email notification, that would be scheduled and run via Celery so as not to block the main web response.
I am working on a web application that uses a permanent object MyService. Using a web interface I am dynamically updating its state and monitor its behavior. Now I would like to periodically call one of its methods. I was thinking of using celery PeriodicTask but run into some scope issues. It seems I need to execute three different processes:
python manage.py runserver
python manage.py celery worker
python manage.py celerybeat
The problem is that even if I ensure that MyService is a singleton that can be safely used by more than one thread, celery creates its own fresh copy of the object. Is there a way I could share this object between both django server and celery main process? I tried to find a way to start celery from within django script but until now with no success. Would appreciate any help.
If you need to share something between multiple processes or maybe even multiple machines (eg. your workers could run on a seperate machine) the best (and probably easiest) practice to share information would be using an external service.
In the simplest case you could use Django's DB, but if you encounter that this is not suitable for you, for example if you have a heavy write load you can use something like Redis or Memcache (which you can also talk to via Django's caching API). These will enable you to be able to handle a big write load and besides you can use eg. Redis as a queue for celery as well.
Ubuntu 11.10, Python 2.6. Background: I have an existing Python app that is using Twisted to sit in a loop and wait for RESTful commands to come in. So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever. When a request comes in, the appropriate handler is called, stuff happens, a reply is sent back.
My job is now to remove Twisted because management has decided they don't like it. We're moving to Apache as our web server.
Using the documentation, I have successfully installed and configured Apache2.0 to serve web pages. I also installed mod_wsgi, and was able to configure it and Apache to execute arbitrary Python code when a request comes in. So I'm good on that side.
What I'm missing is how to connect my Python application to the Apache/mod_wsgi bits, since the application needs to be persistent and always running. It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way. But it seems like this is something that should already be out there, I just don't know enough to know what to search for.
Any pushes in the right direction are greatly appreciated.
Further edit for clarity: I'm not making a webserver. The application in question is a host app that is running on a virtual machine. It happens to be controlled by a RESTful interface via HTTP. So all it needs to do is be able to listen for incoming commands and reply to them.
mod_wsgi may not be the proper tool for this job, which is fine, I just don't know what is.
Does the daemon mode of mod_wsgi offer enough persistence in your case? Or if you want to run the main process separately from Apache, how about mod_fastcgi? Maybe running Apache as a reverse proxy could be an option too.
It was suggested that I open a pipe between my wsgi script and my main application, and serialize the requests that way.
That's what multiprocessing queues are for.
http://docs.python.org/library/multiprocessing.html
http://docs.python.org/library/multiprocessing.html#pipes-and-queues
You'll be even happier if you start using Celery.
Celery will allow you to "remove Twisted because management has decided they don't like it."
However. Switch to celery means that things like "So the app starts up, kicks off threads that do various things, and main sets up callbacks for Twisted, then calls Twisted.reactor.run(), which blocks forever" all have to be completely rethought. Instead of some main polling loop, you now have multiple, independent processes that are coordinated by celery.
What you'll find is all the housekeeping in your application -- all the coordination among threads -- the callbacks -- all that -- will go away. You'll be left with a few Python scripts that do the "real work" and Celery to manage the distributed task queue.
Is it possible to run a script each time the dev server starts? Also at each deploy to google?
I want the application to fill the database based on what some methods returns.
Is there any way to do this?
..fredrik
I use appengine python with the django helper. As far as I know you cannot hook anything on the deploy, but you could put a call to check if you need to do your setup in the main function of main.py. This is how the helper initializes itself on the first request. I haven't looked at webapp in a while, but I assume main.py acts in a similar fashion for that framework.
Be aware that main is run on the first request, not when you first deploy. It will also happen if appengine starts up a new instance to handle load, or if all instances were stopped because of inactivity. So make sure you check to see if you need to do your initialization and then only do it if needed.
You can do this by writing a script in your favorite scripting language that performs the actions that you desire and then runs the dev server or runs appcfg.py update.
Try to make wrapper around the server runner and script that run deployment. So you will be able to run custom code when you need.
Warmup Requests in combination with min_idle_instances will probably work in your deploy usecase.