I have a project using Celery and originally implementing one unique queue, which could cause some trouble.
So I want to implement several queues (which is done and works), but I would like to set different soft time limitd per queue. Actually the only things I found is time_limit as global setting for Celery, or setting it every time I decorate a task. First is a too generic solution, the second is not enough.
Thanks
During you queue definition you can set a time to live x-message-ttl on it.
Queue('test_queue', Exchange('default'), routing_key='test_queue', queue_arguments={'x-message-ttl': 86400000})
Related
What I want is deceptively simple: user presses a single button in ForceScheduler and multiple build requests go to a dozen of workers simultaneously. Right now users repeatedly run the same scheduler multiple times with different worker each time and this is repetitive.
I've tried setting multiple = True, on WorkerChoiceParameter in the most straightforward manner, but this didn't help: only the first worker gets the build request. Also, I've seen that inside Worker class there's a field that hints that it can be done: endpoints = [WorkerEndpoint, WorkersEndpoint]. Looking at WorkersEndpoint (note the "s") I've noted that it tries to get a list of workers, so in theory this should be possible? But I cannot figure out how to tell buildbot to use this second endpoint or maybe I misunderstood what this code does.
I've started a new Python 3 project in which my goal is to download tweets and analyze them. As I'll be downloading tweets from different subjects, I want to have a pool of workers that must download from Twitter status with the given keywords and store them in a database. I name this workers fetchers.
Other kind of worker is the analyzers whose function is to analyze tweets contents and extract information from them, storing the result in a database also. As I'll be analyzing a lot of tweets, would be a good idea to have a pool of this kind of workers too.
I've been thinking in using RabbitMQ and Celery for this but I have some questions:
General question: Is really a good approach to solve this problem?
I need at least one fetcher worker per downloading task and this could be running for a whole year (actually is a 15 minutes cycle that repeats and last for a year). Is it appropriate to define an "infinite" task?
I've been trying Celery and I used delay to launch some example tasks. The think is that I don't want to call ready() method constantly to check if the task is completed. Is it possible to define a callback? I'm not talking about a celery task callback, just a function defined by myself. I've been searching for this and I don't find anything.
I want to have a single RabbitMQ + Celery server with workers in different networks. Is it possible to define remote workers?
Yeah, it looks like a good approach to me.
There is no such thing as infinite task. You might reschedule a task it to run once in a while. Celery has periodic tasks, so you can schedule a task so that it runs at particular times. You don't necessarily need celery for this. You can also use a cron job if you want.
You can call a function once a task is successfully completed.
from celery.signals import task_success
#task_success(sender='task_i_am_waiting_to_complete')
def call_me_when_my_task_is_done():
pass
Yes, you can have remote workes on different networks.
I want to set multiple alarm in python. What is the recommended way of setting it up? My use-case is that I've threshold time for N variables. When the current time reaches the threshold value, I want all the variables with that threshold values.
Here's my apprach:-
threshold_time_list = [get list all times from the DB]
current_time = datetime.now()
[i for i in threshold_time_list if i==current_time]
But this is very inefficient way of doing it since I might have 250+ variables like a/b/c.
And also I have to check this condition every second(cronjob). Is there a better way of doing it?
I found on SO, this can be done using threading and making the thread to go to sleep for threshod - current_time. But running 250 threads parallely is again an issue, since I've been facing an issue in my production where Django gets hanged (dont know why) and I need to restart the server to make it work again. We're asssuming that Django might get out of threads for processing, hence making 250 more threads is cumbersome.
Also if someone knows , why does Django gets hang in b/w the running live product it will be beneficial.
Can this alarm question be done in celery?
Use the sched module. This lets you create any number of scheduled tasks, then run them when they kick off, one at a time.
I often have models that are a local copy of some remote resource, which needs to be periodically kept in sync.
Task(
url="/keep_in_sync",
params={'entity_id':entity_id},
name="sync-%s" % entity_id,
countdown=3600
).add()
Inside keep_in_sync any changes are saved to the model and a new task is scheduled to happen again later.
Now, while superficially this seems like a nice solution, in practice you might become worried if all the necessary tasks have really been added or not. Maybe you have entities representing the level of food pellets inside your hamster cages so that an automated email can be sent to your housekeeper to feed them. But then a few weeks later when you come back from your holiday, you find several of your hamsters starving.
It then starts seeming like a good idea to make a script that goes through each entity and makes sure that the proper task really is in the queue for it. But neither Task nor Queue classes have any method for checking if a task exists or not.
Can you save the hamsters and come up with a nicer way to make sure that a method really for sure is being periodically called for each entity?
Update
It seems that if you want to be really sure that tasks are scheduled, you need to keep track of your own tasks as Nick Johnson suggests. Not ready to let go of the convenient task queue, so for the time being will just tolerate the uncertainty of being unable to check if tasks are really scheduled or not.
Instead of enqueueing a task per entity, handle multiple entities in a single task. This can be triggered by a daily cron job, for instance, which fans out to multiple tasks. As well as ensuring you execute your code for each entity, you can also take advantage of asynchronous URLFetch to synchronize with the external resource more efficiently, and batch puts and gets from the datastore to make the updates more efficient.
You'll get an exception (TaskAlreadyExistsError) if there already such task in queue (same url and same params). So, don't worry, just all of them into queue, and remember to catch exceptions.
You can find full list of exceptions here: http://code.google.com/intl/en/appengine/docs/python/taskqueue/exceptions.html
I'm using deferred to put tasks in the default queue in a AppEngine app similar to this approach.
Im naming the task with a timestamp that changes every 5 second. During that time a lot of calls are made to the queue with the same name resulting in a TaskAlreadyExistsError which is fine. The problem is when I check the quotas the "Task Queue API Calls" are increasing for every call made, not only those who actually are put in the queue.
I mean if you look at the quota: Task Queue API Calls: 34,017 of 100,000 and compare to the actual queue calls: /_ah/queue/deferred - 2.49K
Here is the code that handles the queue:
try:
deferred.defer(function_call, params, _name=task_name, _countdown=int(interval/2))
except (taskqueue.TaskAlreadyExistsError, taskqueue.TombstonedTaskError):
pass
I suppose that is the way it works. Is there a good way to solve the problem with the quota? Can I use memcache to store the task_name and check if the task has been added besides the above try/catch? Or is there a way to check if the task already exists without using Task Queue Api Calls?
Thanks for pointing out that this is the case, because I didn't realise, but the same problem must be affecting me.
As far as I can see yeah, throwing something into memcache comprised of the taskname should work fine, and then if you want to reduce those hits on memcache you can store the flag locally also within the instance.
The "good way" to solve the quota problem is eliminating destined-to-fail calls to Task Queue API.
_name you are using changes every 5 seconds which might not be a bottleneck if you increase the execution rate of your Task Queue. But you also add Tasks using _countdown.