I'm using django-rq in my project.
What I want to achieve:
I have a first view that loads a template where an image is acquired from webcam and saved on my pc. Then, the view calls a second view, where an asynchronous task to process the image is enqueued using rq. Finally, after a 20-second delay, a third view is called. In this latter view I'd like to retrieve the result of the asynchronous task.
The problem: the job object is correctly created, but the queue is always empty, so I cannot use queue.fetch_job(job_id). Reading here I managed to find the job in the FinishedJobRegistry, but I cannot access it, since the registry is not iterable.
from django_rq import job
import django_rq
from rq import Queue
from redis import Redis
from rq.registry import FinishedJobRegistry
redis_conn = Redis()
q = Queue('default',connection=redis_conn)
last_job_id = ''
def wait(request): #second view, starts the job
template = loader.get_template('pepper/wait.html')
job = q.enqueue(processImage)
print(q.is_empty()) # this is always True!
last_job_id = job.id # this is the expected job id
return HttpResponse(template.render({},request))
def ceremony(request): #third view, retrieves the result
template = loader.get_template('pepper/ceremony.html')
print(q.is_empty()) # True
registry = FinishedJobRegistry('default', connection=redis_conn)
finished_job_ids = registry.get_job_ids() #here I have the correct id (last_job_id)
return HttpResponse(template.render({},request))
The question: how can I retrieve the result of the asynchronous job from the finished job registry? Or, better, how can I correctly enqueue the job?
I have found an other way to do it: I'm simply using a global list of jobs, that I'm modifying in the views. Anyway, I'd like to know the right way to do this...
Related
I have a script that continually runs, processing data that it gets from an external device. The core logic follows something like:
from external_module import process_data, get_data, load_interesting_things
class MyService:
def __init__(self):
self.interesting_items = load_interesting_things()
self.run()
def run(self):
try:
while True:
data = get_data()
for item in self.interesting_items:
item.add_datapoint(process_data(data, item))
except KeyboardInterrupt:
pass
I would like to add the ability to request information for the various interesting things via a RESTful API.
Is there a way in which I can add something like a Flask web service to the program such that the web service can get a stat from the interesting_items list to return? For example something along the lines of:
#app.route("/item/<idx>/average")
def average(idx: int):
avg = interesting_items[idx].getAverage()
return jsonify({"average":avg})
Assuming there is the necessary idx bounds checking and any appropriate locking implemented.
It does not have to be Flask, but it should be light weight. I want to avoid using a database. I would prefer to use a webservice, but if it is not possible without completely restructuring the code base I can use a socket instead, but this is less preferable.
The server would be running on a local network only and usually only handling a single user, sometimes it may have a few.
I needed to move the run() method out of the __init__() method, so that I could have a global reference to the service, and start the run method in a separate thread. Something along the lines of:
service = MyService()
service_thread = threading.Thread(target=service.run, daemon=True)
service_thread.start()
app = flask.Flask("appname")
...
#app.route("/item/<idx>/average")
def average(idx: int):
avg = service.interesting_items[idx].getAverage()
return jsonify({"average":avg})
...
app.run()
Currently I am doing the celery group task and I want to call upload_local_directory(output_file)method after all the tasks completed. I tried below approach but its not waiting the job to be completed.
tasks = [make_tmp_files.s(page.object_list, path + str(uuid.uuid4() + '.csv')) for page in paginator]
job = group(tasks)
job.apply_async()
job.get()
output_file = 'final.zip'
upload_local_directory_into_S3(output_file)
make_tmp_files method is the celery job method.
“backend” also defined in the celery object.
please comment If more information needed.
You either chain your group and the final, make_tmp_files task, or you use Chord to accomplish the same. Do not be alarmed if you see that Celery automatically converts group+task chain into a Chord.
Within celery there is a difference between a group of tasks and a group of task results. If you were to change your code to the following it should work:
tasks = [make_tmp_files.s(page.object_list, path + str(uuid.uuid4() + '.csv')) for page in paginator]
job = group(tasks)
job_results = job.apply_async()
job_results.get()
output_file = 'final.zip'
upload_local_directory_into_S3(output_file)
job.apply_async() returns a group of AsyncResults. As a user, you need to check the result of AsyncResult, not the tasks themselves.
Reference: https://docs.celeryproject.org/en/stable/userguide/canvas.html#groups
Hopefully that helps!
I'm working with the huey task queue https://github.com/coleifer/huey in flask . I'm trying to run a task and get a task id number back from my initial function:
#main.route('/renew',methods=['GET', 'POST'])
def renew():
print(request.form)
user =request.form.get('user')
pw =request.form.get('pw')
res =renewer(user,pw)
res(blocking=True) # Block for up to 5 seconds
print(res)
return res.id
After running this I plug the outputted id (which is the same as the result in the screenshot)
into :
#main.route('/get_result_by_id',methods=['GET', 'POST'])
def get_result_by_id():
print(request.form)
id =request.form.get('id')
from ..tasking.tasks import my_huey
res = my_huey.result(id)
if res==None:
res = 'no value'
return res
However I'm getting 'no value'
How can I access the value in the data store?
When you are doing res(blocking=True) in def renew() you are fetching the result from the result store and effectively removing it. When you then try to fetch the result again using the id, it will just return nothing.
You have 2 options to solve this:
Either use res(blocking=True, preserve=True) to preserve the result in the result store so you can still fetch it with your second call.
Use a storage that uses expiring results like RedisExpireStorage. When configuring this storage while setting up the huey instance, you can specify for how long your results should be stored. This would give you x amount of time to do the second call based on the task/result id.
Is there a way I can remove a specific set of tasks from Celery? Maybe using a wildcard? Something like:
app.control.delete("foobar-only-*")
I know I can delete all tasks with
from proj.celery import app
app.control.purge()
which comes from here, but that's not very helpful as it doesn't seem that I can use that code to tweak it and do what I want.
Answering to my own question. This is an extract from the code with which I achieved my goal:
def stop_crawler(crawler_name):
crawler = Crawler.objects.get(name=crawler_name)
if crawler is None:
logger.error(f"Can't find a crawler named {crawler_name}")
return
i = app.control.inspect()
queue_name = f"collect_urls_{crawler_name}"
# Iterate over all workers, and the queues of each worker, and stop workers
# from consuming from the queue that belongs to the crawler we're stopping
for worker_name, worker_queues in i.active_queues().items():
for queue in worker_queues:
if queue["name"] == queue_name:
app.control.cancel_consumer(queue_name, reply=True)
# Iterate over the different types of tasks and stop the ones that belong
# to the crawler we're stopping
for queue in [i.active, i.scheduled, i.reserved]:
for worker_name, worker_tasks in queue().items():
for task in worker_tasks:
args = ast.literal_eval(task["args"])
if "collect_urls" in task["name"] and args[0] == crawler_name:
app.control.revoke(task["id"], terminate=True)
Just found the Queue module which is helping me adapt the pyftpdlib module. I'm running an very strict FTP server, and my goal is to restrict the filenames available to upload. This is to prevent people from uploading whatever they want (it's actually the backend of an upload client, not a complete FTP server).
I have this in the ftpserver Authorizer:
def fetch_worlds(queue, username):
while queue.empty():
worlds = models.World.objects.filter(is_synced=True, user__username=username)
print worlds
queue.put(worlds, timeout=1)
class FTPAuthorizer(ftpserver.DummyAuthorizer):
def __init__(self):
self.q = Queue.Queue()
self.t = None # Thread
self.world_item = None
def has_perm(self, username, perm, path=None):
print "Checking permission\n"
if perm not in ['r','w']:
return False
# Check world name
self.t = threading.Thread(target=fetch_worlds, args=(self.q, username))
self.t.daemon = True
self.t.start()
self.world_item = self.q.get()
print "WORLDITEM: %s" % self.world_item
if path is not None:
path = os.path.basename(path)
for world in self.world_item:
test = "{0}_{1}.zip".format(username, world.name)
if path == test:
print "Match on %s" % test
return True
return False
My issue is, after the server starts, the first time I STOR a file, it does an initial db call and gets all the worlds properly. But when I then add another world (for example, set is_synced=True on one, it still returns the old data from self.q.get(). has_perm() is called every time a file is uploaded, and it needs to return live data (to check if a file is allowed).
For example, brand new server:
STOR file.zip, self.q.get() returns <World1, World2>
Update the database via other methods etc
STOR file2.zip, inside fetch_worlds, print worlds returns <World1, World2, World3> but self.q.get() returns <World1, World2>
Just found the Queue module and it seemed like it would be helpful but I can't get the implementation right.
(also couldn't add tag pyftpdlib)
i think this is what could be happening here:
when has_perm is called, you create a Thread that will query a database (?) to add elements to the queue
after calling start the call to the database will take some time
meanwhile in your main thread you entered q.get which will block.
the db call finishes and the result is added to the queue
and is immediately removed from the queue again by the blocking q.get
the queue is now empty, your thread enters the while-loop again and executes the same query again and puts the result onto the queue.
the next call to q.get will return that instead of what it expects.
you see, you could have a race condition here, that already is aparent from the fact that you're adding something to a queue in a loop while you don't have a loop when pulling.
also you assume the element you get from the queue is the result to what you have put onto it before. that doesn't have to be true. if you call has_perm two times this will result in two calls to fetch_worlds with the possibility that the queue.empty() check for one of the calls fails. so only one result will be put onto the queue. now you have two threads waiting on q.get, but only one will get a result, while the oter one waits until one becomes ready...
has_perm looks like it should be a blocking call anyway.