Store HttpResponse in dictionary - python

I have a trouble with long calculations in django. I am not able to install Celery because of idiocy of my company, so I have to "reinvent the wheel".I am trying to make all calculations in TaskQueue class, which stores all calculations in dictionary "results". Also, I am trying to make "Please Wait" page, which will asks this TaskQueue if task with provided key is ready.
And the problem is that the results somehow disappear.
I have some view with long calculations.
def some_view(request):
...
uuid = task_queue.add_task(method_name, params) #method_name(params) returns HttpResponse
return redirect('/please_wait/?uuid={0}'.format(uuid))
And please_wait view:
def please_wait(request):
uuid = request.GET.get('uuid','0')
ready = task_queue.task_ready(uuid)
if ready:
return task_queue.task_result(uuid)
elif ready == None:
return render_to_response('admin/please_wait.html',{'not_found':True})
else:
return render_to_response('admin/please_wait.html',{'not_found':False})
And last code, my TaskQueue:
class TaskQueue:
def __init__(self):
self.pool = ThreadPool()
self.results = {}
self.lock = Lock()
def add_task(self, method, params):
self.lock.acquire()
new_uuid = self.generate_new_uuid()
while self.results.has_key(new_uuid):
new_uuid = self.generate_new_uuid()
self.results[new_uuid] = self.pool.apply_async(func=method, args=params)
self.lock.release()
return new_uuid
def generate_new_uuid(self):
return uuid.uuid1().hex[0:8]
def task_ready(self, task_id):
if self.results.has_key(task_id):
return self.results[task_id].ready()
else:
return None
def task_result(self, task_id):
if self.task_ready(task_id):
return self.results[task_id].get()
else:
return None
global task_queue = TaskQueue()
After task addition I could log result providing it's uuid for some seconds, and then it says that task doesn't ready. Here is my log: (I am outputting task_queue.results)
[INFO] 2013-10-01 16:04:52,782 logger: {'ade5d154': <multiprocessing.pool.ApplyResult object at 0x1989906c>}
[INFO] 2013-10-01 16:05:05,740 logger: {}
Help me, please! Why the hell result disappears?
UPD: #freakish helped me to find out some new information. This result doesn't disappear forever, it disappears sometimes if I will repeat my tries to log it.
[INFO] 2013-10-01 16:52:41,743 logger: {}
[INFO] 2013-10-01 16:52:45,775 logger: {}
[INFO] 2013-10-01 16:52:48,855 logger: {'ade5d154': <multiprocessing.pool.ApplyResult object at 0x1989906c>}

OK, so we've established that you are running 4 processes of Django. In that case your queue won't be shared between them. Actually there are two possible solutions AFAIK:
Use a shared queueing server. You can write your own (see for example this entry) but using a proper one (like Celery) will be a lot easier (if you can't convince your employer to install it, then quit the job ;)).
Use database to store results inside it and let each server do the calculations (via processes or threads). It does not have to be a proper database server. You can use sqlite3 for example. This is more secure and reliable way but less efficient. I think this is easier then queueing mechanism. You simply create table with columns: id, state, result. When you create job you update entry with state=processing, when you finish the job you update entry with state=done and result=result (for example as JSON string). This is easy and reliable (you actually don't need a queue here at all, the order of jobs doesn't matter unless I'm missing something).
Of course you won't be able to use this .ready() functions with it (you should store results inside these storages) unless you pickle results but that is an unnecessary overhead.

Related

Python 3.7+: Wait until result is produced - api/task system

In the following code, an API gives a task to a task broker, who puts it in a queue, where it is picked up by a worker. The worker will then execute the task and notify the task broker (using a redis message channel) that he is done, after which the task broker will remove it from its queue. This works.
What I'd like is that the task broker is then able to return the result of the task to the API. But I'm unsure on how to do so since it is asynchronous code and I'm having difficulty figuring it out. Can you help?
Simplified the code is roughly as follows, but incomplete.
The API code:
#router.post('', response_model=BaseDocument)
async def post_document(document: BaseDocument):
"""Create the document with a specific type and an optional name given in the payload"""
task = DocumentTask({ <SNIP>
})
task_broker.give_task(task)
result = await task_broker.get_task_result(task)
return result
The task broker code, first part is giving the task, the second part is removing the task and the final part is what I assume should be a blocking call on the status of the removed task
def give_task(self, task_obj):
self.add_task_to_queue(task_obj)
<SNIP>
self.message_channel.publish(task_obj)
# ...
def remove_task_from_queue(self, task):
id_task_to_remove = task.id
for i in range(len(task_queue)):
if task_queue[i]["id"] == id_task_to_remove:
removed_task = task_queue.pop(i)
logger.debug(
f"[TaskBroker] Task with id '{id_task_to_remove}' succesfully removed !"
)
removed_task["status"] = "DONE"
return
# ...
async def get_task_result(self, task):
return task.result
My intuition would like to implement a way in get_task_result that blocks on task.result until it is modified, where I would modify it in remove_task_from_queue when it is removed from the queue (and thus done).
Any idea in how to do this, asynchronously?

Celery: how to get queue size in a reliable and testable way

I'm losing my mind trying to find a reliable and testable way to get the number of tasks contained in a given Celery queue.
I've already read these two related discussions:
Django Celery get task count
Note: I'm not using Django nor any other Python web framework.
Retrieve list of tasks in a queue in Celery
But I have not been able to solve my issue using the methods described in those threads.
I'm using Redis as backend, but I would like to have a backend independent and flexible solution, especially for tests.
This is my current situation: I've defined an EnhancedCelery class which inherits from Celery and adds a couple of methods, specifically get_queue_size() is the one I'm trying to properly implement/test.
The following is the code in my test case:
celery_test_app = EnhancedCelery(__name__)
# this is needed to avoid exception for ping command
# which is automatically triggered by the worker once started
celery_test_app.loader.import_module('celery.contrib.testing.tasks')
# in memory backend
celery_test_app.conf.broker_url = 'memory://'
celery_test_app.conf.result_backend = 'cache+memory://'
# We have to setup queues manually,
# since it seems that auto queue creation doesn't work in tests :(
celery_test_app.conf.task_create_missing_queues = False
celery_test_app.conf.task_default_queue = 'default'
celery_test_app.conf.task_queues = (
Queue('default', routing_key='task.#'),
Queue('queue_1', routing_key='q1'),
Queue('queue_2', routing_key='q2'),
Queue('queue_3', routing_key='q3'),
)
celery_test_app.conf.task_default_exchange = 'tasks'
celery_test_app.conf.task_default_exchange_type = 'topic'
celery_test_app.conf.task_default_routing_key = 'task.default'
celery_test_app.conf.task_routes = {
'sample_task': {
'queue': 'default',
'routing_key': 'task.default',
},
'sample_task_in_queue_1': {
'queue': 'queue_1',
'routing_key': 'q1',
},
'sample_task_in_queue_2': {
'queue': 'queue_2',
'routing_key': 'q2',
},
'sample_task_in_queue_3': {
'queue': 'queue_3',
'routing_key': 'q3',
},
}
#celery_test_app.task()
def sample_task():
return 'sample_task_result'
#celery_test_app.task(queue='queue_1')
def sample_task_in_queue_1():
return 'sample_task_in_queue_1_result'
#celery_test_app.task(queue='queue_2')
def sample_task_in_queue_2():
return 'sample_task_in_queue_2_result'
#celery_test_app.task(queue='queue_3')
def sample_task_in_queue_3():
return 'sample_task_in_queue_3_result'
class EnhancedCeleryTest(TestCase):
def test_get_queue_size_returns_expected_value(self):
def add_task(task):
task.apply_async()
with start_worker(celery_test_app):
for _ in range(7):
add_task(sample_task_in_queue_1)
for _ in range(4):
add_task(sample_task_in_queue_2)
for _ in range(2):
add_task(sample_task_in_queue_3)
self.assertEqual(celery_test_app.get_queue_size('queue_1'), 7)
self.assertEqual(celery_test_app.get_queue_size('queue_2'), 4)
self.assertEqual(celery_test_app.get_queue_size('queue_3'), 2)
Here are my attempts to implement get_queue_size():
This always returns zero (jobs == 0):
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
try:
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
return jobs
except (ChannelError, NotFound):
pass
This also always returns zero:
def get_queue_size(self, queue_name: str) -> Optional[int]:
inspection = self.control.inspect()
return inspection.active() # zero!
# or:
return inspection.scheduled() # zero!
# or:
return inspection.reserved() # zero!
This works by returning the expected number for each queue, but only in the test environment, because the channel.queues property does not exist when using the redis backend:
def get_queue_size(self, queue_name: str) -> Optional[int]:
with self.connection_or_acquire() as connection:
channel = connection.default_channel
if hasattr(channel, 'queues'):
queue = channel.queues.get(queue_name)
if queue is not None:
return queue.unfinished_tasks
None of the solutions you mentioned are entirely correct in my humble opinion. As you already mentioned this is backend-specific so you would have to wrap handlers for all backends supported by Celery to provide backend-agnostic queue inspection. In the Redis case you have to directly connect to Redis and LLEN the queue you want to inspect. In the case of RabbitMQ you find this information in completely different way. Same story with SQS...
This has all been discussed in the Retrieve list of tasks in a queue in Celery thread...
Finally, there is a reason why Celery does not provide this functionality out of box - the information is, I believe, useless. By the time you get what is in the queue it may already be empty!
If you want to monitor what is going on with your queues I suggest another approach. - Write your own real-time monitor. The example just captures task-failed events, but you should be able to modify it easily to capture all events you care about, and gather data about those tasks (queue, time, host it was executed on, etc). Clearly is an example how it is done in a more serious project.
You can see how it's implemented in the Flower (real-time monitor for Celery) here They have different Broker class implementation for redis and rabbitmq.
Another way - use celery's task events: calculate how many tasks were sent and how many were succeed/failed

Python muiltithreading is mixing the data of different request in django

I am using python muiltithreading for achieving a task which is like 2 to 3 mins long ,i have made one api endpoint in django project.
Here is my code--
from threading import Thread
def myendpoint(request):
print("hello")
lis = [ *args ]
obj = Model.objects.get(name =" jax")
T1 = MyThreadClass(lis, obj)
T1.start()
T1.deamon = True
return HttpResponse("successful", status=200)
Class MyThreadClass(Thread):
def __init__(self,lis,obj):
Thread.__init__(self)
self.lis = lis
self.obj = obj
def run(self):
for i in lis:
res =Func1(i)
self.obj.someattribute = res
self.obj.save()
def Func1(i):
'''Some big codes'''
context =func2(*args)
return context
def func2(*args):
"' some codes "'
return res
By this muiltithreading i can achieve the quick response from the django server on calling the endpoint function as the big task is thrown in another tread and execution of the endpoint thread is terminated on its return statement without keeping track of the spawned thread.
This part works for me correctly if i hit the url once , but if i hit the url 2 times as soon as 1st execution starts then on 2nd request i can see my request on console. But i cant get any response from it.
And if i hit the same url from 2 different client at the same time , both the individual datas are getting mixed up and i see few records of one client's request on other client data.
I am testing it to my local django runserver.
So guys please help , and i know about celery so dont recommend celery. Just tell me why this thing is happening or can it be fixed . As my task is not that long to use celery. I want to achieve it by muiltithreading.

Celery AsyncResult - Not working

I am new to celery but failing at what should be simple:
Backend and broker are both configured for RabbitMQ
Task as follows:
#app.task
def add(x, y):
return x + y
Test Code:
File 1:
from tasks import add
from celery import uuid
task_id = uuid()
result = add.delay(7, 2)
task_id = result.task_id
print task_id
# output =
05f3f783-a538-45ed-89e3-c836a2623e8a
print result.get()
# output =
9
File 2:
from tasks import add
from celery.result import AsyncResult
res = AsyncResult('05f3f783-a538-45ed-89e3-c836a2623e8a')
print res.state
# output =
pending
print ('Result = %s' %res.get())
My understanding is file 2 should retrieve the value success and 9.
I have installed flower:
This reports success and 9 for the result.
Help. This is driving me nuts.
Thank you
Maybe you should read the FineManual and think twice ?
RPC Result Backend (RabbitMQ/QPid)
The RPC result backend (rpc://) is special as it doesn’t actually store the states, but rather sends
them as messages. This is an important difference as it means that a
result can only be retrieved once, and only by the client that
initiated the task. Two different processes can’t wait for the same
result.
(...)
The messages are transient (non-persistent) by default, so the results
will disappear if the broker restarts. You can configure the result
backend to send persistent messages using the result_persistent
setting.

Failed ndb transaction attempt not rolling back all changes?

I have some trouble understanding a sequence of events causing a bug in my appplication which can only be seen intermittently in the app deployed on GAE, and never when running with the local devserver.py.
All the related code snippets below (trimmed for MCV, hopefully I didn't lose anything significant) are executed during handling of the same task queue request.
The entry point:
def job_completed_task(self, _):
# running outside transaction as query is made
if not self.all_context_jobs_completed(self.context.db_key, self):
# this will transactionally enqueue another task
self.trigger_job_mark_completed_transaction()
else:
# this is transactional
self.context.jobs_completed(self)
The corresponding self.context.jobs_completed(self) is:
#ndb.transactional(xg=True)
def jobs_completed(self, job):
if self.status == QAStrings.status_done:
logging.debug('%s jobs_completed %s NOP' % (self.lid, job.job_id))
return
# some logic computing step_completed here
if step_completed:
self.status = QAStrings.status_done # includes self.db_data.put()
# this will transactionally enqueue another task
job.trigger_job_mark_completed_transaction()
The self.status setter, hacked to obtain a traceback for debugging this scenario:
#status.setter
def status(self, new_status):
assert ndb.in_transaction()
status = getattr(self, self.attr_status)
if status != new_status:
traceback.print_stack()
logging.info('%s status change %s -> %s' % (self.name, status, new_status))
setattr(self, self.attr_status, new_status)
The job.trigger_job_mark_completed_transaction() eventually enqueues a new task like this:
task = taskqueue.add(queue_name=self.task_queue_name, url=url, params=params,
transactional=ndb.in_transaction(), countdown=delay)
The GAE log for the occurence, split as it doesn't fit into a single screen:
My expectation from the jobs_completed transaction is to either see the ... jobs_completed ... NOP debug message and no task enqueued or to at least see the status change running -> done info message and a task enqueued by job.trigger_job_mark_completed_transaction().
What I'm actually seeing is both messages and no task enqueued.
The logs appears to indicate the transaction is attempted twice:
1st time it finds the status not done, so it executes the logic, sets the status to done (and displays the traceback and the info msg) and should transactionally enqueue the new task - but it doesn't
2nd time it finds the status done and just prints the debug message
My question is - if the 1st transaction attempt fails shouldn't the status change be rolled back as well? What am I missing?
I found a workaround: specifying no retries to the jobs_completed() transaction:
#ndb.transactional(xg=True, retries=0)
def jobs_completed(self, job):
This prevents the automatic repeated execution, instead causing an exception:
TransactionFailedError(The transaction could not be committed. Please
try again.)
Which is acceptable as I already have in place a back-off/retry safety net for the entire job_completed_task(). Things are OK now.
As for why the rollback didn't happen, the only thing that crosses my mind is that somehow the entity was read (and cached in my object attribute) prior to entering the transaction, thus not being considered part of the (same) transaction. But I couldn't find a code path that would do that, so it's just speculation.

Categories

Resources