Celery wait for a task from another request - python

Let's say there is a long task that takes 1 minute. When a user makes a request /get-info and waiting for the response it should return a result. I'm using delay(), wait() and everything works. Now I want if another 5 users make same request /get-info I want them 'connect' to the same task and get result once the task is finished. I'm trying to save task id in redis. But so far I'm having 2 problems.
If I use AsyncResult() and wait() the second request hangs.
If I use AsyncResult() and state, the first request hangs. How can I implement that?
#main.route('/get-info', methods=['POST'])
def get_info():
if redis.exists('getInfoTaskId'):
taks_id = redis.get('getInfoTaskId')
task = add_together.AsyncResult(taks_id)
result = task.wait()
# result = task.state - if uncomment and comment the line above the first req hangs
else:
task = add_together.delay(23, 42)
redis.set('getInfoTaskId', task.id, ex=600)
result = task.wait()
redis.delete('getInfoTaskId')
return f"task result is {result}"

Related

Python: Flask simple task queue without external libraries not working

I'm trying to do a simple task queue with Flask and without any DB.
In the most simple version, I have two endpoints. Submit job and Check status.
Submit job will add request to queue and check status will return the status of a job id (queued, running, failed, finished).
The workflow is as follows:
user submits a job
job is added to queue
user will check status of the job every 5 seconds
every check of status will trigger a function that checks if the number of running jobs is smaller than the maximum number of jobs (from config). If the number is smaller, it will span another thread with the job on top of queue.
This is the simplified code:
app = Flask(__name__)
def finish_job(job_id):
finished.append(job_id)
last = running.pop(job_id)
last.close()
def remove_finished():
for j in list(running.keys()):
if not running[j].is_alive():
finish_job(j)
def start_jobs():
while len(running) < config.threads and len(queue_list) > 0:
print('running now', len(running))
next_job = queue.pop()
queue_list.remove(next_job[0])
start_job(*next_job)
#app.route("/Simulation", methods=['POST'])
#authenticate
def submit_job():
# create id
job_id = str(uuid.uuid4())
job_data = request.data.decode('utf-8')
queue.append((job_id, job_data))
queue_list.add(job_id)
return 'QUEUED', 200
#app.route("/Simulation/<uuid:job_id>", methods=['GET'])
#authenticate
def check_status(job_id: uuid):
job_id = str(job_id)
remove_finished()
start_jobs()
if job_id in running:
r = 'RUNNING'
elif job_id in queue_list:
r = 'QUEUED'
elif job_id in finished:
r = 'COMPLETED'
else:
r = 'FAILED'
return status_response(r), 200
running = {}
finished = []
queue = []
queue_list = set()
app.run()
Now, the problem is, that if multiple users submit a check status request at the same time, and there is only one slot free for running a task, both requests will spawn the job.
Is there some way to force Flask to only run one instance of a function at a time?
Thank you
after much searching, I have finally found an answer for this.
As of Flask 1.0, the builtin WSGI server runs threaded by default.
So, I just needed to add parameter to stop threads
app.run(threaded=False)

Python 3.7+: Wait until result is produced - api/task system

In the following code, an API gives a task to a task broker, who puts it in a queue, where it is picked up by a worker. The worker will then execute the task and notify the task broker (using a redis message channel) that he is done, after which the task broker will remove it from its queue. This works.
What I'd like is that the task broker is then able to return the result of the task to the API. But I'm unsure on how to do so since it is asynchronous code and I'm having difficulty figuring it out. Can you help?
Simplified the code is roughly as follows, but incomplete.
The API code:
#router.post('', response_model=BaseDocument)
async def post_document(document: BaseDocument):
"""Create the document with a specific type and an optional name given in the payload"""
task = DocumentTask({ <SNIP>
})
task_broker.give_task(task)
result = await task_broker.get_task_result(task)
return result
The task broker code, first part is giving the task, the second part is removing the task and the final part is what I assume should be a blocking call on the status of the removed task
def give_task(self, task_obj):
self.add_task_to_queue(task_obj)
<SNIP>
self.message_channel.publish(task_obj)
# ...
def remove_task_from_queue(self, task):
id_task_to_remove = task.id
for i in range(len(task_queue)):
if task_queue[i]["id"] == id_task_to_remove:
removed_task = task_queue.pop(i)
logger.debug(
f"[TaskBroker] Task with id '{id_task_to_remove}' succesfully removed !"
)
removed_task["status"] = "DONE"
return
# ...
async def get_task_result(self, task):
return task.result
My intuition would like to implement a way in get_task_result that blocks on task.result until it is modified, where I would modify it in remove_task_from_queue when it is removed from the queue (and thus done).
Any idea in how to do this, asynchronously?

Python muiltithreading is mixing the data of different request in django

I am using python muiltithreading for achieving a task which is like 2 to 3 mins long ,i have made one api endpoint in django project.
Here is my code--
from threading import Thread
def myendpoint(request):
print("hello")
lis = [ *args ]
obj = Model.objects.get(name =" jax")
T1 = MyThreadClass(lis, obj)
T1.start()
T1.deamon = True
return HttpResponse("successful", status=200)
Class MyThreadClass(Thread):
def __init__(self,lis,obj):
Thread.__init__(self)
self.lis = lis
self.obj = obj
def run(self):
for i in lis:
res =Func1(i)
self.obj.someattribute = res
self.obj.save()
def Func1(i):
'''Some big codes'''
context =func2(*args)
return context
def func2(*args):
"' some codes "'
return res
By this muiltithreading i can achieve the quick response from the django server on calling the endpoint function as the big task is thrown in another tread and execution of the endpoint thread is terminated on its return statement without keeping track of the spawned thread.
This part works for me correctly if i hit the url once , but if i hit the url 2 times as soon as 1st execution starts then on 2nd request i can see my request on console. But i cant get any response from it.
And if i hit the same url from 2 different client at the same time , both the individual datas are getting mixed up and i see few records of one client's request on other client data.
I am testing it to my local django runserver.
So guys please help , and i know about celery so dont recommend celery. Just tell me why this thing is happening or can it be fixed . As my task is not that long to use celery. I want to achieve it by muiltithreading.

Are my Python threads tripping over each other with requests?

I have a program that needs to access data from an API. I need to get a list from it and then for every item in that list, request more data from the API. When I get a list, I get them in batches of 50 when there are around 600 items on this list. I thought that I could do this using requests and threading. Here is what it looks like:
I have essentially a helper method to call the API:
call_api_method(method, token, params={}):
params_to_send = params.copy()
params_to_send['auth'] = token
response = requests.get('{0}/rest{1}'.format(DOMAIN, method), params = params_to_send)
return response.json()
I then have a recursive threading function to get all info. I thought I could use threads to go ahead and request the next batch of info while making threads to request the info per item:
def import_item_info(auth_token, start = None):
if start is None:
start = 0
threads = []
result = call_api_method('get_list', auth_token, {'start': start})
#the call returns next which is the index of the next batch
if result['next']:
thread = threading.Thread(target=import_item_info, args=(auth_token, result['next'])
thread.start()
threads.append(thread)
for list_item in result['result']:
thread = threading.Thread(target=get_item_info, args=(auth_token, item['ID'])
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
This is get_item_info, which makes a call to the api using the id of the item to get the specific details about the item:
def get_item_info(auth_token, item_id):
item = call_api_method('get_item', auth_token, {'id': item_id})
print(item['key'])
I've abstracted a lot of the info, but essentially what is going on is that sometimes the requests.get returns something slightly garbled and I get a JSONDecodeError: Expecting value: line 1 column 1 (char 0).
I highly suspect that this is a threading problem because the first request goes through just fine. I can't seem to find what I'm doing wrong.
Okay. Sorry... I thought I checked but apparently I hit the query limit and that's why it is starting to do weird things.

In celery, how do i run a period task based on task completion

So in my django project i currently have a celery beat schedule for a task that runs periodically on a timer.
Now, my task is requesting from a URL about 250 times that responds with a json, and since requesting from that URL is limited, the whole task can take anywhere from 5 minutes to 10 minutes depending on how successful the requests are.
Instead of running this task periodically through a timer, how can run it based on the last task completion.
for example: if this last task was completed 10 seconds ago, run this task again
tasks.py
#app.task()
def run_db():
allPlayers = Player.objects.all()
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
settings.py
CELERYBEAT_SCHEDULE = {
'add-every-10-seconds': {
'task': 'ladder.tasks.run_db',
'schedule': timedelta(seconds=10),
}
}
I think the only way to achieve this is to call the task again in the end of the current execution of the task.
Something like:
def rundb():
allPlayers = Player.objects.all()
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
rundb.apply_async(countdown=10) # Call the same task again in 10 seconds
Keep in mind that this should be monitored. If the worker crashes and your not using ack_late, or for some other reason, the worker may not be able to reschedule the task.
Instead of calling the task 250 times you can make a single task which iterates 250 times, since you want your next task to start after the completion of your previous task you don't need to call all the tasks async. This is what I suggest:
#app.task()
def run_db():
allPlayers = Player.objects.all()
for run in totalRuns: #(totalRuns = 250 in your case.)
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
time.sleep(10)

Categories

Resources