I'm trying to do a simple task queue with Flask and without any DB.
In the most simple version, I have two endpoints. Submit job and Check status.
Submit job will add request to queue and check status will return the status of a job id (queued, running, failed, finished).
The workflow is as follows:
user submits a job
job is added to queue
user will check status of the job every 5 seconds
every check of status will trigger a function that checks if the number of running jobs is smaller than the maximum number of jobs (from config). If the number is smaller, it will span another thread with the job on top of queue.
This is the simplified code:
app = Flask(__name__)
def finish_job(job_id):
finished.append(job_id)
last = running.pop(job_id)
last.close()
def remove_finished():
for j in list(running.keys()):
if not running[j].is_alive():
finish_job(j)
def start_jobs():
while len(running) < config.threads and len(queue_list) > 0:
print('running now', len(running))
next_job = queue.pop()
queue_list.remove(next_job[0])
start_job(*next_job)
#app.route("/Simulation", methods=['POST'])
#authenticate
def submit_job():
# create id
job_id = str(uuid.uuid4())
job_data = request.data.decode('utf-8')
queue.append((job_id, job_data))
queue_list.add(job_id)
return 'QUEUED', 200
#app.route("/Simulation/<uuid:job_id>", methods=['GET'])
#authenticate
def check_status(job_id: uuid):
job_id = str(job_id)
remove_finished()
start_jobs()
if job_id in running:
r = 'RUNNING'
elif job_id in queue_list:
r = 'QUEUED'
elif job_id in finished:
r = 'COMPLETED'
else:
r = 'FAILED'
return status_response(r), 200
running = {}
finished = []
queue = []
queue_list = set()
app.run()
Now, the problem is, that if multiple users submit a check status request at the same time, and there is only one slot free for running a task, both requests will spawn the job.
Is there some way to force Flask to only run one instance of a function at a time?
Thank you
after much searching, I have finally found an answer for this.
As of Flask 1.0, the builtin WSGI server runs threaded by default.
So, I just needed to add parameter to stop threads
app.run(threaded=False)
Related
So i'm creating this application and a part of it is a web page where a trading algorithm is testing itself using live data. All that is working but the issue is if i leave (exit) the web page, it stops. I was wondering how i can keep it running in the background indefinitely as i want the algorithm to keep doing it's thing.
This is the route which i would like to run in the background.
#app.route('/live-data-source')
def live_data_source():
def get_live_data():
live_options = lo.Options()
while True:
live_options.run()
live_options.update_strategy()
trades = live_options.get_all_option_trades()
trades = trades[0]
json_data = json.dumps(
{'data': trades})
yield f"data:{json_data}\n\n"
time.sleep(5)
return Response(get_live_data(), mimetype='text/event-stream')
I've looked into multi-threading but not too sure if that's the right thing for the job. I am kind of still new to flask so hence the poor question. If you need more info, please do comment.
You can do it the following way - this is 100% working example below. Note, in production use Celery for such tasks, or write another one daemon app (another process) by yourself and feed it with tasks from http server with the help of message queue (e.g. RabbitMQ) or with the help of common database.
If any questions regarding code below, feel free to ask, it was quite good exercise for me:
from flask import Flask, current_app
import threading
from threading import Thread, Event
import time
from random import randint
app = Flask(__name__)
# use the dict to store events to stop other treads
# one event per thread !
app.config["ThreadWorkerActive"] = dict()
def do_work(e: Event):
"""function just for another one thread to do some work"""
while True:
if e.is_set():
break # can be stopped from another trhead
print(f"{threading.current_thread().getName()} working now ...")
time.sleep(2)
print(f"{threading.current_thread().getName()} was stoped ...")
#app.route("/long_thread", methods=["GET"])
def long_thread_task():
"""Allows to start a new thread"""
th_name = f"Th-{randint(100000, 999999)}" # not really unique actually
stop_event = Event() # is used to stop another thread
th = Thread(target=do_work, args=(stop_event, ), name=th_name, daemon=True)
th.start()
current_app.config["ThreadWorkerActive"][th_name] = stop_event
return f"{th_name} was created!"
#app.route("/stop_thread/<th_id>", methods=["GET"])
def stop_thread_task(th_id):
th_name = f"Th-{th_id}"
if th_name in current_app.config["ThreadWorkerActive"].keys():
e = current_app.config["ThreadWorkerActive"].get(th_name)
if e:
e.set()
current_app.config["ThreadWorkerActive"].pop(th_name)
return f"Th-{th_id} was asked to stop"
else:
return "Sorry something went wrong..."
else:
return f"Th-{th_id} not found"
#app.route("/", methods=["GET"])
def index_route():
text = ("/long_thread - create another thread. "
"/stop_thread/th_id - stop thread with a certain id. "
f"Available Threads: {'; '.join(current_app.config['ThreadWorkerActive'].keys())}")
return text
if __name__ == '__main__':
app.run(host="0.0.0.0", port=9999)
Let's say there is a long task that takes 1 minute. When a user makes a request /get-info and waiting for the response it should return a result. I'm using delay(), wait() and everything works. Now I want if another 5 users make same request /get-info I want them 'connect' to the same task and get result once the task is finished. I'm trying to save task id in redis. But so far I'm having 2 problems.
If I use AsyncResult() and wait() the second request hangs.
If I use AsyncResult() and state, the first request hangs. How can I implement that?
#main.route('/get-info', methods=['POST'])
def get_info():
if redis.exists('getInfoTaskId'):
taks_id = redis.get('getInfoTaskId')
task = add_together.AsyncResult(taks_id)
result = task.wait()
# result = task.state - if uncomment and comment the line above the first req hangs
else:
task = add_together.delay(23, 42)
redis.set('getInfoTaskId', task.id, ex=600)
result = task.wait()
redis.delete('getInfoTaskId')
return f"task result is {result}"
The main intent of this question is to know how to refresh cache from db (which is populated by some other team not in our control) in django rest service which will then be used in serving requests received on rest end point.
Currently I am using the following approach but my concern is since python (cpython with GIL) is not multithreaded then can we have following type of code in rest service where one thread is populating cache every 30 mins and main thread is serving requests on rest end point.Here is sample code only for illustration.
# mainproject.__init__.py
globaldict = {} # cache
class MyThread(Thread):
def __init__(self, event):
Thread.__init__(self)
self.stopped = event
def run(self):
while not self.stopped.wait(1800):
refershcachefromdb() # function that takes around 5-6 mins for refreshing cache (global datastructure) from db
refershcachefromdb() # this is explicitly called to initially populate cache
thread = MyThread(stop_flag)
thread.start() # started thread that will refresh cache every 30 mins
# views.py
import mainproject
#api_view(['GET'])
def get_data(request):
str_param = request.GET.get('paramid')
if str_param:
try:
paramids = [int(x) for x in str_param.split(",")]
except ValueError:
return JsonResponse({'Error': 'This rest end point only accept comma seperated integers'}, status=422)
# using global cache to get records
output_dct_lst = [mainproject.globaldict[paramid] for paramid in paramids if paramid in mainproject.globaldict]
if not output_dct_lst:
return JsonResponse({'Error': 'Data not available'}, status=422)
else:
return JsonResponse(output_dct_lst, status=200, safe=False)
In flask app, I need to execute other task's checkJob function (checking the job status and email to the the user) after executing return render_template(page). The user will see the confirm page but there is still background job running to check the job status.
I tried to use celery https://blog.miguelgrinberg.com/post/using-celery-with-flask for the background job and it does not work. Anything after return render_template(page) is not being executed.
Here's the code fragment:
#app.route("/myprocess", methods=['POST'])
def myprocess():
//.... do work
#r = checkJob()
return render_template('confirm.html')
r = checkJob()
#celery.task()
def checkJob():
bb=1
while bb == 1:
print "checkJob"
time.sleep(10)
As suggested in the comments, you should use apply_async().
#app.route("/myprocess", methods=['POST'])
def myprocess():
#.... do work
r = checkJob.apply_async()
return render_template('confirm.html')
Note that, as with the example, you do not want to invoke checkJob() but rather keep it like checkJob.
So in my django project i currently have a celery beat schedule for a task that runs periodically on a timer.
Now, my task is requesting from a URL about 250 times that responds with a json, and since requesting from that URL is limited, the whole task can take anywhere from 5 minutes to 10 minutes depending on how successful the requests are.
Instead of running this task periodically through a timer, how can run it based on the last task completion.
for example: if this last task was completed 10 seconds ago, run this task again
tasks.py
#app.task()
def run_db():
allPlayers = Player.objects.all()
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
settings.py
CELERYBEAT_SCHEDULE = {
'add-every-10-seconds': {
'task': 'ladder.tasks.run_db',
'schedule': timedelta(seconds=10),
}
}
I think the only way to achieve this is to call the task again in the end of the current execution of the task.
Something like:
def rundb():
allPlayers = Player.objects.all()
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
rundb.apply_async(countdown=10) # Call the same task again in 10 seconds
Keep in mind that this should be monitored. If the worker crashes and your not using ack_late, or for some other reason, the worker may not be able to reschedule the task.
Instead of calling the task 250 times you can make a single task which iterates 250 times, since you want your next task to start after the completion of your previous task you don't need to call all the tasks async. This is what I suggest:
#app.task()
def run_db():
allPlayers = Player.objects.all()
for run in totalRuns: #(totalRuns = 250 in your case.)
for player in allPlayers:
a = get_json(player.name)
if a is None:
pass
else:
player.mmr = a['rnk_amm_team_rating']
player.save()
print player.mmr
time.sleep(2)
time.sleep(10)