I'm working on writing a Python web app with Flask using Azure to host. I need to do some background work. When I test the app locally, everything works great. However, as soon as I push the update to Azure, it stops functioning. Right now, I have a multithreading.Process set up, and based on the log files, Azure isn't starting another process. Here is the relevant parts of my code:
#task queue and comm pipes
tasks = Queue()
parent_pipe, child_pipe = Pipe()
def handle_queue_execution(tasks, pipe):
logging.info("starting task queue handler")
while True:
if pipe.recv():
logging.debug("preparing to get task from queue")
task = tasks.get()
args = tasks.get()
logging.debug("executing task %s(%s)", get_fn_name(task), clean_args(args))
task(args)
logging.debug("task %s(%s) executed successfully", get_fn_name(task), clean_args(args))
queue_handler = Process(target=handle_queue_execution, args=(tasks, child_pipe,))
queue_handler.daemon = True
if __name__ == '__main__':
queue_handler.start()
There are a few semi-related questions I have on this:
1) Why won't Azure start another process?
You'll note that the handle_queue_execution function begins with a logger call. That message doesn't appear in the log file when hosted on Azure, nor do the queued tasks appear to execute. Again, both aspects of this work as expected when running on localhost.
2) Is there a better way?
I'm fairly new to both Python and Azure, so if there's a better way to do this type of task handling, I'm open to hear about it. I've looked into using something like Celery, but I can't figure out how to set it up, and I'd prefer to make my own implementation as I'm learning these new skills.
Thanks very much.
Python has multiple other ways to start new processes. Threading would most likely be the easiest here.
#task queue and comm pipes
import threading
tasks = Queue()
parent_pipe, child_pipe = Pipe()
def handle_queue_execution(tasks, pipe):
logging.info("starting task queue handler")
while True:
if pipe.recv():
logging.debug("preparing to get task from queue")
task = tasks.get()
args = tasks.get()
logging.debug("executing task %s(%s)", get_fn_name(task), clean_args(args))
task(args)
logging.debug("task %s(%s) executed successfully", get_fn_name(task), clean_args(args))
T1 = threading.Thread(target=handle_que_execution, args=(tasks, child_pipe,))
if __name__ == '__main__':
T1.start()
Related
We have a aiohttp server serving requests at:
from aiohttp import web
app = web.Application()
app.add_routes(
[
web.post("/submit_job", submit_job),
web.get("/get_job/{job_name}", get_job)
]
)
web.run_app(
app, host="127.0.0.1", port=s.kworkers_port, access_log=logger, keepalive_timeout=5,
reuse_address=True, reuse_port=True)
where /submit_job sends a long-running asyncio.Task to the current running event loop:
async def coro():
# Construct a ProcessPoolExecutor object per function run to make sure the resources are cleaned up
# right after the function runs to completion.
with concurrent.futures.ProcessPoolExecutor() as executor:
# Keep a reference to the result to prevent the `run_in_executor` function from
# disappearing midway through running.
result = await asyncio.get_running_loop().run_in_executor(
executor, functools.partial(worker_func, **worker_func_kwargs))
print(f"Got result from running {worker_func.__name__}({worker_func_kwargs}): {result}")
task = asyncio.create_task(coro())
self.background_tasks.add(task)
# To prevent keeping references to finished tasks forever, make each task remove its own reference
# from the set after completion.
task.add_done_callback(self.background_tasks.discard)
where worker_func is a blocking CPU-intensive function.
After a /submit_job call, a separate process polls on /get_job/{job_name} to retrieve the status of the task.
This setup works only when there is no load on the system. As soon as some sort of load is incurred, no matter how light, all /get_job/{job_name} requests hang.
What's wrong with aiohttp+asyncio in this code?
Question
I've read up some on accessing status from a Celery worker from a Flask application, like in this tutorial, but can you go the other way? Send an interrupt or get introspection into a Celery worker after it's been started?
I've read a bit about signals, but either don't understand them yet or it's not what I'm looking for. Possibly both.
Background
I'm using Celery to kick off a long-running loop that subscribes to an MQTT topic, I'd like to be able to also shut down that process/subscription from another endpoint in my Flask app. What's the best way to do this? Or a way?
Example Code
from flask import Flask
from celery import Celery
import time
app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost:6379/0'
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
#celery.task(bind=True)
def test_loop(self):
i=0
running = True
while running:
i = i+1
print "loop running %d" % i
time.sleep(1)
#app.route('/')
def index():
return 'index page'
#app.route('/start')
def start():
global task
task = test_loop.delay()
return "started loop"
#app.route('/stop')
def stop():
global task ### What I'm having trouble with
task.running = False ### How can I interrupt/introspect into the task?
return "stopped loop"
TL/DR
Is there a way to send an interrupt or get introspection into a Celery worker after it's been started? How can I stop a long-running loop started in a Celery Worker from Flask?
My personal thoughts behind this would be to stay away from tasks that run forever.
If you absolutely must abort a task then you can use revoke.
http://docs.celeryproject.org/en/latest/userguide/workers.html#revoke-revoking-tasks
#app.route('/stop')
def stop():
global task
task.revoke(terminate=True, signal='SIGKILL')
return "stopped loop"
Celery may be overkill for your use case but I'm not totally sure what your end goal is so I can't really offer any alternatives.
I have a multiproccessing tornado web server and I want to create another process that will do some things in the background.
I have a server with to following code
start_background_process
app = Application([<someurls>])
server = HTTPServer(app)
server.bind(8888)
server.start(4) # Forks multiple sub-processes
IOLoop.current().start()
def start_background_process():
process = multiprocessing.Process(target=somefunc)
process.start()
and everything is working great.
However when I try to close the server (by crtl c or send signal)
I get AssertionError: can only join a child process
I understood the cause of this problem:
when I create a process with multiprocess a call for the process join method
is registered in "atexit" and because tornado does a simple fork all its childs also call the join method of the process I created and the can't since the process is their brother and not their son?
So how can I open a process normally in tornado?
"HTTPTserver start" uses os.fork to fork the 4 sub-processes as it can be seen in its source code.
If you want your method to be executed by all the 4 sub-processes, you have to call it after the processes have been forked.
Having that in mind your code can be changed to look as below:
import multiprocessing
import tornado.web
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
# A simple external handler as an example for completion
from handlers.index import IndexHandler
def method_on_sub_process():
print("Executing in sub-process")
def start_background_process():
process = multiprocessing.Process(target=method_on_sub_process)
process.start()
def main():
app = tornado.web.Application([(r"/", IndexHandler)])
server = HTTPServer(app)
server.bind(8888)
server.start(4)
start_background_process()
IOLoop.current().start()
if __name__ == "__main__":
main()
Furthermore to keep the behavior of your program clean during any keyboard interruption , surround the instantiation of the sever by a try...except clause as below:
def main():
try:
app = tornado.web.Application([(r"/", IndexHandler)])
server = HTTPServer(app)
server.bind(8888)
server.start(4)
start_background_process()
IOLoop.current().start()
except KeyboardInterrupt:
IOLoop.instance().stop()
I've built a restful service in python using cherrypy, which is multi-thread by default. Thus two different http sessions don't block each other.
For a given endpoint of my API i need a way to start a long (non blocking) background task which i can stop at any time. Actually i am using a new thread to run the task which allows the user to send other requests to the server without wait for the long task to complete. Unfortunately i need also a way to stop the background task at any time, and it seems i can't stop the new thread from the main thread (am i correct?).
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
thread = Thread(target=longRunningTask, args=())
thread.start()
return {"message" : "Long task started"}
I've tried multiprocessing Process instead of thread, but this seems to block the main thread (client can't get any response from the server until the background task is completed):
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
process = multiprocessing.Process(target=longRunningTask, args=())
process.start()
return {"message" : "Long task started"}
How can i start a long background task which does not block the main thread (for each http session) and which the server can stop at any moment?
import threading
import Queue
import urllib2
import time
class ThreadURL(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host = self.queue.get()
sock = urllib2.urlopen(host)
data = sock.read()
self.queue.task_done()
hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()
def main():
queue = Queue.Queue()
for i in range(len(hosts)):
t = ThreadURL(queue)
t.start()
for host in hosts:
queue.put(host)
queue.join()
if __name__ == '__main__':
main()
print 'Elapsed time: {0}'.format(time.time() - start)
I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.
What it's supposed to do is:
Initialiase the queue
Create my Thread pool and then queue up the list of hosts
My ThreadURL class should then begin work once a host is in the queue and read the website data
The program should finish
What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?
Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?
Your code looks fine and is quite clean.
The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.
The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.
looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.
nothing wrong with it, just wondering.
One thing, in the thread run function, the while True loop, if some exception happened, the task_done() may not be called however the get() has already been called. Thus the queue.join() may never end.