python cherrypy: how to start long background (stoppable) task - python

I've built a restful service in python using cherrypy, which is multi-thread by default. Thus two different http sessions don't block each other.
For a given endpoint of my API i need a way to start a long (non blocking) background task which i can stop at any time. Actually i am using a new thread to run the task which allows the user to send other requests to the server without wait for the long task to complete. Unfortunately i need also a way to stop the background task at any time, and it seems i can't stop the new thread from the main thread (am i correct?).
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
thread = Thread(target=longRunningTask, args=())
thread.start()
return {"message" : "Long task started"}
I've tried multiprocessing Process instead of thread, but this seems to block the main thread (client can't get any response from the server until the background task is completed):
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
process = multiprocessing.Process(target=longRunningTask, args=())
process.start()
return {"message" : "Long task started"}
How can i start a long background task which does not block the main thread (for each http session) and which the server can stop at any moment?

Related

aiohttp and asyncio concurrency

We have a aiohttp server serving requests at:
from aiohttp import web
app = web.Application()
app.add_routes(
[
web.post("/submit_job", submit_job),
web.get("/get_job/{job_name}", get_job)
]
)
web.run_app(
app, host="127.0.0.1", port=s.kworkers_port, access_log=logger, keepalive_timeout=5,
reuse_address=True, reuse_port=True)
where /submit_job sends a long-running asyncio.Task to the current running event loop:
async def coro():
# Construct a ProcessPoolExecutor object per function run to make sure the resources are cleaned up
# right after the function runs to completion.
with concurrent.futures.ProcessPoolExecutor() as executor:
# Keep a reference to the result to prevent the `run_in_executor` function from
# disappearing midway through running.
result = await asyncio.get_running_loop().run_in_executor(
executor, functools.partial(worker_func, **worker_func_kwargs))
print(f"Got result from running {worker_func.__name__}({worker_func_kwargs}): {result}")
task = asyncio.create_task(coro())
self.background_tasks.add(task)
# To prevent keeping references to finished tasks forever, make each task remove its own reference
# from the set after completion.
task.add_done_callback(self.background_tasks.discard)
where worker_func is a blocking CPU-intensive function.
After a /submit_job call, a separate process polls on /get_job/{job_name} to retrieve the status of the task.
This setup works only when there is no load on the system. As soon as some sort of load is incurred, no matter how light, all /get_job/{job_name} requests hang.
What's wrong with aiohttp+asyncio in this code?

Multithreading/Multiprocessing in Python 3.4 web app for Azure

I'm working on writing a Python web app with Flask using Azure to host. I need to do some background work. When I test the app locally, everything works great. However, as soon as I push the update to Azure, it stops functioning. Right now, I have a multithreading.Process set up, and based on the log files, Azure isn't starting another process. Here is the relevant parts of my code:
#task queue and comm pipes
tasks = Queue()
parent_pipe, child_pipe = Pipe()
def handle_queue_execution(tasks, pipe):
logging.info("starting task queue handler")
while True:
if pipe.recv():
logging.debug("preparing to get task from queue")
task = tasks.get()
args = tasks.get()
logging.debug("executing task %s(%s)", get_fn_name(task), clean_args(args))
task(args)
logging.debug("task %s(%s) executed successfully", get_fn_name(task), clean_args(args))
queue_handler = Process(target=handle_queue_execution, args=(tasks, child_pipe,))
queue_handler.daemon = True
if __name__ == '__main__':
queue_handler.start()
There are a few semi-related questions I have on this:
1) Why won't Azure start another process?
You'll note that the handle_queue_execution function begins with a logger call. That message doesn't appear in the log file when hosted on Azure, nor do the queued tasks appear to execute. Again, both aspects of this work as expected when running on localhost.
2) Is there a better way?
I'm fairly new to both Python and Azure, so if there's a better way to do this type of task handling, I'm open to hear about it. I've looked into using something like Celery, but I can't figure out how to set it up, and I'd prefer to make my own implementation as I'm learning these new skills.
Thanks very much.
Python has multiple other ways to start new processes. Threading would most likely be the easiest here.
#task queue and comm pipes
import threading
tasks = Queue()
parent_pipe, child_pipe = Pipe()
def handle_queue_execution(tasks, pipe):
logging.info("starting task queue handler")
while True:
if pipe.recv():
logging.debug("preparing to get task from queue")
task = tasks.get()
args = tasks.get()
logging.debug("executing task %s(%s)", get_fn_name(task), clean_args(args))
task(args)
logging.debug("task %s(%s) executed successfully", get_fn_name(task), clean_args(args))
T1 = threading.Thread(target=handle_que_execution, args=(tasks, child_pipe,))
if __name__ == '__main__':
T1.start()

RuntimeError: working outside of request context

I am trying to create a 'keepalive' websocket thread to send an emit every 10 seconds to the browser once someone connects to the page, but I'm getting an error and am not sure how to get around it.
Any ideas on how to make this work?
And how would I kill this thread once a 'disconnect' is sent?
Thanks!
#socketio.on('connect', namespace='/endpoint')
def test_connect():
emit('my response', {'data': '<br>Client thinks i\'m connected'})
def background_thread():
"""Example of how to send server generated events to clients."""
count = 0
while True:
time.sleep(10)
count += 1
emit('my response', {'data': 'websocket is keeping alive'}, namespace='/endpoint')
global thread
if thread is None:
thread = Thread(target=background_thread)
thread.start()
You wrote your background thread in a way that requires it to know who's the client, since you are sending a direct message to it. For that reason the background thread needs to have access to the request context. In Flask you can install a copy of the current request context in a thread using the copy_current_request_context decorator:
#copy_current_request_context
def background_thread():
"""Example of how to send server generated events to clients."""
count = 0
while True:
time.sleep(10)
count += 1
emit('my response', {'data': 'websocket is keeping alive'}, namespace='/endpoint')
Couple of notes:
It is not necessary to set the namespace when you are sending back to the client, by default the emit call will be on the same namespace used by the client. The namespace needs to be specified when you broadcast or send messages outside of a request context.
Keep in mind your design will require a separate thread for each client that connects. It would be more efficient to have a single background thread that broadcasts to all clients. See the example application that I have on the Github repository for an example: https://github.com/miguelgrinberg/Flask-SocketIO/tree/master/example
To stop the thread when the client disconnects you can use any multi-threading mechanism to let the thread know it needs to exit. This can be, for example, a global variable that you set on the disconnect event. A not so great alternative that is easy to implement is to wait for the emit to raise an exception when the client went away and use that to exit the thread.

Consuming rabbitmq queue from inside python threads

This is a long one.
I have a list of usernames and passwords. For each one I want to login to the accounts and do something things. I want to use several machines to do this faster. The way I was thinking of doing this is have a main machine whose job is just having a cron which from time to time checks if the rabbitmq queue is empty. If it is, read the list of usernames and passwords from a file and send it to the rabbitmq queue. Then have a bunch of machines which are subscribed to that queue whose job is receiving a user/pass, do stuff on it, acknowledge it, and move on to the next one, until the queue is empty and then the main machine fills it up again. So far I think I have everything down.
Now comes my problem. I have checked that the things to be done with each user/passes aren't so intensive and so I could have each machine doing three of them simultaneously using python's threading. In fact for a single machine I have implemented this where I load the user/passes into a python Queue() and then have three threads consume that Queue(). Now I want to do something similar, but instead of consuming from a python Queue(), each thread of each machine should consume from a rabbitmq queue. This is where I'm stuck. To run tests I started by using rabbitmq's tutorial.
send.py:
import pika, sys
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
message = ' '.join(sys.argv[1:])
channel.basic_publish(exchange='',
routing_key='hello',
body=message)
connection.close()
worker.py
import time, pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='hello', no_ack=False)
channel.start_consuming()
For the above you can run two worker.py which will subscribe to the rabbitmq queue and consume as expected.
My threading without rabbitmq is something like this:
runit.py
class Threaded_do_stuff(threading.Thread):
def __init__(self, user_queue):
threading.Thread.__init__(self)
self.user_queue = user_queue
def run(self):
while True:
login = self.user_queue.get()
do_stuff(user=login[0], pass=login[1])
self.user_queue.task_done()
user_queue = Queue.Queue()
for i in range(3):
td = Threaded_do_stuff(user_queue)
td.setDaemon(True)
td.start()
## fill up the queue
for user in list_users:
user_queue.put(user)
## go!
user_queue.join()
This also works as expected: you fill up the queue and have 3 threads subscribe to it. Now what I want to do is something like runit.py but instead of using a python Queue(), using something like worker.py where the queue is actually a rabbitmq queue.
Here's something which I tried and didn't work (and I don't understand why)
rabbitmq_runit.py
import time, threading, pika
class Threaded_worker(threading.Thread):
def callback(self, ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
def __init__(self):
threading.Thread.__init__(self)
self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='hello')
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(self.callback, queue='hello')
def run(self):
print 'start consuming'
self.channel.start_consuming()
for _ in range(3):
print 'launch thread'
td = Threaded_worker()
td.setDaemon(True)
td.start()
I would expect that this launches three threads each of which is blocked by .start_consuming() which just stays there waiting for the rabbitmq queue to send them sometihing. Instead, this program starts, does some prints, and exits. The pattern of the exists is weird too:
launch thread
launch thread
start consuming
launch thread
start consuming
In particular notice there is one "start consuming" missing.
What's going on?
EDIT: One answer I found to a similar question is here
Consuming a rabbitmq message queue with multiple threads (Python Kombu)
and the answer is to "use celery", whatever that means. I don't buy it, I shouldn't need anything remotely as sophisticated as celery. In particular, I'm not trying to set up an RPC and I don't need to read replies from the do_stuff routines.
EDIT 2: The print pattern that I expected would be the following. I do
python send.py first message......
python send.py second message.
python send.py third message.
python send.py fourth message.
and the print pattern would be
launch thread
start consuming
[x] received 'first message......'
launch thread
start consuming
[x] received 'second message.'
launch thread
start consuming
[x] received 'third message.'
[x] received 'fourth message.'
The problem is that you're making the thread daemonic:
td = Threaded_worker()
td.setDaemon(True) # Shouldn't do that.
td.start()
Daemonic threads will be terminated as soon as the main thread exits:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property.
Leave out setDaemon(True) and you should see it behave the way you expect.
Also, the pika FAQ has a note about how to use it with threads:
Pika does not have any notion of threading in the code. If you want to
use Pika with threading, make sure you have a Pika connection per
thread, created in that thread. It is not safe to share one Pika
connection across threads.
This suggests you should move everything you're doing in __init__() into run(), so that the connection is created in the same thread you're actually consuming from the queue in.

Dbus/GLib Main Loop, Background Thread

I'm starting out with DBus and event driven programming in general. The service that I'm trying to create really consists of three parts but two are really "server" things.
1) The actual DBus server talks to a remote website over HTTPS, manages sessions, and conveys info the clients.
2) The other part of the service calls a keep alive page every 2 minutes to keep the session active on the external website
3) The clients make calls to the service to retrieve info from the service.
I found some simple example programs. I'm trying to adapt them to prototype #1 and #2. Rather than building separate programs for both. I thought I that I can run them in a single, two threaded process.
The problem that I'm seeing is that I call time.sleep(X) in my keep alive thread. The thread goes to sleep, but won't ever wake up. I think that the GIL isn't released by the GLib main loop.
Here's my thread code:
class Keepalive(threading.Thread):
def __init__(self, interval=60):
super(Keepalive, self).__init__()
self.interval = interval
bus = dbus.SessionBus()
self.remote = bus.get_object("com.example.SampleService", "/SomeObject")
def run(self):
while True:
print('sleep %i' % self.interval)
time.sleep(self.interval)
print('sleep done')
reply_status = self.remote.keepalive()
if reply_status:
print('Keepalive: Success')
else:
print('Keepalive: Failure')
From the print statements, I know that the sleep starts, but I never see "sleep done."
Here is the main code:
if __name__ == '__main__':
try:
dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
session_bus = dbus.SessionBus()
name = dbus.service.BusName("com.example.SampleService", session_bus)
object = SomeObject(session_bus, '/SomeObject')
mainloop = gobject.MainLoop()
ka = Keepalive(15)
ka.start()
print('Begin main loop')
mainloop.run()
except Exception as e:
print(e)
finally:
ka.join()
Some other observations:
I see the "begin main loop" message, so I know it's getting control. Then, I see "sleep %i," and after that, nothing.
If I ^C, then I see "sleep done." After ~20 seconds, I get an exception from self.run() that the remote application didn't respond:
DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
What's the best way to run my keep alive code within the server?
Thanks,
You have to explicitly enable multithreading when using gobject by calling gobject.threads_init(). See the PyGTK FAQ for background info.
Next to that, for the purpose you're describing, timeouts seem to be a better fit. Use as follows:
# Enable timer
self.timer = gobject.timeout_add(time_in_ms, self.remote.keepalive)
# Disable timer
gobject.source_remove(self.timer)
This calls the keepalive function every time_in_ms (milli)seconds. Further details, again, can be found at the PyGTK reference.

Categories

Resources