The request handlers are as follows:
class TestHandler(tornado.web.RequestHandler): # localhost:8888/test
#tornado.web.asynchronous
def get(self):
t = threading.Thread(target = self.newThread)
t.start()
def newThread(self):
print "new thread called, sleeping"
time.sleep(10)
self.write("Awake after 10 seconds!")
self.finish()
class IndexHandler(tornado.web.RequestHandler): # localhost:8888/
def get(self):
self.write("It is not blocked!")
self.finish()
When I GET localhost:8888/test, the page loads 10 seconds and shows Awake after 10 seconds; while it is loading, if I open localhost:8888/index in a new browser tab, the new index page is not blocked and loaded instantly. These fit my expectation.
However, while the /test is loading, if I open another /test in a new browser tab, it is blocked. The second /test only starts processing after the first has finished.
What mistakes have I made here?
What you are seeing is actually a browser limitation, not an issue with your code. I added some extra logging to your TestHandler to make this clear:
class TestHandler(tornado.web.RequestHandler): # localhost:8888/test
#tornado.web.asynchronous
def get(self):
print "Thread starting %s" % time.time()
t = threading.Thread(target = self.newThread)
t.start()
def newThread(self):
print "new thread called, sleeping %s" % time.time()
time.sleep(10)
self.write("Awake after 10 seconds!" % time.time())
self.finish()
If I open two curl sessions to localhost/test simultaneously, I get this on the server side:
Thread starting 1402236952.17
new thread called, sleeping 1402236952.17
Thread starting 1402236953.21
new thread called, sleeping 1402236953.21
And this on the client side:
Awake after 10 seconds! 1402236962.18
Awake after 10 seconds! 1402236963.22
Which is exactly what you expect. However in Chromium, I get the same behavior as you. I think that Chromium (perhaps all browsers) will only allow one connection at a time to be opened to the same URL. I confirmed this by making IndexHandler run the same code as TestHandler, except with slightly different log messages. Here's the output when opening two browser windows, one to /test, and one to /index:
index Thread starting 1402237590.03
index new thread called, sleeping 1402237590.03
Thread starting 1402237592.19
new thread called, sleeping 1402237592.19
As you can see both ran concurrently without issue.
I think you picked the "wrong" test for checking parallel GET requests, that's because you're using a blocking function for your test: time.sleep(), which its behavior doesn't really occur when you simply render an HTML page ...
What happens is, that the def get() ( which handle all GET requests ) is actually being blocked when you use time.sleep it cannot process any new GET requests, puts them in some kind of "queue".
So if you really want to test sleep() - use the Tornado non-blocking function: tornado.gen.sleep()
Example:
from tornado import gen
#gen.coroutine
def get(self):
yield self.time_wait()
#gen.coroutine
def time_wait(self):
yield gen.sleep(15)
self.write("done")
Open multiple tabs in your browser, then you'll see that all requests are being processed when they arrive w/o "queueing" the new requests that comes in ..
Related
I am somewhat new to Python. I have looked around but cannot find an answer that fits exactly what I am looking for.
I have a function that makes an HTTP call using the requests package. I'd like to print a '.' to the screen (or any char) say every 10 seconds while the HTTP requests executes, and stop printing when it finishes. So something like:
def make_call:
rsp = requests.Post(url, data=file)
while requests.Post is executing print('.')
Of course the above code is just pseudo code but hopefully illustrates what I am hoping to accomplish.
Every function call from the requests module is blocking, so your program waits until the function returns a value. The simplest solution is to use the built-in threading library which was already suggested. Using this module allows you to use code "parallelism"*. In your example you need one thread for the request which will be blocked until the request finished and the other for printing.
If you want to learn more about more advanced solutions see this answer https://stackoverflow.com/a/14246030/17726897
Here's how you can achieve your desired functionality using the threading module
def print_function(stop_event):
while not stop_event.is_set():
print(".")
sleep(10)
should_stop = threading.Event()
thread = threading.Thread(target=print_function, args=[should_stop])
thread.start() # start the printing before the request
rsp = requests.Post(url, data=file) # start the requests which blocks the main thread but not the printing thread
should_stop.set() # request finished, signal the printing thread to stop
thread.join() # wait for the thread to stop
# handle response
* parallelism is in quotes because of something like the Global Interpreter Lock (GIL). Code statements from different threads aren't executed at the same time.
i don't really getting what you looking for but if you want two things processed at the same time you can use multithreading module
Example:
import threading
import requests
from time import sleep
def make_Request():
while True:
req = requests.post(url, data=file)
sleep(10)
makeRequestThread = threading.Thread(target=make_Request)
makeRequestThread.start()
while True:
print("I used multi threading to do two tasks at a same time")
sleep(10)
or you can use very simple schedule module to schedule your tasks in a easy way
docs: https://schedule.readthedocs.io/en/stable/#when-not-to-use-schedule
import threading
import requests
from time import sleep
#### Function print and stop when answer comes ###
def Print_Every_10_seconds(stop_event):
while not stop_event.is_set():
print(".")
sleep(10)
### Separate flow of execution ###
Stop = threading.Event()
thread = threading.Thread(target=Print_Every_10_seconds, args=[Stop])
### Before the request, the thread starts printing ###
thread.start()
### Blocking of the main thread (the print thread continues) ###
Block_thread_1 = requests.Post(url, data=file)
### Thread stops ###
Stop.set()
thread.join()
The below code also solves the problem asked. It will print "POST Data.." and additional trailing '.' every second until the HTTP POST returns.
import concurrent.futures as fp
import logging
with fp.ThreadPoolExecutor(max_workers=1) as executor:
post = executor.submit(requests.post, url, data=fileobj, timeout=20)
logging.StreamHandler.terminator = ''
logging.info("POST Data..")
while (post.running()):
print('.', end='', flush=True)
sleep(1)
print('')
logging.StreamHandler.terminator = '\n'
http_response = post.result()
In my Django application, I want to do some work in background when a certain view is requested. To that end, I created a multiprocessing.dummy.Pool of workers, and whenever that URL is called, I start a new process on it. The task to be executed in background can have to do some retries with a certain timeout between them.
Since this whole thing is executed, so to speak, not on a UI thread, I thought I'd use sleep for timeouts. When I unittest this arrangement, everything works fine, but when this runs in Django, the thread gets to the sleep statement and then never wakes up, but when I restart the Django app, the thread gets past the sleep statement and then is immediately killed by the restart. I know I could schedule retries using Timers, but I wanted a simpler solution.
Here's a simplified version of my code:
from multiprocessing.dummy import Pool
POOL = Pool(settings.POOL_WORKERS)
def background_task(arg):
refresh = True
try:
for i in range(settings.GET_RETRY_LIMIT):
status, result = (arg, refresh=refresh)
refresh = False
if status is Statuses.OK:
return result
if i < settings.GET_RETRY_LIMIT - 1:
sleep(settings.GET_SLEEP_TIME)
except Exception as e:
logging.error(e)
return []
def do_background_work(arg):
POOL.apply_async(
background_task,
(arg)
)
def my_view(request):
arg = get_arg_from_request(request)
do_background_work(arg)
return Response("Ok")
UPD: By the way, turns out that the workers are most probably killed by Harakiri
Now I wrote ferver by this tutorial:
https://twistedmatrix.com/documents/14.0.0/web/howto/web-in-60/asynchronous-deferred.html
But it seems to be good only for delayng process, not actually concurently process 2 or more requests. My full code is:
from twisted.internet.task import deferLater
from twisted.web.resource import Resource
from twisted.web.server import Site, NOT_DONE_YET
from twisted.internet import reactor, threads
from time import sleep
class DelayedResource(Resource):
def _delayedRender(self, request):
print 'Sorry to keep you waiting.'
request.write("<html><body>Sorry to keep you waiting.</body></html>")
request.finish()
def make_delay(self, request):
print 'Sleeping'
sleep(5)
return request
def render_GET(self, request):
d = threads.deferToThread(self.make_delay, request)
d.addCallback(self._delayedRender)
return NOT_DONE_YET
def main():
root = Resource()
root.putChild("social", DelayedResource())
factory = Site(root)
reactor.listenTCP(8880, factory)
print 'started httpserver...'
reactor.run()
if __name__ == '__main__':
main()
But when I passing 2 requests console output is like:
Sleeping
Sorry to keep you waiting.
Sleeping
Sorry to keep you waiting.
But if it was concurrent it should be like:
Sleeping
Sleeping
Sorry to keep you waiting.
Sorry to keep you waiting.
So the question is how to make twisted not to wait until response is finished before processing next?
Also make_delayIRL is a large function with heavi logic. Basically I spawn lot of threads and make requests to other urls and collecting results intro response, so it can take some time and not easly to be ported
Twisted processes everything in one event loop. If somethings blocks the execution, it also blocks Twisted. So you have to prevent blocking calls.
In your case you have time.sleep(5). It is blocking. You found the better way to do it in Twisted already: deferLater(). It returns a Deferred that will continue execution after the given time and release the events loop so other things can be done meanwhile. In general all things that return a deferred are good.
If you have to do heavy work that for some reason can not be deferred, you should use deferToThread() to execute this work in a thread. See https://twistedmatrix.com/documents/15.5.0/core/howto/threading.html for details.
You can use greenlents in your code (like threads).
You need to install the geventreactor - https://gist.github.com/yann2192/3394661
And use reactor.deferToGreenlet()
Also
In your long-calculation code need to call gevent.sleep() for change context to another greenlet.
msecs = 5 * 1000
timeout = 100
for xrange(0, msecs, timeout):
sleep(timeout)
gevent.sleep()
This is a long one.
I have a list of usernames and passwords. For each one I want to login to the accounts and do something things. I want to use several machines to do this faster. The way I was thinking of doing this is have a main machine whose job is just having a cron which from time to time checks if the rabbitmq queue is empty. If it is, read the list of usernames and passwords from a file and send it to the rabbitmq queue. Then have a bunch of machines which are subscribed to that queue whose job is receiving a user/pass, do stuff on it, acknowledge it, and move on to the next one, until the queue is empty and then the main machine fills it up again. So far I think I have everything down.
Now comes my problem. I have checked that the things to be done with each user/passes aren't so intensive and so I could have each machine doing three of them simultaneously using python's threading. In fact for a single machine I have implemented this where I load the user/passes into a python Queue() and then have three threads consume that Queue(). Now I want to do something similar, but instead of consuming from a python Queue(), each thread of each machine should consume from a rabbitmq queue. This is where I'm stuck. To run tests I started by using rabbitmq's tutorial.
send.py:
import pika, sys
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
message = ' '.join(sys.argv[1:])
channel.basic_publish(exchange='',
routing_key='hello',
body=message)
connection.close()
worker.py
import time, pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='hello', no_ack=False)
channel.start_consuming()
For the above you can run two worker.py which will subscribe to the rabbitmq queue and consume as expected.
My threading without rabbitmq is something like this:
runit.py
class Threaded_do_stuff(threading.Thread):
def __init__(self, user_queue):
threading.Thread.__init__(self)
self.user_queue = user_queue
def run(self):
while True:
login = self.user_queue.get()
do_stuff(user=login[0], pass=login[1])
self.user_queue.task_done()
user_queue = Queue.Queue()
for i in range(3):
td = Threaded_do_stuff(user_queue)
td.setDaemon(True)
td.start()
## fill up the queue
for user in list_users:
user_queue.put(user)
## go!
user_queue.join()
This also works as expected: you fill up the queue and have 3 threads subscribe to it. Now what I want to do is something like runit.py but instead of using a python Queue(), using something like worker.py where the queue is actually a rabbitmq queue.
Here's something which I tried and didn't work (and I don't understand why)
rabbitmq_runit.py
import time, threading, pika
class Threaded_worker(threading.Thread):
def callback(self, ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
def __init__(self):
threading.Thread.__init__(self)
self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='hello')
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(self.callback, queue='hello')
def run(self):
print 'start consuming'
self.channel.start_consuming()
for _ in range(3):
print 'launch thread'
td = Threaded_worker()
td.setDaemon(True)
td.start()
I would expect that this launches three threads each of which is blocked by .start_consuming() which just stays there waiting for the rabbitmq queue to send them sometihing. Instead, this program starts, does some prints, and exits. The pattern of the exists is weird too:
launch thread
launch thread
start consuming
launch thread
start consuming
In particular notice there is one "start consuming" missing.
What's going on?
EDIT: One answer I found to a similar question is here
Consuming a rabbitmq message queue with multiple threads (Python Kombu)
and the answer is to "use celery", whatever that means. I don't buy it, I shouldn't need anything remotely as sophisticated as celery. In particular, I'm not trying to set up an RPC and I don't need to read replies from the do_stuff routines.
EDIT 2: The print pattern that I expected would be the following. I do
python send.py first message......
python send.py second message.
python send.py third message.
python send.py fourth message.
and the print pattern would be
launch thread
start consuming
[x] received 'first message......'
launch thread
start consuming
[x] received 'second message.'
launch thread
start consuming
[x] received 'third message.'
[x] received 'fourth message.'
The problem is that you're making the thread daemonic:
td = Threaded_worker()
td.setDaemon(True) # Shouldn't do that.
td.start()
Daemonic threads will be terminated as soon as the main thread exits:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property.
Leave out setDaemon(True) and you should see it behave the way you expect.
Also, the pika FAQ has a note about how to use it with threads:
Pika does not have any notion of threading in the code. If you want to
use Pika with threading, make sure you have a Pika connection per
thread, created in that thread. It is not safe to share one Pika
connection across threads.
This suggests you should move everything you're doing in __init__() into run(), so that the connection is created in the same thread you're actually consuming from the queue in.
I'm starting out with DBus and event driven programming in general. The service that I'm trying to create really consists of three parts but two are really "server" things.
1) The actual DBus server talks to a remote website over HTTPS, manages sessions, and conveys info the clients.
2) The other part of the service calls a keep alive page every 2 minutes to keep the session active on the external website
3) The clients make calls to the service to retrieve info from the service.
I found some simple example programs. I'm trying to adapt them to prototype #1 and #2. Rather than building separate programs for both. I thought I that I can run them in a single, two threaded process.
The problem that I'm seeing is that I call time.sleep(X) in my keep alive thread. The thread goes to sleep, but won't ever wake up. I think that the GIL isn't released by the GLib main loop.
Here's my thread code:
class Keepalive(threading.Thread):
def __init__(self, interval=60):
super(Keepalive, self).__init__()
self.interval = interval
bus = dbus.SessionBus()
self.remote = bus.get_object("com.example.SampleService", "/SomeObject")
def run(self):
while True:
print('sleep %i' % self.interval)
time.sleep(self.interval)
print('sleep done')
reply_status = self.remote.keepalive()
if reply_status:
print('Keepalive: Success')
else:
print('Keepalive: Failure')
From the print statements, I know that the sleep starts, but I never see "sleep done."
Here is the main code:
if __name__ == '__main__':
try:
dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
session_bus = dbus.SessionBus()
name = dbus.service.BusName("com.example.SampleService", session_bus)
object = SomeObject(session_bus, '/SomeObject')
mainloop = gobject.MainLoop()
ka = Keepalive(15)
ka.start()
print('Begin main loop')
mainloop.run()
except Exception as e:
print(e)
finally:
ka.join()
Some other observations:
I see the "begin main loop" message, so I know it's getting control. Then, I see "sleep %i," and after that, nothing.
If I ^C, then I see "sleep done." After ~20 seconds, I get an exception from self.run() that the remote application didn't respond:
DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
What's the best way to run my keep alive code within the server?
Thanks,
You have to explicitly enable multithreading when using gobject by calling gobject.threads_init(). See the PyGTK FAQ for background info.
Next to that, for the purpose you're describing, timeouts seem to be a better fit. Use as follows:
# Enable timer
self.timer = gobject.timeout_add(time_in_ms, self.remote.keepalive)
# Disable timer
gobject.source_remove(self.timer)
This calls the keepalive function every time_in_ms (milli)seconds. Further details, again, can be found at the PyGTK reference.