This is a long one.
I have a list of usernames and passwords. For each one I want to login to the accounts and do something things. I want to use several machines to do this faster. The way I was thinking of doing this is have a main machine whose job is just having a cron which from time to time checks if the rabbitmq queue is empty. If it is, read the list of usernames and passwords from a file and send it to the rabbitmq queue. Then have a bunch of machines which are subscribed to that queue whose job is receiving a user/pass, do stuff on it, acknowledge it, and move on to the next one, until the queue is empty and then the main machine fills it up again. So far I think I have everything down.
Now comes my problem. I have checked that the things to be done with each user/passes aren't so intensive and so I could have each machine doing three of them simultaneously using python's threading. In fact for a single machine I have implemented this where I load the user/passes into a python Queue() and then have three threads consume that Queue(). Now I want to do something similar, but instead of consuming from a python Queue(), each thread of each machine should consume from a rabbitmq queue. This is where I'm stuck. To run tests I started by using rabbitmq's tutorial.
send.py:
import pika, sys
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
message = ' '.join(sys.argv[1:])
channel.basic_publish(exchange='',
routing_key='hello',
body=message)
connection.close()
worker.py
import time, pika
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
def callback(ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='hello', no_ack=False)
channel.start_consuming()
For the above you can run two worker.py which will subscribe to the rabbitmq queue and consume as expected.
My threading without rabbitmq is something like this:
runit.py
class Threaded_do_stuff(threading.Thread):
def __init__(self, user_queue):
threading.Thread.__init__(self)
self.user_queue = user_queue
def run(self):
while True:
login = self.user_queue.get()
do_stuff(user=login[0], pass=login[1])
self.user_queue.task_done()
user_queue = Queue.Queue()
for i in range(3):
td = Threaded_do_stuff(user_queue)
td.setDaemon(True)
td.start()
## fill up the queue
for user in list_users:
user_queue.put(user)
## go!
user_queue.join()
This also works as expected: you fill up the queue and have 3 threads subscribe to it. Now what I want to do is something like runit.py but instead of using a python Queue(), using something like worker.py where the queue is actually a rabbitmq queue.
Here's something which I tried and didn't work (and I don't understand why)
rabbitmq_runit.py
import time, threading, pika
class Threaded_worker(threading.Thread):
def callback(self, ch, method, properties, body):
print ' [x] received %r' % (body,)
time.sleep( body.count('.') )
ch.basic_ack(delivery_tag = method.delivery_tag)
def __init__(self):
threading.Thread.__init__(self)
self.connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
self.channel = self.connection.channel()
self.channel.queue_declare(queue='hello')
self.channel.basic_qos(prefetch_count=1)
self.channel.basic_consume(self.callback, queue='hello')
def run(self):
print 'start consuming'
self.channel.start_consuming()
for _ in range(3):
print 'launch thread'
td = Threaded_worker()
td.setDaemon(True)
td.start()
I would expect that this launches three threads each of which is blocked by .start_consuming() which just stays there waiting for the rabbitmq queue to send them sometihing. Instead, this program starts, does some prints, and exits. The pattern of the exists is weird too:
launch thread
launch thread
start consuming
launch thread
start consuming
In particular notice there is one "start consuming" missing.
What's going on?
EDIT: One answer I found to a similar question is here
Consuming a rabbitmq message queue with multiple threads (Python Kombu)
and the answer is to "use celery", whatever that means. I don't buy it, I shouldn't need anything remotely as sophisticated as celery. In particular, I'm not trying to set up an RPC and I don't need to read replies from the do_stuff routines.
EDIT 2: The print pattern that I expected would be the following. I do
python send.py first message......
python send.py second message.
python send.py third message.
python send.py fourth message.
and the print pattern would be
launch thread
start consuming
[x] received 'first message......'
launch thread
start consuming
[x] received 'second message.'
launch thread
start consuming
[x] received 'third message.'
[x] received 'fourth message.'
The problem is that you're making the thread daemonic:
td = Threaded_worker()
td.setDaemon(True) # Shouldn't do that.
td.start()
Daemonic threads will be terminated as soon as the main thread exits:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property.
Leave out setDaemon(True) and you should see it behave the way you expect.
Also, the pika FAQ has a note about how to use it with threads:
Pika does not have any notion of threading in the code. If you want to
use Pika with threading, make sure you have a Pika connection per
thread, created in that thread. It is not safe to share one Pika
connection across threads.
This suggests you should move everything you're doing in __init__() into run(), so that the connection is created in the same thread you're actually consuming from the queue in.
Related
I am developing a server (daemon).
The server has one "worker thread". The worker thread runs a queue of commands. When the queue is empty, the worker thread is paused (but does not exit, because it should preserve certain state in memory). To have exactly one copy of the state in memory, I need to run all time exactly one (not several and not zero) worker thread.
Requests are added to the end of this queue when a client connects to a Unix socket and sends a command.
After the command is issued, it is added to the queue of commands of the worker thread. After it is added to the queue, the server replies something like "OK". There should be not a long pause between server receiving a command and it "OK" reply. However, running commands in the queue may take some time.
The main "work" of the worker thread is split into small (taking relatively little time) chunks. Between chunks, the worker thread inspects ("eats" and empties) the queue and continues to work based on the data extracted from the queue.
How to implement this server/daemon in Python?
This is a sample code with internet sockets, easily replaced with unix domain sockets. It takes whatever you write to the socket, passes it as a "command" to worker, responds OK as soon as it has queued the command. The single worker simulates a lengthy task with sleep(30). You can queue as many tasks as you want, receive OK immediately and every 30 seconds, your worker prints a command from the queue.
import Queue, threading, socket
from time import sleep
class worker(threading.Thread):
def __init__(self,q):
super(worker,self).__init__()
self.qu = q
def run(self):
while True:
new_task=self.qu.get(True)
print new_task
i=0
while i < 10:
print "working ..."
sleep(1)
i += 1
try:
another_task=self.qu.get(False)
print another_task
except Queue.Empty:
pass
task_queue = Queue.Queue()
w = worker(task_queue)
w.daemon = True
w.start()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', 4200))
sock.listen(1)
try:
while True:
conn, addr = sock.accept()
data = conn.recv(32)
task_queue.put(data)
conn.sendall("OK")
conn.close()
except:
sock.close()
I have this piece of code, basically it run channel.start_consuming().
I want it to stop after a while.
I think that channel.stop_consuming() is the right method:
def stop_consuming(self, consumer_tag=None):
""" Cancels all consumers, signalling the `start_consuming` loop to
exit.
But it doesn't work: start_consuming() never ends (execution doesn't exit from this call, "end" is never printed).
import unittest
import pika
import threading
import time
_url = "amqp://user:password#xxx.rabbitserver.com/aaa"
class Consumer_test(unittest.TestCase):
def test_startConsuming(self):
def callback(channel, method, properties, body):
print("callback")
print(body)
def connectionTimeoutCallback():
print("connecionClosedCallback")
def _closeChannel(channel_):
print("_closeChannel")
time.sleep(1)
print("close")
if channel_.is_open:
channel_.stop_consuming()
print("stop_cosuming")
else:
print("channel is closed")
#channel_.close()
params = pika.URLParameters(_url)
params.socket_timeout = 5
connection = pika.BlockingConnection(params)
#connection.add_timeout(2, connectionTimeoutCallback)
channel = connection.channel()
channel.basic_consume(callback,
queue='test',
no_ack=True)
t = threading.Thread(target=_closeChannel, args=[channel])
t.start()
print("start_consuming")
channel.start_consuming() # start consuming (loop never ends)
connection.close()
print("end")
connection.add_timeout solve my problem, maybe call basic_cancel too, but I want to use the right method.
Thanks
Note:
I can't respond or add comment to this (pika, stop_consuming does not work) due to my low reputation points.
Note 2:
I think that I'm not sharing channel or connection across threads (Pika doesn't support this) because I use "channel_" passed as parameter and not "channel" instance of the class (Am I wrong?).
I was having the same problem; as pika is not thread safe. i.e. connections and channels can't be safely shared across threads.
So I used a separate connection to send a shutdown message; then stopped consuming the original channel from the callback function.
I want to consume a queue (RabbitMQ) synchronously with blocking.
Note: below is full code ready to be run.
The system set up is using RabbitMQ as it's queuing system, but asynchronous consumption is not needed in one of our modules.
I've tried using basic_get on top of a BlockingConnection, which doesn't block (returns (None, None, None) immediately):
# declare queue
get_connection().channel().queue_declare(TEST_QUEUE)
def blocking_get_1():
channel = get_connection().channel()
# get from an empty queue (prints immediately)
print channel.basic_get(TEST_QUEUE)
I've also tried to use the consume generator, fails with "Connection Closed" after a long time of not consuming.
def blocking_get_2():
channel = get_connection().channel()
# put messages in TEST_QUEUE
for i in range(4):
channel.basic_publish(
'',
TEST_QUEUE,
'body %d' % i
)
consume_generator = channel.consume(TEST_QUEUE)
print next(consume_generator)
time.sleep(14400)
print next(consume_generator)
Is there a way to use RabbitMQ using the pika client as I would a Queue.Queue in python? or anything similar?
My option at the moment is busy-wait (using basic_get) - but I rather use the existing system to not busy-wait, if possible.
Full code:
#!/usr/bin/env python
import pika
import time
TEST_QUEUE = 'test'
def get_connection():
# define connection
connection = pika.BlockingConnection(
pika.ConnectionParameters(
host=YOUR_IP,
port=YOUR_PORT,
credentials=pika.PlainCredentials(
username=YOUR_USER,
password=YOUR_PASSWORD,
)
)
)
return connection
# declare queue
get_connection().channel().queue_declare(TEST_QUEUE)
def blocking_get_1():
channel = get_connection().channel()
# get from an empty queue (prints immediately)
print channel.basic_get(TEST_QUEUE)
def blocking_get_2():
channel = get_connection().channel()
# put messages in TEST_QUEUE
for i in range(4):
channel.basic_publish(
'',
TEST_QUEUE,
'body %d' % i
)
consume_generator = channel.consume(TEST_QUEUE)
print next(consume_generator)
time.sleep(14400)
print next(consume_generator)
print "blocking_get_1"
blocking_get_1()
print "blocking_get_2"
blocking_get_2()
get_connection().channel().queue_delete(TEST_QUEUE)
A common problem with Pika is that it is currently not handling incoming events in the background. This basically means that in many scenarios you will need to call connection.process_data_events() periodically to ensure that it does not miss heartbeats.
This also means that if you sleep for a extended period of time, pika will not be handling incoming data, and eventually die as it is not responding to heartbeats. An option here is to disable heartbeats.
I usually solve this by having a thread in the background check for new events, as seen in this example.
If you want to block completely I would do something like this (based on my own library AMQPStorm).
while True:
result = channel.basic.get(queue='simple_queue', no_ack=False)
if result:
print("Message:", message.body)
message.ack()
else:
print("Channel Empty.")
sleep(1)
This is based on the example found here.
I'm starting out with DBus and event driven programming in general. The service that I'm trying to create really consists of three parts but two are really "server" things.
1) The actual DBus server talks to a remote website over HTTPS, manages sessions, and conveys info the clients.
2) The other part of the service calls a keep alive page every 2 minutes to keep the session active on the external website
3) The clients make calls to the service to retrieve info from the service.
I found some simple example programs. I'm trying to adapt them to prototype #1 and #2. Rather than building separate programs for both. I thought I that I can run them in a single, two threaded process.
The problem that I'm seeing is that I call time.sleep(X) in my keep alive thread. The thread goes to sleep, but won't ever wake up. I think that the GIL isn't released by the GLib main loop.
Here's my thread code:
class Keepalive(threading.Thread):
def __init__(self, interval=60):
super(Keepalive, self).__init__()
self.interval = interval
bus = dbus.SessionBus()
self.remote = bus.get_object("com.example.SampleService", "/SomeObject")
def run(self):
while True:
print('sleep %i' % self.interval)
time.sleep(self.interval)
print('sleep done')
reply_status = self.remote.keepalive()
if reply_status:
print('Keepalive: Success')
else:
print('Keepalive: Failure')
From the print statements, I know that the sleep starts, but I never see "sleep done."
Here is the main code:
if __name__ == '__main__':
try:
dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
session_bus = dbus.SessionBus()
name = dbus.service.BusName("com.example.SampleService", session_bus)
object = SomeObject(session_bus, '/SomeObject')
mainloop = gobject.MainLoop()
ka = Keepalive(15)
ka.start()
print('Begin main loop')
mainloop.run()
except Exception as e:
print(e)
finally:
ka.join()
Some other observations:
I see the "begin main loop" message, so I know it's getting control. Then, I see "sleep %i," and after that, nothing.
If I ^C, then I see "sleep done." After ~20 seconds, I get an exception from self.run() that the remote application didn't respond:
DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
What's the best way to run my keep alive code within the server?
Thanks,
You have to explicitly enable multithreading when using gobject by calling gobject.threads_init(). See the PyGTK FAQ for background info.
Next to that, for the purpose you're describing, timeouts seem to be a better fit. Use as follows:
# Enable timer
self.timer = gobject.timeout_add(time_in_ms, self.remote.keepalive)
# Disable timer
gobject.source_remove(self.timer)
This calls the keepalive function every time_in_ms (milli)seconds. Further details, again, can be found at the PyGTK reference.
import threading
import Queue
import urllib2
import time
class ThreadURL(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host = self.queue.get()
sock = urllib2.urlopen(host)
data = sock.read()
self.queue.task_done()
hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()
def main():
queue = Queue.Queue()
for i in range(len(hosts)):
t = ThreadURL(queue)
t.start()
for host in hosts:
queue.put(host)
queue.join()
if __name__ == '__main__':
main()
print 'Elapsed time: {0}'.format(time.time() - start)
I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.
What it's supposed to do is:
Initialiase the queue
Create my Thread pool and then queue up the list of hosts
My ThreadURL class should then begin work once a host is in the queue and read the website data
The program should finish
What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?
Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?
Your code looks fine and is quite clean.
The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.
The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.
looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.
nothing wrong with it, just wondering.
One thing, in the thread run function, the while True loop, if some exception happened, the task_done() may not be called however the get() has already been called. Thus the queue.join() may never end.