I have the following code that launches multiple python processes that continually poll from an SQS queue.
The processes are launched with
num_processes = range(1, 9)
for p_num in num_processes:
p = multiprocessing.Process(
target=sqs_polling, args=(queue_name, p_num,))
p.start()
and the actual polling function is
def sqs_polling(queue_name, process_id):
sqs = boto3.resource('sqs', region_name='us-east-1')
queue = sqs.get_queue_by_name(QueueName=queue_name)
no_messages = False
# poll sqs forever
while 1:
# polling delay so aws does not throttle us
sleep(2.0)
# sleep longer if there are no messages on the queue the last time it was polled
if no_messages:
sleep(900.0)
message_batch = queue.receive_messages(MaxNumberOfMessages=10, WaitTimeSeconds=20)
if len(message_batch) == 0:
no_messages = True
else:
no_messages = False
# process messages
for message in message_batch:
do_something(message)
message.delete()
This seems to work for a few hours but eventually it seems as though SQS throttles the processes and no messages can be read even though they exist on the queue. To help reduce this I have a timeout of 2 seconds between Queue reads. Also I have created a 15min timeout if there are no messages read. In spite of this I still get throttling. Can anyone explain why throttling is still occurring here? Another possibility might be that the connection to the queue gets stale but I think that is unlikely.
The question is a bit outdated, but I just released multi_sqs_listener that provides a high level, multi-threaded way to listen to multiple SQS queues from Python code.
import time
from multi_sqs_listener import QueueConfig, EventBus, MultiSQSListener
class MyListener(MultiSQSListener):
def low_priority_job(self, message):
print('Starting low priority, long job: {}'.format(message))
time.sleep(5)
print('Ended low priority job: {}'.format(message))
def high_priority_job(self, message):
print('Starting high priority, quick job: {}'.format(message))
time.sleep(.2)
print('Ended high priority job: {}'.format(message))
def handle_message(self, queue, bus, priority, message):
if bus == 'high-priority-bus':
self.high_priority_job(message.body)
else:
self.low_priority_job(message.body)
low_priority_bus = EventBus('low-priority-bus', priority=1)
high_priority_bus = EventBus('high-priority-bus', priority=5)
EventBus.register_buses([low_priority_bus, high_priority_bus])
low_priority_queue = QueueConfig('low-priority-queue', low_priority_bus)
high_priority_queue = QueueConfig('high-priority-queue', high_priority_bus)
my_listener = MyListener([low_priority_queue, high_priority_queue])
my_listener.listen()
Related
Scenarion
Sensor is continuously sending data in an interval of 100 milliseconds ( time needs to be configurable)
One Thread read the data continuously from sensor and write it to a common queue
This process is continuous until keyboard interrupt press happens
Thread 2 locks queue, ( may momentarily block Thread1)
Read full data from queue to temp structure
Release the queue
process the data in it. It is a computational task. While performing this task. Thread 1 should keep on filling the buffer with sensor data.
I have read about threading and GIL, so step 7 cannot afford to have any loss in data sent by the sensor while performing the computational process() on thread 2.
How this can be implemented using Python?
What I started with it is
import queue
from threading import Thread
import queue
from queue import Queue
q = Queue(maxsize=10)
def fun1():
fun2Thread = Thread(target=fun2)
fun2Thread.start()
while True:
try:
q.put(1)
except KeyboardInterrupt:
print("Key Interrupt")
fun2Thread.join()
def fun2():
print(q.get())
def read():
fun1Thread = Thread(target=fun1)
fun1Thread.start()
fun1Thread.join()
read()
The issue I'm facing in this is the terminal is stuck after printing 1. Can someone please guide me on how to implement this scenario?
Here's an example that may help.
We have a main program (driver), a client and a server. The main program manages queue construction and the starting and ending of the subprocesses.
The client sends a range of values via a queue to the client. When the range is exhausted it tells the server to terminate. There's a delay (sleep) in enqueueing the data for demonstration purposes.
Try running it once without any interrupt and note how everything terminates nicely. Then run again and interrupt (Ctrl-C) and again note a clean termination.
from multiprocessing import Queue, Process
from signal import signal, SIGINT, SIG_IGN
from time import sleep
def client(q, default):
signal(SIGINT, default)
try:
for i in range(10):
sleep(0.5)
q.put(i)
except KeyboardInterrupt:
pass
finally:
q.put(-1)
def server(q):
while (v := q.get()) != -1:
print(v)
def main():
q = Queue()
default = signal(SIGINT, SIG_IGN)
(server_p := Process(target=server, args=(q,))).start()
(client_p := Process(target=client, args=(q, default))).start()
client_p.join()
server_p.join()
if __name__ == '__main__':
main()
EDIT:
Edited to ensure that the server process continues to drain the queue if the client is terminated due to a KeyboardInterrupt (SIGINT)
I implemented asynchronous pull subscriber using Python. This is the basic code
def receive_messages(project, subscription_name):
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
project, subscription_name)
def callback(message):
print ("A")
time.sleep(2)
print('Received message: {}'.format(message))
message.ack()
print ("B")
subscriber.subscribe(subscription_path, callback=callback)
print('Listening for messages on {}'.format(subscription_path))
while True:
time.sleep(60)
I need to print like
A,
message
B
A
message
B
(I need to run sequentially) or receive messages via given no of threads. I don't find a way to limit no of threads. My program given Segmentation fault due to many threads.
How I control no of threads to receive messages.
Problem can solve using Policy
from google.cloud import pubsub_v1
from concurrent import futures
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project, subscription_name)
def callback(message):
print (str(message.data) + " " + str(threading.current_thread()))
message.ack()
flow_control = pubsub_v1.types.FlowControl(max_messages=10)
executor = futures.ThreadPoolExecutor(max_workers=5)
policy = pubsub_v1.subscriber.policy.thread.Policy(subscriber, subscription_path, executor=executor, flow_control=flow_control)
policy.open(callback)
We can set maximum thread count using max_workers. Also flow control settings can be set.
If you need your processing callbacks to run sequentially, you would be better off using a message passing model than modifying the subscriber internals. If you push the received messages to an explicit queue.Queue, you can ensure that only one worker is pulling off of this queue, and only one is being processed at a time. Note however, that while this provides you a ‘one at a time’ guarantee for processing if there is only one subscribing job, it does not provide you with any ordering guarantees. Messages may still be processed in any arbitrary order relative to the order that they were published.
If someone looking for a newer version
from concurrent import futures
from google.cloud import pubsub_v1
executor = futures.ThreadPoolExecutor(max_workers=1)
scheduler = pubsub_v1.subscriber.scheduler.ThreadScheduler(executor)
with pubsub_v1.SubscriberClient() as subscriber:
streaming_pull_future = subscriber.subscribe(subscription_name, callback, scheduler=scheduler, await_callbacks_on_shutdown=True)
timeout = 5 * 60 # seconds
try:
streaming_pull_future.result(timeout=timeout)
except Exception:
streaming_pull_future.cancel() # Trigger the shutdown.
streaming_pull_future.result() # Block until the shutdown is complete.
I am developing a server (daemon).
The server has one "worker thread". The worker thread runs a queue of commands. When the queue is empty, the worker thread is paused (but does not exit, because it should preserve certain state in memory). To have exactly one copy of the state in memory, I need to run all time exactly one (not several and not zero) worker thread.
Requests are added to the end of this queue when a client connects to a Unix socket and sends a command.
After the command is issued, it is added to the queue of commands of the worker thread. After it is added to the queue, the server replies something like "OK". There should be not a long pause between server receiving a command and it "OK" reply. However, running commands in the queue may take some time.
The main "work" of the worker thread is split into small (taking relatively little time) chunks. Between chunks, the worker thread inspects ("eats" and empties) the queue and continues to work based on the data extracted from the queue.
How to implement this server/daemon in Python?
This is a sample code with internet sockets, easily replaced with unix domain sockets. It takes whatever you write to the socket, passes it as a "command" to worker, responds OK as soon as it has queued the command. The single worker simulates a lengthy task with sleep(30). You can queue as many tasks as you want, receive OK immediately and every 30 seconds, your worker prints a command from the queue.
import Queue, threading, socket
from time import sleep
class worker(threading.Thread):
def __init__(self,q):
super(worker,self).__init__()
self.qu = q
def run(self):
while True:
new_task=self.qu.get(True)
print new_task
i=0
while i < 10:
print "working ..."
sleep(1)
i += 1
try:
another_task=self.qu.get(False)
print another_task
except Queue.Empty:
pass
task_queue = Queue.Queue()
w = worker(task_queue)
w.daemon = True
w.start()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', 4200))
sock.listen(1)
try:
while True:
conn, addr = sock.accept()
data = conn.recv(32)
task_queue.put(data)
conn.sendall("OK")
conn.close()
except:
sock.close()
I have a piece of multi threaded code - 3 threads that polls data from SQS and add it to a python queue. 5 threads that take the messages from python queue, process them and send it to a back end system.
Here is the code:
python_queue = Queue.Queue()
class GetDataFromSQS(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, python_queue):
threading.Thread.__init__(self)
self.python_queue = python_queue
def run(self):
while True:
time.sleep(0.5) //sleep for a few secs before querying again
try:
msgs = sqs_queue.get_messages(10)
if msgs == None:
print "sqs is empty now"!
for msg in msgs:
#place each message block from sqs into python queue for processing
self.python_queue.put(msg)
print "Adding a new message to Queue. Queue size is now %d" % self.python_queue.qsize()
#delete from sqs
sqs_queue.delete_message(msg)
except Exception as e:
print "Exception in GetDataFromSQS :: " + e
class ProcessSQSMsgs(threading.Thread):
def __init__(self, python_queue):
threading.Thread.__init__(self)
self.python_queue = python_queue
self.pool_manager = PoolManager(num_pools=6)
def run(self):
while True:
#grabs the message to be parsed from sqs queue
python_queue_msg = self.python_queue.get()
try:
processMsgAndSendToBackend(python_queue_msg, self.pool_manager)
except Exception as e:
print "Error parsing:: " + e
finally:
self.python_queue.task_done()
def processMsgAndSendToBackend(msg, pool_manager):
if msg != "":
###### All the code related to processing the msg
for individualValue in processedMsg:
try:
response = pool_manager.urlopen('POST', backend_endpoint, body=individualValue)
if response == None:
print "Error"
else:
response.release_conn()
except Exception as e:
print "Exception! Post data to backend: " + e
def startMyPython():
#spawn a pool of threads, and pass them queue instance
for i in range(3):
sqsThread = GetDataFromSQS(python_queue)
sqsThread.start()
for j in range(5):
parseThread = ProcessSQSMsgs(python_queue)
#parseThread.setDaemon(True)
parseThread.start()
#wait on the queue until everything has been processed
python_queue.join()
# python_queue.close() -- should i do this?
startMyPython()
The problem:
3 python workers die randomly (monitored using top -p -H) once every few days and everything is alright if i kill the process and start the script again. I suspect the workers that vanish are the 3 GetDataFromSQS threads.. And because the GetDataFromSQS dies, the other 5 workers although running always sleep as there is no data in the python queue. I am not sure what I am doing wrong here as I am pretty new to python and followed this tutorial for creating this queuing logic and threads - http://www.ibm.com/developerworks/aix/library/au-threadingpython/
Thanks in advance for your help. Hope I have explained my problem clear.
The problem for the thread hanging was related to getting a handle of the sqs queue. I used IAM for managing credentials and the boto sdk for connecting to sqs.
The root cause for this issue was that the boto package was reading the metadata for auth from AWS and it was failing once in a while.
The fix is to edit the boto config, increasing the attempts that are made to perform the auth call to AWS.
[Boto]
metadata_service_num_attempts = 5
( https://groups.google.com/forum/#!topic/boto-users/1yX24WG3g1E )
I was reading a article on Python multi threading using Queues and have a basic question.
Based on the print stmt, 5 threads are started as expected. So, how does the queue works?
1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item?
2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 1 by 1.
import Queue
import threading
import urllib2
import datetime
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
print 'threads are created'
self.queue = queue
def run(self):
while True:
#grabs host from queue
print 'thread startting to run'
now = datetime.datetime.now()
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print 'host=%s ,threadname=%s' % (host,self.getName())
print url.read(20)
#signals to queue job is done
self.queue.task_done()
start = time.time()
if __name__ == '__main__':
#spawn a pool of threads, and pass them queue instance
print 'program start'
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
print "Elapsed Time: %s" % (time.time() - start)
A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data.
What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. It can then continue to process it until it finishes and returns to a blocking state.
All of your threads can run concurrently because they all are able to receive items from the queue. This is where you would see your improvement in performance. If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. The queue objects job is simply to manage the locking access, and popping off items to the callers.