Python performance - best parallelism approach

Python performance - best parallelism approach - python

I am implementing a Python script that needs to keep sending 1500+ packets in parallel in less than 5 seconds each.
In a nutshell what I need is:
def send_pkts(ip):
#craft packet
while True:
#send packet
time.sleep(randint(0,3))
for x in list[:1500]:
send_pkts(x)
time.sleep(randint(1,5))
I have tried the simple single-threaded, multithreading, multiprocessing and multiprocessing+multithreading forms and had the following issues:
Simple single-threaded:
The "for delay" seems to compromise the "5 seconds" dependency.
Multithreading:
I think I could not accomplish what I desire due to Python GIL limitations.
Multiprocessing:
That was the best approach that seemed to work. However, due to excessive quantity of process the VM where I am running the script freezes (of course, 1500 process running). Thus becoming impractical.
Multiprocessing+Multithreading:
In this approach I created less process with each of them calling some threads (lets suppose: 10 process calling 150 threads each). It was clear that the VM is not freezing as fast as approach number 3, however the most "concurrent packet sending" I could reach was ~800. GIL limitations? VM limitations?
In this attempt I also tried using Process Pool but the results where similar.
Is there a better approach I could use to accomplish this task?
[1] EDIT 1:
def send_pkt(x):
#craft pkt
while True:
#send pkt
gevent.sleep(0)
gevent.joinall([gevent.spawn(send_pkt, x) for x in list[:1500]])
[2] EDIT 2 (gevent monkey-patching):
from gevent import monkey; monkey.patch_all()
jobs = [gevent.spawn(send_pkt, x) for x in list[:1500]]
gevent.wait(jobs)
#for send_pkt(x) check [1]
However I got the following error: "ValueError: filedescriptor out of range in select()". So I checked my system ulimit (Soft and Hard both are maximum: 65536).
After, I checked it has something to do with select() limitations over Linux (1024 fds maximum). Please check: http://man7.org/linux/man-pages/man2/select.2.html (BUGS section) - In orderto overcome that I should use poll() (http://man7.org/linux/man-pages/man2/poll.2.html) instead. But with poll() I return to same limitations: as polling is a "blocking approach".
Regards,

When using parallelism in Python a good approach is to use either ThreadPoolExecutor or ProcessPoolExecutor from
https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures
these work well in my experience.
an example of threadedPoolExecutor that can be adapted for your use.
import concurrent.futures
import urllib.request
import time
IPs= ['168.212. 226.204',
'168.212. 226.204',
'168.212. 226.204',
'168.212. 226.204',
'168.212. 226.204']
def send_pkt(x):
status = 'Failed'
while True:
#send pkt
time.sleep(10)
status = 'Successful'
break
return status
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_ip = {executor.submit(send_pkt, ip): ip for ip in IPs}
for future in concurrent.futures.as_completed(future_to_ip):
ip = future_to_ip[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (ip, exc))
else:
print('%r send %s' % (url, data))

Your result in option 3: "due to excessive quantity of process the VM where I am running the script freezes (of course, 1500 process running)" could bear further investigation. I believe it may be underdetermined from the information gathered so far whether this is better characterized as a shortcoming of the multiprocessing approach, or a limitation of the VM.
One fairly simple and straightforward approach would be to run a scaling experiment: rather than either having all sends happen from individual processes or all from the same, try intermediate values. Time it how long it takes to split the workload in half between two processes, or 4, 8, so on.
While doing that it may also be a good idea to run a tool like xperf on Windows or oprofile on Linux to record whether these different choices of parallelism are leading to different kinds of bottlenecks, for example thrashing the CPU cache, running the VM out of memory, or who knows what else. Easiest way to say is to try it.
Based on prior experience with these types of problems and general rules of thumb, I would expect the best performance to come when the number of multiprocessing processes is less than or equal to the number of available CPU cores (either on the VM itself or on the hypervisor). That is however assuming that the problem is CPU bound; it's possible performance would still be higher with more than #cpu processes if something blocks during packet sending that would allow better use of CPU time if interleaved with other blocking operations. Again though, we don't know until some profiling and/or scaling experiments are done.

You are correct that python is single-threaded, however your desired task (sending network packets) is considered IO-bound operation, therefor a good candidate for multi-threading. Your main thread is not busy while the packets are transmitting, as long as your write your code with async in mind.
Take a look at the python docs on async tcp networking - https://docs.python.org/3/library/asyncio-protocol.html#tcp-echo-client.

If the bottleneck is http based ("sending packets") then the GIL actually shouldn't be too much of a problem.
If there is computation happening within python as well, then the GIL may get in the way and, as you say, process-based parallelism would be preferred.
You do not need one process per task! This seems to be the oversight in your thinking. With python's Pool class, you can easily create a set of workers which will receive tasks from a queue.
import multiprocessing
def send_pkts(ip):
...
number_of_workers = 8
with multiprocessing.Pool(number_of_workers) as pool:
pool.map(send_pkts, list[:1500])
You are now running number_of_workers + 1 processes (the workers + the original process) and the N workers are running the send_pkts function concurrently.

The main issue keeping you from achieving your desired performance is the send_pkts() method. It doesn't just send the packet, it also crafts the packet:
def send_pkts(ip):
#craft packet
while True:
#send packet
time.sleep(randint(0,3))
While sending a packet is almost certainly an I/O bound task, crafting a packet is almost certainly a CPU bound task. This method needs to be split into two tasks:
craft a packet
send a packet
I've written a basic socket server and a client app that crafts and sends packets to the server. The idea is to have a separate process which crafts the packets and puts them into a queue. There is a pool of threads that share the queue with the packet crafting process. These threads pull packets off of the queue and send them to the server. They also stick the server's responses into another shared queue but that's just for my own testing and not relevant to what you're trying to do. The threads exit when they get a None (poison pill) from the queue.
server.py:
import argparse
import socketserver
import time
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, help="bind to host")
parser.add_argument("--port", type=int, help="bind to port")
parser.add_argument("--packet-size", type=int, help="size of packets")
args = parser.parse_args()
HOST, PORT = args.host, args.port
class MyTCPHandler(socketserver.BaseRequestHandler):
def handle(self):
time.sleep(1.5)
data = self.request.recv(args.packet_size)
self.request.sendall(data.upper())
with socketserver.ThreadingTCPServer((HOST, PORT), MyTCPHandler) as server:
server.serve_forever()
client.py:
import argparse
import logging
import multiprocessing as mp
import os
import queue as q
import socket
import time
from threading import Thread
def get_logger():
logger = logging.getLogger("threading_example")
logger.setLevel(logging.INFO)
fh = logging.FileHandler("client.log")
fmt = '%(asctime)s - %(threadName)s - %(levelname)s - %(message)s'
formatter = logging.Formatter(fmt)
fh.setFormatter(formatter)
logger.addHandler(fh)
return logger
class PacketMaker(mp.Process):
def __init__(self, result_queue, max_packets, packet_size, num_poison_pills, logger):
mp.Process.__init__(self)
self.result_queue = result_queue
self.max_packets = max_packets
self.packet_size = packet_size
self.num_poison_pills = num_poison_pills
self.num_packets_made = 0
self.logger = logger
def run(self):
while True:
if self.num_packets_made >= self.max_packets:
for _ in range(self.num_poison_pills):
self.result_queue.put(None, timeout=1)
self.logger.debug('PacketMaker exiting')
return
self.result_queue.put(os.urandom(self.packet_size), timeout=1)
self.num_packets_made += 1
class PacketSender(Thread):
def __init__(self, task_queue, result_queue, addr, packet_size, logger):
Thread.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
self.server_addr = addr
self.packet_size = packet_size
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.connect(addr)
self.logger = logger
def run(self):
while True:
packet = self.task_queue.get(timeout=1)
if packet is None:
self.logger.debug("PacketSender exiting")
return
try:
self.sock.sendall(packet)
response = self.sock.recv(self.packet_size)
except socket.error:
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sock.connect(self.server_addr)
self.sock.sendall(packet)
response = self.sock.recv(self.packet_size)
self.result_queue.put(response)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--num-packets', type=int, help='number of packets to send')
parser.add_argument('--packet-size', type=int, help='packet size in bytes')
parser.add_argument('--num-threads', type=int, help='number of threads sending packets')
parser.add_argument('--host', type=str, help='name of host packets will be sent to')
parser.add_argument('--port', type=int, help='port number of host packets will be sent to')
args = parser.parse_args()
logger = get_logger()
logger.info(f"starting script with args {args}")
packets_to_send = mp.Queue(args.num_packets + args.num_threads)
packets_received = q.Queue(args.num_packets)
producers = [PacketMaker(packets_to_send, args.num_packets, args.packet_size, args.num_threads, logger)]
senders = [PacketSender(packets_to_send, packets_received, (args.host, args.port), args.packet_size, logger)
for _ in range(args.num_threads)]
start_time = time.time()
logger.info("starting workers")
for worker in senders + producers:
worker.start()
for worker in senders:
worker.join()
logger.info("workers finished")
end_time = time.time()
print(f"{packets_received.qsize()} packets received in {end_time - start_time} seconds")
run.sh:
#!/usr/bin/env bash
for i in "$#"
do
case $i in
-s=*|--packet-size=*)
packet_size="${i#*=}"
shift
;;
-n=*|--num-packets=*)
num_packets="${i#*=}"
shift
;;
-t=*|--num-threads=*)
num_threads="${i#*=}"
shift
;;
-h=*|--host=*)
host="${i#*=}"
shift
;;
-p=*|--port=*)
port="${i#*=}"
shift
;;
*)
;;
esac
done
python3 server.py --host="${host}" \
--port="${port}" \
--packet-size="${packet_size}" &
server_pid=$!
python3 client.py --packet-size="${packet_size}" \
--num-packets="${num_packets}" \
--num-threads="${num_threads}" \
--host="${host}" \
--port="${port}"
kill "${server_pid}"
$ ./run.sh -s=1024 -n=1500 -t=300 -h=localhost -p=9999
1500 packets received in 4.70330023765564 seconds
$ ./run.sh -s=1024 -n=1500 -t=1500 -h=localhost -p=9999
1500 packets received in 1.5025699138641357 seconds
This result may be verified by changing the log level in client.py to DEBUG. Note that the script does take much longer than 4.7 seconds to complete. There is quite a lot of teardown required when using 300 threads, but the log makes it clear that the threads are done processing at 4.7 seconds.
Take all performance results with a grain of salt. I have no clue what system you're running this on. I will provide my relevant system stats:
2 Xeon X5550 #2.67GHz
24MB DDR3 #1333MHz
Debian 10
Python 3.7.3
I'll address the issues with your attempts:
Simple single-threaded: This is all but guaranteed to take at least 1.5 x num_packets seconds due to the randint(0, 3) delay
Multithreading: The GIL is the likely bottleneck here, but it's likely because of the craft packet part rather than send packet
Multiprocessing: Each process requires at least one file descriptor so you're probably exceeding the user or system limit, but this could work if you change the appropriate settings
Multiprocessing+multithreading: This fails for the same reason as #2, crafting the packet is probably CPU bound
The rule of thumb is: I/O bound - use threads, CPU bound - use processes

Related

Is there any way Load balancing on a preforked multi process TCP server build by asyncio?

Thaks for guys in previous answer, now I can build a multi process TCP server with each process running a asynchronous server respectively but all binding to one port.
(Could not use os.fork() bind several process to one socket server when using asyncio)
Theoretically ? this model would achieve its best performance when every process handling incoming message equally. Benefits may like lower latency or higher tps ? I'm not sure.
Here's the problem . I created a four process server ,and did statistics of how many tcp request would each process accept (by a while looped client who continuously making new connection request) .Result is like {p1: 20000 times , p2: 16000 times ,p3: 13000 times ,p4: 10000 times} <-- this. Probably not some kind of good result.
I'm figuring out if a lock would help (let the process who gets the lock accept the request ,instead of let processes competitively accepting request directly) .But it turns out in this way only the Parent process could get the lock while the other couldn't at all.
Trying to figure out solution ,need your help.
Here's a naive , sample server code (preforked model where processes competitively accepting request directly):
# sample_server.py
import asyncio
import os
from socket import *
def create_server():
sock = socket(AF_INET , SOCK_STREAM)
sock.setsockopt(SOL_SOCKET , SO_REUSEADDR ,1)
sock.bind(('',25000))
sock.listen()
sock.setblocking(False)
return sock
async def start_serving(loop , server):
while True:
client ,addr = await loop.sock_accept(server)
loop.create_task(loop ,client)
async def handler(loop ,client):
with client:
while True:
data = await loop.sock_recv(client , 64)
if not data: break
print(f"Incoming message {data} at pid {pid}")
await loop.sock_sendall(client , data)
server = create_server()
for i in range(4 - 1):
pid = os.fork()
if pid <= 0:
break
pid = os.getpid()
loop = asyncio.get_event_loop()
loop.create_task(start_serving(loop , server))
loop.run_forever()
Then we may redirect its output into a file like this:
python3 sample_server.py > sample_server.output
Next step maybe we roughly deal with these data:
import re
from collections import Counter
with open('./sample_server.output','r') as f:
cont = file.read()
pat = re.compile('[\d]{4}')
res = pat.findall(cont)
print(Counter(res))
Get output like this (where key means port num while value means how many echos they handled):
Counter({'3788': 23136, '3789': 18866, '3791': 18263, '3790': 10817})
Inequalitiy.
Things went even more worse when I introduced multiprocessing Lock like this:
from multiprocessing import Lock
l = Lock()
async def start_serving(loop , server):
while True:
with l:
client ,addr = await loop.sock_accept(server)
loop.create_task(loop ,client)
↑ Then the only process can accept request is the parent process. while child process were totally blocked. Seems like if you acquired a lock before process was blocked ,then it will always behave like this . The interpreter is just faithfully doing what we told it to do.
In summary here's my two question:
1\ if there's any method let this preforked asynchronous server load balance ?
2\ Is there any way to introduce a lock help solving this problem?
Thanks!
PS: if anyone could tell me how to use uvloop drive eventloop in the interpreter of pypy? big thanks!

ZeroMQ REQ .recv() hangs with messages larger than ~1kB if run inside Docker

I'm working on a relatively simple Python / ZeroMQ based work distribution system, using REQ/ROUTER sockets. The system is distributed and worker nodes are geographically distributed on different continents.
The ROUTER, responsible for distributing work, .bind()-s a ROUTER socket. Workers .connect() to it over TCP using a REQ socket.
In the process of setting up a new worker node, I've noticed that while smaller messages (up to 1kB) do the trip with no issues, replies of ~2kB and up, sent by the ROUTER-end are never received by the worker into their REQ-socket - when I call recv(), the socket just hangs.
The worker code runs inside Docker containers, and I was able to work around the issue when running the same image with --net=host - it seems to not happen if Docker is using the host network.
I'm wondering if this is something in the network stack configuration on the host machine or in Docker, or maybe something that can be prevented in my code?
Here is a simplified version of my code that reproduces this issue:
Worker
import sys
import zmq
import logging
import time
READY = 'R'
def worker(connect_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.REQ)
socket.connect(connect_to)
log = logging.getLogger(__name__)
while True:
socket.send_string(READY)
log.debug("Send READY message, waiting for reply")
message = socket.recv()
log.debug("Got reply of %d bytes", len(message))
time.sleep(5)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
worker(sys.argv[1])
Router
import sys
import zmq
import logging
REPLY_SIZE = 1024 * 8
def router(bind_to):
ctx = zmq.Context()
socket = ctx.socket(zmq.ROUTER)
socket.bind(bind_to)
poller = zmq.Poller()
poller.register(socket, zmq.POLLIN)
log = logging.getLogger(__name__)
while True:
socks = dict(poller.poll(5000))
if socks.get(socket) == zmq.POLLIN:
message = socket.recv_multipart()
log.debug("Received message of %d parts", len(message))
identity, _ = message[:2]
res = handle_message(message[2:])
log.debug("Sending %d bytes back in response on socket", len(res))
socket.send_multipart([identity, '', res])
def handle_message(parts):
log = logging.getLogger(__name__)
log.debug("Got message: %s", parts)
return 'A' * REPLY_SIZE
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
router(sys.argv[1])
FWIW I was able to reproduce this on Ubuntu 16.04 (both router and worker) with Docker 17.09.0-ce, libzmq 4.1.5 and PyZMQ 15.4.0.

No, sir, the socket does not hang at all:
Why?
The issue is, that you have instructed the Socket()-instance to enter into an infinitely blocking state, once having called .recv() method, without specifying a zmq.NOBLOCK flag ( the ZMQ_DONTWAIT flag in the ZeroMQ original API ).
This is the cause, that upon other circumstances reported yesterday, moves the code into infinite blocking, as there seem to be other issues that prevent Docker-container to properly deliver any first message to the hands of the Worker's Docker-embedded-ZeroMQ-Context() I/O-engine and to the hands of the REQ-access-point. As the REQ-archetype uses a strict two-step Finite-State-Automaton - strictly striding ( .send()->.recv()->.send()-> ... ad infimum )
This cause->effect reversing is wrong and misleading -
the issue of "socket just hangs"
is un-decideable
from an issue Docker does not deliver a single message ( to allow .recv() to return )
Next steps:
may use .poll() in REQ-side to sniff without blocking for any already arrived message in the Worker.
Once there are none such, focus on Docker first + next may benefit from ZeroMQ Context()-I/O-engine performance and link-level tweaking configuration options.

Python multiprocessing and networking on Windows

I'm trying to implement a tcp 'echo server'.
Simple stuff:
Client sends a message to the server.
Server receives the message
Server converts message to uppercase
Server sends modified message to client
Client prints the response.
It worked well, so I decided to parallelize the server; make it so that it could handle multiple clients at time.
Since most Python interpreters have a GIL, multithreading won't cut it.
I had to use multiproces... And boy, this is where things went downhill.
I'm using Windows 10 x64 and the WinPython suit with Python 3.5.2 x64.
My idea is to create a socket, intialize it (bind and listen), create sub processes and pass the socket to the children.
But for the love of me... I can't make this work, my subprocesses die almost instantly.
Initially I had some issues 'pickling' the socket...
So I googled a bit and thought this was the issue.
So I tried passing my socket thru a multiprocessing queue, through a pipe and my last attempt was 'forkpickling' and passing it as a bytes object during the processing creating.
Nothing works.
Can someone please shed some light here?
Tell me whats wrong?
Maybe the whole idea (sharing sockets) is bad... And if so, PLEASE tell me how can I achieve my initial objective: enabling my server to ACTUALLY handle multiple clients at once (on Windows) (don't tell me about threading, we all know python's threading won't cut it ¬¬)
It also worth noting that no files are create by the debug function.
No process lived long enough to run it, I believe.
The typical output of my server code is (only difference between runs is the process numbers):
Server is running...
Degree of parallelism: 4
Socket created.
Socket bount to: ('', 0)
Process 3604 is alive: True
Process 5188 is alive: True
Process 6800 is alive: True
Process 2844 is alive: True
Press ctrl+c to kill all processes.
Process 3604 is alive: False
Process 3604 exit code: 1
Process 5188 is alive: False
Process 5188 exit code: 1
Process 6800 is alive: False
Process 6800 exit code: 1
Process 2844 is alive: False
Process 2844 exit code: 1
The children died...
Why god?
WHYYyyyyy!!?!?!?
The server code:
# Imports
import socket
import packet
import sys
import os
from time import sleep
import multiprocessing as mp
import pickle
import io
# Constants
DEGREE_OF_PARALLELISM = 4
DEFAULT_HOST = ""
DEFAULT_PORT = 0
def _parse_cmd_line_args():
arguments = sys.argv
if len(arguments) == 1:
return DEFAULT_HOST, DEFAULT_PORT
else:
raise NotImplemented()
def debug(data):
pid = os.getpid()
with open('C:\\Users\\Trauer\\Desktop\\debug\\'+str(pid)+'.txt', mode='a',
encoding='utf8') as file:
file.write(str(data) + '\n')
def handle_connection(client):
client_data = client.recv(packet.MAX_PACKET_SIZE_BYTES)
debug('received data from client: ' + str(len(client_data)))
response = client_data.upper()
client.send(response)
debug('sent data from client: ' + str(response))
def listen(picklez):
debug('started listen function')
pid = os.getpid()
server_socket = pickle.loads(picklez)
debug('acquired socket')
while True:
debug('Sub process {0} is waiting for connection...'.format(str(pid)))
client, address = server_socket.accept()
debug('Sub process {0} accepted connection {1}'.format(str(pid),
str(client)))
handle_connection(client)
client.close()
debug('Sub process {0} finished handling connection {1}'.
format(str(pid),str(client)))
if __name__ == "__main__":
# Since most python interpreters have a GIL, multithreading won't cut
# it... Oughta bust out some process, yo!
host_port = _parse_cmd_line_args()
print('Server is running...')
print('Degree of parallelism: ' + str(DEGREE_OF_PARALLELISM))
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print('Socket created.')
server_socket.bind(host_port)
server_socket.listen(DEGREE_OF_PARALLELISM)
print('Socket bount to: ' + str(host_port))
buffer = io.BytesIO()
mp.reduction.ForkingPickler(buffer).dump(server_socket)
picklez = buffer.getvalue()
children = []
for i in range(DEGREE_OF_PARALLELISM):
child_process = mp.Process(target=listen, args=(picklez,))
child_process.daemon = True
child_process.start()
children.append(child_process)
while not child_process.pid:
sleep(.25)
print('Process {0} is alive: {1}'.format(str(child_process.pid),
str(child_process.is_alive())))
print()
kids_are_alive = True
while kids_are_alive:
print('Press ctrl+c to kill all processes.\n')
sleep(1)
exit_codes = []
for child_process in children:
print('Process {0} is alive: {1}'.format(str(child_process.pid),
str(child_process.is_alive())))
print('Process {0} exit code: {1}'.format(str(child_process.pid),
str(child_process.exitcode)))
exit_codes.append(child_process.exitcode)
if all(exit_codes):
# Why do they die so young? :(
print('The children died...')
print('Why god?')
print('WHYYyyyyy!!?!?!?')
kids_are_alive = False
edit: fixed the signature of "listen". My processes still die instantly.
edit2: User cmidi pointed out that this code does work on Linux; so my question is: How can I 'made this work' on Windows?

You can directly pass a socket to a child process. multiprocessing registers a reduction for this, for which the Windows implementation uses the following DupSocket class from multiprocessing.resource_sharer:
class DupSocket(object):
'''Picklable wrapper for a socket.'''
def __init__(self, sock):
new_sock = sock.dup()
def send(conn, pid):
share = new_sock.share(pid)
conn.send_bytes(share)
self._id = _resource_sharer.register(send, new_sock.close)
def detach(self):
'''Get the socket. This should only be called once.'''
with _resource_sharer.get_connection(self._id) as conn:
share = conn.recv_bytes()
return socket.fromshare(share)
This calls the Windows socket share method, which returns the protocol info buffer from calling WSADuplicateSocket. It registers with the resource sharer to send this buffer over a connection to the child process. The child in turn calls detach, which receives the protocol info buffer and reconstructs the socket via socket.fromshare.
It's not directly related to your problem, but I recommend that you redesign the server to instead call accept in the main process, which is the way this is normally done (e.g. in Python's socketserver.ForkingTCPServer module). Pass the resulting (conn, address) tuple to the first available worker over a multiprocessing.Queue, which is shared by all of the workers in the process pool. Or consider using a multiprocessing.Pool with apply_async.

def listen() the target/start for your child processes does not take any argument but you are providing serialized socket as an argument args=(picklez,) to the child process this would cause an exception in the child process and exit immediately.
TypeError: listen() takes no arguments (1 given)
def listen(picklez) should solve the problem this will provide one argument to the target of your child processes.

Asynchronous Client/Server pattern in Python ZeroMQ

I have 3 programs written in Python, which need to be connected. 2 programs X and Y gather some information, which are sent by them to program Z. Program Z analyzes the data and send to program X and Y some decisions. Number of programs similar to X and Y will be expanded in the future. Initially I used named pipe to allow communication from X, Y to Z. But as you can see, I need bidirectional relation. My boss told me to use ZeroMQ. I have just found pattern for my use case, which is called Asynchronous Client/Server. Please see code from ZMQ book (http://zguide.zeromq.org/py:all) below.
The problem is my boss does not want to use any threads, forks etc. I moved client and server tasks to separate programs, but I am not sure what to do with ServerWorker class. Can this be somehow used without threads? Also, I am wondering, how to establish optimal workers amount.
import zmq
import sys
import threading
import time
from random import randint, random
__author__ = "Felipe Cruz <felipecruz#loogica.net>"
__license__ = "MIT/X11"
def tprint(msg):
"""like print, but won't get newlines confused with multiple threads"""
sys.stdout.write(msg + '\n')
sys.stdout.flush()
class ClientTask(threading.Thread):
"""ClientTask"""
def __init__(self, id):
self.id = id
threading.Thread.__init__ (self)
def run(self):
context = zmq.Context()
socket = context.socket(zmq.DEALER)
identity = u'worker-%d' % self.id
socket.identity = identity.encode('ascii')
socket.connect('tcp://localhost:5570')
print('Client %s started' % (identity))
poll = zmq.Poller()
poll.register(socket, zmq.POLLIN)
reqs = 0
while True:
reqs = reqs + 1
print('Req #%d sent..' % (reqs))
socket.send_string(u'request #%d' % (reqs))
for i in range(5):
sockets = dict(poll.poll(1000))
if socket in sockets:
msg = socket.recv()
tprint('Client %s received: %s' % (identity, msg))
socket.close()
context.term()
class ServerTask(threading.Thread):
"""ServerTask"""
def __init__(self):
threading.Thread.__init__ (self)
def run(self):
context = zmq.Context()
frontend = context.socket(zmq.ROUTER)
frontend.bind('tcp://*:5570')
backend = context.socket(zmq.DEALER)
backend.bind('inproc://backend')
workers = []
for i in range(5):
worker = ServerWorker(context)
worker.start()
workers.append(worker)
poll = zmq.Poller()
poll.register(frontend, zmq.POLLIN)
poll.register(backend, zmq.POLLIN)
while True:
sockets = dict(poll.poll())
if frontend in sockets:
ident, msg = frontend.recv_multipart()
tprint('Server received %s id %s' % (msg, ident))
backend.send_multipart([ident, msg])
if backend in sockets:
ident, msg = backend.recv_multipart()
tprint('Sending to frontend %s id %s' % (msg, ident))
frontend.send_multipart([ident, msg])
frontend.close()
backend.close()
context.term()
class ServerWorker(threading.Thread):
"""ServerWorker"""
def __init__(self, context):
threading.Thread.__init__ (self)
self.context = context
def run(self):
worker = self.context.socket(zmq.DEALER)
worker.connect('inproc://backend')
tprint('Worker started')
while True:
ident, msg = worker.recv_multipart()
tprint('Worker received %s from %s' % (msg, ident))
replies = randint(0,4)
for i in range(replies):
time.sleep(1. / (randint(1,10)))
worker.send_multipart([ident, msg])
worker.close()
def main():
"""main function"""
server = ServerTask()
server.start()
for i in range(3):
client = ClientTask(i)
client.start()
server.join()
if __name__ == "__main__":
main()

So, you grabbed the code from here: Asynchronous Client/Server Pattern
Pay close attention to the images that show you the model this code is targeted to. In particular, look at "Figure 38 - Detail of Asynchronous Server". The ServerWorker class is spinning up 5 "Worker" nodes. In the code, those nodes are threads, but you could make them completely separate programs. In that case, your server program (probably) wouldn't be responsible for spinning them up, they'd spin up separately and just communicate to your server that they are ready to receive work.
You'll see this often in ZMQ examples, a multi-node topology mimicked in threads in a single executable. It's just to make reading the whole thing easy, it's not always intended to be used that way.
For your particular case, it could make sense to have the workers be threads or to break them out into separate programs... but if it's a business requirement from your boss, then just break them out into separate programs.
Of course, to answer your second question, there's no way to know how many workers would be optimal without understanding the work load they'll be performing and how quickly they'll need to respond... your goal is to have the worker complete the work faster than new work is received. There's a fair chance, in many cases, that that can be accomplished with a single worker. If so, you can have your server itself be the worker, and just skip the entire "worker tier" of the architecture. You should start there, for the sake of simplicity, and just do some load testing to see if it will actually cope with your workload effectively. If not, get a sense of how long it takes to complete a task, and how quickly tasks are coming in. Let's say a worker can complete a task in 15 seconds. That's 4 tasks a minute. If tasks are coming in 5 tasks a minute, you need 2 workers, and you'll have a little headroom to grow. If things are wildly variable, then you'll have to make a decision about resources vs. reliability.
Before you get too much farther down the trail, make sure you read Chapter 4, Reliable Request/Reply Patterns, it will provide some insight for handling exceptions, and might give you a better pattern to follow.

A good heartbeat interval for pika-rabbitmq in Amazon ec2

I am using the latest pika library(0.9.9+) for rabbitmq. My usage for rabbitmq and pika is as follows :
I have long running tasks (about 5 minutes) as workers. These tasks take their requests from rabbitmq.The requests come very infrequently i.e. there is a long idle time between requests.
The problem i was facing previously is related to idle connections(connection closures due to idle connections). So, I have enabled heartbeat in pika.
Now the selection of heartbeat is a problem. Pika seems to be a single threaded library where heartbeats reception and acknowledgement happens to be done in-between requests time frame.
So, if the heartbeat interval is set less than the time the callback function uses to do its long running computations, the server does not receive any heartbeat acknowledgements and closes the connection.
So, I assume the minimum heartbeat interval should be the maximum computation time of the callback function in a blocking connection.
What can be a good heartbeat value for amazon ec2 to prevent it closing idle connections ?
Also, some suggest to use rabbitmq keepalive (or libkeepalive) to maintain tcp connections. I think managing heartbeats at the tcp layer is much better because the application need not manage them.Is this true ? Is keepalive a good method when compared to RMQ heartbeats ?
I have seen that some suggest using multiple threads and queue for long running tasks. But is this the only option for long running tasks ? It is quite disappointing that another queue must be used for this scenario.
Thank you in advance. I think I have detailed the problem. Let me know if I can provide more details.

If you're not tied to using pika, this thread helped me achieve what you're trying to do using kombu:
#!/usr/bin/env python
import time, logging, weakref, eventlet
from kombu import Connection, Exchange, Queue
from kombu.utils.debug import setup_logging
from kombu.common import eventloop
from eventlet import spawn_after
eventlet.monkey_patch()
log_format = ('%(levelname) -10s %(asctime)s %(name) -30s %(funcName) '
'-35s %(lineno) -5d: %(message)s')
logging.basicConfig(level=logging.INFO, format=log_format)
logger = logging.getLogger('job_worker')
logger.setLevel(logging.INFO)
def long_running_function(body):
time.sleep(300)
def job_worker(body, message):
long_running_function(body)
message.ack()
def monitor_heartbeats(connection, rate=2):
"""Function to send heartbeat checks to RabbitMQ. This keeps the
connection alive over long-running processes."""
if not connection.heartbeat:
logger.info("No heartbeat set for connection: %s" % connection.heartbeat)
return
interval = connection.heartbeat
cref = weakref.ref(connection)
logger.info("Starting heartbeat monitor.")
def heartbeat_check():
conn = cref()
if conn is not None and conn.connected:
conn.heartbeat_check(rate=rate)
logger.info("Ran heartbeat check.")
spawn_after(interval, heartbeat_check)
return spawn_after(interval, heartbeat_check)
def main():
setup_logging(loglevel='INFO')
# process for heartbeat monitor
p = None
try:
with Connection('amqp://guest:guest#localhost:5672//', heartbeat=300) as conn:
conn.ensure_connection()
monitor_heartbeats(conn)
queue = Queue('job_queue',
Exchange('job_queue', type='direct'),
routing_key='job_queue')
logger.info("Starting worker.")
with conn.Consumer(queue, callbacks=[job_worker]) as consumer:
consumer.qos(prefetch_count=1)
for _ in eventloop(conn, timeout=1, ignore_timeouts=True):
pass
except KeyboardInterrupt:
logger.info("Worker was shut down.")
if __name__ == "__main__":
main()
I stripped out my domain specific code but essentially this is the framework I use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.