ZMQ PUB Send file - python

I'm trying (PY)ZMQ for the first time, and wonder if it's possible to send a complete FILE (binary) using PUB/SUB? I need to send database updates to many subscribers. I see examples of short messages but not files. Is it possible?
publisher:
import zmq
import time
import os
import sys
while True:
print 'loop'
msg = 'C:\TEMP\personnel.db'
# Prepare context & publisher
context = zmq.Context()
publisher = context.socket(zmq.PUB)
publisher.bind("tcp://*:2002")
time.sleep(1)
curFile = 'C:/TEMP/personnel.db'
size = os.stat(curFile).st_size
print 'File size:',size
target = open(curFile, 'rb')
file = target.read(size)
if file:
publisher.send(file)
publisher.close()
context.term()
target.close()
time.sleep(10)
subscriber:
'''always listening'''
import zmq
import os
import time
import sys
while True:
path = 'C:/TEST'
filename = 'personnel.db'
destfile = path + '/' + filename
if os.path.isfile(destfile):
os.remove(destfile)
time.sleep(2)
context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.connect("tcp://127.0.0.1:2002")
subscriber.setsockopt(zmq.SUBSCRIBE,'')
msg = subscriber.recv(313344)
if msg:
f = open(destfile, 'wb')
print 'open'
f.write(msg)
print 'close\n'
f.close()
time.sleep(5)

You shall be able to accomplish to distribute files to many subscribers using zmq and PUB/SUB pattern.
Your code is almost there, or in other words, it might work in most situations, could be improved a bit.
Things to be aware of
Messages are living in memory
The message must fit into memory when getting published (living in PUB socket) and stays there until last currently subscribed consumer does not read it out or disconnects.
The message must also fit into memory when being received. But with reasonable large files (like your 313 kB) it shall work unless you are really short with RAM.
Slow consumer issue
In case you have multiple consumers, and one of them is reading much slower then the others, it will start slowing down all of them. Zmq is explaining this problem and also proposes some methods how to avoid it (e.g. suicide of slow subscriber).
However, in most situations, you will not encounter this problem.
Start your consumer first not to miss a message
zmq messaging is extremely fast. There is no problem, if you start your consumer sooner, then the publisher, zmq makes this scenario easy and consumer will connect automatically.
However, your publisher shall allow consumers to connect before it start publishing, your code does 1 second sleep before sending the message, this shall be sufficient.
Comments to your code
do you really have to sleep after os.remove? Probably not
subscriber.recv - there is no need to know message size in advance, zmq packet is aware of file size, so if you call it without number of bytes to receive, you will get it properly.
Send large files in chunks
zmq provides a feature called multipart messages, but according to doc, it has to fit completely (all message parts) in memory, before being sent out, so this is not the trick to use.
On the other hand, you can create "application level multipart protocol" in such a way, that you decide sending messages with structure like (hasNextPart, chunkData). This way you would be sending in well controlled sized messages and only the last one would tell "hasNextPart" == False.
Consumer would then read and write to disk all the parts until last message, claiming that there is no further part arrives.

Related

Sending object metadata causes FPS drops in the stream

I want to send some object metadata(class_id, confidence value, etc…) to another PC when the object is detected but it causes FPS drops and the stream is frozen. Which parallel programming technique I should use to solve it? Can you give me an example of it?
Checking if detected object in the class_dict:
if obj_meta.class_id in class_dict:
send_one(obj_meta.class_id)
I am using this function to send class_id message.
from __future__ import print_function
import can
def send_one(class_id):
bus = can.interface.Bus()
bus = can.interface.Bus(bustype='socketcan', channel='vcan0', bitrate=250000)
msg = can.Message(arbitration_id=0xc0ffee,
**data=[class_id]**,
is_extended_id=True)
try:
bus.send(msg)
print("Message sent on {}".format(bus.channel_info))
except can.CanError:
print("Message NOT sent")
I am not sure what's your usecase but I would recommend to have a look at msgbroker (DS plugin) for msg passing between the applications
A little bit more code would help, but I'm assuming you are (were?) doing the check inside a gstreamer buffer probe. Buffer probe blocks buffer downstream so no new buffers keep coming until you've disposed of it.
A: using external service: use the msgbroker element to produce messages and inject into alternative service (eg rabbit, kafka). See reference implementation here. Then, use a service-specific consumer to process the data (and call your send_one).
B: from python: You should extract metadata as quickly as possible, and then process it from outside.
from queue import Queue, Empty
from threading import Thread
q = Queue()
...
#in buffer probe:
if obj_meta.class_id in class_dict:
q.put(obj_meta.class_id)
...
def consume():
while True:
try:
data = q.get(block=True, timeout=1)
except Empty:
pass
...
consumer = Thread(target=consume)
consumer.start()
you could improve from this eg by reading in batches, running multiple consumer threads, etc.

Python Twisted multithreaded TCP proxy

I am trying to write a TCP proxy using Python's twisted framework. I started with the Twisted's port forward example and it seems to do the job in a standard secnario. The problem is that I have a rather peculiar scenario. What we need to so is to process each TCP data packet and look for a certain pattern.
In case the pattern matches we need to do a certain process. This process takes anywhere between 30-40 seconds (I know its not a good design but currently thats how things stand). The trouble is that if this process starts all other packets get held up/stuck till the process completes. So if there are 100 live connections and even if 1 of them calls the process all the remaining 99 processes are stuck.
Is there a standard 'twisted' way wherein each connection/session is handled in a separate thread so that the 'blocking process' does not intervene with the other live connections?
Example Code:
from twisted.internet import reactor
from twisted.protocols import portforward
from twisted.internet import threads
def processingOperation(data)
# doing the processing operation here
sleep(30)
return data
def server_dataReceived(self, data):
if data.find("pattern we need to test")<> -1:
data = processingOperation(data)
portforward.Proxy.dataReceived(self, data)
portforward.ProxyServer.dataReceived = server_dataReceived
def client_dataReceived(self, data):
portforward.Proxy.dataReceived(self, data)
portforward.ProxyClient.dataReceived = client_dataReceived
reactor.listenTCP(8383, portforward.ProxyFactory('xxx.yyy.uuu.iii', 80))
reactor.run()
Of cause there is. You defer the processing to a thread. For example:
def render_POST(self, request):
# some code you may have to run before processing
d = threads.deferToThread(method_that_does_the_processing, request)
return ''
There is a trick: This will return before the processing is done. And the client will get the answer back. So you might want to return 202/Accepted instead of 200/Ok (or my dummy '').
If you need to return after the processing is complete, you can use an inline call-back (http://twistedmatrix.com/documents/10.2.0/api/twisted.internet.defer.inlineCallbacks.html).

Python: How to trigger multiple process at same instant

I am trying to run a process that does a http POST which in turn will send an alert(time taken to send an alert is in nano second) to a server. I am trying to test the capacity of the server in handling alerts in milliseconds. As per the given standard, the server is said to handle 6000 alerts/second.
I created a piece of code using multiprocessing module, which sends 6000 alerts, but I am using a for loop and hence the time taken to execute the for loop exceeds more than a second. And hence all the 6000 process are not triggered at SAME INSTANT.
Is there a way to trigger multiple(N number) process at same instant?
This is my code: flowtesting.py which is a library. And this is followed by my script after '####'
import json
import httplib2
class flowTesting():
def init(self, companyId, deviceIp):
self.companyId = companyId
self.deviceIp = deviceIp
def generate_savedSearchName(self, randNum):
self.randMsgId = randNum
self.savedSearchName = "TEST %s risk31 more than 3" % self.randMsgId
def def_request_body_dict(self):
self.reqBody_dict = \
{ "Header" : {"agid" : "Agent1",
"mid": self.randMsgId,
"ts" : 1253125001
},
"mp":
{
"host" : self.deviceIp,
"index" : self.companyId,
"savedSearchName" : self.savedSearchName,
}
}
self.req_body = json.dumps(self.reqBody_dict)
def get_default_hdrs(self):
self.hdrs = {'Content-type': 'application/json',
'Accept-Language': 'en-US,en;q=0.8'}
def send_request(self, sIp, method="POST"):
self.sIp = sIp
self.url = "http://%s:8080/agent/splunk/messages" % self.sIp
http_cli = httplib2.Http(timeout=180, disable_ssl_certificate_validation=True)
rsp, rsp_body = http_cli.request(uri=self.url, method=method, headers=self.hdrs, body=self.req_body)
print "rsp: %s and rsp_body: %s" % (rsp, rsp_body)
# My testScript
from flowTesting import flowTesting
import random
import multiprocessing
deviceIp = "10.31.421.35"
companyId = "CPY0000909"
noMsgToBeSent = 1000
sIp = "10.31.44.235"
uniq_msg_id_list = random.sample(xrange(1,10000), noMsgToBeSent)
def runner(companyId, deviceIp, uniq_msg_id):
proc = flowTesting(companyId, deviceIp)
proc.generate_savedSearchName(uniq_msg_id)
proc.def_request_body_dict()
proc.get_default_hdrs()
proc.send_request(sIp)
process_list = []
for uniq_msg_id in uniq_msg_id_list:
savedSearchName = "TEST-1000 %s risk31 more than 3" % uniq_msg_id
process = multiprocessing.Process(target=runner, args=(companyId,deviceIp,uniq_msg_id,))
process.start()
process.join()
process_list.append(process)
print "Process list: %s" % process_list
print "Unique Message Id: %s" % uniq_msg_id_list
Making them all happen in the same instant is obviously impossible—unless you have a 6000-core machine and an OS kernel whose scheduler is able to handle them all perfectly (which you don't), you can't get 6000 pieces of code running at once.
And, even if you did, what they're all trying to do is to send a message on a socket. Even if your kernel was that insanely parallel, unless you have 6000 separate NICs, they're going to end up serialized in the NIC buffer. That's the way IP works: one packet after another. And of course there are all the routers on the path, the server's NIC, the server's OS, etc. And even if IP doesn't get in the way, bytes take time to transfer over a cable. So the only way to do this at the same instant, even in theory, would be to have 6000 NICs on each side and wire them up directly to each other with identical fiber.
However, you don't really need them in the same instant, just closer to each other than they are. You didn't show us your code, but presumably you're just starting 6000 Processes that all immediately try to send a message. That means you're including the process startup time—which can be pretty slow (especially on Windows)—in the skew time.
You can reduce that by using threads instead of processes. That may seem counterintuitive, but Python is pretty good at handling I/O-bound threads, and every modern OS is very good at starting new threads.
But really, what you need is a Barrier on your threads or processes, to let all of them complete all the setup work (including process startup) before any of them try to do any work.
It still probably won't be tight enough, but it will be a lot tighter than you probably have right now.
The next limit you're going to face is context-switching time. Modern OSs are pretty good at scheduling, but not 6000-simultaneous-tasks good. So really, you want to reduce this to N processes, each one just spamming 6000/N connections sequentially as fast as possible. That will get them into the kernel/NIC much faster than trying to do 6000 at once and making the OS do the serialization for you. (In fact, on some platforms, depending on your hardware, you might actually be better off with one process doing 6000 in a row than N doing 6000/N. Test it both ways.)
There's still some overhead for the socket library itself. To get around that, you want to pre-craft all of the IP packets, then create a single raw socket and spam those packets. Send the first packet from each connection, then the second packet from each connection, etc.
You need to use an inter-process synchronization primitive. On Linux you would use a Sys-V semaphore, on Windows you would use a Win32 event.
Your 6000 processes would wait on this semaphore/event, and from a different process you would trigger it, thus releasing all your 6000 processes from their waiting state to a ready state, and then the OS would start executing them as quickly as possible.

Multiple ipc publishers and one subscriber using python-zmq

I'm wondering if is possible set multiple ipc publishers for one subscriber using zmq ipc...
Abstractly I have only one publisher like this, but I need run multiple instances of it getting several data types but publishing the same format every time.
context = zmq.Context()
publisher = context.socket(zmq.PUB)
publisher.connect("ipc://VCserver")
myjson = json.dumps(worker.data)
publisher.send(myjson)
My subscriber:
context = zmq.Context()
subscriber = context.socket(zmq.SUB)
subscriber.bind("ipc://VCserver")
subscriber.setsockopt(zmq.SUBSCRIBE, '')
while True:
response = subscriber.recv()
if response:
data = json.loads(response)
check_and_store(data)
My subscriber every time is checking the same parameters from the data and storing it on a db.I do not know if it is possible, as this mode of communication uses a shared file and maybe I should think in publisher-subscriber pairs for every instance...
EDITED:Every publisher will send 2kb aprox, 100 times/sec
You can definitely have multiple publishers, the only restriction is that you cannot have multiple binders on one IPC socket - each successive bind simply clobbers previous ones (as opposed to TCP, where you will get EADDRINUSE if you try to bind to an already-in-use interface). Your case should work fine.

Sending data through a socket from another thread does not work in Python

This is my 'game server'. It's nothing serious, I thought this was a nice way of learning a few things about python and sockets.
First the server class initialized the server.
Then, when someone connects, we create a client thread. In this thread we continually listen on our socket.
Once a certain command comes in (I12345001001, for example) it spawns another thread.
The purpose of this last thread is to send updates to the client.
But even though I see the server is performing this code, the data isn't actually being sent.
Could anyone tell where it's going wrong?
It's like I have to receive something before I'm able to send. So I guess somewhere I'm missing something
#!/usr/bin/env python
"""
An echo server that uses threads to handle multiple clients at a time.
Entering any line of input at the terminal will exit the server.
"""
import select
import socket
import sys
import threading
import time
import Queue
globuser = {}
queue = Queue.Queue()
class Server:
def __init__(self):
self.host = ''
self.port = 2000
self.backlog = 5
self.size = 1024
self.server = None
self.threads = []
def open_socket(self):
try:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind((self.host,self.port))
self.server.listen(5)
except socket.error, (value,message):
if self.server:
self.server.close()
print "Could not open socket: " + message
sys.exit(1)
def run(self):
self.open_socket()
input = [self.server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == self.server:
# handle the server socket
c = Client(self.server.accept(), queue)
c.start()
self.threads.append(c)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
# close all threads
self.server.close()
for c in self.threads:
c.join()
class Client(threading.Thread):
initialized=0
def __init__(self,(client,address), queue):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.queue = queue
print 'Client thread created!'
def run(self):
running = 10
isdata2=0
receivedonce=0
while running > 0:
if receivedonce == 0:
print 'Wait for initialisation message'
data = self.client.recv(self.size)
receivedonce = 1
if self.queue.empty():
print 'Queue is empty'
else:
print 'Queue has information'
data2 = self.queue.get(1, 1)
isdata2 = 1
if data2 == 'Exit':
running = 0
print 'Client is being closed'
self.client.close()
if data:
print 'Data received through socket! First char: "' + data[0] + '"'
if data[0] == 'I':
print 'Initializing user'
user = {'uid': data[1:6] ,'x': data[6:9], 'y': data[9:12]}
globuser[user['uid']] = user
print globuser
initialized=1
self.client.send('Beginning - Initialized'+';')
m=updateClient(user['uid'], queue)
m.start()
else:
print 'Reset receivedonce'
receivedonce = 0
print 'Sending client data'
self.client.send('Feedback: ' +data+';')
print 'Client Data sent: ' + data
data=None
if isdata2 == 1:
print 'Data2 received: ' + data2
self.client.sendall(data2)
self.queue.task_done()
isdata2 = 0
time.sleep(1)
running = running - 1
print 'Client has stopped'
class updateClient(threading.Thread):
def __init__(self,uid, queue):
threading.Thread.__init__(self)
self.uid = uid
self.queue = queue
global globuser
print 'updateClient thread started!'
def run(self):
running = 20
test=0
while running > 0:
test = test + 1
self.queue.put('Test Queue Data #' + str(test))
running = running - 1
time.sleep(1)
print 'Updateclient has stopped'
if __name__ == "__main__":
s = Server()
s.run()
I don't understand your logic -- in particular, why you deliberately set up two threads writing at the same time on the same socket (which they both call self.client), without any synchronization or coordination, an arrangement that seems guaranteed to cause problems.
Anyway, a definite bug in your code is you use of the send method -- you appear to believe that it guarantees to send all of its argument string, but that's very definitely not the case, see the docs:
Returns the number of bytes sent.
Applications are responsible for
checking that all data has been sent;
if only some of the data was
transmitted, the application needs to
attempt delivery of the remaining
data.
sendall is the method that you probably want:
Unlike send(), this method continues
to send data from string until either
all data has been sent or an error
occurs.
Other problems include the fact that updateClient is apparently designed to never terminate (differently from the other two thread classes -- when those terminate, updateClient instances won't, and they'll just keep running and keep the process alive), redundant global statements (innocuous, just confusing), some threads trying to read a dict (via the iteritems method) while other threads are changing it, again without any locking or coordination, etc, etc -- I'm sure there may be even more bugs or problems, but, after spotting several, one's eyes tend to start to glaze over;-).
You have three major problems. The first problem is likely the answer to your question.
Blocking (Question Problem)
The socket.recv is blocking. This means that execution is halted and the thread goes to sleep until it can read data from the socket. So your third update thread just fills the queue up but it only gets emptied when you get a message. The queue is also emptied by one message at a time.
This is likely why it will not send data unless you send it data.
Message Protocol On Stream Protocol
You are trying to use the socket stream like a message stream. What I mean is you have:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
The SOCK_STREAM part says it is a stream not a message such as SOCK_DGRAM. However, TCP does not support message. So what you have to do is build messages such as:
data =struct.pack('I', len(msg)) + msg
socket.sendall(data)
Then the receiving end will looking for the length field and read the data into a buffer. Once enough data is in the buffer it can grab out the entire message.
Your current setup is working because your messages are small enough to all be placed into the same packet and also placed into the socket buffer together. However, once you start sending large data over multiple calls with socket.send or socket.sendall you are going to start having multiple messages and partial messages being read unless you implement a message protocol on top of the socket byte stream.
Threads
Even though threads can be easier to use when starting out they come with a lot of problems and can degrade performance if used incorrectly especially in Python. I love threads so do not get me wrong. Python also has a problem with the GIL (global interpreter lock) so you get bad performance when using threads that are CPU bound. Your code is mostly I/O bound at the moment, but in the future it may become CPU bound. Also you have to worry about locking with threading. A thread can be a quick fix but may not be the best fix. There are circumstances where threading is quite simply the easiest way to break some time consuming process. So do not discard threads as evil or terrible. In Python they are considered bad mainly because of the GIL, and in other languages (including Python) because of concurrency issues so most people recommend you to use multiple processes with Python or use asynchronous code. The subject of to use a thread or not is very complex as it depends on the language (way your code is run), the system (single or multiple processors), and contention (trying to share a resource with locking), and other factors, but generally asynchronous code is faster because it utilizes more CPU with less overhead especially if you are not CPU bound.
The solution is the usage of the select module in Python, or something similar. It will tell you when a socket has data to be read, and you can set a timeout parameter.
You can gain more performance by doing asynchronous work (asynchronous sockets). To turn a socket into asynchronous mode you simply call socket.settimeout(0) which will make it not block. However, you will constantly eat CPU spinning waiting for data. The select module and friends will prevent you from spinning.
Generally for performance you want to do as much asynchronous (same thread) as possible, and then expand with more threads that are also doing as much asynchronously as possible. However as previously noted Python is an exception to this rule because of the GIL (global interpreter lock) which can actually degrade performance from what I have read. If you are interesting you should try writing a test case and find out!
You should also check out the thread locking primitives in the threading module. They are Lock, RLock, and Condition. They can help multiple threads share data with out problems.
lock = threading.Lock()
def myfunction(arg):
with lock:
arg.do_something()
Some Python objects are thread safe and others are not.
Sending Updates Asynchronously (Improvement)
Instead of using a third thread only to send updates you could instead use the client thread to send updates by checking the current time with the last time an update was sent. This would remove the usage of a Queue and a Thread. Also to do this you must convert your client code into asynchronous code and have a timeout on your select so that you can at interval check the current time to see if an update is needed.
Summary
I would recommend you rewrite your code using asynchronous socket code. I would also recommend that you use a single thread for all clients and the server. This will improve performance and decrease latency. It would make debugging easier because you would have no possible concurrency issues like you have with threads. Also, fix your message protocol before it fails.

Categories

Resources