Which protocol should I use for pyzmq?

Which protocol should I use for pyzmq? - python

I am working on a project where I have a client server model in python. I set up a server to monitor requests and send back data. PYZMQ supports: tcp, udp, pgm, epgm, inproc and ipc. I have been using tcp for interprocess communication, but have no idea what i should use for sending a request over the internet to a server. I simply need something to put in:
socket.bind(BIND_ADDRESS)
DIAGRAM: Client Communicating over internet to server running a program

Any particular reason you're not using ipc or inproc for interprocess communication?
Other than that, generally, you can consider tcp the universal communicator; it's not always the best choice, but no matter what (so long as you actually have an IP address) it will work.
Here's what you need to know when making a choice between transports:
PGM/EPGM are multicast transports - the idea is that you send one message and it gets delivered as a single message until the last possible moment where it will be broken up into multiple messages, one for each receiver. Unless you absolutely know you need this, you don't need this.
IPC/Inproc are for interprocess communication... if you're communicating between different threads in the same process, or different processes on the same logical host, then these might be appropriate. You get the benefit of a little less overhead. If you might ever add new logical hosts, this is probably not appropriate.
Russle Borogove enumerates the difference between TCP and UDP well. Typically you'll want to use TCP. Only if absolute speed is more important than reliability then you'll use UDP.
It was always my understanding that UDP wasn't supported by ZMQ, so if it's there it's probably added by the pyzmq binding.
Also, I took a look at your diagram - you probably want the server ZMQ socket to bind and the client ZMQ socket to connect... there are some reasons why you might reverse this, but as a general rule the server is considered the "reliable" peer, and the client is the "transient" peer, and you want the "reliable" peer to bind, the "transient" peer to connect.

Over the internet, TCP or UDP are the usual choices. I don't know if pyzmq has its own delivery guarantees on top of the transport protocol. If it doesn't, TCP will guarantee in-order delivery of all messages, while UDP may drop messages if the network is congested.
If you don't know what you want, TCP is the simplest and safest choice.

Related

Is it possible to start a TCP socket connection without a handshake?

I am making a SSL server, and I don't use python's library as I want to make some unorthodox changes to the process. Because of that, I cannot simply start a TCP connection since I need to transfer the encryption details with the handshake, which I can't do over sockets. So I am using scapy to make the handshake itself, but after that I would like to continue working with a TCP socket without going through the process of the handshake again. Is that possible?

If I understand your question correctly, you exchanged a few segments using scapy and now want to manufacture a normal full-blown socket out of them.
This is not easily possible: for all practical purposes your TCP is oblivious to whatever you sent in your packets and it doesn't keep any state for this TCP connection: all the state is in your application.
That said, there is a thing called TCP_REPAIR in Linux that lets you put a socket in a given state.
When this option is used, a socket is switched into a special mode, in
which any action performed on it does not result in anything defined
by an appropriate protocol actions, but rather directly puts the
socket into a state, in which the socket is expected to be at the end
of the successfully finished operation.
If you set sequence numbers correctly, the socket should "just work".,
One also needs to restore the TCP sequence numbers. To do so, the
TCP_REPAIR_QUEUE and TCP_QUEUE_SEQ options were introduced.
Of course all this is specific to a modern Linux; other operating systems may or may not have similar mechanisms.

Twisted RPC message aggregation

I'm working with a python application that makes remote procedure calls, using Twisted Perspective broker's callRemote, on a TCP connection. From a system call trace, it appears that multiple remote procedure calls from the sender could be aggregated together into a single sendto() call on the socket. The same behavior was observed with the receiver's response as well. I would've thought that as long as the socket was write-able and if there was some data to send, Perspective broker would send it out on the socket. But it does not appear to be the case.
Does Twisted's Perspective broker aggregate multiple RPC messages together for a specific reason, before they are sent on the socket ? In other words, does Twisted do something similar to Nagle's algorithm in TCP ?
If the above is true, is there an option to turn off this behavior ?

Twisted performs write buffering in the underlying twisted.internet.abstract.FileDescriptor object. You can try changing the twisted.internet.abstract.FileDescriptor.SEND_LIMIT attribute to something smaller to force it to write to the socket more frequently.
See the Twisted bug 4089 for discussion about the SEND_LIMIT and bufferSize attributes.

Can pyzmq pub/sub sockets be used bidirectionally?

I'm using a pyzmq pub/sub socket for a server to advertise notifications to client subscribers. It works nicely but I have a question:
Is there any way to use the same socket to send information back to the server? Or do I need a separate socket for that?
Use case: I just want to allow the server to see who's actively subscribing to notifications, so I was hoping I could allow clients to send back periodic "heartbeat" messages. I have a use case where if no clients are listening, I want the server to spawn one. (This is a multiprocess system that uses localhost only.)

You need a separate socket. From the ZMQ guide (http://zguide.zeromq.org/page:all#Pros-and-Cons-of-Pub-Sub):
Killing back-chatter is essential to real scalability. With pub-sub, it's how the pattern can map cleanly to the PGM multicast protocol, which is handled by the network switch. In other words, subscribers don't connect to the publisher at all, they connect to a multicast group on the switch, to which the publisher sends its messages.
In order for this to work, the PUB socket will not send back data to the subscribers (at least not in a way visible to the user. The heartbeating problem was discussed in-depth in the guide: http://zguide.zeromq.org/page:all#The-Asynchronous-Client-Server-Pattern
Also, check out the 7/MDP and 18/MDP protocols (http://rfc.zeromq.org/spec:7 -- this is also discussed in the guide) if you want to keep track of clients.

Multi-threaded UDP server with Python

I want to create a simple video streaming (actually, image streaming) server that can manage different protocols (TCP Push/Pull, UDP Push/Pull/Multicast).
I managed to get TCP Push/Pull working with the SocketServer.TCPServer class and ThreadinMixIn for processing each connected client in a different thread.
But now that I'm working on the UDP protocol, I just realized that ThreadinMixIn creates a thread per call of handle() per client query (as there's nothing such as a "connection" in UDP).
The problem is I need to process a sequence of queries by the same client, for all the clients. How could I manage that ?
The only way I see I could handle that is to have a list of (client adresses, processing thread) and send each query to the matching thread (or create a new one if the client haven't sent any thread yet). Is there an easier way to do that ?
Thanks !
P.S : I can't use any external or too "high-level" library for this as it's a school subject meant to understand how sockets work.

Take a look at Twisted. This will remove the need to do any thread dispatch from your application. You still have to match up packets to a particular session in order to handle them, but this isn't difficult (use a port per client and dispatch based on the port, or require packets in a session to always come from the same address and use the peer address, or use one of the existing protocols that solves this problem such as SIP).

Python Socket programming (TCP vs. UDP)

I'm planning to design a server that receives data from multiple clients, the server don't need to send anything back to the client, though STATUS_OK is still cool but not necessary.
I know the basics of Python socket module, twisted framework but my question is, should i use UDP or TCP? The client that need to stay connected at all.
I hope you guys understand my question, thank you for your wonderful help here

You should always use TCP until you have a performance problem that you know can be mitigated with UDP. TCP is easier to understand when it fails.

Can you afford to lose messages? If yes, use UDP. Otherwise use TCP. It's what they're designed for.

I would use TCP in your situation, but it's hard to tell what the specifics of your needs are. TCP is in most cases a better protocol because it's much more reliable. Data is very rarely lost in TCP, however this does slow it down a bit. Since you're not sending anything back to the client, the fact that TCP is a streaming protocol shouldn't really matter too much.
So I'd just go with TCP.

For how long will one client be connected to the server? How many concurrent connections are you planning to handle? If there will be very short bursts of data for a lot of clients, then you should go with UDP. But chances are, TCP will do just fine initially.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.