Twisted Python: multicast server not working as expected

Twisted Python: multicast server not working as expected - python

I am experimenting with twisted python's multicast protocol. This is a simple example:
I created two servers, listening on 224.0.0.1 and 224.0.0.2 like below:
from twisted.internet.protocol import DatagramProtocol
from twisted.internet import reactor
from twisted.application.internet import MulticastServer
class MulticastServerUDP(DatagramProtocol):
def __init__ (self, group, name):
self.group = group
self.name = name
def startProtocol(self):
print '%s Started Listening' % self.group
# Join a specific multicast group, which is the IP we will respond to
self.transport.joinGroup(self.group)
def datagramReceived(self, datagram, address):
print "%s Received:"%self.name + repr(datagram) + repr(address)
reactor.listenMulticast(10222, MulticastServerUDP('224.0.0.1', 'SERVER1'), listenMultiple = True)
reactor.listenMulticast(10222, MulticastServerUDP('224.0.0.1', 'SERVER2'), listenMultiple = True)
reactor.run()
Then I run this code to send "HELLO":
import socket
MCAST_GRP = '224.0.0.1'
MCAST_PORT = 10222
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 2)
sock.sendto("HELLO", (MCAST_GRP, MCAST_PORT))
The results were quite confusing. There are several cases:
-When I set all group IP and MCAST_GRP to 224.0.0.1, both servers received the message (expected)
-When I set both servers' group IP to 224.0.0.1 and MCAST_GRP in the sending script to 224.0.0.2 (or something different from 224.0.0.1), both servers did not receive the message (expected)
-When I set one server's group IP to 224.0.0.1 and the other 224.0.0.2, strange things happen. When I set MCAST_GRP to 224.0.0.1 or 224.0.0.2, I expected only ONE of the two servers to receive the message. The result was that BOTH servers received the message. I am not sure what is going on. Can someone explain this?
Note: I am running these on the same machine.
SL

It's a little tricky, indeed.
You must write it this way:
reactor.listenMulticast(
10222,
MulticastServerUDP('224.0.0.1', 'SERVER1'),
listenMultiple=True,
interface='224.0.0.1'
)
reactor.listenMulticast(
10222,
MulticastServerUDP('224.0.0.2', 'SERVER2'),
listenMultiple=True,
interface='224.0.0.2'
)
I had the same problem before. I had to look at the source to find it out.
But I solved it, because of my background in network programming in C.

Multicast is wacky and platform (Linux, Windows, OS X, etc) implementations of multicast are even wackier.
Twisted is just reflecting the platform's multicast behavior here, so this is only vaguely a Twisted-related question. Really, it's a multicast question and a platform question.
Here's a slightly educated guess as to what's going on.
Multicast works by having hosts subscribe to addresses. When a program running on the host joins a group (eg 224.0.0.1), the host makes a note of this locally and does some network operations (IGMP) to tell nearby hosts (probably via a router, but I'm fuzzy on the details of this part) that it is now interested in messages for that group.
In the ideal universe of the creators of multicast, that subscription propagates all the way through the internet. This is necessary so that whenever anyone anywhere on the internet sends a message to that group, whichever routers get their hands on it can deliver it to all the hosts that have subscribed to the group. This is supposed to be more efficient than broadcast because only hosts that have subscribed need the message delivered to them. Since routers are tracking subscriptions, they can skip sending the traffic down links that have no subscribed hosts.
In the real universe, multicast subscriptions usually don't get propagated very far (eg, they reach the first router, probably the one running your house LAN, and stop there).
So it may seem like all that information about the ideal universe is irrelevant to this scenario. However! My suspicion is that most of the people implementing multicast thought really, really hard about that first part and were pretty tired by the time they were done implementing it.
Once a message for a multicast group actually gets to a host, the host needs to deliver it to the programs that are actually interested in it. Here, I suspect, implementers were too tired to do the right thing. Instead, they did a variety of lazy, easy things (depending on your platform). For example, some of them just visited every single open socket on the system that was subscribed to a multicast group and delivered the message to them.
On other platforms, you'll sometimes find that a single multicast message is delivered to a single listening multicast socket more than once. And of course there's the popular issue of multicast messages never being delivered at all.
Enjoy your wacky times in multicast land!

Related

How does a Python listening socket get setup?

When you setup a simple TCP listening socket using the Python 'socket' module, what are the different steps involved doing?
The code I'm talking about looks like this:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('localhost', 50000))
s.listen(1)
conn, addr = s.accept()
The s = ... seems pretty straightforward - you are expressing your intent to create an ipv4 TCP socket, without having done anything yet.
What I'm curious about is this:
What does it mean to bind to a socket, without listening?
How does limiting the number of unaccepted connections using listen(n) work?
If you have listen(1), you're in the middle of dealing with the first connection you accepted, and a second client tries to connect, is the second client waiting for the SYN-ACK? Or does the 3 way handshake happen, and he's waiting for actual data?
What happens if a third client tries to connect - does he immediately get a TCP RST?
Does setting the number of unaccepted connections here set some option in the kernel to indicate how many connections it should accept? Or is this all handled in Python?
How can you be listening without accepting? What does it mean to accept a connection?
Every article I've come across seems to just assume these steps make sense to everyone, without explaining what exactly it is that each one does. They just use generic terms like
listen() starts listening for connections
bind() binds to a socket
accept() just accepts the connection
Defining a word by using that word in the definition is kind of a dumb way to explain something.

it's basically a 1-to-1 from the POSIX c calls and as such I'm including links to the man pages, so that you can read their explanation and corresponding c code:
socket creates a communication endpoint by means of a file-descriptor in the namespace of the address-family you specified but assigns neither address nor port.
bind assigns an address and port to said socket, a port which may be chosen randomly if you request a port for which you do not have the privilige. (like < 1024 for non-root user)
listen makes the specific socket and hence address and port a passive one, meaning that it will accept incoming connections with the accept call. To handle multiple connections one after the other, you get to specify a backlog containing them, connections that arrive while you're handling one get appended. Once the backlog is full, the system will respond as such to those systems with an approach that makes them reconnect by withholding SYN, withholding ACK response etc..
As usual you can find someone explaining the previous to you a lot better.
accept then creates a new non-listening socket associated with a new file descriptor that you then use for communication with said connecting party.
accept also works as a director for your flow of execution, effectively blocking further progress until a connection is actually available in the queue for it to take, like a spinlock. The only way around that is to declare the socket non-blocking in which case it would return immediately with an error.

How do I connect two machines (with two different public IPs) via sockets using Python? Is it possible?

I'm new to the world of networking and wanted to clarify some of my thoughts regarding a problem I am facing at the moment. I saw this post which makes me think what I'm doing may be impossible, but I thought it would be worth a shot to ask on here and see what more qualified people think about it.
I am a TA for an intro computer science course, and I am writing a final project for students to complete at the end of the semester. Essentially, the project would be to fill in the holes in the implementation of a messaging client. I have set it up so each client would run two threads (one to listen for incoming messages, and one to wait for input to send messages to the other client). I have gotten this to work successfully on localhost with communication between two different port numbers, and am trying to find a way to have this work over the network so the two clients do not necessarily have to be on the same machine.
After struggling through a few methods, I came up with this solution: I would host a server on Heroku that would keep track of the clients' IPs and port numbers, and use a rest API so that one client could easily get the IP and port of the other client they are trying to communicate with. I have tested this, and the API seems to work. Thus, a client can create a socket endpoint and send it to this server to be entered into its database, and when the communication is terminated, it is removed from the database (this JSON would store a username as the primary key and internally manage an IP and port number) as the connection is now closed.
So, what I have is each client with an IP and port number knowing the IP and port number it is trying to communicate with. My last struggle is to actually form the connection. I understand there is a distinction between localhost (127.0.0.1) and the public IP for an internet endpoint. Upon searching, I found a way to find the public IP for the current user to share with the database, but I cannot bind to it. Whenever I try to, I get sockets error code 13: permission denied. I would imagine that if I tried connecting to the public IP of the other machine, I would get a similar error (but I cannot test the client until I can get a server running!).
I read online that some router work would be needed to actually form this connection between two machines. I guess I'm struggling to understand the practicality of socket programming if such a simple operation (connecting two socket endpoints on two different computers) requires so much tweaking. Is there something I am missing?
For reference, here is a general outline of my code thus far. The server:
# Server thread
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((LOCAL_IP, AVAILABLE_PORT))
s.listen(1)
# In my code, there is a quitting mechanism which closes s as well.
while True:
client_socket, addr = s.accept()
data = client_socket.recv(1024)
print "Received: " + data
client_socket.close()
...and the client:
# Client thread
# It is an infinite loop so I am always waiting for another potential message to send
while True:
x = raw_input()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((OTHER_MACHINE_LOCAL_IP, OTHER_PORT))
sock.sendall(x)
sock.close()
Right now, I cannot make progress with these permission denied errors. Is there any way I could get this to work? Understand that, being that this would be for around 250ish students to use who are all intro CS students, I would want to avoid having to instruct them to do anything with their routers.
If there is another method to do this which would make this easier that I am missing, I would also love to hear any suggestions :) Thanks in advance!

Which protocol should I use for pyzmq?

I am working on a project where I have a client server model in python. I set up a server to monitor requests and send back data. PYZMQ supports: tcp, udp, pgm, epgm, inproc and ipc. I have been using tcp for interprocess communication, but have no idea what i should use for sending a request over the internet to a server. I simply need something to put in:
socket.bind(BIND_ADDRESS)
DIAGRAM: Client Communicating over internet to server running a program

Any particular reason you're not using ipc or inproc for interprocess communication?
Other than that, generally, you can consider tcp the universal communicator; it's not always the best choice, but no matter what (so long as you actually have an IP address) it will work.
Here's what you need to know when making a choice between transports:
PGM/EPGM are multicast transports - the idea is that you send one message and it gets delivered as a single message until the last possible moment where it will be broken up into multiple messages, one for each receiver. Unless you absolutely know you need this, you don't need this.
IPC/Inproc are for interprocess communication... if you're communicating between different threads in the same process, or different processes on the same logical host, then these might be appropriate. You get the benefit of a little less overhead. If you might ever add new logical hosts, this is probably not appropriate.
Russle Borogove enumerates the difference between TCP and UDP well. Typically you'll want to use TCP. Only if absolute speed is more important than reliability then you'll use UDP.
It was always my understanding that UDP wasn't supported by ZMQ, so if it's there it's probably added by the pyzmq binding.
Also, I took a look at your diagram - you probably want the server ZMQ socket to bind and the client ZMQ socket to connect... there are some reasons why you might reverse this, but as a general rule the server is considered the "reliable" peer, and the client is the "transient" peer, and you want the "reliable" peer to bind, the "transient" peer to connect.

Over the internet, TCP or UDP are the usual choices. I don't know if pyzmq has its own delivery guarantees on top of the transport protocol. If it doesn't, TCP will guarantee in-order delivery of all messages, while UDP may drop messages if the network is congested.
If you don't know what you want, TCP is the simplest and safest choice.

Is it possible to have a socket listening to two UDP IPs, one that is 127.0.0.1 (same machine) and a different computer at the same time?

An eye-tracking application I use utilizes UDP to send packets of data. I made a python socket on the same computer to listen and dump the data into a .txt file. I already have this much working.
A separate application also written in python (what the eye-tracked subject is seeing) is running on a separate computer. Because the eye-tracking application is continuous and sends unnecessary data, so far I've had to manually parse out the instances when the subject is looking at desired stimuli. I did this based on a manually synchronized start of both the stimuli and eye-tracking applications and then digging through the log file.
What I want to do is have the second computer act as a second UDP client, sending a packet of data to the socket on the eye-tracking computer everytime the subject is looking at stimuli (where a marker is inserted into the .txt file previously mentioned). Is it possible to have a socket listening to two IP addresses at one time?
Here's my socket script:
#GT Pocket client program
import datetime
import socket
now = datetime.datetime.now()
filename = 'C:\gazelog_' + now.strftime("%Y_%m_%d_%H_%M") + '.txt'
UDP_IP = '127.0.0.1' # The remote host (in this case our local computer)
UDP_PORT = 6666 # The same port as used by the GT server by default
sock = socket.socket(socket.AF_INET, #internet
socket.SOCK_DGRAM) #UDP
sock.bind( (UDP_IP, UDP_PORT) )
while True:
data, addr = sock.recvfrom( 1024) #I assume buffer size is 1024 bytes.
print "Received Message:", data
with open(filename, "a") as myfile:
myfile.write(str(data + "\n"))
sock.close()
myfile.close()
EDIT:
#abarnert I was able to bind to the host address on the Ethernet interface and send a message from computer B to computer A, but computer A was no long able to receive packets from itself. When I specified UDP_IP = '0.0.0.0' computer B was no longer able to send data across the Ethernet. When I specified UDP_IP = '' I received the `error: [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
This have to do with the script I used on the Computer B to send the data:
import socket
UDP_IP = "169.254.35.231" # this was the host address I was able to send through.
UDP_PORT = 6666
MESSAGE = "Start"
print ("UDP target IP:"), UDP_IP
print ("UDP target port:"), UDP_PORT
print ("message:"), MESSAGE
sock = socket.socket(socket.AF_INET,
socket.SOCK_DGRAM)
sock.sendto(MESSAGE, (UDP_IP, UDP_PORT) )
I didn't know where (or if at all) I needed to specify INADDR_ANY, so I didn't. But I did try once where import socket.INADDR_ANY but got ImportError: No module named INADDR_ANY
Seems like a simple issue based on your response, so I'm not sure where I'm messing up.
EDIT2: I just reread your answer again and understand why socket.INADDR_ANY doesn't work. Please disregard that part of my previous edit
EDIT3: Okay so the reason that I wasn't picking up data when specifying the host IP was that the application I was collecting data from on Computer A was still specified to send to 127.0.0.1. So I figured it out. I am still curious why 0.0.0.0 didn't work though!

No. A socket can only be bound to a single address at a time.*
If there happens to be a single address that handles both things you want, you can use a single socket to listen to it. In this case, the INADDR_ANY host (0.0.0.0) may be exactly what you're looking for—that will handle any (IPv4) connections on all interfaces, both loopback and otherwise. And even if there is no pre-existing address that does what you want, you may be able to set one up via, e.g., an ipfilter-type interface.
But otherwise, you have to create two sockets. Which means you need to either multiplex with something like select, or create two threads.
In your case, you want to specify a host that can listen to both the local machine, and another machine on the same Ethernet network. You could get your host address on the Ethernet interface and bind that. (Your machine can talk to itself on any of its interfaces.) Usually, getting your address on "whatever interface is the default" works for this too—you'll see code that binds to socket.gethostname() in some places, like the Python Socket Programming HOWTO. But binding to INADDR_ANY is a lot simpler. Unless you want to make sure that machines on certain interfaces can't reach you (which is usually only a problem if you're, e.g., building a server intended to live on a firewall's DMZ), you'll usually want to use INADDR_ANY.
Finally, how do you bind to INADDR_ANY? The short answer is: just use UDP_IP = '', or UDP_IP = '0.0.0.0' if you want to be more explicit. Anyone who understands sockets, even if they don't know any Python, will understand what '0.0.0.0' means in server code.(You may wonder why Python doesn't have a constant for this in the socket module, especially when even lower-level languages like C do. The answer is that it does, but it's not really usable.**)
* Note that being bound to a single address doesn't mean you can only receive packets from a single address; it means you can receive packets from all networks where that single address is reachable. For example, if your machine has a LAN connection, where your address is 10.0.0.100, and a WAN connection, where your address is 8.9.10.11, if you bind 10.0.0.100, you can receive packets from other LAN clients like 10.0.0.201 and 10.0.0.202. But you can't receive packets from WAN clients like 9.10.11.12 as 10.0.0.100.
** In the low-level sockets API, dotted-string addresses like '0.0.0.0' are converted to 32-bit integers like 0. Python sometimes represents those integers as ints, and sometimes as 4-byte buffers like b'\0\0\0\0'. Depending on your platform and version, the socket.INADDR_ANY constant can be either 0 or b'\0\0\0\0'. The bind method will not take 0, and may not take b'\0\0\0\0'. And you can't convert to '0.0.0.0' without first checking which form you have, then calling the right functions on it. This is ugly. That's why it's easier to just use '0.0.0.0'.

I believe you can bind a raw socket to an entire interface, but you appear to be using two different interfaces.
It's probably best to use two sockets with select().

How to get Multicast Group Address from datagramReceived in Twisted?

This is the following code example from Twisted for dealing with receiving of multicasts. I currently am listening to many groups with the same client and I want to be able to print out which group a certain datagram packet came from. I would think that this can be received from the address parameter of datagramReceived; however, this only gives me a tuple containing the local ip and port that a group is bound to, but not the address of the group itself.
Question: How can I print the multicast address from where a datagram originated from within the Twisted protocol/API?
from twisted.internet.protocol import DatagramProtocol
from twisted.internet import reactor
class MulticastPingClient(DatagramProtocol):
def startProtocol(self):
# Join the multicast address, so we can receive replies:
self.transport.joinGroup("228.0.0.5")
self.transport.joinGroup("229.0.2.11")
self.transport.joinGroup("221.3.3.3")
# Send to 228.0.0.5:8005 - all listeners on the multicast address
# (including us) will receive this message.
self.transport.write('Client: Ping', ("228.0.0.5", 8005))
def datagramReceived(self, datagram, address):
print "Datagram %s received from %s" % (repr(datagram), repr(address))
reactor.listenMulticast(8005, MulticastPingClient(), listenMultiple=True)
reactor.run()

Unfortunately the socket APIs Twisted wraps (via Python's standard library) do not provide this information. I would recommend having a separate DatagramProtocol for each multicast group, and listening on different ports for each. Although someone could still send a UDP datagram directly to to that port and you wouldn't be able to distinguish it from multicast.
I have a vague memory indicating that the recvmsg() API does provide the necessary information, though I lack the time to verify this. Twisted 12.1 has the beginning of a recvmsg() wrapper, so this may be a possible way to add this functionality to Twisted (or your code) with a bit more work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.