How to reassemble TCP packets in Python? Is there any existing tools for this?
Thanks! :-)
To do perform TCP reassembly you'll need to use something like pynids http://jon.oberheide.org/pynids/.
You can also build your own using pylibpcap, dpkt or scapy.
TCP reassembly is very tricky with a LOT of edge cases. I wouldn't recommend doing it yourself if you need a robust solution.
Yes... the TCP protocol guarantees that the application layer will only see packets assembled and in order. Now if you are talking about building some low level interface parsing the IP packet itself, you can take a stab at it with RAW sockets which should give you access to IP header information. Here's an example:
import socket
# the public network interface
HOST = socket.gethostbyname(socket.gethostname())
# create a raw socket and bind it to the public interface
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_IP)
s.bind((HOST, 0))
# Include IP headers
s.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1)
# receive all packages
s.ioctl(socket.SIO_RCVALL, socket.RCVALL_ON)
# receive a package
print s.recvfrom(65565)
# disabled promiscuous mode
s.ioctl(socket.SIO_RCVALL, socket.RCVALL_OFF)
lifted shamelessly from the python socket module documentation:
http://docs.python.org/library/socket.html
Related
I'm having a strange problem with python udp sockets. When I send data over them, they open an udp port on 0.0.0.0 and I cannot find out why they do, what they do listen to and how to deactivate that behaviour. Our system administrators don't like ports to be listened on 0.0.0.0 (reasonably).
Minimum example:
import socket
fam = socket.AF_INET
family, _, _, _, addr = socket.getaddrinfo('localhost', 9999, fam, socket.SOCK_DGRAM)[0]
sock = socket.socket(family, socket.SOCK_DGRAM)
sock.sendto('foobar'.encode('ascii'), addr)
Right after the last method call, the python program listens to:
udp 0 0 0.0.0.0:41972 0.0.0.0:* 1000 308716 17777/python3
And this seems to stay until the python executable stops. So my question is, does anyone here have the same problem and how can I avoid it?
Thank you very much!
Which port do you expect your packets to come from?
Sockets have ports at both ends. Packets come from a port and go to a port. You didn't pick a port (with bind), so the operating system chose one for you.
It's not a problem, it's how sockets work.
0.0.0.0 means "any IP address" by the way.
just to elaborate on my comment, here's how I'd put the socket calls together:
import socket
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
sock.bind(('127.0.0.1', 0)) # optional
sock.connect(('localhost', 9999))
sock.send(b'foobar')
notes:
connecting a UDP socket should cause it to be bound to something more appropriate than 0.0.0.0, and hence why I put an optional comment
getaddrinfo doesn't help you much, hence I'm just passing names and letting Python resolve them internally
using a name for the connect call and dotted-quad notation for bind looks a little strange. I'd suggest using just one format for consistency, or just using connect and not using bind
getaddrinfo is really useful when you want to be able to transparently handle IPv6 along with IPv4 addresses, especially for hosts that resolve to multiple addresses. see the Happy Eyeballs algorithm for an example
So, I've found a workaround, even if I think it's not the cleanest way to do that, I couldn't find another one.
I used the method bind(addr: tuple) on the udp socket to make it listen to only a specific IP. In my case this looks like:
import socket
fam = socket.AF_INET
family, _, _, _, addr = socket.getaddrinfo('localhost', 9999, fam, socket.SOCK_DGRAM)[0]
sock = socket.socket(family, socket.SOCK_DGRAM)
sock.bind(('127.0.0.1', 0)) # This is the new line where I bind only to 127.0.0.1
sock.sendto('foobar'.encode('ascii'), addr)
Thanks to #user253751 for leading me in the right direction.
I have a client which is creating packets and sending packets to a destination in a network which has been created using mininet. Now I am writing a python program at the destination to count the number of packets which has arrived. Now I know for sure that the packets are arriving at the destination (used tcpdump to verify it)
How do I go about it?
I thought of using this -
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_TCP)
print s.recvfrom(5001)
But this seems to be slow. Is there any other alternative?
You want socket.IPPROTO_UDP for UDP packets, but otherwise, that's basically what you must do. No matter what other things you try, it's going to have to do those things.
Oh, and you'll want to do a socket.bind(('',PORT)) to bind it to the port you want it to listen on.
Yes, I did already try to find information on this.
The Python socket documentation has this list of what I believe are protocols:
SO_*
socket.SOMAXCONN
MSG_*
SOL_*
IPPROTO_*
IPPORT_*
INADDR_*
IP_*
IPV6_*
EAI_*
AI_*
NI_*
TCP_*
What exactly do they do? Let's say I used
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_IP)
What does this do? I understand it's a raw socket, but does the IPPROTO_IP mean I have to construct everything? (i.e. the IP header down to the TCP to the data?)
The Python documentation says I can find information on the above in the Unix documentation on sockets, but I couldn't find the document. Anyone know where it is?
There are a lot of Linux manual pages describing socket:
SOCKET
IP
TCP
UDP
UNIX
In general, we use these arguments for socket:
Address family: AF_INET for internet domain address family, AF_UNIX for UNIX domain address family.
Socket type: SOCK_STREAM for TCP, SOCK_DGRAM for UDP. Of course you can use SOCK_RAW for directly access IP protocol.
Protocol: when using TCP or UDP, leave it to 0 is just fine; when using RAW, you can specify protocol to 0, IPPROTO_TCP for TCP sockets, IPPROTO_UDP for UDP sockets.
And, SO_ means "socket option", SOL_ means "socket option level", which are used to set socket options through setsockopt (also mentioned in SOCKET).
In fact, you can find more pages at the bottom of these pages in the SEE ALSO section. Note that the page of 2 or 3 is a concrete system call or library function, pages of 7 is what you need.
I have a python scripts that captures the packets on the ethernet using dpkt, but how do i differentiate between which packets are tcp and which ones are for udp.
Eventually i would like to have a list of packets for each tcp connection that was established during the time interval.
my code is:
import dpkt
import pcapy
cap=pcap.open_live('eth0',100000,1,0)
(header,payload)=cap.next()
while header:
eth=dpkt.ethernet.Ethernet(str(payload))
ip=eth.data
tcp=ip.data
# i need to know whether it is a tcp or a udp packet here!!!
(header,payload)=cap.next()
IP header contains field protocol. dpkt should allow you to obtain this value and using it you can guess what is on top of IP. Here is a list of valid protocols numbers http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xml.
UDP is equal to 17 while TCP is 6.
Edit:
I have checked this issue and as I mentioned dpkg provide p properties to access protocol field of IP. So you can check agains it. But it also automatically parse packet and set data property to instance of class that represent upper protocol like UDP or TCP. So you can check type of data property and you recognize this protocol.
from dpkt.ip import IP, IP_PROTO_UDP
from dpkt.udp import UDP
ip = IP('E\x00\x00"\x00\x00\x00\x00#\x11r\xc0\x01\x02\x03\x04\x01\x02\x03\x04\x00o\x00\xde\x00\x0e\xbf5foobar')
#if ip.p == IP_PROTO_UDP: # checking for protocol field in ip header
if type(ip.data) == UDP : # checking of type of data that was recognized by dpkg
udp = ip.data
print udp.sport
else:
print "Not UDP"
A python script that captures the packets on the ethernet adapter eth0 using dpkt, and differentiates between TCP and UDP packets of the IP.
import dpkt
import pcapy
cap=pcapy.open_live('eth0',100000,1,0)
(header,payload)=cap.next()
while header:
eth=dpkt.ethernet.Ethernet(str(payload))
# Check whether IP packets: to consider only IP packets
if eth.type!=dpkt.ethernet.ETH_TYPE_IP:
continue
# Skip if it is not an IP packet
ip=eth.data
if ip.p==dpkt.ip.IP_PROTO_TCP: # Check for TCP packets
TCP=ip.data
# ADD TCP packets Analysis code here
elif ip.p==dpkt.ip.IP_PROTO_UDP: # Check for UDP packets
UDP=ip.data
# UDP packets Analysis code here
(header,payload)=cap.next()
I have some code that listens for "announcements" via UDP multicast. I can get the IP address of the sender, but what I really need is the MAC address of the sender (since the IP address can and will change).
Is there an easy way to do this in Python?
A code snippet is included for reference, but likely unnecessary.
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
# Allow multiple sockets to use the same PORT number
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
# Bind to the port that we know will receive multicast data
sock.bind((self.interface, MCAST_PORT))
# Tell API we are a multicast socket
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 255)
# Tell API we want to add ourselves to a multicast group
# The address for the multicast group is the third param
status = sock.setsockopt(socket.IPPROTO_IP,
socket.IP_ADD_MEMBERSHIP,
socket.inet_aton(MCAST_ADDR) + socket.inet_aton(self.interface));
data, addr = sock.recvfrom(1024)
...
You cannot, in general, get the mac address. You might succeed using ARP on a LAN, but across the Internet it's not possible.
Consider the case where the packet you receive has the IP address of the sender's NATting router. The packet may have traversed any number of intermediate machines along the way, each of which have mac addresses, too. Whose responsibility should it be to support the kind of lookup you're after? For all the machines along the way, the sender's mac address is completely useless, so why bother supporting that kind of lookup?
And, btw, changing the mac address is trivial on many network cards, so using it as some kind of unique ID is not a wise idea.
To do this, you need to capture the raw Ethernet frame, not just the UDP packet.
import socket
ETH_P_ALL=3
sock = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(ETH_P_ALL))
sock.bind((interface_name, 0))
data = sock.recv(2000)
dmac = data[:6]
smac = data[6:12]
udp_offset = 14
ethertype = data[12:14]
if ethertype == [0x81, 0x00]: # 802.1Q VLAN-tagged
udp_offset += 4
udp_pkt = data[udp_offset:]
Some notes:
An Ethertype value of 0x88a8 would indicate 802.1ad QinQ or "stacked VLAN" and would need to be handled appropriately to correctly find the UDP data.
On most Linux systems you need to be root (or have the CAP_NET_RAW capability) to do this. Not sure what you need on Windows, but assume something similar.
In practice this will be something of a firehose as it will receive all UDP packets. You can either parse the UDP header yourself to narrow it down to the ones you are interested in, or (on Linux / BSD etc) investigate Berkeley Packet Filter to get the kernel to do it for you. The latter is much more CPU efficient but is rather a pain to implement.
Answers suggesting ARP or similar might do what you want but they might not. They will tell you what association your OS has between IP addresses and MACs; this won't help for layer 2 protocols, multicast, broadcast etc or if something is lying in response to ARP requests.
The protocol you need is ARP. Check this question/answer for details
I'm not sure that it is possible to get the sender's MAC address since the MAC address is a link level address and not a network level address like IP. The MAC address will change at each hop in the network as the packet containing the UDP message is routed from the sender to the receiver.
I don't know how to do it in python but it is possible to get MAC address. For example by using tcpdump I put all packets into file:
sudo tcpdump -i enp0s31f6 -w file_name port 6665
then in python read it with:
packetlist = rdpcap("./file_name")
for pkt in packetlist:
print pkt.src, pkt.load
you can see the mac address
edit:
I found one way to do this:
sniff all packages with scapy with the help of function sniff, then filter the packages to get only what you need. There you can use mac address
for example from my project:
sniff(prn=self._pkt_callback, store=0)
def _pkt_callback(self, pkt):
if not self.sniffer_on:
return
if Ether not in pkt or pkt[Ether].type != 0x800:
return
if pkt[IP].proto != 17: # 17 means UDP package
return
if pkt[UDP].dport != 6665:
return
print pkt.src, pkt.load #src is mac address