Capture TCP-Packets with Python

Capture TCP-Packets with Python - python

I try to capture an HTTP-download with Python using dpkt and pcap. The code looks like
...
pc = pcap.pcap(iface)
for ts, pkt in pc:
handle_packet(pkt)
def handle_packet(pkt):
eth = dpkt.ethernet.Ethernet(pkt)
# Ignore non-IP and non-TCP packets
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip = eth.data
if ip.p != dpkt.ip.IP_PROTO_TCP:
return
tcp = ip.data
data = tcp.data
# current connection
c = (ip.src, ip.dst, tcp.sport, tcp.dport)
# Handle only new HTTP-responses and TCP-packets
# of existing connections.
if c in conn:
handle_tcp_packet(c, tcp)
elif data[:4] == 'HTTP':
handle_http_response(c, tcp)
...
In handle_http_response() and handle_tcp_packet() i read the data of the tcp-packets (tcp.data) and write them to a file. However i noticed that i often get packets with the same TCP sequence number (tcp.seq) (on the same connection) but it seems that they contain the same data. Moreover it seems that not all packets are captured. For example if i sum up the packet-sizes the resulting value is lower than the one listed in the http-header (content-length). But in Wireshark i can see all packages.
Does anyone has an idea why i get those duplicate packets and how i can capture every packet belonging to the http-response?
EDIT:
Here you can find the complete code: pastebin.com.
When running it prints something like that to stdout:
Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
As you can see the TCP-payload plus the duplicate received bytes (5112497+82080=5194577) are lower than the filesize of the download (5197150). Moreover you can see that i receive 57 duplicate packages (same SEQ and same TCP-data) and that still packages are received after the packet with the FIN-flag.
So does anyone have an idea how i can capture all packets belonging to the connection? Wireshark sees all packets and i think it uses libpcap too.
I don't even know if i do something wrong or if the pcap-library does something wrong.
EDIT2:
OK, it seems that my code is correct: In Wireshark I saved the captured packets and used the capture-file in my code (pcap.pcap('/home/path/filename') instead of pcap.pcap('eth0')). My code read perfectly all packages (on multiple tests)! Since Wireshark uses libpcap too (afaik), i think the problem is the lib pypcap which does not provide me all packages.
Any idea on how to test that?
I already compiled pypcap by myself (trunk) but that didn't change anything -.-
EDIT3:
OK, I changed my code to work with pcapy instead of pypcap and have the same problem:
When reading the packets from a previous captured file (created with Wireshark) then everything is fine, but when I capture the packets directly from eth0 I miss some packets.
Interesting: When running both programs (the one using pypcap and the one using pcapy) in parallel they capture different packets. e.g. one programm receives one packet more.
But I have still no idea why -.-
I thought Wireshark uses the same base-lib (libpcap).
Please help :)

Here's a couple of things to watch out for:
make sure you have a big snaplen - for pcapy you can set it on open_live (second parameter)
make sure you handle fragmented packets - this will not be done automatically - you need to check the details
check statistics - unfortunately I don't think this is exposed to pcapy interface, but it's possible that you're not handling all packets; if you're too late you will not know that you missed something (although you can get the same information by tracking the length / position of tcp stream) libpcap itself does expose those statistics, so you might be able to add the function for it

Set the snaplen to 65535. Apparently this is the default for Wireshark:
http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html

Related

Raw load found, how to access?

To start off, I have read through other raw answers pertaining to scapy on here, however none have been useful, maybe I am just doing something wrong and thats what has brought me here today.
So, for starters, I have a pcap file, which started corrupted with some retransmissions, to my belief I have gotten it back to gether correctly.
It contains Radiotap header, IEEE 802.11 (dot11), logical-link control, IPv4, UDP, and DNS.
To my understanding, the udp packets being transmitted hold this raw data, however, do to a some recent quirks, maybe the raw is in Radiotap/raw.
Using scapy, I'm iterating through the packets, and when a packet with the Raw layer is found, I am using the .show() function of scapy to view it.
As such, I can see that there is a raw load available
###[ Raw ]###
\load \
|###[ Raw ]###
| load = '#\x00\x00\x00\xff\xff\xff\xff\xff\xff\x10h?'
So, I suppose my question is, how can I capture this payload to receive whatever this may be, To my knowledge the load is supposed to be an image file, however I have trouble believing such, so I assume I have misstepped somewhere.
Here is the code I'm using to achieve the above result
from scapy.all import *
from scapy.utils import *
pack = rdpcap('/home/username/Downloads/new.pcap')
for packet in pack:
if packet.getlayer(Raw):
print '[+] Found Raw' + '\n'
l = packet.getlayer(Raw)
rawr = Raw(l)
rawr.show()
Any help, or insight for further reading would be appreciated, I am new to scapy and no expert in packet dissection.
*Side note, previously I had tried (using separate code and server) to replay the packets and send them to myself, to no avail. However I feel thats due to my lack of knowledge in receipt of UDP packets.
UPDATES - I have now tested my pcap file with a scapy reassembler, and I've confirmed I have no fragmented packets, or anything of the sort, so I assume all should go smoothly...
Upon opening my pcap in wireshark, I can see that there are retransmissions, but I'm not sure how much that will affect my goals since no fragmentation occurred?
Also, I have tried the getlayer(Raw).load, if I use print on it I get some gibberish to the screen, I'm assuming its the data to my would-be-image, however I need to now get it into a usable format.

You can do:
data = packet[Raw].load

You should be able to access the field in this way:
l = packet.getlayer(Raw).load

Using Scapy’s interactive shell I was successful doing this:
pcap = rdpcap('sniffed_packets.pcap')
s = pcap.sessions()
for key, value in s.iteritems():
# Looking for telnet sessions
if ':23' in key:
for v in value:
try:
v.getlayer(Raw).load
except AttributeError:
pass

If you are trying to get the load part of the packet only, you can try :
def handle_pkt(pkt):
if TCP in pkt and pkt[TCP].dport == 5201:
#print("got a packet")
print(pkt[IP])
load_part = pkt[IP].load
print("Load#",load_part)
pkt.show2()
sys.stdout.flush()

Python sniffer using pypcap and dpkt on OS X

I'm actually trying to sniff packets with python (using pypcap and dpkt).
I tried the following :
import dpkt, pcap
pc = pcap.pcap() # construct pcap object
pc.setfilter('src host X.X.X.X or dst host X.X.X.X')
for timestamp, packet in pc:
print dpkt.ethernet.Ethernet(packet)
But nothing happens when I launch the script... Did I miss something ?
Using Python 2.7
On OS X Yosemite (10.10)

The question is old but for new people who might hit this. The github 'chains' project uses both pypcap and dpkt for exactly this kind of thing (Disclaimer: I'm involved with all three projects :) https://github.com/SuperCowPowers/chains
chains/sources/packet_streamer.py (code showing use of pypcap for 'sniffing')
chains/links/packet_meta.py (code showing use of dpkt for packet parsing)
For those that just want to use pypcap/dpkt here's a working code snippet:
import pcap
import dpkt
sniffer = pcap.pcap(name=None, promisc=True, immediate=True)
for timestamp, raw_buf in sniffer:
output = {}
# Unpack the Ethernet frame (mac src/dst, ethertype)
eth = dpkt.ethernet.Ethernet(raw_buf)
output['eth'] = {'src': eth.src, 'dst': eth.dst, 'type':eth.type}
# It this an IP packet?
if not isinstance(eth.data, dpkt.ip.IP):
print 'Non IP Packet type not supported %s\n' % eth.data.__class__.__name__
continue
# Grab ip packet
packet = eth.data
# Pull out fragment information
df = bool(packet.off & dpkt.ip.IP_DF)
mf = bool(packet.off & dpkt.ip.IP_MF)
offset = packet.off & dpkt.ip.IP_OFFMASK
# Pulling out src, dst, length, fragment info, TTL, checksum and Protocol
output['ip'] = {'src':packet.src, 'dst':packet.dst, 'p': packet.p,
'len':packet.len, 'ttl':packet.ttl,
'df':df, 'mf': mf, 'offset': offset,
'checksum': packet.sum}
print output

You should check out Scapy. Its a powerful networking tool, that can be used interactivly as well. Its written in python, hence you can use it in your scripts as well.
In scapy its as easy as (but you can easily add filters as well):
sniff(iface='eth0')

If you didn't place the path to a file in pcap.pcap(), there's no pcap for it to parse.
I ran your script with a glob of from a pcap directory I have and replaced the IP with one in my network, seemed like it worked.
You sure you installed pypcap and dpkt?
Here's exactly what I did with your script.
import dpkt, pcap, glob
for i in glob.glob("/pcap/*.pcap"):
pc = pcap.pcap(i)
pc.setfilter('src host 192.168.1.140 or dst host 192.168.1.140')
for timestamp, packet in pc:
print dpkt.ethernet.Ethernet(packet)
It printed a lot of stuff.

Nothing jumps out at the code, so I'm wondering if it is just the network.
Can you double check the IP addresses and also maybe run tcpdump as a sanity check to make sure you can see traffic?
For tcpdump something like this
$ sudo tcpdump -i en1 "src host 10.0.0.2 or dst host 10.0.0.2"

Whole packet length Scapy

I am capturing WiFi traffic with tcpdump using the parameter -s 100 (which means I am only capturing the headers of the packets).
When I load the .pcap file and process it with Scapy I do:
pkts = rdpcap(pcapfile)
totalbytes = 0
for pkt in pkts:
totalbytes += len(pkt)
However, as I am truncating the capture, doing len(pkt) will not give me the whole packet length (frame length), it will give me the captured packet length. How can I get the real packet length?
Extra: as I have done in some occasions before, I open the pcap file in wireshark and search for the hex values of interest. But in this case (frame.len) will show the value I am looking for, but I can't find the way wireshark obtains this real packet length without having the whole packet captured.

The rdpcap function uses the PcapReader class for reading packets. Unfortunately this class discards the information you are looking for in the read_packet method, even though it is to be found in the pcap file. So you have to use the RawPcapReader directly.
totalbytes = 0
for pkt, (sec, usec, wirelen) in RawPcapReader(pcapfile):
totalbytes += wirelen

With modern Scapy versions, the proper answer would be to use pkt.wirelen. This only exists in packets read from a pcap

If for some reason you don't want to use RawPcapReader, you can use the len attribute for IPv4 packets.
real_length = pkt[IP].len
truncated_length = len(pkt)
Strangely, the IPv6 layer in Scapy doesn't have the same attribute, but it does have an attribute called plen which is the length of the payload:
payload_length = pkt[IPv6].plen
real_length = payload_length + 40
truncated_length = len(pkt)

Accessing 802.11 Wireless Management Frames from Python

From Python on Linux I would like to sniff 802.11 management 'probe-request' frames. This is possible from Scapy like so:
# -*- coding: utf-8 -*-
from scapy.all import *
def proc(p):
if ( p.haslayer(Dot11ProbeReq) ):
mac=re.sub(':','',p.addr2)
ssid=p[Dot11Elt].info
ssid=ssid.decode('utf-8','ignore')
if ssid == "":
ssid="<BROADCAST>"
print "%s:%s" %(mac,ssid)
sniff(iface="mon0",prn=proc)
Or from tshark like so:
tshark -n -i mon0 subtype probereq -R 'wlan.fc.type_subtype eq 4' -T fields -e wlan.sa -e wlan_mgt.ssid
We could redirect the output from tshark, and slurp it up with some Python (not pretty, but it works).
However, both of these options have GPL licensing, which makes potential commercial projects tricky. I'm therefore trying to figure out a 'lower level' solution in Python for this specific problem. From Google I've managed to work out two potential directions to try:
Pcap libraries: There seem to be three pcap libraries available for Python: pylibpcap, pypcap, and pcapy. I'm not too sure how to approach incorporating the above functionality into these. Any sample code or solutions would be great.
Raw sockets: PF_PACKET:
"Packet sockets are used to receive or send raw packets at the device driver (OSI Layer 2) level. They allow the user to implement protocol modules in user space on top of the physical layer."
This sounds like it could be another option, bypassing pcap altogether. I've heard comments that this may even be a better approach, removing the overhead of pcap libraries. I'm not sure where to start tackling this, though.
Any help in solving this would be greatly appreciated.

I've managed to work this out. Here's the process I went through:
Capture some 802.11 management 'probe-request' frames:
tshark -n -i mon0 subtype probereq -c 5 -w probe.pcap
Understand RadioTap
Reading RadioTap documentation, I realised that RadioTap frames are comprised of the following fields:
it_version (2 bytes) - major version of the radiotap header is in use. Currently, this is always 0
it_pad (2 bytes) - currently unused
it_len (4 bytes) - entire length of the radiotap data, including the radiotap header
it_present (8 byte) - bitmask of the radiotap data fields that follows the radiotap header
Therefore the it_len allows us to locate the beginning of the 802.11 frame that follows the radiotap data.
Coding solution in Python
I chose to use pylibpcap from three pcap library options I found in my previous post, and discovered the dpkt module for parsing 802.11 frames. Documentation was very thin, so by playing in the Python interpreter I managed to work out the following code to extract MAC, probe SSID, and signal strength from our capture file:
f = open('probe.pcap')
pc = dpkt.pcap.Reader(f)
dl=pc.datalink()
if pc.datalink() == 127: #Check if RadioTap
for timestamp, rawdata in pc:
tap = dpkt.radiotap.Radiotap(rawdata)
signal_ssi=-(256-tap.ant_sig.db) #Calculate signal strength
t_len=binascii.hexlify(rawdata[2:3]) #t_len field indicates the entire length of the radiotap data, including the radiotap header.
t_len=int(t_len,16) #Convert to decimal
wlan = dpkt.ieee80211.IEEE80211(rawdata[t_len:])
if wlan.type == 0 and wlan.subtype == 4: # Indicates a probe request
ssid = wlan.ies[0].info
mac=binascii.hexlify(wlan.mgmt.src)
print "%s, %s (%d dBm)"%(mac,ssid,signal_ssi)

Writing raw IP data to an interface (linux)

I have a file which contains raw IP packets in binary form. The data in the file contains a full IP header, TCP\UDP header, and data. I would like to use any language (preferably python) to read this file and dump the data onto the line.
In Linux I know you can write to some devices directly (echo "DATA" > /dev/device_handle). Would using python to do an open on /dev/eth1 achieve the same effect (i.e. could I do echo "DATA" > /dev/eth1)

Something like:
#!/usr/bin/env python
import socket
s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
s.bind(("ethX", 0))
blocksize = 100;
with open('filename.txt') as fh:
while True:
block = fh.read(blocksize)
if block == "": break #EOF
s.send(block)
Should work, haven't tested it however.
ethX needs to be changed to your interface (e.g. eth1, eth2, wlan1, etc.)
You may want to play around with blocksize. 100 bytes at a time should be fine, you may consider going up but I'd stay below the 1500 byte Ethernet PDU.
It's possible you'll need root/sudoer permissions for this. I've needed them before when reading from a raw socket, never tried simply writing to one.
This is provided that you literally have the packet (and only the packet) dumped to file. Not in any sort of encoding (e.g. hex) either. If a byte is 0x30 it should be '0' in your text file, not "0x30", "30" or anything like that. If this is not the case you'll need to replace the while loop with some processing, but the send is still the same.
Since I just read that you're trying to send IP packets -- In this case, it's also likely that you need to build the entire packet at once, and then push that to the socket. The simple while loop won't be sufficient.

No; there is no /dev/eth1 device node -- network devices are in a different namespace from character/block devices like terminals and hard drives. You must create an AF_PACKET socket to send raw IP packets.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Capture TCP-Packets with Python - python

Set the snaplen to 65535. Apparently this is the default for Wireshark: http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html

Related

Raw load found, how to access?

Python sniffer using pypcap and dpkt on OS X

Whole packet length Scapy

Accessing 802.11 Wireless Management Frames from Python

Writing raw IP data to an interface (linux)

Categories

Resources