I have written a socket packet sniffer in Python using this code.
import socket, struct
# Setup socket object
s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_UDP)
s.bind((socket.gethostbyname(socket.gethostname()), 0))
s.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1)
s.ioctl(socket.SIO_RCVALL, socket.RCVALL_ON)
Id = 0
while (True):
data = s.recvfrom(65565)
packet = data[0]
address = data[1]
header = struct.unpack("!BBHHHBBHBBBBBBBB", packet[:20])
if (header[6] == 6):
protocol = "TCP"
elif (header[6] == 17):
protocol = "UDP"
print("Number: ", Id, end="\t")
print("Protocol: ", protocol, end="\t")
print("Address: ", address, end="")
print("Header: ", header)
#print("Data: ", data)
print()
Id += 1
I know I can find the receivers IP address at index 0 of the data variable (data[0]), but where in the packet would i find the senders IP address and port number?
What do all the integers in the header tuple mean? I know header[6] is the protocol TCP/UDP, but what about the rest?
Header: (69, 0, 61, 1541, 0, 128, 17, 0, 192, 168, 56, 1, 255, 255, 255, 255)
The IP header has a well-defined structure and an RFC. You decided to unpack it with bytes and shorts. Generally speaking, you want to match the size of each field with the correct data type. Taking a look at the proper header size, you can see the individual breakdown:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This can be done programmatically as:
struct.unpack('!BBHHHBBHII')
# alternatively:
struct.unpack('!BBHHHBBH4s4s')
I prefer the latter because you can fairly easily convert it to an IP address:
Example Ethernet header:
0000 3c 4a 92 1f 04 00 74 c6 3b 8d 82 69 08 00 45 00 <J....t.;..i..E.
0010 00 3c 0a 24 40 00 40 06 23 76 0a 14 01 0d 01 01 .<.$#.#.#v......
0020 01 01 e7 7a 04 d2 a2 5e 0c d2 00 00 00 00 a0 02 ...z...^........
0030 72 10 0d 51 00 00 02 04 05 b4 04 02 08 0a 9c d4 r..Q............
0040 c0 c0 00 00 00 00 01 03 03 07 ..........
the 20-byte IPv4 header exists at packet[14:34]. Unpacking it with the above format yields this:
>>> header = struct.unpack('!BBHHHBBH4s4s', packet[14:34])
>>> header
(69, 0, 60, 2596, 16384, 64, 6, 9078, b'\n\x14\x01\r', b'\x01\x01\x01\x01')
Indices 8 and 9 are the source and destination IP's respectively. Because they are bytes, we can convert them to int and then str to get the IP in string format:
# Note: Python 3 only
>>> ip_src = '.'.join(map(str, header[8])) # 10.20.1.13
>>> ip_dst = '.'.join(map(str, header[9])) # 1.1.1.1
Related
I want to connect to a detector and receive image data. After working with Wireshark software, I realized that I should use protocol UDP. I wrote the following code in Python and got the data.
The received data sample is as follows:
import socket
UDP_IP = "1.1.1.1" #IP
UDP_PORT = 123 #port
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
sock.bind((UDP_IP,UDP_PORT))
while True:
data, addr = sock.recvfrom(2048)
print("recived message: %s" % data)
Output:
data1 =b'\x83\xa4A\xd2\x01\x00\x00\x00\x12\x05\xf3\x04.\x04\xb7\x04#\x05Y\x04\x9f\x05\xf4\x046\x05\x89\x04\x1b\x08\x9b\x05^\x05\xb7\x04G\x05\x04\x04\xc7\x05\\\x05\xc8\x04$\x06:\x05\xf1\x05\x90\x06\x1f\x05w\x05\x8d\x05\x15\x05P\x06\x19\x05\x8a\x06e\x06\xfd\x05w\x06f\x05\x1f\x05!\x06\xed\x04\xb8\x06\xeb\x05\xd9\x05\x13\x06{\x05\x01\x07i\x07\xed\x07f\x08\x9c\x08\x80\x08\\\x08A\x08w\x08y\x089\x08<\x08\x11\x08m\x08\x1f\x08M\x07$\x08\xc5\x07\xe5\x08\xd9\x08q\x07\x7f\x08^\x08i\x08S\x08\xa7\x07.\t\x15\x08\x1d\x08\xae\x08\x8e\x07E\x08\xdc\x07\xf3\x061\t\xcd\x07\xe8\x08Y\x08\x00\x08\xb5\x08\xa3\x08\t\tt\x08q\x08w\t\r\t\xbd\x08p\x08\xfb\x08\xc5\x08_\x08\xe4\x07\n\x08/\x08\xf4\x07\xdc\x07#\x08\x99\x08\xf3\x07\xc5\x07Y\t\r\x08\xdd\x08\x91\x08\x91\x07\x91\x07?\x08\xd8\x08J\x08_\x08E\x08\xf9\x07\x1f\x08L\x08\xff\x07\xd7\x07M\x08<\x08\xdf\x08\x89\x075\tw\x08\x89\x08\x15\t......................
But I don't know how to decode the data. Wireshark itself does this. But I don't know how to decode with Python.
7c 10 c9 20 e3 31 da 55 aa 55 aa 01 08 00 45 00 05 dc 00 00 00 00 80 11 a9 97 c0 a8 05 15 c0 a8 05 14 15 be 15 be 05 c8 00 00
I use the following code, but it returns meaningless data.
data.decode("utf-16")
My first question is, how can I get from the primary data to the data that Wireshark decodes with Python?
And my second question is: How should I get from this data to the data I want, which are approximately numbers between 1000 and 10000? A number for each pixel, which is the sequence of received data, in the row and column of forming an image.
I have some text file in following format (network traffic collected by tcpdump):
1505372009.023944 00:1e:4c:72:b8:ae > 00:23:f8:93:c1:af, ethertype IPv4 (0x0800), length 97: (tos 0x0, ttl 64, id 5134, offset 0, flags [DF], proto TCP (6), length 83)
192.168.1.53.36062 > 74.125.143.139.443: Flags [P.], cksum 0x67fd (correct), seq 1255996541:1255996572, ack 1577943820, win 384, options [nop,nop,TS val 356377 ecr 746170020], length 31
0x0000: 0023 f893 c1af 001e 4c72 b8ae 0800 4500 .#......Lr....E.
0x0010: 0053 140e 4000 4006 8ab1 c0a8 0135 4a7d .S..#.#......5J}
0x0020: 8f8b 8cde 01bb 4adc fc7d 5e0d 830c 8018 ......J..}^.....
0x0030: 0180 67fd 0000 0101 080a 0005 7019 2c79 ..g.........p.,y
0x0040: a6a4 1503 0300 1a00 0000 0000 0000 04d1 ................
0x0050: c300 9119 6946 698c 67ac 47a9 368a 1748 ....iFi.g.G.6..H
0x0060: 1c .
and want to change it to:
1505372009.023944
000000: 00 23 f8 93 c1 af 00 1e 4c 72 b8 ae 08 00 45 00 .#......Lr....E.
000010: 00 53 14 0e 40 00 40 06 8a b1 c0 a8 01 35 4a 7d .S..#.#......5J}
000020: 8f 8b 8c de 01 bb 4a dc fc 7d 5e 0d 83 0c 80 18 ......J..}^.....
000030: 01 80 67 fd 00 00 01 01 08 0a 00 05 70 19 2c 79 ..g.........p.,y
000040: a6 a4 15 03 03 00 1a 00 00 00 00 00 00 00 04 d1 ................
000050: c3 00 91 19 69 46 69 8c 67 ac 47 a9 36 8a 17 48 ....iFi.g.G.6..H
000060: 1c .
Here is what I have done:
import re
regexp_time =re.compile("\d\d\d\d\d\d\d\d\d\d.\d\d\d\d\d\d+")
regexp_hex = re.compile("(\t0x\d+:\s+)([0-9a-f ]+)+ ")
with open ('../Traffic/traffic1.txt') as input,open ('../Traffic/txt2.txt','w') as output:
for line in input:
if regexp_time.match(line):
output.write ("%s\n" % (line.split()[0]))
elif regexp_hex.match(line):
words = re.split(r'\s{2,}', line)
bytes=""
for byte in words[1].split():
if len(byte) == 4:
bytes += "%s%s %s%s "%(byte[0],byte[1],byte[2],byte[3])
elif len(byte) == 2:
bytes += "%s%s "%(byte[0],byte[1])
output.write ("%s %s %s \n" % (words[0].replace("0x","00"),"{:<47}".format (bytes),words[2].replace("\n","")))
input.close()
output.close()
Could some one help me in speed up?
Edit
Here is the new version of code depends on #Austin answer, It really speed up the code.
with open ('../Traffic/traffic1.txt') as input,open ('../Traffic/txt1.txt','w') as output:
for line in input:
if line[0].isdigit():
output.write (line[:16])
output.write ('\n')
elif line.startswith("\t0x"):#(Since there is line which is not hex and not start with timestamp I should check this as well)
offset = line[:10] # " 0x0000: "
words = line[10:51] # "0023 f893 c1af 001e 4c72 b8ae 0800 4500 "
chars = line[51:] # " .#......Lr....E."
line = [offset.replace('x', '0', 1)]
for a,b,c,d,space in zip (words[0::5],words[1::5],words[2::5],words[3::5],words[4::5]):
line.append(a)
line.append(b)
line.append(space)
line.append(c)
line.append(d)
line.append(space)
line.append (chars)
output.write (''.join (line))
input.close()
output.close()
Here is the result:
1505372009.02394
000000: 00 23 f8 93 c1 af 00 1e 4c 72 b8 ae 08 00 45 00 .#......Lr....E.
000010: 00 53 14 0e 40 00 40 06 8a b1 c0 a8 01 35 4a 7d .S..#.#......5J}
000020: 8f 8b 8c de 01 bb 4a dc fc 7d 5e 0d 83 0c 80 18 ......J..}^.....
000030: 01 80 67 fd 00 00 01 01 08 0a 00 05 70 19 2c 79 ..g.........p.,y
000040: a6 a4 15 03 03 00 1a 00 00 00 00 00 00 00 04 d1 ................
000050: c3 00 91 19 69 46 69 8c 67 ac 47 a9 36 8a 17 48 ....iFi.g.G.6..H
000060: 1c .
You haven't specified anything else about your file format, including what if any lines appear between blocks of packet data. So I'm going to assume that you just have paragraphs like the one you show, jammed together.
The best way to speed up something like this is to reduce the extra operations. You have a bunch! For example:
You use a regex to match the "start" line.
You use a split to extract the timestamp from the start line.
You use a %-format operator to write the timestamp out.
You use a different regex to match a "hex" line.
You use more than one split to parse the hex line.
You use various formatting operators to output the hex line.
If you're going to use regular expression matching, then I think you should just do one match. Create an alternate pattern (like a|b) that describes both lines. Use match.lastgroup or .lastindex to decide what got matched.
But your lines are so different that I don't think a regex is needed. Basically, you can decide what sort of line you have by looking at the very first character:
if line[0].isdigit():
# This is a timestamp line
else:
# This is a hex line
For timestamp processing, all you want to do is print out the 17 characters at the start of the line: 11 digits, a dot, and 6 more digits. So do that:
if line[0].isdigit():
output.write(line[:17], '\n')
For hex line processing, you want to make two kinds of changes: you want to replace the 'x' in the hex offset with a zero. That's easy:
hexline = line.replace('x', '0', 1) # Note: 1 replacement only!
Then, you want to insert spaces between the groups of 4 hex digits, and pad the short lines so the character display appears in the same column.
This is a place where regular expression replacement might help you. There's a limited number of occurrences, but it may be that the overhead of the Cpython interpreter costs more than the setup and teardown for a regex replacement. You probably should do some profiling on this.
That said, you can split the line into three parts. It's important to capture the trailing space on the middle part, though:
offset = line[:13] # " 0x0000: "
words = line[13:53] # "0023 f893 c1af 001e 4c72 b8ae 0800 4500 "
chars = line[53:] # " .#......Lr....E."
You already know how to replace the 'x' in the offset, and there's nothing to be done to the chars portion of the line. So we'll leave those alone. The remaining task is to spread out the characters in the
words string. You can do that in various ways, but it seems easy to process the characters in chunks of 5 (4 hex digits plus a trailing space).
We can do this because we captured the trailing space on the words part. If not, you might have to use itertools.zip_longest(..., fill_value=''), but it's probably easier just to grab one more character.
With that done, you can do:
for a,b,c,d,space in zip(words[0::5], words[1::5], words[2::5], words[3::5], words[4::5]):
output.write(a, b, space, c, d, space)
Alternatively, instead of making all those calls you could accumulate the characters in a buffer and then write the buffer one time. Something like:
line = [offset]
for ...:
line.extend(a, b, space, c, d, space)
line.append(chars)
line.append('\n')
output.write(''.join(line))
That's fairly straightforward, but like I said, it may not perform quite as well as a regular-expression replacement. That would be due to the regex code running as "C" rather than python bytecode. So you should compare it against a pattern replacement like:
words = re.sub(r'(..)(..) ', '\1 \2 ', words)
Note that I didn't require hex digits, in order to cause any trailing "padding" spaces on the last line of a paragraph to expand in proportion.
Again, please check the performance against the zip version above!
Consider the below string which will be given as the input to a function.
01 02 01 0D A1 D6 72 02 00 01 00 00 00 00 53 73 F2
The highlighted part is the address I need.
If the preceding byte is 1 then I have to take only 6 octet and assign it to a variable.
If it is more than 1 the I should read 6 * Num(preceding value) and assign 6 octets for each variable.
Currently I am assigning it statically.
def main(line_input):
Device = ' '.join(line_input[9:3:-1])
Length = line_input[2]
var1 = line_input[3]
main("01 02 02 0D A1 D6 72 02 00 01 00 00 00 00 53 73 F2")
Can this be done?
Here I think this does it, let me know if there is anything that needs changing:
import string
def address_extract(line_input):
line_input = string.split(line_input, ' ')
length = 6 * int(line_input[2])
device_list = []
for x in range(3, 3+length, 6):
if x+6 > len(line_input):
print "Length multiplier too long for input string"
else:
device_list.append(' '.join(line_input[x:x+6]))
return device_list
print address_extract("01 02 02 0D A1 D6 72 02 00 01 00 00 00 00 53 73 F2")
#output = ['0D A1 D6 72 02 00', '01 00 00 00 00 53']
Here is some code that I hope will help you. I tried to add many comments to explain what is happening
import binascii
import struct
#note python 3 behaves differently and won't work with this code (personnaly I find it easyer for strings convertion to bytes)
def main(line_input):
formated_line = line_input.split(" ") #I start by cutting the input on each space character
print formated_line #the output is a list. Each element is composed of 2 chars
formated_line = [binascii.unhexlify(xx) for xx in formated_line] #create a list composed of unhelified bytes of each elements of the original list
print formated_line #the output is a list of bytes char
#can be done in one step but I try to be clearer as you are nee to python (moereover this is easyer in python-3.x)
formated_line = map(ord, formated_line) #convert to a list of int (this is not needed in python 3)
print formated_line
Length = formated_line[2] #this is an int
unformated_var1 = formated_line[3:3+(6*length)] #keep only interesting data
#now you can format your address
main("01 02 02 0D A1 D6 72 02 00 01 00 00 00 00 53 73 F2")
#if the input comes from a machine and not a human, they could exchange 17bytes instead of (17x3)characters
#main("\x01\x02\x02\x0D\xA1\xD6\x72\x02\x00\x01\x00\x00\x00\x00\x53\x73\xF2")
#then the parsing could be done with struct.unpack
I am writing a program to parse a WAV file header and print the information to the screen. Before writing the program i am doing some research
hexdump -n 48 sound_file_8000hz.wav
00000000 52 49 46 46 bc af 01 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
00000010 10 00 00 00 01 00 01 00 >40 1f 00 00< 40 1f 00 00 |........#...#...|
00000020 01 00 08 00 64 61 74 61 98 af 01 00 81 80 81 80 |....data........|
hexdump -n 48 sound_file_44100hz.wav
00000000 52 49 46 46 c4 ea 1a 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
00000010 10 00 00 00 01 00 02 00 >44 ac 00 00< 10 b1 02 00 |........D.......|
00000020 04 00 10 00 64 61 74 61 a0 ea 1a 00 00 00 00 00 |....data........|
The part between > and < in both files are the sample rate.
How does "40 1f 00 00" translate to 8000Hz and "44 ac 00 00" to 44100Hz? Information like number of channels and audio format can be read directly from the dump. I found a Python
script called WavHeader that parses the sample rate correctly in both files. This is the core of the script:
bufHeader = fileIn.read(38)
# Verify that the correct identifiers are present
if (bufHeader[0:4] != "RIFF") or \
(bufHeader[12:16] != "fmt "):
logging.debug("Input file not a standard WAV file")
return
# endif
stHeaderFields = {'ChunkSize' : 0, 'Format' : '',
'Subchunk1Size' : 0, 'AudioFormat' : 0,
'NumChannels' : 0, 'SampleRate' : 0,
'ByteRate' : 0, 'BlockAlign' : 0,
'BitsPerSample' : 0, 'Filename': ''}
# Parse fields
stHeaderFields['ChunkSize'] = struct.unpack('<L', bufHeader[4:8])[0]
stHeaderFields['Format'] = bufHeader[8:12]
stHeaderFields['Subchunk1Size'] = struct.unpack('<L', bufHeader[16:20])[0]
stHeaderFields['AudioFormat'] = struct.unpack('<H', bufHeader[20:22])[0]
stHeaderFields['NumChannels'] = struct.unpack('<H', bufHeader[22:24])[0]
stHeaderFields['SampleRate'] = struct.unpack('<L', bufHeader[24:28])[0]
stHeaderFields['ByteRate'] = struct.unpack('<L', bufHeader[28:32])[0]
stHeaderFields['BlockAlign'] = struct.unpack('<H', bufHeader[32:34])[0]
stHeaderFields['BitsPerSample'] = struct.unpack('<H', bufHeader[34:36])[0]
I do not understand how this can extract the corret sample rates, when i cannot using hexdump?
I am using information about the WAV file format from this page:
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
The "40 1F 00 00" bytes equate to an integer whose hexadecimal value is 00001F40 (remember that the integers are stored in a WAVE file in the little endian format). A value of 00001F40 in hexadecimal equates to a decimal value of 8000.
Similarly, the "44 AC 00 00" bytes equate to an integer whose hexadecimal value is 0000AC44. A value of 0000AC44 in hexadecimal equates to a decimal value of 44100.
They're little-endian.
>>> 0x00001f40
8000
>>> 0x0000ac44
44100
A urllib2 request receives binary response as below:
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
Its structure is:
DATA, TYPE, DESCRIPTION
00 00 00 01, 4 bytes, Symbol Count =1
00 04, 2 bytes, Symbol Length = 4
41 4D 54 44, 6 bytes, Symbol = AMTD
00, 1 byte, Error code = 0 (OK)
00 00 00 02, 4 bytes, Bar Count = 2
FIRST BAR
41 97 33 33, 4 bytes, Close = 18.90
41 99 5C 29, 4 bytes, High = 19.17
41 90 3D 71, 4 bytes, Low = 18.03
41 91 D7 0A, 4 bytes, Open = 18.23
47 0F C6 14, 4 bytes, Volume = 3,680,608
00 00 01 16 6A E0 68 80, 8 bytes, Timestamp = November 23,2007
SECOND BAR
41 93 B4 05, 4 bytes, Close = 18.4629
41 97 1E B8, 4 bytes, High = 18.89
41 90 7A E1, 4 bytes, Low = 18.06
41 96 8F 57, 4 bytes, Open = 18.82
46 E6 2E 80, 4 bytes, Volume = 2,946,325
00 00 01 16 7A 53 7C 80, 8 bytes, Timestamp = November 26,2007
TERMINATOR
FF FF, 2 bytes,
How to read binary data like this?
Thanks in advance.
Update:
I tried struct module on first 6 bytes with following code:
struct.unpack('ih', response.read(6))
(16777216, 1024)
But it should output (1, 4). I take a look at the manual but have no clue what was wrong.
So here's my best shot at interpreting the data you're giving...:
import datetime
import struct
class Printable(object):
specials = ()
def __str__(self):
resultlines = []
for pair in self.__dict__.items():
if pair[0] in self.specials: continue
resultlines.append('%10s %s' % pair)
return '\n'.join(resultlines)
head_fmt = '>IH6sBH'
head_struct = struct.Struct(head_fmt)
class Header(Printable):
specials = ('bars',)
def __init__(self, symbol_count, symbol_length,
symbol, error_code, bar_count):
self.__dict__.update(locals())
self.bars = []
del self.self
bar_fmt = '>5fQ'
bar_struct = struct.Struct(bar_fmt)
class Bar(Printable):
specials = ('header',)
def __init__(self, header, close, high, low,
open, volume, timestamp):
self.__dict__.update(locals())
self.header.bars.append(self)
del self.self
self.timestamp /= 1000.0
self.timestamp = datetime.date.fromtimestamp(self.timestamp)
def showdata(data):
terminator = '\xff' * 2
assert data[-2:] == terminator
head_data = head_struct.unpack(data[:head_struct.size])
try:
assert head_data[4] * bar_struct.size + head_struct.size == \
len(data) - len(terminator)
except AssertionError:
print 'data length is %d' % len(data)
print 'head struct size is %d' % head_struct.size
print 'bar struct size is %d' % bar_struct.size
print 'number of bars is %d' % head_data[4]
print 'head data:', head_data
print 'terminator:', terminator
print 'so, something is wrong, since',
print head_data[4] * bar_struct.size + head_struct.size, '!=',
print len(data) - len(terminator)
raise
head = Header(*head_data)
for i in range(head.bar_count):
bar_substr = data[head_struct.size + i * bar_struct.size:
head_struct.size + (i+1) * bar_struct.size]
bar_data = bar_struct.unpack(bar_substr)
Bar(head, *bar_data)
assert len(head.bars) == head.bar_count
print head
for i, x in enumerate(head.bars):
print 'Bar #%s' % i
print x
datas = '''
00 00 00 01 00 04 41 4D 54 44 00 00 00 00 02 41
97 33 33 41 99 5C 29 41 90 3D 71 41 91 D7 0A 47
0F C6 14 00 00 01 16 6A E0 68 80 41 93 B4 05 41
97 1E B8 41 90 7A E1 41 96 8F 57 46 E6 2E 80 00
00 01 16 7A 53 7C 80 FF FF
'''
data = ''.join(chr(int(x, 16)) for x in datas.split())
showdata(data)
this emits:
symbol_count 1
bar_count 2
symbol AMTD
error_code 0
symbol_length 4
Bar #0
volume 36806.078125
timestamp 2007-11-22
high 19.1700000763
low 18.0300006866
close 18.8999996185
open 18.2299995422
Bar #1
volume 29463.25
timestamp 2007-11-25
high 18.8899993896
low 18.0599994659
close 18.4629001617
open 18.8199901581
...which seems to be pretty close to what you want, net of some output formatting details. Hope this helps!-)
>>> data
'\x00\x00\x00\x01\x00\x04AMTD\x00\x00\x00\x00\x02A\x9733A\x99\\)A\x90=qA\x91\xd7\nG\x0f\xc6\x14\x00\x00\x01\x16j\xe0h\x80A\x93\xb4\x05A\x97\x1e\xb8A\x90z\xe1A\x96\x8fWF\xe6.\x80\x00\x00\x01\x16zS|\x80\xff\xff'
>>> from struct import unpack, calcsize
>>> scount, slength = unpack("!IH", data[:6])
>>> assert scount == 1
>>> symbol, error_code = unpack("!%dsb" % slength, data[6:6+slength+1])
>>> assert error_code == 0
>>> symbol
'AMTD'
>>> bar_count = unpack("!I", data[6+slength+1:6+slength+1+4])
>>> bar_count
(2,)
>>> bar_format = "!5fQ"
>>> from collections import namedtuple
>>> Bar = namedtuple("Bar", "Close High Low Open Volume Timestamp")
>>> b = Bar(*unpack(bar_format, data[6+slength+1+4:6+slength+1+4+calcsize(bar_format)]))
>>> b
Bar(Close=18.899999618530273, High=19.170000076293945, Low=18.030000686645508, Open=18.229999542236328, Volume=36806.078125, Timestamp=1195794000000L)
>>> import time
>>> time.ctime(b.Timestamp//1000)
'Fri Nov 23 08:00:00 2007'
>>> int(b.Volume*100 + 0.5)
3680608
>>> struct.unpack('ih', response.read(6))
(16777216, 1024)
You are unpacking big-endian data on a little-endian machine. Try this instead:
>>> struct.unpack('!IH', response.read(6))
(1L, 4)
This tells unpack to consider the data in network-order (big-endian). Also, the values of counts and lengths can not be negative, so you should should use the unsigned variants in your format string.
Take a look at the struct.unpack in the struct module.
Use pack/unpack functions from "struct" package. More info here http://docs.python.org/library/struct.html
Bye!
As it was already mentioned, struct is the module you need to use.
Please read its documentation to learn about byte ordering, etc.
In your example you need to do the following (as your data is big-endian and unsigned):
>>> import struct
>>> x = '\x00\x00\x00\x01\x00\x04'
>>> struct.unpack('>IH', x)
(1, 4)