Reading UDP Packets - python

I am having some trouble dissecting a UDP packet. I am receiving the packets and storing the data and sender-address in variables 'data' and 'addr' with:
data,addr = UDPSock.recvfrom(buf)
This parses the data as a string, that I am now unable to turn into bytes. I know the structure of the datagram packet which is a total of 28 bytes, and that the data I am trying to get out is in bytes 17:28.
I have tried doing this:
mybytes = data[16:19]
print struct.unpack('>I', mybytes)
--> struct.error: unpack str size does not match format
And this:
response = (0, 0, data[16], data[17], 6)
bytes = array('B', response[:-1])
print struct.unpack('>I', bytes)
--> TypeError: Type not compatible with array type
And this:
print "\nData byte 17:", str.encode(data[17])
--> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)
More specifically I want to parse what I think is an unsigned int. And now I am not sure what to try next. I am completely new to sockets and byte-conversions in Python, so any advice would be helpful :)
Thanks,
Thomas

An unsigned int32 is 4 bytes long, so you have to feed 4 bytes into struct.unpack.
Replace
mybytes = data[16:19]
with
mybytes = data[16:20]
(right number is the first byte not included, i.e. range(16,19) = [16,17,18]) and you should be good to go.

Related

How to convert a byte array to string?

I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:
print("Index: ", Index) # The Index
# Subsituting text to our compressed index
for x in range(len(TextTest)):
TextTest[x]=Index[TextTest[x]]
NewText=''.join(TextTest)
# print(NewText)
# NewText=int(NewText)
byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
byte_array.append(int(NewText[i:i + 8], 2))
NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')
print(byte_array)
print(byte_array)
print(NewSize)
x=bytes(byte_array)
x.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte
You can use .decode('ascii') (leave empty for utf-8).
>>> print(bytearray("abcd", 'utf-8').decode())
abcd
Source : Convert bytes to a string?

Python - decoding bytes in struct

I am building a parser, and I kinda new to this.
I have problem with decoding specific bytes, they always return same int(and they shouldn't) so I must doing it wrong.
byte = ser.read(1)
byte += ser.read(ser.inWaiting())
a = 0
for i in byte:
if i == 0x04:
value = struct.unpack("<h", bytes([i, a]))[0]
print (value)
I recive bytes like this:
b'\xaa\x04\x80\x02\xff\xfb\x83\xaa\xaa\x04\x80\
And I need to decode packet 0x04. I am using Python 3.6
Try something like :
value = int.from_bytes(byte, byteorder='little')

Python String Prefix by 4 Byte Length

I'm trying to write a server in Python to communicate with a pre-existing client whose message packets are ASCII strings, but prepended by four-byte unsigned integer values representative of the length of the remaining string.
I've done a receiver, but I'm sure there's a a more pythonic way. Until I find it, I haven't done the sender. I can easily calculate the message length, convert it to bytes and transmit the message.The bit I'm struggling with is creating an integer which is an array of four bytes.
Let me clarify: If my string is 260 characters in length, I wish to prepend a big-endian four byte integer representation of 260. So, I don't want the ASCII string "0260" in front of the string, rather, I want four (non-ASCII) bytes representative of 0x00000104.
My code to receive the length prepended string from the client looks like this:
sizeBytes = 4 # size of the integer representing the string length
# receive big-endian 4 byte integer from client
data = conn.recv(sizeBytes)
if not data:
break
dLen = 0
for i in range(sizeBytes):
dLen = dLen + pow(2,i) * data[sizeBytes-i-1]
data = str(conn.recv(dLen),'UTF-8')
I could simply do the reverse. I'm new to Python and feel that what I've done is probably longhand!!
1) Is there a better way of receiving and decoding the length?
2) What's the "sister" method to encode the length for transmission?
Thanks.
The struct module is helpful here
for writing:
import struct
msg = 'some message containing 260 ascii characters'
length = len(msg)
encoded_length = struct.pack('>I', length)
encoded_length will be a string of 4 bytes with value '\x00\x00\x01\x04'
for reading:
length = struct.unpack('>I', received_msg[:4])[0]
An example using asyncio:
import asyncio
import struct
def send_message(writer, message):
data = message.encode()
size = struct.pack('>L', len(data))
writer.write(size + data)
async def receive_message(reader):
data = await reader.readexactly(4)
size = struct.unpack('>L', data)[0]
data = await reader.readexactly(size)
return data.decode()
The complete code is here

Convertion between ISO-8859-2 and UTF-8 in Python

I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")

Python UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 74: invalid start byte

I have some data in the hbase stored as bytes and strings combined delimited by \x00 padding.
So the row in my hbase looks like:-
00:00:00:00:00:00\x00\x80\x00\x00\x00U\xEF\xA0\xB00\x002\x0040.0.2.1\x00
There is value corresponding to this row (key) which is 100.
Row description:-
00:00:00:00:00:00 - This is mac address and is a string
\x80\x00\x00\x00U\xEF\xA0\xB00 - This is the time which is saved as bytes
2 - this is customer id number stored as string
40.0.2.1 - this is store ID stored as string
I have used star base module to connect python to it's stargate server.
Here is my code snippet to connection to starbase and to the hbase table, and try fetching out the value of that row:-
from starbase import Connection
import starbase
C = Connection(host='10.10.5.2', port='60010')
get_table = C.table('dummy_table')
mac_address = "00:00:00:00:00:00"
time_start = "\x80\x00\x00\x00U\xEF\xA0\xB00"
cus_id = "2"
store_id = "40.0.2.1"
create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)
fetch_result = get_table.fetch(create_query)
print fetch_result
Expected output is:-
100
You don't have to worry about the starbase connection and it's methods. They work just fine if everything was a string but now since time is converted into bytes, it is giving me error:-
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 74: invalid start byte
Just in case you need to see the output of create_query when I print it:-
00:00:1E:00:C8:36▒U▒v▒130.0.2.6
I would highly appreciate some help. Thanks
Try this
time_start = "\\x80\\x00\\x00\\x00U\\xEF\\xA0\\xB00"
\x is escape sequence for hex values,
create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)
was converting time_start to string. And since x80 is not valid utf-8,it was throwing an error.
My guess would be that your database doesn't support storing bytes in these fields; perhaps you must store strings.
One approach would be to convert your bytes into base64 strings before storing them in the database. For example:
>>> from base64 import b64encode, b64decode
>>> b64encode("\x80\x00\x00\x00U\xEF\xA0\xB00")
'gAAAAFXvoLAw'
>>> b64decode(_)
'\x80\x00\x00\x00U\xef\xa0\xb00'

Categories

Resources