Python message partition error - python

I'm receiving the following message trough TCP:
{"message": "Start", "client": "134.106.74.21", "type": 1009}<EOM>
but when I'm trying to partition that
msg.partition( "<EOM>" )
I'm getting the following array:
('{\x00\x00\x00"\x00\x00\x00m\x00\x00\x00e\x00\x00\x00s\x00\x00\x00s\x00\x00\x00a\x00
\x00\x00g\x00\x00\x00e\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x00"\x00\x00\x00#
\x00\x00\x00B\x00\x00\x00E\x00\x00\x00G\x00\x00\x00I\x00\x00\x00N\x00\x00\x00;\x00\x00
\x00A\x00\x00\x00l\x00\x00\x00l\x00\x00\x00;\x00\x00\x000\x00\x00\x00;\x00\x00\x001\x00\x00
\x00;\x00\x00\x000\x00\x00\x00;\x00\x00\x001\x00\x00\x003\x00\x00\x004\x00\x00\x00.\x00\x00
\x001\x00\x00\x000\x00\x00\x006\x00\x00\x00.\x00\x00\x007\x00\x00\x004\x00\x00\x00.\x00\x00
\x001\x00\x00\x002\x00\x00\x005\x00\x00\x00:\x00\x00\x003\x00\x00\x000\x00\x00\x000\x00\x00
\x000\x00\x00\x000\x00\x00\x00;\x00\x00\x00#\x00\x00\x00E\x00\x00\x00N\x00\x00\x00D\x00\x00
\x00"\x00\x00\x00,\x00\x00\x00 \x00\x00\x00"\x00\x00\x00c\x00\x00\x00l\x00\x00\x00i\x00\x00
\x00e\x00\x00\x00n\x00\x00\x00t\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x00"\x00
\x00\x001\x00\x00\x003\x00\x00\x004\x00\x00\x00.\x00\x00\x001\x00\x00\x000\x00\x00\x006
\x00\x00\x00.\x00\x00\x007\x00\x00\x004\x00\x00\x00.\x00\x00\x001\x00\x00\x002\x00\x00
\x005\x00\x00\x00"\x00\x00\x00,\x00\x00\x00 \x00\x00\x00"\x00\x00\x00t\x00\x00\x00y\x00
\x00\x00p\x00\x00\x00e\x00\x00\x00"\x00\x00\x00:\x00\x00\x00 \x00\x00\x002\x00\x00\x000
\x00\x00\x000\x00\x00\x005\x00\x00\x00}\x00\x00\x00<\x00\x00\x00E\x00\x00\x00O\x00\x00\x00M
\x00\x00\x00>\x00\x00\x00{"message": "Start", "client": "134.106.74.21", "type": 1009}',
'', '')
Updated
try:
#Check if there are messages, if don't than throwing an exception otherwise continue
ans = self.request.recv( 20480 )
if( ans ):
recv = self.getMessage( recv + ans )
else:
#Master client disconnected
break
except:
...
def getMessage( self, msg ):
print( "masg:" + msg );
aSplit = msg.partition( "<EOM>" )
while( aSplit[ 1 ] == "<EOM>" ):
self.recvMessageHandler( json.loads( aSplit[ 0 ] ) )
#Get the new message id any
msg = aSplit[ 3 ]
aSplit = msg.partition( "<EOM>" )
return msg;
The problem has occurred when I'm trying to add two strings.
recv + ans

If you print msg.encode("hex") then you will likely see that this is exactly what is in the string.
In any case, you may have noticed that every 4th byte of the result is one of the characters that you expected. This suggests that you have a UCS4 Unicode string that you are not handling properly.
Did you receive UCS4 encoded bytes? If so then you should be stuffing them into a unicode string u"".append(stuff). But if you are receiving UCS4-encoded bytes and you have any influence over the sender, you really should get things changed to transmit and receive UTF-8 encoded strings since that is more normal over network connections.
Are you sure that the 5 literal bytes < E O M > are indeed the delimiter that you need to use for partitioning. Or is it supposed to be the single byte ASCII code named EOM? Or is it a UCS4 encoded u"<EOM>" ?

Related

Hex to string, the python way, in powershell

A bit of a weird question perhaps, but I'm trying to replicate a python example where they are creating a HMAC SHA256 hash from a series of parameters.
I've run into a problem where I'm supposed to translate an api key in hex to ascii and use it as secret, but I just can't get the output to be the same as python.
>>> import hmac
>>> import hashlib
>>> apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b"
>>> apiKey.decode('hex')
'v\xb0,\x0cEC\xa8^EU$fiL\xf6w\x93x3\xc9\xcc\xe8\x7f\nb\x82H\xaf-,I['
If I've understood the material online this is supposed to represent the hex string in ascii characters.
Now to the powershell script:
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$hexstring = ""
for($i=0; $i -lt $apikey.Length;$i=$i+2){
$hexelement = [string]$apikey[$i] + [string]$apikey[$i+1]
$hexstring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
That outputs the following:
v°,♀EC¨^EU$fiLöw?x3ÉÌè⌂
b?H¯-,I[
They are almost the same, but not quite and using them as secret in the HMAC generates different results. Any ideas?
Stating the obvious: The key in this example is no the real key.
Update:
They look more or less the same, but the encoding of the output is different. I also verified the hex to ASCII in multiple online functions and the powershell version seems right.
Does anyone have an idea how to compare the two different outputs?
Update 2:
I converted each character to integer and both Python and Powershell generates the same numbers, aka the content should be the same.
Attaching the scripts
Powershell:
Function generateToken {
Param($apikey, $url, $httpMethod, $queryparameters=$false, $postData=$false)
#$timestamp = [int]((Get-Date -UFormat %s).Replace(",", "."))
$timestamp = "1446128942"
$datastring = $httpMethod + $url
if($queryparameters){ $datastring += $queryparameters }
$datastring += $timestamp
if($postData){ $datastring += $postData }
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$apiAscii = HexToString -hexstring $apiKey
$hmacsha.key = [Text.Encoding]::ASCII.GetBytes($apiAscii)
$signature = $hmacsha.ComputeHash([Text.Encoding]::ASCII.GetBytes($datastring))
$signature
}
Function HexToString {
Param($hexstring)
$asciistring = ""
for($i=0; $i -lt $hexstring.Length;$i=$i+2){
$hexelement = [string]$hexstring[$i] + [string]$hexstring[$i+1]
$asciistring += [CHAR][BYTE][CONVERT]::toint16($hexelement,16)
}
$asciistring
}
Function TokenToHex {
Param([array]$Token)
$hexhash = ""
Foreach($element in $Token){
$hexhash += '{0:x}' -f $element
}
$hexhash
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
$token = generateToken -uri $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters, postData=postData, -apiKey $apiKey
TokenToHex -Token $token
Python:
import hashlib
import hmac
import time
try: import simplejson as json
except ImportError: import json
class HMACSample:
def generateSecurityToken(self, url, httpMethod, apiKey, queryParameters=None, postData=None):
#timestamp = str(int(round(time.time()*1000)))
timestamp = "1446128942"
datastring = httpMethod + url
if queryParameters != None : datastring += queryParameters
datastring += timestamp
if postData != None : datastring += postData
token = hmac.new(apiKey.decode('hex'), msg=datastring, digestmod=hashlib.sha256).hexdigest()
return token
if __name__ == '__main__':
apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
apiKey = "76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b";
queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
postData = "{param1: 123, param2: 456}"
tool = HMACSample()
hmac = tool.generateSecurityToken(url=apiEndpoint, httpMethod="GET", queryParameters=queryParameters, postData=postData, apiKey=apiKey)
print json.dumps(hmac, indent=4)
apiKey with "test" instead of the converted hex to ASCII string outputs the same value which made me suspect that the conversion was the problem. Now I'm not sure what to believe anymore.
/Patrik
ASCII encoding support characters from this code point range 0–127. Any character outside this range, encoded with byte 63, which correspond to ?, in case you decode byte array back to string. So, with your code, you ruin your key by applying ASCII encoding to it. But if what you want is a byte array, then why do you do Hex String -> ASCII String -> Byte Array instead of just Hex String -> Byte Array?
Here is PowerShell code, which generate same results, as your Python code:
function GenerateToken {
param($apikey, $url, $httpMethod, $queryparameters, $postData)
$datastring = -join #(
$httpMethod
$url
$queryparameters
#[DateTimeOffset]::Now.ToUnixTimeSeconds()
1446128942
$postData
)
$hmacsha = New-Object System.Security.Cryptography.HMACSHA256
$hmacsha.Key = #($apikey -split '(?<=\G..)(?=.)'|ForEach-Object {[byte]::Parse($_,'HexNumber')})
[BitConverter]::ToString($hmacsha.ComputeHash([Text.Encoding]::UTF8.GetBytes($datastring))).Replace('-','').ToLower()
}
$apiEndpoint = "http://test.control.llnw.com/traffic-reporting-api/v1"
#what you see in Control on Edit My Profile page#
$apikey = '76b02c0c4543a85e45552466694cf677937833c9cce87f0a628248af2d2c495b';
$queryParameters = "shortname=bulkget&service=http&reportDuration=day&startDate=2012-01-01"
$postData = "{param1: 123, param2: 456}"
GenerateToken -url $apiEndpoint -httpMethod "GET" -queryparameters $queryParameters -postData $postData -apiKey $apiKey
I also fix some other errors in your PowerShell code. In particular, arguments to GenerateToken function call. Also, I change ASCII to UTF8 for $datastring encoding. UTF8 yields exactly same bytes if all characters are in ASCII range, so it does not matter in you case. But if you want to use characters out of ASCII range in $datastring, than you should choose same encoding, as you use in Python, or you will not get the same results.

unpack requires a string argument of length 24

I am not sure what I am doing wrong here but I am trying to open a file, trace1.flow, read the header information then throw the source IP and destination IP into dictionaries. This is done in Python running on a Fedora VM. I am getting the following error:
(secs, nsecs, booted, exporter, mySourceIP, myDestinationIP) = struct.unpack('IIIIII',myBuf)
struct.error: unpack requires a string argument of length 24
Here is my code:
import struct
import socket
#Dictionaries
uniqSource = {}
uniqDestination = {}
def int2quad(i):
z = struct.pack('!I', i)
return socket.inet_ntoa(z)
myFile = open('trace1.flow')
myBuf = myFile.read(8)
(magic, endian, version, headerLen) = struct.unpack('HBBI', myBuf)
print "Magic: ", hex(magic), "Endian: ", endian, "Version: ", version, "Header Length: ", headerLen
myFile.read(headerLen - 8)
try:
while(True):
myBuf = myFile.read(24)
(secs, nsecs, booted, exporter, mySourceIP, myDestinationIP) = struct.unpack('IIIIII',myBuf)
mySourceIP = int2quad(mySourceIP)
myDestinationIP = int2quad(myDestinationIP)
if mySourceIP not in uniqSource:
uniqSource[mySourceIP] = 1
else:
uniqSource[mySourceIP] += 1
if myDestinationIP not in uniqDestination:
uniqDestination[myDestinationIP] = 1
else:
uniqDestination[myDestinationIP] += 1
myFile.read(40)
except EOFError:
print "END OF FILE"
You seem to assume that file.read will raise EOFError on end of file, but this error is only raised by input() and raw_input(). file.read will simply return a string that's shorter than requested (possibly empty).
So you need to check the length after reading:
myBuf = myFile.read(24)
if len(myBuf) < 24:
break
Perhaps your have reached end-of-file. Check the length of myBuf:
len(myBuf)
It's probably less than 24 chars long. Also you don't need those extra parenthesis, and try to specify duplicated types using 'nI' like this:
secs, nsecs, booted, exporter, mySourceIP, myDestinationIP = struct.unpack('6I',myBuf)

Python reading until null character from Telnet

I am telneting to my server, which answers to me with messages and at the end of each message is appended hex00 (null character) which cannot be read. I tried searching through and through, but can't seem to make it work, a simple example:
from telnetlib import Telnet
connection = Telnet('localhost', 5001)
connection.write('aa\n')
connection.read_eager()
This returns an output:
'Fail - Command aa not found.\n\r'
whereas there should be sth like:
'Fail - Command aa not found.\n\r\0'
Is there any way to get this end of string character? Can I get bytes as an output if the character is missed on purpose?
The 00 character is there:
I stumbled in this same problem when trying to get data from an RS232-TCP/IP Converter using telnet - the telnetlib would suppress every 0x00 from the message. As Fredrik Johansson well answered, it is the way telnetlib was implemented.
One solution would be to override the process_rawq() function from telnetlib's Telnet class that doesn't eat all the null characters:
import telnetlib
from telnetlib import IAC, DO, DONT, WILL, WONT, SE, NOOPT
def _process_rawq(self):
"""Alteração da implementação desta função necessária pois telnetlib suprime 0x00 e \021 dos dados lidos
"""
buf = ['', '']
try:
while self.rawq:
c = self.rawq_getchar()
if not self.iacseq:
# if c == theNULL:
# continue
# if c == "\021":
# continue
if c != IAC:
buf[self.sb] = buf[self.sb] + c
continue
else:
self.iacseq += c
elif len(self.iacseq) == 1:
# 'IAC: IAC CMD [OPTION only for WILL/WONT/DO/DONT]'
if c in (DO, DONT, WILL, WONT):
self.iacseq += c
continue
self.iacseq = ''
if c == IAC:
buf[self.sb] = buf[self.sb] + c
else:
if c == SB: # SB ... SE start.
self.sb = 1
self.sbdataq = ''
elif c == SE:
self.sb = 0
self.sbdataq = self.sbdataq + buf[1]
buf[1] = ''
if self.option_callback:
# Callback is supposed to look into
# the sbdataq
self.option_callback(self.sock, c, NOOPT)
else:
# We can't offer automatic processing of
# suboptions. Alas, we should not get any
# unless we did a WILL/DO before.
self.msg('IAC %d not recognized' % ord(c))
elif len(self.iacseq) == 2:
cmd = self.iacseq[1]
self.iacseq = ''
opt = c
if cmd in (DO, DONT):
self.msg('IAC %s %d',
cmd == DO and 'DO' or 'DONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + WONT + opt)
elif cmd in (WILL, WONT):
self.msg('IAC %s %d',
cmd == WILL and 'WILL' or 'WONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + DONT + opt)
except EOFError: # raised by self.rawq_getchar()
self.iacseq = '' # Reset on EOF
self.sb = 0
pass
self.cookedq = self.cookedq + buf[0]
self.sbdataq = self.sbdataq + buf[1]
telnetlib.Telnet.process_rawq = _process_rawq
then override the Telnet class' method:
telnetlib.Telnet.process_rawq = _process_rawq
This solved the problem for me.
This code (http://www.opensource.apple.com/source/python/python-3/python/Lib/telnetlib.py) seems to just ignore null characters. Is that really correct behavior?
def process_rawq(self):
"""Transfer from raw queue to cooked queue.
Set self.eof when connection is closed. Don't block unless in
the midst of an IAC sequence.
"""
buf = ''
try:
while self.rawq:
c = self.rawq_getchar()
if c == theNULL:
continue
:
:
process_rawq is then in turn called by e.g. read_until
def read_until(self, match, timeout=None):
"""Read until a given string is encountered or until timeout.
When no match is found, return whatever is available instead,
possibly the empty string. Raise EOFError if the connection
is closed and no cooked data is available.
"""
n = len(match)
self.process_rawq()
:
:
I also want to receive the null character. In my particular case it marks the end of a multiline message.
So the answer seems to be that this is expected behavior as the library code is written.
FWIW https://support.microsoft.com/en-us/kb/231866 states:
Communication is established using TCP/IP and is based on a Network
Virtual Terminal (NVT). On the client, the Telnet program is
responsible for translating incoming NVT codes to codes understood by
the client's display device as well as for translating
client-generated keyboard codes into outgoing NVT codes.
The NVT uses 7-bit codes for characters. The display device, referred
to as a printer in the RFC, is only required to display the standard
printing ASCII characters represented by 7-bit codes and to recognize
and process certain control codes. The 7-bit characters are
transmitted as 8-bit bytes with the most significant bit set to zero.
An end-of-line is transmitted as a carriage return (CR) followed by a
line feed (LF). If you want to transmit an actual carriage return,
this is transmitted as a carriage return followed by a NUL (all bits
zero) character.
and
Name Code Decimal Value
Function NULL NUL 0 No operation

python construct for protocol parsing

I am trying to mix up the power of twisted Protocol with the ductility of construct, the declarative binary data parser.
So far, my MessageReceiver protocol accumulates the data coming from the tcp channel in the following way:
def rawDataReceived(self, data):
'''
This method bufferizes the data coming from the TCP channel in the following way:
- Initially, discard the stream until a reserved character is detected
- add data to the buffer up to the expected message length unless the reserved character is met again. In that case discard the message and start again
- if the expected message length is reached, attempt to parse the message and clear the buffer
'''
if self._buffer:
index = data.find(self.reserved_character)
if index > -1:
if len(self._buffer) + index >= self._fixed_size:
self.on_message(self._buffer + data[:data.index(self._reserved_character)])
self._buffer = b''
data = data[data.index(self.reserved_character):]
[self.on_message(chunks[:self._fixed_size]) for chunks in [self.reserved_character + msg for msg in data.split(self._reserved_character) if msg]]
elif len(self._buffer) + len(data) < self._expected_size:
self._buffer = self._buffer + data
else:
self._buffer = b''
else:
try:
data = data[data.index(self._reserved_character):]
[self.on_message(chunks[:self._fixed_size]) for chunks in [self._reserved_character + msg for msg in data.split(self._reserved_character) if msg]]
except Exception, exc:
log.msg("Warning: Maybe there is no delimiter {delim} for the new message. Error: {err}".format(delim=self._reserved_character, err=str(exc)))
Now I am in need of evolving the protocol to take into consideration the fact that the message may or may not carry optional fields (thus there isn't a fixed message length anymore). I modeled (a meaningful part of) the message parser with construct in the following way:
def on_message(self, msg):
return Struct(HEADER,
Bytes(HEADER_RAW, 3),
BitStruct(OPTIONAL_HEADER_STRUCT,
Nibble(APPLICATION_SELECTOR),
Flag(OPTIONAL_HEADER_FLAG),
Padding(3)
),
If(lambda ctx: ctx.optional_header_struct[OPTIONAL_HEADER_FLAG],
Embed(Struct(None,
Byte(BATTERY_CHARGE),
Bytes(OPTIONAL_HEADER, 3)
)
)
)
).parse(msg)
So right now I am in need to change the buffering logic to pass the right chunk size to the Struct. I would like to avoid sizing up the data to be passed to the Structin the rawDataReceived method considering that the rules of what is a possible candidate for a message are known in the construct object.
Is there any way to push the buffering logic to the construct object?
Edit
I was able to partially achieved the aim to push the buffering logic inside, by simply making use of Macros and Adapters:
MY_PROTOCOL = Struct("whatever",
Anchor("begin"),
RepeatUntil(lambda obj, ctx:obj==RESERVED_CHAR, Field("garbage", 1)),
NoneOf(Embed(HEADER_SECTION), [RESERVED_CHAR]),
Anchor("end"),
Value("size", lambda ctx:ctx.end - ctx.begin)
)
This greatly simplifies the caller code (which is no longer in rawDataReceived thanks to Glyph's suggestion):
def dataReceived(self, data):
log.msg('Received data: {}'.format(bytes_to_hex(data)))
self._buffer += data
try:
container = My_PROTOCOL.parse(self._buffer)
self._buffer = self._buffer[container.size:]
d, self.d = self.d, self._create_new_transmission_deferred()
d.callback(container)
except ValidationError, err:
self._cb_error("A validation error occurred. Discarding the rest of the message. {}".format(err))
self._buffer = b''
except FieldError, err: #Incomplete message. We simply keep on buffering and retry
if len(self._buffer) >= MyMessageReceiver.MAX_GARBAGE_SIZE:
self._cb_error("Buffer overflown. No delimiter found in the stream")
Unfortunately this solution covers the requirements only partially since I could not find a way to get construct to tell me the index of the stream that produced the error and therefore I am obliged to drop the entire buffer, which is not ideal.
To get the stream position at which an error occurs, you'll need to use Anchor and write your own version of NoneOf. Assuming HEADER_SECTION is another Construct, replace the NoneOf like so:
SpecialNoneOf(Struct('example', Anchor('position'), HEADER_SECTION), [RESERVED_CHAR]))
SpecialNoneOf needs to subclass from Adapter and combine init and _validate from NoneOf with _encode and _decode from Validator. In _decode, replace
raise ValidationError("invalid object", obj)
with
raise ValidationError("invalid object", obj.header_section + " at " + obj.position)
Replace header_section with the name of the HEADER_SECTION Construct. You will have to change the structure of the resulting container or figure out a different way to use Embed to make this method work.

Split array in values in Python by length in bytes

I have a sensor system. The sensors receive commands from me, do something and then send a response to me.
The response is like this:
seq. number | net_id | opcode_group | opcode | payloadlength | val
where I have these values delimited by a space character.
Now I want to take the last value named val. In this part, I have all the information I want to know to elaborate the response from the sensors.
For example, I have this response for the command that wants to know the IEEE MAC address of the sensor:
In this case val is all the fields after Length in the response. There are not separation, but I have a sort of string.
All I have to do is to split this array/string of numbers, just knowing only the length of every field. For ex. the status is 1 byte, the MAC address 8 byte, and so on...
My code is this:
if response.error:
ret['error'] = 'Error while retrieving unregistered sensors'
else:
for line in response.body.split("\n"):
if line != "":
value = int(line.split(" ")[6])
ret['response'] = value
self.write(tornado.escape.json_encode(ret))
self.finish()
if command == 'IDENTIFY':
status = value.split(" ")[0]
IEEEAddrRemoteDev = value.split(" ")[1]
NWKAddrRemoteDev = value.split(" ")[2]
NumOfAssociatedDevice = value.split(" ")[3]
StartIndex = value.split(" ")[4]
ListOfShortAddress = value.split(" ")[5]
if status == 0x00:
ret['success'] = "The %s command has been succesfully sent! \
IEEE address: %s" % % (command.upper(), IEEEAddrRemoteDev)
self.write(tornado.escape.json_encode(ret))
elif status == 0x80:
ret['success'] = "Invalid Request Type"
self.write(tornado.escape.json_encode(ret))
elif status == 0x81:
ret['success'] = "Device Not Found"
self.write(tornado.escape.json_encode(ret))
where in the first part I take the 6th value from the entire response and I put this in the variable value. After this I want to split this variable in every component.
For ex. this status = value.split(" ")[0] in which way I have to split????
Thank you very much for the help!
What is the exact format of the val field (i.e. the contents of value in your code)? Is it a string? A sequence of bytes?
If it's a string, you could use:
status = value[0]
IEEEAddrRemoteDev = value[1:8]
NWKAddrRemoteDev = value[9:2]
NumOfAssociatedDevice = value[11:1]
StartIndex = value[12:1]
ListOfShortAddress = value[13:2*NumOfAssociatedDevice]
If it's a sequence of bytes, then you could use the struct.unpack() - see here.
The problem is that you're treating this as a Python string, but it's just going to be a bunch of bits.
You need to use struct.unpack to split these up.

Categories

Resources