How to understand bytes output in python?

How to understand bytes output in python? - python

I am using struct for creating byte like objects out of arrays. Here is my code:
import numpy as np
import struct
a = 1
a = np.array(a,dtype=np.int32)
format_charecters = f'<1I'
bytes_ = struct.pack(format_charecters,*a.flatten())
bytes_
The code outputs:
b'\x01\x00\x00\x00'
This makes sense to me as I am using < little-endian byte-ordering and referring the following table 1 should correspond to \x01 where x represents hexadecimal.
Now when I replace 1 with 10 I get a surprising result:
b'\n\x00\x00\x00'
I was not expecting this... I thought the output will be:
b'\x0a\x00\x00\x00'
Also for some random value a = 1324233699 I get:
b'\xe33\xeeN'
Using an online decimal-hex converter I get:
4EEE33E3
How to interpret the results of my code?

The link deadshot gave perfectly explained my question. I am adding a screenshot of the table of escape characters in python so that other people can find it even if the 'tutorialspoint' site goes down.

Related

Base64 implementation is not giving the desired result

I've followed the following site instructions in order to implement base64 encoding.
Here's my code:
from bitarray import bitarray
from bitarray.util import ba2int
base64mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcd\
efghijklmnopqrstuvwxyz0123456789+/"
testString1 = "49276d206b696c6c696e6720796f757220627261696e206c\
696b65206120706f69736f6e6f7573206d757368726f6f6d"
stringBits = bitarray()
# We create an unitialized bitarray
stringBits.frombytes(testString1.encode('utf-8'))
# We convert the string to bits
b64stringList = [] # To store the corresponding b64 chars
for sequence in range(0, len(stringBits), 6):
# We scan the bitarray 6 bits at a time
b64stringList += base64mapping[(ba2int(stringBits[sequence:sequence+6]))]
# We store the corresponding b64 char into b64stringlist
print(''.join(b64stringList)) # Lets see the result
The testString1 was taken from the cryptopals set 1 - challenge 1 which tells me the result should be:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
However, my implementation gives another result:
NDkyNzZkMjA2YjY5NmM2YzY5NmU2NzIwNzk2Zjc1NzIyMDYyNzI2MTY5NmUyMDZjNjk2YjY1MjA2MTIwNzA2ZjY5NzM2ZjZlNmY3NTczMjA2ZDc1NzM2ODcyNmY2ZjZk
I checked against Wikipedia's example and I get the same output sans the padding at the end (TODO).
What I am doing wrong or missing here?

According to the site, the task is to convert the hex to base64.
Your encoding, seems to be working correctly as the library base64 outputs the same as your result when running it directly over the hex value
t = b'49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d'
base64.b64encode(t)
b'NDkyNzZkMjA2YjY5NmM2YzY5NmU2NzIwNzk2Zjc1NzIyMDYyNzI2MTY5NmUyMDZjNjk2YjY1MjA2MTIwNzA2ZjY5NzM2ZjZlNmY3NTczMjA2ZDc1NzM2ODcyNmY2ZjZk'
Using binascii we can convert the hex value your originally have to readable text by using unhexlify()
binascii.unhexlify(t)
#b"I'm killing your brain like a poisonous mushroom"
Given this information it would be best to take the approach that what you're storing is not a string but a hex representation of the string itself.
Due to this we will need to convert your hex to the readable string before encoding it.
When we pass the resulting string to base64 your output matches the expected output in the link.
base64.b64encode(binascii.unhexlify(t))
#b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

Why python has different types of bytes

I have two variables, one is b_d, the other is b_test_d.
When I type b_d in the console, it shows:
b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
when I type b_test_d in the console, it shows:
b'[-2.1997713216,-1.4249271187,-1.1076795391,1.5224958034,-0.1709796203,0.3663875698,0.14846441,-0.7415930061,-1.7602231949,0.126605689,0.6010934792,-0.466415358,1.5675525816,1.00836295,1.4332792992,0.6113384254,-1.8008540571,-0.9443408896,1.0943670356,-1.0114642686,1.443892627,-0.2709427287,0.2990462512,0.4650133591,0.2560791327,0.2257600462,-2.4077429827,-0.0509983213,1.0062187148,0.4315075795,-0.6116110033,0.3495131413,-0.3249903375,0.3962305931,-0.1985757285,1.165792433,-1.1171953063,-0.1732557874,-0.3791600654,-0.2860519953,0.7872658859,0.217728374,-0.4715179983,-0.4539613811,-0.396353657,1.2326862425,-1.3548659354,1.6476230786,0.6312713442,-0.735444661,-0.6853447369,-0.8480631975,0.9538606574,0.6653542368,-0.2833696021,0.7281604648,-0.2843872095,0.1461980484,-2.3511731773,-0.3118047948,-1.6938613893,-0.0359659687,-0.5162134311,-2.2026641552,-0.7294895084,0.7493073213,0.1034096968,0.6439803068,-0.2596155272,0.5851323455,1.0173285542,-0.7370464113,1.0442954406,-0.5363832595,0.0117795359,0.2225617514,0.067571974,-0.9154681906,-0.293808596,1.3717113798,0.4919516922,-0.3254944005,1.6203744532,-0.1810222279,-0.6111596457,1.344064259,-0.4596893179,-0.2356197144,0.4529942046,1.6244603294,0.1849995925,0.6223061217,-0.0340662398,0.8365900535,-0.6804201929,0.0149665385,0.4132453788,0.7971962667,-1.9391525531,0.1440486871,-0.7103617816,0.9026539637,0.6665798363,-1.5885073458,1.4084493329,-1.397040825,1.6215697667,1.7057148522,0.3802647045,-0.4239271483,1.4773614536,1.6841461329,0.1166845529,-0.3268795898,-0.9612751672,0.4062399443,0.357209662,-0.2977362702,-0.3988147401,-0.1174652196,0.3350589818,-1.8800423584,0.0124169787,1.0015110265,0.789541751,-0.2710408983,1.4987300181,-1.1726824468,-0.355322591,0.6567978423,0.8319110558,0.8258835069,-1.1567887763,1.9568551122,1.5148655075,1.0589021915,-0.4388232953,-0.7451680183,-2.1897621693,0.4502135234,-1.9583089063,0.1358789518,-1.7585860897,0.452259777,0.7406800349,-1.3578980418,1.108740204,-1.1986272667,-1.0273598206,-1.8165822264,1.0853600894,-0.273943514,0.8589890805,1.3639094329,-0.6121993589,-0.0587067992,0.0798457584,1.0992814648,-1.0455733611,1.4780003064,0.5047157705,0.1565451605,0.9656886956,-0.5998330255,0.4846727299,0.8790524818,1.0288893846,-2.0842447397,0.4074607421,2.1523241756,-1.1268047125,-0.6016001524,-1.3302141561,1.1869516954,1.0988060125,0.7405900405,1.1813110811,0.8685330644,2.0927140519,-1.7171952009,0.9231993147,0.320874115,0.7465845079,-0.1034484959,-0.4776822499,0.436218328,-0.4083564542,0.4835567895,1.0733230373,-0.858658902,-0.4493571034,0.4506418221,1.6696649735,-0.9189799982,-1.1690356499,-1.0689397924,0.3174297583,1.0403701444,0.5440082812,-0.1128248996]'
Both of them are bytes type, but I can use numpy.frombuffer to read the b_d, but not the b_test_d. And they look very different. Why do I have these two types of bytes?
Thank you.

[A]nyone can point out how to use Json marshall to convert the byte to the same type of bytes as the first one?
This isn't the right question, but I think I know what you're asking. You say you're getting the 2nd array via JSON marshalling, but that it's also not under your control:
it was obtained by json marshal (convert a received float array to byte array, and then convert the result to base64 string, which is done by someone else)
That's fine though, you just have to do a few steps of processing to get to a state equivalent to the first set of bytes.
First, some context to what's going on. You've already seen that numpy can understand your first set of bytes.
>>> numpy.frombuffer(data)
[1.21 2.963 1. ]
Based on its output, it looks like numpy is interpreting your data as 3 doubles, with 8 bytes each (24 bytes total)...
>>> data = b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
>>> len(data)
24
...which the struct module can also interpret.
# Separate into 3 doubles
x, y, z = data[:8], data[8:16], data[16:]
print([struct.unpack('d', i) for i in (x, y, z)])
[(1.21,), (2.963,), (1.0,)
There's actually (at least) 2 ways you can get a numpy array out of this.
Short way
1. Convert to string
# Original JSON data (snipped)
junk = b'[-2.1997713216,-1.4249271187,-1.1076795391,...]'
# Decode from bytes to a string (defaults to utf-8), then
# trim off the brackets (first and last characters in the string)
as_str = junk.decode()[1:-1]
2. Use numpy.fromstring
numpy.fromstring(as_str, dtype=float, sep=',')
# Produces:
array([-2.19977132, -1.42492712, -1.10767954, 1.5224958 , -0.17097962,
0.36638757, 0.14846441, -0.74159301, -1.76022319, 0.12660569,
0.60109348, -0.46641536, 1.56755258, 1.00836295, 1.4332793 ,
0.61133843, -1.80085406, -0.94434089, 1.09436704, -1.01146427,
1.44389263, -0.27094273, 0.29904625, 0.46501336, 0.25607913,
0.22576005, -2.40774298, -0.05099832, 1.00621871, 0.43150758,
... ])
Long way
Note: I found the fromstring method after writing this part up, figured I'd leave it here to at least help explain the byte differences.
1. Convert the JSON data into an array of numeric values.
# Original JSON data (snipped)
junk = b'[-2.1997713216,-1.4249271187,-1.1076795391,...]'
# Decode from bytes to a string - defaults to utf-8
junk = junk.decode()
# Trim off the brackets - First and last characters in the string
junk = junk[1:-1]
# Separate into values
junk = junk.split(',')
# Convert to numerical values
doubles = [float(val) for val in junk]
# Or, as a one-liner
doubles = [float(val) for val in junk.decode()[1:-1].split(',')]
# "doubles" currently holds:
[-2.1997713216,
-1.4249271187,
-1.1076795391,
1.5224958034,
...]
2. Use struct to get byte-representations for the doubles
import struct
as_bytes = [struct.pack('d', val) for val in doubles]
# "as_bytes" currently holds:
[b'\x08\x9b\xe7\xb4!\x99\x01\xc0',
b'\x0b\x00\xe0`\x80\xcc\xf6\xbf',
b'+ ..\x0e\xb9\xf1\xbf',
b'hg>\x8f$\\\xf8?',
...]
3. Join all the double values (as bytes) into a single byte-string, then submit to numpy
new_data = b''.join(as_bytes)
numpy.frombuffer(new_data)
# Produces:
array([-2.19977132, -1.42492712, -1.10767954, 1.5224958 , -0.17097962,
0.36638757, 0.14846441, -0.74159301, -1.76022319, 0.12660569,
0.60109348, -0.46641536, 1.56755258, 1.00836295, 1.4332793 ,
0.61133843, -1.80085406, -0.94434089, 1.09436704, -1.01146427,
1.44389263, -0.27094273, 0.29904625, 0.46501336, 0.25607913,
0.22576005, -2.40774298, -0.05099832, 1.00621871, 0.43150758,
... ])

A bytes object can be in any format. It is "just a bunch of bytes" without context. For display Python will represent byte values <128 as their ASCII value, and use hex escape codes (\x##) for others.
The first looks like IEEE 754 double precision floating point. numpy or struct can read it. The second one is in JSON format. Use the json module to read it:
import numpy as np
import json
import struct
b1 = b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
b2 = b'[-2.1997713216,-1.4249271187,-1.1076795391,1.5224958034]'
j = json.loads(b2)
n = np.frombuffer(b1)
s = struct.unpack('3d',b1)
print(j,n,s,sep='\n')
# To convert b2 into a b1 format
b = struct.pack('4d',*j)
print(b)
Output:
[-2.1997713216, -1.4249271187, -1.1076795391, 1.5224958034]
[1.21 2.963 1. ]
(1.21, 2.963, 1.0)
b'\x08\x9b\xe7\xb4!\x99\x01\xc0\x0b\x00\xe0`\x80\xcc\xf6\xbf+ ..\x0e\xb9\xf1\xbfhg>\x8f$\\\xf8?'

Why can't Python struct module pack (or unpack) multi bytes with little endian

I'm dealing with some multi bytes issues. For example, I have a variable a = b'\x00\x01\x02\x03', it is a bytes object rather than int. I'd like to struct.pack it to form a package with little endian, but <4s didn't work. In fact, <4s and >4s get the same results. What to do if I'd like the result to be b'\x03\x02\x01\x00.
I know I could use struct.pack('<L', struct.unpack('>L', a)), but is it the only and correct way to deal with multi bytes objects?
Example:
import struct
import secrets
mhdr = b'\x20'
joineui = b'\x00\x01\x02\x03\x04\x05\x06\x07'
deveui = b'\x08\x09\x10\x11\x12\x13\x14\x15'
devnonce = secrets.token_bytes(2)
joinreq = struct.pack(
'<s8s8s2s',
mhdr,
joineui,
deveui,
devnonce,
)
# The expected joinreq should be b'\x20\x07\x06\x05\x04\x03\x02\x01\x00\x15\x14\x13\x12\x11\x10\x09\x08...'

It seems to me you do not want to have 4 single chars, but instead 1 integer.
So instead of '4s' you should try using 'i' or 'I' (whether it is signed or unsigned).
Your example should look like
import struct
import secrets
mhdr = b'\x20'
joineui = b'\x00\x01\x02\x03\x04\x05\x06\x07'
deveui = b'\x08\x09\x10\x11\x12\x13\x14\x15'
devnonce = secrets.token_bytes(2)
joinreq = struct.pack(
'<BQQH', #use small letters if the values are signed instead of unsigned
mhdr,
joineui,
deveui,
devnonce,
)
"Q" stands for long long unsigned (8byte). If you want to use float instead you can use d for double float precision (8byte).
You can see the meaning of all letters in the documentation of struct.

Read a binary file using unpack in Python compared with a IDL method

I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)

According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.

Python 2.7.6 Optimizing code for packing big endian bytes into a string

import struct
varA['Z']['value'] = 8700
varA['Y']['value'] = 8800
varA['X']['value'] = 8900
varA['W']['value'] = 8800
varA['V']['value'] = 8700
varB = ""
varC = ""
for name in 'Z Y X W V'.split(' '):
varB = C[name]['value']
varC += str(struct.pack('>h',varB))
print varC[:-1] + '\n'
What i need is a string of bytes,
where each number is a signed int16 big-endian byte(s).
this code here works for what im trying to do, but i know
theres a far more elegant solution.
I wouldn't spend any time on optimizing the varA as its
only there to set up the code and won't be used in my project.
Also the print is also there to set up the problem, im actually
sending the bytes as a socket.
Initially I had this in an array first few times, but when I converted the
array to a bytearray, I kept running into having 0x00 mixed in.
Same with struct, as you can see in my solution removing the
0x00 at the end.

Here's a simpler way to do it. It's unclear from the question whether this is exactly the result you desire.
values = [varA[name]['value'] for name in 'ZYXWV']
varC = struct.pack('>'+str(len(values))+'h', *values)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to understand bytes output in python? - python

The link deadshot gave perfectly explained my question. I am adding a screenshot of the table of escape characters in python so that other people can find it even if the 'tutorialspoint' site goes down.

Related

Base64 implementation is not giving the desired result

Why python has different types of bytes

Why can't Python struct module pack (or unpack) multi bytes with little endian

Read a binary file using unpack in Python compared with a IDL method

Python 2.7.6 Optimizing code for packing big endian bytes into a string

Categories

Resources