I have two variables, one is b_d, the other is b_test_d.
When I type b_d in the console, it shows:
b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
when I type b_test_d in the console, it shows:
b'[-2.1997713216,-1.4249271187,-1.1076795391,1.5224958034,-0.1709796203,0.3663875698,0.14846441,-0.7415930061,-1.7602231949,0.126605689,0.6010934792,-0.466415358,1.5675525816,1.00836295,1.4332792992,0.6113384254,-1.8008540571,-0.9443408896,1.0943670356,-1.0114642686,1.443892627,-0.2709427287,0.2990462512,0.4650133591,0.2560791327,0.2257600462,-2.4077429827,-0.0509983213,1.0062187148,0.4315075795,-0.6116110033,0.3495131413,-0.3249903375,0.3962305931,-0.1985757285,1.165792433,-1.1171953063,-0.1732557874,-0.3791600654,-0.2860519953,0.7872658859,0.217728374,-0.4715179983,-0.4539613811,-0.396353657,1.2326862425,-1.3548659354,1.6476230786,0.6312713442,-0.735444661,-0.6853447369,-0.8480631975,0.9538606574,0.6653542368,-0.2833696021,0.7281604648,-0.2843872095,0.1461980484,-2.3511731773,-0.3118047948,-1.6938613893,-0.0359659687,-0.5162134311,-2.2026641552,-0.7294895084,0.7493073213,0.1034096968,0.6439803068,-0.2596155272,0.5851323455,1.0173285542,-0.7370464113,1.0442954406,-0.5363832595,0.0117795359,0.2225617514,0.067571974,-0.9154681906,-0.293808596,1.3717113798,0.4919516922,-0.3254944005,1.6203744532,-0.1810222279,-0.6111596457,1.344064259,-0.4596893179,-0.2356197144,0.4529942046,1.6244603294,0.1849995925,0.6223061217,-0.0340662398,0.8365900535,-0.6804201929,0.0149665385,0.4132453788,0.7971962667,-1.9391525531,0.1440486871,-0.7103617816,0.9026539637,0.6665798363,-1.5885073458,1.4084493329,-1.397040825,1.6215697667,1.7057148522,0.3802647045,-0.4239271483,1.4773614536,1.6841461329,0.1166845529,-0.3268795898,-0.9612751672,0.4062399443,0.357209662,-0.2977362702,-0.3988147401,-0.1174652196,0.3350589818,-1.8800423584,0.0124169787,1.0015110265,0.789541751,-0.2710408983,1.4987300181,-1.1726824468,-0.355322591,0.6567978423,0.8319110558,0.8258835069,-1.1567887763,1.9568551122,1.5148655075,1.0589021915,-0.4388232953,-0.7451680183,-2.1897621693,0.4502135234,-1.9583089063,0.1358789518,-1.7585860897,0.452259777,0.7406800349,-1.3578980418,1.108740204,-1.1986272667,-1.0273598206,-1.8165822264,1.0853600894,-0.273943514,0.8589890805,1.3639094329,-0.6121993589,-0.0587067992,0.0798457584,1.0992814648,-1.0455733611,1.4780003064,0.5047157705,0.1565451605,0.9656886956,-0.5998330255,0.4846727299,0.8790524818,1.0288893846,-2.0842447397,0.4074607421,2.1523241756,-1.1268047125,-0.6016001524,-1.3302141561,1.1869516954,1.0988060125,0.7405900405,1.1813110811,0.8685330644,2.0927140519,-1.7171952009,0.9231993147,0.320874115,0.7465845079,-0.1034484959,-0.4776822499,0.436218328,-0.4083564542,0.4835567895,1.0733230373,-0.858658902,-0.4493571034,0.4506418221,1.6696649735,-0.9189799982,-1.1690356499,-1.0689397924,0.3174297583,1.0403701444,0.5440082812,-0.1128248996]'
Both of them are bytes type, but I can use numpy.frombuffer to read the b_d, but not the b_test_d. And they look very different. Why do I have these two types of bytes?
Thank you.
[A]nyone can point out how to use Json marshall to convert the byte to the same type of bytes as the first one?
This isn't the right question, but I think I know what you're asking. You say you're getting the 2nd array via JSON marshalling, but that it's also not under your control:
it was obtained by json marshal (convert a received float array to byte array, and then convert the result to base64 string, which is done by someone else)
That's fine though, you just have to do a few steps of processing to get to a state equivalent to the first set of bytes.
First, some context to what's going on. You've already seen that numpy can understand your first set of bytes.
>>> numpy.frombuffer(data)
[1.21 2.963 1. ]
Based on its output, it looks like numpy is interpreting your data as 3 doubles, with 8 bytes each (24 bytes total)...
>>> data = b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
>>> len(data)
24
...which the struct module can also interpret.
# Separate into 3 doubles
x, y, z = data[:8], data[8:16], data[16:]
print([struct.unpack('d', i) for i in (x, y, z)])
[(1.21,), (2.963,), (1.0,)
There's actually (at least) 2 ways you can get a numpy array out of this.
Short way
1. Convert to string
# Original JSON data (snipped)
junk = b'[-2.1997713216,-1.4249271187,-1.1076795391,...]'
# Decode from bytes to a string (defaults to utf-8), then
# trim off the brackets (first and last characters in the string)
as_str = junk.decode()[1:-1]
2. Use numpy.fromstring
numpy.fromstring(as_str, dtype=float, sep=',')
# Produces:
array([-2.19977132, -1.42492712, -1.10767954, 1.5224958 , -0.17097962,
0.36638757, 0.14846441, -0.74159301, -1.76022319, 0.12660569,
0.60109348, -0.46641536, 1.56755258, 1.00836295, 1.4332793 ,
0.61133843, -1.80085406, -0.94434089, 1.09436704, -1.01146427,
1.44389263, -0.27094273, 0.29904625, 0.46501336, 0.25607913,
0.22576005, -2.40774298, -0.05099832, 1.00621871, 0.43150758,
... ])
Long way
Note: I found the fromstring method after writing this part up, figured I'd leave it here to at least help explain the byte differences.
1. Convert the JSON data into an array of numeric values.
# Original JSON data (snipped)
junk = b'[-2.1997713216,-1.4249271187,-1.1076795391,...]'
# Decode from bytes to a string - defaults to utf-8
junk = junk.decode()
# Trim off the brackets - First and last characters in the string
junk = junk[1:-1]
# Separate into values
junk = junk.split(',')
# Convert to numerical values
doubles = [float(val) for val in junk]
# Or, as a one-liner
doubles = [float(val) for val in junk.decode()[1:-1].split(',')]
# "doubles" currently holds:
[-2.1997713216,
-1.4249271187,
-1.1076795391,
1.5224958034,
...]
2. Use struct to get byte-representations for the doubles
import struct
as_bytes = [struct.pack('d', val) for val in doubles]
# "as_bytes" currently holds:
[b'\x08\x9b\xe7\xb4!\x99\x01\xc0',
b'\x0b\x00\xe0`\x80\xcc\xf6\xbf',
b'+ ..\x0e\xb9\xf1\xbf',
b'hg>\x8f$\\\xf8?',
...]
3. Join all the double values (as bytes) into a single byte-string, then submit to numpy
new_data = b''.join(as_bytes)
numpy.frombuffer(new_data)
# Produces:
array([-2.19977132, -1.42492712, -1.10767954, 1.5224958 , -0.17097962,
0.36638757, 0.14846441, -0.74159301, -1.76022319, 0.12660569,
0.60109348, -0.46641536, 1.56755258, 1.00836295, 1.4332793 ,
0.61133843, -1.80085406, -0.94434089, 1.09436704, -1.01146427,
1.44389263, -0.27094273, 0.29904625, 0.46501336, 0.25607913,
0.22576005, -2.40774298, -0.05099832, 1.00621871, 0.43150758,
... ])
A bytes object can be in any format. It is "just a bunch of bytes" without context. For display Python will represent byte values <128 as their ASCII value, and use hex escape codes (\x##) for others.
The first looks like IEEE 754 double precision floating point. numpy or struct can read it. The second one is in JSON format. Use the json module to read it:
import numpy as np
import json
import struct
b1 = b'\\\x8f\xc2\xf5(\\\xf3?Nb\x10X9\xb4\x07#\x00\x00\x00\x00\x00\x00\xf0?'
b2 = b'[-2.1997713216,-1.4249271187,-1.1076795391,1.5224958034]'
j = json.loads(b2)
n = np.frombuffer(b1)
s = struct.unpack('3d',b1)
print(j,n,s,sep='\n')
# To convert b2 into a b1 format
b = struct.pack('4d',*j)
print(b)
Output:
[-2.1997713216, -1.4249271187, -1.1076795391, 1.5224958034]
[1.21 2.963 1. ]
(1.21, 2.963, 1.0)
b'\x08\x9b\xe7\xb4!\x99\x01\xc0\x0b\x00\xe0`\x80\xcc\xf6\xbf+ ..\x0e\xb9\xf1\xbfhg>\x8f$\\\xf8?'
I have a IDL procedure reading a binary file and I try to translate it into a Python routine.
The IDL code look like :
a = uint(0)
b = float(0)
c = float(0)
d = float(0)
e = float(0)
x=dblarr(nptx)
y=dblarr(npty)
z=dblarr(nptz)
openr,11,name_file_data,/f77_unformatted
readu,11,a
readu,11,b,c,d,e
readu,11,x
readu,11,y
readu,11,z
it works perfectly. So I'm writing the same thing in python but I can't find the same results (even the value of 'a' is different). Here is my code :
x=np.zeros(nptx,float)
y=np.zeros(npty,float)
z=np.zeros(nptz,float)
with open(name_file_data, "rb") as fb:
a, = struct.unpack("I", fb.read(4))
b,c,d,e = struct.unpack("ffff", fb.read(16))
x[:] = struct.unpack(str(nptx)+"d", fb.read(nptx*8))[:]
y[:] = struct.unpack(str(npty)+"d", fb.read(npty*8))[:]
z[:] = struct.unpack(str(nptz)+"d", fb.read(nptz*8))[:]
Hope it will help anyone to answer me.
Update : As suggested in the answers, I'm now trying the module "FortranFile", but I'm not sure I understood everything about its use.
from scipy.io import FortranFile
f=FortranFile(name_file_data, 'r')
a=f.read_record('H')
b=f.read_record('f','f','f','f')
However, instead of having an integer for 'a', I got : array([0, 0], dtype=uint16).
And I had this following error for 'b': Size obtained (1107201884) is not a multiple of the dtypes given (16)
According to a table of IDL data types, UINT(0) creates a 16 bit integer (i.e. two bytes). In the Python struct module, the I format character denotes a 4 byte integer, and H denotes an unsigned 16 bit integer.
Try changing the line that unpacks a to
a, = struct.unpack("H", fb.read(2))
Unfortunately, this probably won't fix the problem. You use the option /f77_unformatted with openr, which means the file contains more than just the raw bytes of the variables. (See the documentation of the OPENR command for more information about /f77_unformatted.)
You could try to use scipy.io.FortranFile to read the file, but there are no gaurantees that it will work. The binary layout of an unformatted Fortran file is compiler dependent.