I am having trouble reading the binary file. I have a NumPy array as,
data = array([[ 0. , 0. , 7.821725 ],
[ 0.05050505, 0. , 7.6358337 ],
[ 0.1010101 , 0. , 7.453858 ],
...,
[ 4.8989897 , 5. , 16.63227 ],
[ 4.949495 , 5. , 16.88153 ],
[ 5. , 5. , 17.130795 ]], dtype=float32)
I wrote this array to a file in binary format.
file = open('model_binary', 'wb')
data.tofile(file)
Now, I am unable to get back the data from the saved binary file. I tried using numpy.fromfile() but it didn't work out for me.
file = open('model_binary', 'rb')
data = np.fromfile(file)
When I printed the data I got [0.00000000e+00 2.19335211e-13 8.33400000e+04 ... 2.04800049e+03 2.04800050e+03 5.25260241e+07] which is absolutely not what I want.
I ran the following code to check what was in the file,
for line in file:
print(line)
break
I got the output as b'\x00\x00\x00\x00\......\c1\x07#\x00\x00\x00\x00S\xc5{#j\xfd\n' which I suppose is in binary format.
I would like to get the array back from the binary file as it was saved. Any help will be appreciated.
As Kevin noted, adding the dtype is required. You might also need to reshape (you have 3 columns in your example. So
file = open('model_binary', 'rb')
data = fromfile(file, dtype=np.float32).reshape((-1,3))
should work for you.
As an aside, I think np.save does save to binary format, and should avoid these issues.
I have a block of string as below. How do I read this into a numpy array?
5.780326E+03 7.261185E+03 7.749190E+03 8.488770E+03 5.406134E+03 2.828410E+03 9.620957E+02 1.0000000E+00
3.097372E+03 3.885160E+03 5.432678E+03 8.060628E+03 2.768457E+03 6.574258E+03 7.268591E+02 2.0000000E+00
2.061429E+03 4.665282E+03 8.214119E+03 3.579380E+03 8.542057E+03 2.089062E+03 8.829263E+02 3.0000000E+00
3.572444E+03 9.920473E+03 3.573251E+03 6.423813E+03 2.469338E+03 4.652253E+03 8.211962E+02 4.0000000E+00
7.460966E+03 7.691966E+03 7.501826E+03 3.414511E+03 8.590221E+03 6.737868E+03 8.586273E+02 5.0000000E+00
3.250046E+03 9.611985E+03 9.195165E+03 1.064800E+03 7.944535E+03 2.685740E+03 8.212849E+02 6.0000000E+00
8.069926E+03 9.208576E+03 4.267749E+03 2.491888E+03 9.036555E+03 5.001732E+03 7.202407E+02 7.0000000E+00
5.691460E+03 3.868344E+03 3.103342E+03 6.567618E+03 7.274860E+03 8.393253E+03 5.628069E+02 8.0000000E+00
2.887292E+03 9.081563E+02 6.955551E+03 6.763133E+03 2.146178E+03 2.033861E+03 9.725472E+02 9.0000000E+00
6.127778E+03 8.065057E+02 7.474341E+03 4.185868E+03 4.516230E+03 8.714840E+03 8.254562E+02 1.0000000E+01
1.594643E+03 6.060956E+03 2.137153E+03 3.505950E+03 7.714227E+03 6.249693E+03 5.724376E+02 1.1000000E+01
5.039059E+03 3.138161E+03 5.570104E+03 4.594189E+03 7.889644E+03 1.891062E+03 7.085753E+02 1.2000000E+01
3.263593E+03 6.085087E+03 7.136061E+03 9.895028E+03 6.139666E+03 6.670919E+03 5.018248E+02 1.3000000E+01
9.954830E+03 6.777074E+03 3.013747E+03 3.638458E+03 4.357685E+03 1.876539E+03 5.969378E+02 1.4000000E+01
9.920853E+03 3.414156E+03 5.534430E+03 2.011815E+03 7.791122E+03 3.893439E+03 5.229754E+02 1.5000000E+01
5.447470E+03 7.184321E+03 1.382575E+03 9.134295E+03 7.883753E+02 9.160537E+03 7.521197E+02 1.6000000E+01
3.344917E+03 8.151884E+03 3.596052E+03 3.953284E+03 7.456115E+03 7.749632E+03 9.773521E+02 1.7000000E+01
6.310496E+03 1.472792E+03 1.812452E+03 9.535100E+03 1.581263E+03 3.649150E+03 6.562440E+02 1.8000000E+01
I am trying to use numpy native methods so as to speed up the data reading. I am trying to read in couple of GBs of data from a custom file format. I am able to seek and reach the area where a block of text as shown above will appear. Doing regular python string operations on this is always possible, however, I wanted to know if there is any native numpy methods to read in fixed width format.
I tried using np.frombuffer with dtype=float which did not work. It seems to read if I use dtype='S15' however, shows up as bytes and not numbers.
In [294]: txt = """5.780326E+03 7.261185E+03 7.749190E+03 8.488770E+03 5.406134E+03 2
...: .828410E+03 9.620957E+02 1.0000000E+00
...: 3.097372E+03 3.885160E+03 5.432678E+03 8.060628E+03 2.768457E+03 6.57425
...: 8E+03 7.268591E+02 2.0000000E+00
...: 2.061429E+03 4.665282E+03 8.214119E+03 3.579380E+03 8.542057E+03 2.08906
...: 2E+03 8.829263E+02 3.0000000E+00
...: """
With this copy-n-paste I'm assuming your block is a multiline string.
Treating it like a csv file.
In [296]: np.loadtxt(txt.splitlines())
Out[296]:
array([[5.780326e+03, 7.261185e+03, 7.749190e+03, 8.488770e+03,
5.406134e+03, 2.828410e+03, 9.620957e+02, 1.000000e+00],
[3.097372e+03, 3.885160e+03, 5.432678e+03, 8.060628e+03,
2.768457e+03, 6.574258e+03, 7.268591e+02, 2.000000e+00],
[2.061429e+03, 4.665282e+03, 8.214119e+03, 3.579380e+03,
8.542057e+03, 2.089062e+03, 8.829263e+02, 3.000000e+00]])
There's a lot going on under the covers, so this isn't particularly fast. pandas has a faster csv reader.
fromstring works, but returns 1d. You can reshape the result
n [299]: np.fromstring(txt, sep=' ')
Out[299]:
array([5.780326e+03, 7.261185e+03, 7.749190e+03, 8.488770e+03,
5.406134e+03, 2.828410e+03, 9.620957e+02, 1.000000e+00,
3.097372e+03, 3.885160e+03, 5.432678e+03, 8.060628e+03,
2.768457e+03, 6.574258e+03, 7.268591e+02, 2.000000e+00,
2.061429e+03, 4.665282e+03, 8.214119e+03, 3.579380e+03,
8.542057e+03, 2.089062e+03, 8.829263e+02, 3.000000e+00])
This is a string, not a buffer, so frombuffer is wrong.
This list comprehension works:
np.array([row.strip().split(' ') for row in txt.strip().splitlines()], float)
I had to add strip to clear out excess blanks that produced empty lists or strings.
At least with this small sample, the list comprehension isn't that much slower than the fromstring, and still a lot better than the more general loadtxt.
You could use several string operations to convert the the data to a string which is convertible to float. Such as:
import numpy as np
with open('data.txt', 'r') as f:
data = f.readlines()
result = []
for line in data:
splitted_data = line.split(' ')
splitted_data = [item for item in splitted_data if item]
splitted_data = [item.replace('E+', 'e') for item in splitted_data]
result.append(splitted_data)
result = np.array(result, dtype = 'float64')
Where data.txt is the data you pasted in your question.
I just did a regular python split and assigned the dtype to np.float32
>>> y=np.array(x.split(), dtype=np.float32())
>>> y
array([ 5.78032617e+03, 7.26118506e+03, 7.74918994e+03,
8.48876953e+03, 5.40613379e+03, 2.82840991e+03,
9.62095703e+02, 1.00000000e+00, 3.09737207e+03,
3.88515991e+03, 5.43267822e+03, 8.06062793e+03,
2.76845703e+03, 6.57425781e+03, 7.26859070e+02,
2.00000000e+00, 2.06142896e+03, 4.66528223e+03,
8.21411914e+03, 3.57937988e+03, 8.54205664e+03,
2.08906201e+03, 8.82926270e+02, 3.00000000e+00], dtype=float32)
P.S. I copied a chunk of your sample data and assigned it to variable “x”
Ok, this doesn’t rely on any blank spaces or use split(), except for the lines, and maintains the shape of the array but does still use non Numpy python.
>>> n=15
>>> x=' 5.780326E+03 7.261185E+03 7.749190E+03 8.488770E+03 5.406134E+03 2.828410E+03 9.620957E+02 1.0000000E+00\n 3.097372E+03 3.885160E+03 5.432678E+03 8.060628E+03 2.768457E+03 6.574258E+03 7.268591E+02 2.0000000E+00\n 2.061429E+03 4.665282E+03 8.214119E+03 3.579380E+03 8.542057E+03 2.089062E+03 8.829263E+02 3.0000000E+00\n 3.572444E+03 9.920473E+03 3.573251E+03 6.423813E+03 2.469338E+03 4.652253E+03 8.211962E+02 4.0000000E+00\n 7.460966E+03 7.691966E+03 7.501826E+03 3.414511E+03 8.590221E+03 6.737868E+03 8.586273E+02 5.0000000E+00\n 3.250046E+03 9.611985E+03 9.195165E+03 1.064800E+03 7.944535E+03 2.685740E+03 8.212849E+02 6.0000000E+00\n 8.069926E+03 9.208576E+03 4.267749E+03 2.491888E+03 9.036555E+03 5.001732E+03 7.202407E+02 7.0000000E+00\n 5.691460E+03 3.868344E+03 3.103342E+03 6.567618E+03 7.274860E+03 8.393253E+03 5.628069E+02 8.0000000E+00\n 2.887292E+03 9.081563E+02 6.955551E+03 6.763133E+03 2.146178E+03 2.033861E+03 9.725472E+02 9.0000000E+00\n 6.127778E+03 8.065057E+02 7.474341E+03 4.185868E+03 4.516230E+03 8.714840E+03 8.254562E+02 1.0000000E+01\n 1.594643E+03 6.060956E+03 2.137153E+03 3.505950E+03 7.714227E+03 6.249693E+03 5.724376E+02 1.1000000E+01\n 5.039059E+03 3.138161E+03 5.570104E+03 4.594189E+03 7.889644E+03 1.891062E+03 7.085753E+02 1.2000000E+01\n 3.263593E+03 6.085087E+03 7.136061E+03 9.895028E+03 6.139666E+03 6.670919E+03 5.018248E+02 1.3000000E+01\n 9.954830E+03 6.777074E+03 3.013747E+03 3.638458E+03 4.357685E+03 1.876539E+03 5.969378E+02 1.4000000E+01\n 9.920853E+03 3.414156E+03 5.534430E+03 2.011815E+03 7.791122E+03 3.893439E+03 5.229754E+02 1.5000000E+01\n 5.447470E+03 7.184321E+03 1.382575E+03 9.134295E+03 7.883753E+02 9.160537E+03 7.521197E+02 1.6000000E+01\n 3.344917E+03 8.151884E+03 3.596052E+03 3.953284E+03 7.456115E+03 7.749632E+03 9.773521E+02 1.7000000E+01\n 6.310496E+03 1.472792E+03 1.812452E+03 9.535100E+03 1.581263E+03 3.649150E+03 6.562440E+02 1.8000000E+01'
>>> s=np.array([[y[i:i+n] for i in range(0, len(y) - n + 1, n)] for y in x.splitlines()], dtype=np.float32)
>>> s
array([[ 5.78032617e+03, 7.26118506e+03, 7.74918994e+03,
8.48876953e+03, 5.40613379e+03, 2.82840991e+03,
9.62095703e+02, 1.00000000e+00],
[ 3.09737207e+03, 3.88515991e+03, 5.43267822e+03,
8.06062793e+03, 2.76845703e+03, 6.57425781e+03,
7.26859070e+02, 2.00000000e+00],
[ 2.06142896e+03, 4.66528223e+03, 8.21411914e+03,
3.57937988e+03, 8.54205664e+03, 2.08906201e+03,
8.82926270e+02, 3.00000000e+00],
[ 3.57244409e+03, 9.92047266e+03, 3.57325098e+03,
6.42381299e+03, 2.46933789e+03, 4.65225293e+03,
8.21196228e+02, 4.00000000e+00],
[ 7.46096582e+03, 7.69196582e+03, 7.50182617e+03,
3.41451099e+03, 8.59022070e+03, 6.73786816e+03,
8.58627319e+02, 5.00000000e+00],
[ 3.25004590e+03, 9.61198535e+03, 9.19516504e+03,
1.06480005e+03, 7.94453516e+03, 2.68573999e+03,
8.21284912e+02, 6.00000000e+00],
[ 8.06992578e+03, 9.20857617e+03, 4.26774902e+03,
2.49188794e+03, 9.03655469e+03, 5.00173193e+03,
7.20240723e+02, 7.00000000e+00],
[ 5.69145996e+03, 3.86834399e+03, 3.10334204e+03,
6.56761816e+03, 7.27485986e+03, 8.39325293e+03,
5.62806885e+02, 8.00000000e+00],
[ 2.88729199e+03, 9.08156311e+02, 6.95555078e+03,
6.76313281e+03, 2.14617798e+03, 2.03386096e+03,
9.72547180e+02, 9.00000000e+00],
[ 6.12777783e+03, 8.06505676e+02, 7.47434082e+03,
4.18586816e+03, 4.51622998e+03, 8.71483984e+03,
8.25456177e+02, 1.00000000e+01],
[ 1.59464294e+03, 6.06095605e+03, 2.13715308e+03,
3.50594995e+03, 7.71422705e+03, 6.24969287e+03,
5.72437622e+02, 1.10000000e+01],
[ 5.03905908e+03, 3.13816089e+03, 5.57010400e+03,
4.59418896e+03, 7.88964404e+03, 1.89106201e+03,
7.08575317e+02, 1.20000000e+01],
[ 3.26359302e+03, 6.08508691e+03, 7.13606104e+03,
9.89502832e+03, 6.13966602e+03, 6.67091895e+03,
5.01824799e+02, 1.30000000e+01],
[ 9.95483008e+03, 6.77707422e+03, 3.01374707e+03,
3.63845801e+03, 4.35768506e+03, 1.87653894e+03,
5.96937805e+02, 1.40000000e+01],
[ 9.92085254e+03, 3.41415601e+03, 5.53443018e+03,
2.01181494e+03, 7.79112207e+03, 3.89343896e+03,
5.22975403e+02, 1.50000000e+01],
[ 5.44747021e+03, 7.18432080e+03, 1.38257495e+03,
9.13429492e+03, 7.88375305e+02, 9.16053711e+03,
7.52119690e+02, 1.60000000e+01],
[ 3.34491699e+03, 8.15188379e+03, 3.59605200e+03,
3.95328394e+03, 7.45611523e+03, 7.74963184e+03,
9.77352112e+02, 1.70000000e+01],
[ 6.31049609e+03, 1.47279199e+03, 1.81245203e+03,
9.53509961e+03, 1.58126294e+03, 3.64914990e+03,
6.56244019e+02, 1.80000000e+01]], dtype=float32)
Thanks to #hpaulj's comments. Here's the answer I ended up with.
data = np.genfromtxt(f, delimiter=[15]*8, max_rows=18)
More explanation
Since I am reading this from a custom file format, I will post how I'm doing the whole thing as well.
I do some initial processing of the file to identify the positions where the block of text is residing and end up with an array of 'locations' where I can seek to start the reading process and then I use the above method to read the 'block' of text.
data = np.array([])
r = 18 # rows per block
c = 8 # columns per block
w = 15 # width of a column
with open('mycustomfile.xyz') as f:
for location in locations:
f.seek(location)
data = np.append(data, np.genfromtxt(f, delimiter=[w]*c, max_rows=r))
data = data.reshape((r*len(locations),c))
If you want an array with dtype=float you have to convert your string to float beforehand.
import numpy as np
string_list = ["1", "0.1", "1.345e003"]
array = np.array([float(string) for string in string_list])
array.dtype
I want to save an array with floats to at file with numpy.savetxt().
The floats have to be decimals (so non-exponential!)
The code is as follows:
wav = 1./w
wav = np.array(sorted(wav))
efittet = permfitfuncLD(params1[0], params1[1:],1./wav).real
eefittet = permfitfuncLD(params1[0], params1[1:],1./wav).imag
etoprint = [float(s) for s in ["%.5f" % i for i in efittet]]
eetoprint = [float(s) for s in ["%.5f" % i for i in eefittet]]
wavtoprint = [float(s) for s in ["%.5f" % i for i in wav]]
print "--------------------------------------------------------------------"
print "etoprint"
print etoprint
print "--------------------------------------------------------------------"
print "eetoprint"
print eetoprint
print "--------------------------------------------------------------------"
print "wavtoprint"
print wavtoprint
print "--------------------------------------------------------------------"
arr = np.array([wavtoprint,etoprint,eetoprint]).transpose()
print "txt to safe should contain: ", arr
w and params are some parameters for my function permfitfuncLD() which itself is complex.
The output is:
--------------------------------------------------------------------
etoprint
[0.50391, 0.44551, 0.37837, 0.29117, 0.19734, 0.09081, -0.03854, -0.17022, -0.325, -0.47742, -0.63656, -0.8125, -0.9728, -1.13589, -1.26684, -1.36734, -1.42736, -1.42551, -1.3545, -1.23074, -1.07199, -0.90215, -0.79518, -0.78347, -0.899, -1.14185, -1.51655, -1.87591, -2.07595, -1.89561, -1.53701, -1.56583, -2.20596, -3.38828, -4.77624, -6.39743, -8.43319, -10.62524, -13.43619, -16.55797, -20.35308, -25.49629, -31.56653, -40.21232, -51.0312, -66.09163, -90.35934, -125.69555, -188.95742]
--------------------------------------------------------------------
eetoprint
[3.51955, 3.48008, 3.44968, 3.42508, 3.41183, 3.40951, 3.4217, 3.44898, 3.49899, 3.5671, 3.6595, 3.79075, 3.94391, 4.14716, 4.3697, 4.62296, 4.92274, 5.20694, 5.49578, 5.71423, 5.85755, 5.90724, 5.8566, 5.72561, 5.58159, 5.4753, 5.48067, 5.66307, 6.04179, 6.30754, 6.06509, 5.20939, 4.3096, 3.51667, 2.98072, 2.59107, 2.28845, 2.09188, 1.9525, 1.88784, 1.89048, 1.98989, 2.20993, 2.67266, 3.45841, 4.87942, 7.80959, 13.04591, 25.27873]
--------------------------------------------------------------------
wavtoprint
[0.18787, 0.19165, 0.19527, 0.19935, 0.20327, 0.20735, 0.21196, 0.2164, 0.22142, 0.22627, 0.23134, 0.23709, 0.24265, 0.24899, 0.25514, 0.2616, 0.26897, 0.27616, 0.2844, 0.29244, 0.30096, 0.31077, 0.3204, 0.33154, 0.34253, 0.35428, 0.36794, 0.38153, 0.39742, 0.41332, 0.43054, 0.4509, 0.47147, 0.49599, 0.52099, 0.54866, 0.58214, 0.6169, 0.65956, 0.70453, 0.75608, 0.82117, 0.89206, 0.9841, 1.08769, 1.21565, 1.39322, 1.61035, 1.93745]
--------------------------------------------------------------------
txt to safe should contain: [[ 1.87870000e-01 5.03910000e-01 3.51955000e+00]
[ 1.91650000e-01 4.45510000e-01 3.48008000e+00]
[ 1.95270000e-01 3.78370000e-01 3.44968000e+00]
[ 1.99350000e-01 2.91170000e-01 3.42508000e+00]
[ 2.03270000e-01 1.97340000e-01 3.41183000e+00]
[ 2.07350000e-01 9.08100000e-02 3.40951000e+00]
[ 2.11960000e-01 -3.85400000e-02 3.42170000e+00]
[ 2.16400000e-01 -1.70220000e-01 3.44898000e+00]
[ 2.21420000e-01 -3.25000000e-01 3.49899000e+00]
[ 2.26270000e-01 -4.77420000e-01 3.56710000e+00]
[ 2.31340000e-01 -6.36560000e-01 3.65950000e+00]
[ 2.37090000e-01 -8.12500000e-01 3.79075000e+00]
[ 2.42650000e-01 -9.72800000e-01 3.94391000e+00]
[ 2.48990000e-01 -1.13589000e+00 4.14716000e+00]
[ 2.55140000e-01 -1.26684000e+00 4.36970000e+00]
[ 2.61600000e-01 -1.36734000e+00 4.62296000e+00]
[ 2.68970000e-01 -1.42736000e+00 4.92274000e+00]
[ 2.76160000e-01 -1.42551000e+00 5.20694000e+00]
[ 2.84400000e-01 -1.35450000e+00 5.49578000e+00]
[ 2.92440000e-01 -1.23074000e+00 5.71423000e+00]
[ 3.00960000e-01 -1.07199000e+00 5.85755000e+00]
[ 3.10770000e-01 -9.02150000e-01 5.90724000e+00]
[ 3.20400000e-01 -7.95180000e-01 5.85660000e+00]
[ 3.31540000e-01 -7.83470000e-01 5.72561000e+00]
[ 3.42530000e-01 -8.99000000e-01 5.58159000e+00]
[ 3.54280000e-01 -1.14185000e+00 5.47530000e+00]
[ 3.67940000e-01 -1.51655000e+00 5.48067000e+00]
[ 3.81530000e-01 -1.87591000e+00 5.66307000e+00]
[ 3.97420000e-01 -2.07595000e+00 6.04179000e+00]
[ 4.13320000e-01 -1.89561000e+00 6.30754000e+00]
[ 4.30540000e-01 -1.53701000e+00 6.06509000e+00]
[ 4.50900000e-01 -1.56583000e+00 5.20939000e+00]
[ 4.71470000e-01 -2.20596000e+00 4.30960000e+00]
[ 4.95990000e-01 -3.38828000e+00 3.51667000e+00]
[ 5.20990000e-01 -4.77624000e+00 2.98072000e+00]
[ 5.48660000e-01 -6.39743000e+00 2.59107000e+00]
[ 5.82140000e-01 -8.43319000e+00 2.28845000e+00]
[ 6.16900000e-01 -1.06252400e+01 2.09188000e+00]
[ 6.59560000e-01 -1.34361900e+01 1.95250000e+00]
[ 7.04530000e-01 -1.65579700e+01 1.88784000e+00]
[ 7.56080000e-01 -2.03530800e+01 1.89048000e+00]
[ 8.21170000e-01 -2.54962900e+01 1.98989000e+00]
[ 8.92060000e-01 -3.15665300e+01 2.20993000e+00]
[ 9.84100000e-01 -4.02123200e+01 2.67266000e+00]
[ 1.08769000e+00 -5.10312000e+01 3.45841000e+00]
[ 1.21565000e+00 -6.60916300e+01 4.87942000e+00]
[ 1.39322000e+00 -9.03593400e+01 7.80959000e+00]
[ 1.61035000e+00 -1.25695550e+02 1.30459100e+01]
[ 1.93745000e+00 -1.88957420e+02 2.52787300e+01]]
My Problem is, that the values have to be floats (i.e the first one should be 0.18787).
I don't understand, why the numbers in array arr have exponential form.
I really appreciate your help!
These are floats. It is just another way to display floats. Numpy arrays print the floats in exponential format by default. If you need another format for storing the data in a text file, you have to use numpy.save.savetxt or do that yourself using the python formatting functions.
However, you should be aware that you're going to loose precision compared to the scientific format. The reason is that the scientific format can represent very small numbers in the same amount of digits as it can represent very high numbers. You cannot do that with the decimal representation.
If you're interested in printing arrays of floats in NumPy, you can:
Use the fmt parameter of the np.savetxt function. For example, if you want to print 3 columns of floats as "%.5f", separated by a space, you'd use fmt="%.5f %.5f ".5f".
If you want to change the way NumPy arrays are printed on screen, you can play with the np.set_printoptions function, using for example np.set_printoptions(precision=5).
There's yet a last trick to transform an array of float in an array of strings: just use a string dtype like dtype="|S10" (eg, wav.astype("|S10"). That'll require some fiddling to find the proper size of string you want ("|S10" means 'up to 10 characters), as if you print something larger it'll get truncated. I wouldn't advise it in real life, but as an exercise it's harmless.