Loading the binary data to a NumPy array

Loading the binary data to a NumPy array - python

I am having trouble reading the binary file. I have a NumPy array as,
data = array([[ 0. , 0. , 7.821725 ],
[ 0.05050505, 0. , 7.6358337 ],
[ 0.1010101 , 0. , 7.453858 ],
...,
[ 4.8989897 , 5. , 16.63227 ],
[ 4.949495 , 5. , 16.88153 ],
[ 5. , 5. , 17.130795 ]], dtype=float32)
I wrote this array to a file in binary format.
file = open('model_binary', 'wb')
data.tofile(file)
Now, I am unable to get back the data from the saved binary file. I tried using numpy.fromfile() but it didn't work out for me.
file = open('model_binary', 'rb')
data = np.fromfile(file)
When I printed the data I got [0.00000000e+00 2.19335211e-13 8.33400000e+04 ... 2.04800049e+03 2.04800050e+03 5.25260241e+07] which is absolutely not what I want.
I ran the following code to check what was in the file,
for line in file:
print(line)
break
I got the output as b'\x00\x00\x00\x00\......\c1\x07#\x00\x00\x00\x00S\xc5{#j\xfd\n' which I suppose is in binary format.
I would like to get the array back from the binary file as it was saved. Any help will be appreciated.

As Kevin noted, adding the dtype is required. You might also need to reshape (you have 3 columns in your example. So
file = open('model_binary', 'rb')
data = fromfile(file, dtype=np.float32).reshape((-1,3))
should work for you.
As an aside, I think np.save does save to binary format, and should avoid these issues.

Related

Appending values to the array is duplicating it

I'm running the k-means algorithm 3 times and storing the final centers in an array
center_array = []
backXnorm=Xnorm
for i in range(1,3):
X=dataML
X = X[np.random.default_rng(seed=i).permutation(X.columns.values)]
print(X.head())
Xnorm=mms.fit_transform(X)
km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)
y_kmeans=km.fit_predict(Xnorm)
center_array.append(km.cluster_centers_)
The values are being duplicated, as it seems that the entire array is added again in each iteration.
Below you have the output I'm getting.
[array([[ 0.91902229, 0.99146416, 0.11154588, -0.41348193, -0.45307083,
0.18579957, 0.20004497, -0.91902229, -0.17537297, -0.99146416,
-0.4091783 , -0.12493111],
[-0.17637011, -0.02577591, -0.48222273, 1.39450598, 1.50699298,
-0.14651225, -0.12975152, 0.17637011, 0.65213679, 0.02577591,
1.37195399, 0.44572744],
[ 0.91902229, -1.00860933, 0.11367937, -0.40910528, -0.45108061,
0.19771608, 0.23722015, -0.91902229, -0.18480587, 1.00860933,
-0.40459059, -0.13536744],
[-1.08811289, -0.0290917 , 0.19925625, -0.46264585, -0.48998741,
-0.14748408, -0.1943812 , 1.08811289, -0.23289607, 0.0290917 ,
-0.45219009, -0.14996175]]), array([[-0.17537297, 0.18579957, -0.91902229, -0.99146416, 0.99146416,
-0.41348193, -0.45307083, -0.4091783 , -0.12493111, 0.11154588,
0.91902229, 0.20004497],
[ 0.65213679, -0.14651225, 0.17637011, 0.02577591, -0.02577591,
1.39450598, 1.50699298, 1.37195399, 0.44572744, -0.48222273,
-0.17637011, -0.12975152],
[-0.18480587, 0.19771608, -0.91902229, 1.00860933, -1.00860933,
-0.40910528, -0.45108061, -0.40459059, -0.13536744, 0.11367937,
0.91902229, 0.23722015],
[-0.23289607, -0.14748408, 1.08811289, 0.0290917 , -0.0290917 ,
-0.46264585, -0.48998741, -0.45219009, -0.14996175, 0.19925625,
-1.08811289, -0.1943812 ]])]
I was expecting the final array to be something like this
[[ 0.91902229, 0.99146416, 0.11154588, -0.41348193, -0.45307083,
0.18579957, 0.20004497, -0.91902229, -0.17537297, -0.99146416,
-0.4091783 , -0.12493111],
[-0.17637011, -0.02577591, -0.48222273, 1.39450598, 1.50699298,
-0.14651225, -0.12975152, 0.17637011, 0.65213679, 0.02577591,
1.37195399, 0.44572744],
[ 0.91902229, -1.00860933, 0.11367937, -0.40910528, -0.45108061,
0.19771608, 0.23722015, -0.91902229, -0.18480587, 1.00860933,
-0.40459059, -0.13536744],
[-1.08811289, -0.0290917 , 0.19925625, -0.46264585, -0.48998741,
-0.14748408, -0.1943812 , 1.08811289, -0.23289607, 0.0290917 ,
-0.45219009, -0.14996175]]
Am I using the append wrong? Should I use another typpe of structure to store the final centers values?

K-means is not sensitive about columns order. That is why you get the same results but shuffled according to the shuffle of the columns.

Dot product for correlation with complex numbers

OK, this question probably has a very simple answer, but I've been searching for quite a while with no luck...
I want to get the dot product of 2 complex numbers in complex-plane-space. However, np.dot and np.vdot both give the wrong result.
Example of what I WANT to do:
a = 1+1j
b = 1-1j
dot(a,b) == 0
What I actually get:
np.dot(a,b) == 2+0j
np.vdot(a,b) == 0-2j
np.conj(a)*b == 0-2j
I am able to get what I want using this rather clumsy expression (edit for readability):
a.real*b.real + a.imag*b.imag
But I am very surprised not to find a nice ufunc to do this. Does it not exist? I was not expecting to have to write my own ufunc to vectorize such a common operation.
Part of my concern here, is that it seems like my expression is doing a lot of extra work extracting out the real/imaginary parts when they should be already in adjacent memory locations (considering a,b are actually already combined in a data type like complex64). This has the potential to cause a pretty severe slowdown.
** EDIT
Using Numba I ended up defining a ufunc:
#vectorize
def cdot(a, b):
return (a.real*b.real + a.imag*b.imag)
This allowed me to correlate complex data properly. Here's a correlation image for the guys who helped me!

For arrays and np.complex scalars but not plain python complex numbers you can viewcast to float. For example:
a = np.exp(1j*np.arange(4))
b = np.exp(-1j*np.arange(4))
a
# array([ 1. +0.j , 0.54030231+0.84147098j,
# -0.41614684+0.90929743j, -0.9899925 +0.14112001j])
b
# array([ 1. -0.j , 0.54030231-0.84147098j,
# -0.41614684-0.90929743j, -0.9899925 -0.14112001j])
ar = a[...,None].view(float)
br = b[...,None].view(float)
ar
# array([[ 1. , 0. ],
# [ 0.54030231, 0.84147098],
# [-0.41614684, 0.90929743],
# [-0.9899925 , 0.14112001]])
br
# array([[ 1. , -0. ],
# [ 0.54030231, -0.84147098],
# [-0.41614684, -0.90929743],
# [-0.9899925 , -0.14112001]])
Now, for example, all pairwise dot products:
np.inner(ar,br)
# array([[ 1. , 0.54030231, -0.41614684, -0.9899925 ],
# [ 0.54030231, -0.41614684, -0.9899925 , -0.65364362],
# [-0.41614684, -0.9899925 , -0.65364362, 0.28366219],
# [-0.9899925 , -0.65364362, 0.28366219, 0.96017029]])

How can I convert a multidimension-array-string back to an array in Python?

I found an interesting problem. I am trying to save a huge list of Numpy arrays of different length to a file so I can reuse it later on. While I managed to save the list, I struggle to read it. Neither Numpy nor Python seem to be able to reconvert the string to the initial list. Any tips on how to do that?
I have already tried:
list(), np.array() and np.fromstring()
the list looks like that, just that it continues on about a 100000 lines
[[array([[[ 0.481903 , 0.15787785, 0.05661286],
[-0.08817253, -0.14168766, -0.13894859],
[-0.27888685, -0.11231906, 0.26054043]],
[[ 0.0913363 , 0.09927119, 0.42296773],
[ 0.45385012, 0.0164008 , 0.823071 ],
[-0.7438939 , -0.72650474, -0.4468163 ]],
[[-0.34211668, -0.00215243, 0.26458675],
[-0.23189187, 0.9370323 , -0.6188508 ],
[-0.85894495, 0.43526295, 0.17926843]]], dtype=float32), array([[[ 0.78955674, -0.6114772 , -0.18336566],
[-0.12059411, -1.0608526 , -0.47686368],
[-0.00781631, -0.36990076, 0.23920381]],
[[-0.2827969 , -0.5920803 , 1.1788696 ],
[-0.02591886, -0.24817304, -0.17913376],
[-0.7543818 , -0.00784254, -0.38197488]],
[[-0.566821 , -0.35077536, 0.32748973],
[ 0.26770943, 0.04574856, -0.7584006 ],
[ 1.1999835 , -0.42707324, -0.2599928 ]]], dtype=float32), array([[[ 0.501889 , 0.11805235, -0.28508088],
[-0.18496978, -1.2954917 , 0.39576113],
[ 0.03896124, -0.80981237, 0.8888588 ]],
[[ 0.28127173, -0.04418045, 0.74862033],
[ 0.5746676 , -1.0427617 , -0.00984947],
[ 1.357876 , 0.49865335, -0.5559544 ]],
[[-0.2253674 , 0.01848532, 0.16229743],
[ 0.02945629, -0.3473735 , -0.16368015],
[-0.21004315, 0.75182045, -0.14023288]]], dtype=float32),
list() and np.array() both work but produce totally different results from what the list initially was
EDIT: Answer found thanks to #hpaulj
The list was converted to a ndarray and then saved as a npy binary file using np.save

python csv writer writes array in scientific notation

I have a matrix of float numbers which i'm trying to write into a csv file, however csv writer writes it in scientific notation, i would like to leave the numbers in the original notation, i tried adding "%.2f" % but that resulted in the following error:
"TypeError: only length-1 arrays can be converted to Python scalars"
for item in string_color_Values:
avg_color_values = [ float(item) for item in string_color_Values]
array1t = np.array(avg_color_values)
np.savetxt("test.csv", array1t, delimiter=",")
original:
[ 0.5258 1. ]
[ 0.528 1. ]
[ 0.5486 1. ]
[ 0.5545 1. ]
[ 0.732 1. ]
[ 0.7872 1. ]
[ 1. 1. ]]
Csv which i obtain:
1.2270000000035e-01,1.0000000000e+00
2.7639999999790e-01,1.0000000000e+00
etc..

You need to have fmt="%g" in the call to np.savetxt. That will write the numbers without trailing zeroes. Alternatively if you want "cleanly" formatted nubmers you could use e.g. fmt="%.4f" (or any other format that takes your fancy).
import numpy as np
avg_color_Values = np.random.random((10,3))
array1t = np.array(avg_color_Values)
np.savetxt("test.csv", array1t, delimiter=",", fmt = "%g")
By default np.savetxt defaults to a format of %.18e, which will allows float64s to be written without loss of precicion.

Using numpy's function loadtxt() to read a two column file then using polyfit() to return an array

afile is a given file, and z the degree of the polynomial. Been breaking my head over this for a while, frustrating how I'm basically given zero instructions on how to proceed.
This is what I thought it should be like:
import numpy as np
def newfile(afile,z):
x,y = np.loadtxt(afile)
d= np.polyfit(x,y,z)
return d
I have attempted to do it as
data = np.loadtxt(afile)
x = data[0:]
by printing "data" I'm given this format:
[[ 2. 888.8425]
[ 6. 888.975 ]
[ 14. 888.1026]
[ 17. 888.2071]
[ 23. 886.0479]
[ 26. 883.3316]
[ 48. 877.04 ]
[ 99. 854.3665]]
By printing "x" in this case just gives me the whole list (I'm thinking the issue lies in the lack of comma). In this case I'd want x to be an array of the left column.

I suppose you are getting an error when unpacking in this statement:
x,y = np.loadtxt(afile)
you should replace it for this:
x, y = zip(*np.loadtxt(afile))
the rest should work

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading the binary data to a NumPy array - python

Related

Appending values to the array is duplicating it

Dot product for correlation with complex numbers

How can I convert a multidimension-array-string back to an array in Python?

python csv writer writes array in scientific notation

Using numpy's function loadtxt() to read a two column file then using polyfit() to return an array

Categories

Resources