convert binary string to numpy array - python

Assume I have the string:
my_data = '\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#'
Where I got it is irrelevant, but for the sake of having something concrete, assume I read it from a binary file.
I know my string is the binary representation of 4 (4-byte) floats. I would like to get those floats as a numpy array. I could do:
import struct
import numpy as np
tple = struct.unpack( '4f', my_data )
my_array = np.array( tple, dtype=np.float32 )
But it seems silly to create an intermediate tuple. Is there a way to do this operation without creating an intermediate tuple?
EDIT
I would also like to be able to construct the array in such a way that I can specify the endianness of the string.

>>> np.frombuffer(b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#', dtype='<f4') # or dtype=np.dtype('<f4'), or np.float32 on a little-endian system (which most computers are these days)
array([ 1., 2., 3., 4.], dtype=float32)
Or, if you want big-endian:
>>> np.frombuffer(b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#', dtype='>f4') # or dtype=np.dtype('>f4'), or np.float32 on a big-endian system
array([ 4.60060299e-41, 8.96831017e-44, 2.30485571e-41,
4.60074312e-41], dtype=float32)
The b isn't necessary prior to Python 3, of course.
In fact, if you actually are using a binary file to load the data from, you could even skip the using-a-string step and load the data directly from the file with numpy.fromfile().
Also, dtype reference, just in case: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

np.fromstring() is deprecated. Use np.frombuffer() instead.
import numpy as np
my_data = b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#'
# np.fromstring is deprecated
# data = np.fromstring(my_data, np.float32)
data = np.frombuffer(my_data, np.float32)
print(data)
[1. 2. 3. 4.]

Related

Multiplying ndarray with scalar: TypeError: invalid type promotion

I'm trying to multiply every column in an ndarray by a scalar. When I try to do this, I get the error TypeError: invalid type promotion.
I've tried using array.astype(float), but this gives all NaNs.
array = np.genfromtxt("file.csv", dtype=float, delimiter='\t', names=True)
newarray = array*4.0
file.csv has a number of column headers. For example:
array['col_a'] = [5.0, 6.0]
After multiplying by the scalar, I want:
newarray['col_a'] to be [20.0, 24.0]
I'm honestly amazed that this has never come up in my own code, but it turns out that Numpy structured arrays (ie arrays with field names) don't support the standard arithmetic operators +, -, *, or / (see footnote *).
Thus, your only option is to work with a non-structured version of your array. #hpaulj's comment points out the ways you can do so (this old answer contains a thorough exploration of exactly how you can get addition to work with structured arrays.). Either index a single field (the result of which behaves like a standard array) and multiply that:
import numpy as np
from io import StringIO
csv = '''col_a\tcol_b\tcol_c
5.0\t19.6\t22.8
6.0\t42.42\t39.208
'''
arr = np.genfromtxt(StringIO(csv), dtype=np.float64, delimiter='\t', names=True)
xcol_a = arr['col_a']*4
print(xcol_a)
# output: [20. 24.]
or omit the names=True kwarg when you generate your array (which makes np.genfromtxt return a standard array instead of a structured one):
arrstd = np.genfromtxt(StringIO(csv), dtype=np.float64, delimiter='\t', skip_header=True)
print(arrstd*4)
# output: [[ 20. 78.4 91.2 ]
# [ 24. 169.68 156.832]]
*: Technically, it appears that many of Numpy's built-in ufunc's are not supported when working with structured arrays. At least some of the comparison functions/operators (<, >, and ==) are supported.

What is the expected effect of where in numpy's negative?

As far as I understand the documentation of numpy's negative function, its where option allows you to leave some array components unnegated:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1., 2.],
[-3., 4.],
[-5., 6.]])
However, when I try it, it seems that those values are (almost) zeroed instead:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1.00000000e+000, 6.92885436e-310],
[-3.00000000e+000, 6.92885377e-310],
[-5.00000000e+000, 6.92885375e-310]])
So how should I see the where option?
The documentation describes where like this:
Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.
Let's try an example using the out parameter:
x = np.ones(3)
np.negative(np.array([4.,5.,6.]), where=np.array([False,True,False]), out=x)
This sets x to [1., -5., 1.], and returns the same.
This makes some amount of sense once you realize that "leave the value in the output alone" literally means the output value is "don't care", rather than "same as the input" (the latter interpretation was how I read it the first time, too).
The problem comes in when you specify where but not out. Apparently the "ufunc machinery" (which is not visible in the implementation of np.negative()) creates an empty output array, meaning the values are indeterminate. So the locations at which where is False will have uninitialized values, which could be anything.
This seems pretty wrong to me, but there was a NumPy issue filed about it last year, and closed. It seems unlikely to change, so you'll have to work around it (e.g. by creating the output array yourself using zeros).

String arrays in float arrays without format change

Is there a way to include strings in an array of floats without the format of the array changing such that all floats are changed to strings but the string element is still kept as a string?
eg.
import numpy as np
a = np.array([ 'hi' , 1. , 2. , 3. ])
Ideally I would like the format to remain the same as how it looks when input as 'a' above.
This gives:
array(['hi', '1.0', '2.0', '3.0'], dtype='|S3')
And then how would one save such an array as a text file?
Many thanks,
J
I'm guessing your problem is this: you want to dump out the array np.array([ 'hi' , 1. , 2. , 3. ]) using np.savetxt() but are getting this error:
TypeError: Mismatch between array dtype ('|S3') and format specifier ('%.18e')
If this is the case, you just need to set the fmt kwarg in np.savetxt. Instead of the default %.18e, which is for formatting floating point data, you can use %s, which formats things as a string, even if the original value in the array was numerical.
So this will work:
import numpy as np
a = np.array([ 'hi' , 1. , 2. , 3. ])
np.savetxt("test.out",a,fmt="%s")
Note that you can just do this with the original list - numpy will convert it to an array for you. So for example you can do:
np.savetxt("test.out",[ 'hi' , 1. , 2. , 3. ],fmt="%s")
and it should work fine too.
For the first part of the question, this is not really what numpy arrays are intended for. If you are trying to put different data types into the same array, then you probably want a different data structure. A vanilla python list would do it, but depending on your situation, a dict is probably what you're looking for.
Edit: Based on the comment threads & the specific question, it looks like this is an attempt to make a header on a data file. This can be done directly through
np.savetxt("a.txt",a,header="title goes here")
This can be read directly with np.loadtxt() because by default the header is prepended with #, and by default np.loadtxt() ignores lines that start with #.
Use pickle:
import pickle
a = ['abc',3,4,5,6,7.0]
pickle.dump( a, open( "save.p", "wb" ))
b = pickle.load( open( "save.p", "rb" ) )
print(b)
Output:
['abc', 3, 4, 5, 6, 7.0]

Testing the equality of two numpy 2d arrays

I have been trying to copy the individual elements from one 2D array to another. My code is as follows:
tp_matrix = np.array(tp_matrix)
my_array = np.empty(shape = (tp_matrix.shape))
for x in range(tp_matrix.shape[0]):
for y in range(tp_matrix.shape[1]):
my_array[x][y] = tp_matrix[x][y]
if(np.array_equal(my_array, tp_matrix)):
print('Equal')
else:
print('Not equal')
However the two arrays are not equal for some reason. What is the problem here and what can I do to solve it?
I cannot use numpy's copy function as I want to make modifications later to some of the elements from my_array later with the other values being the same as that of my_matrix.
Edit: On running the code I get the following message:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
Does this mean there is something wrong with the dataset (tp_matrix)?
Edit 2: I have tried the allclose and isclose functions but I get this error:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The data is stored as floats. Also it is a bit large (399 x 5825).
Edit 3: Solved. I had to reinstall python.
Use np.allclose to test the (almost) equality of float arrays, because of the way float numbers are represented in a computer.
For more details, you could read for instance "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
I tried to mimic what you are experiencing and did the following:
one = np.array([1,2,3])
two = np.array([one,one,one])
three = np.empty(shape=(two.shape))
for x in range(two.shape[0]):
for y in range(two.shape[1]):
three[x][y] = two[x][y]
Printing the contents of 'two' and 'three' gives the following result
print(three)
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
print(two)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Although for this small example numpy returns True if I test equality using np.array_equal, it is possible that rounding errors cause the test to be False in your case.
A workaround for this could be the following test:
sum(sum(two==three)) == two.shape[0]*three.shape[1]
Although there are probably more efficient ways.

Typecasting a numpy matrix into a string in python

I have a m*n numpy matrix of float type. I am going to use the Counter function (from collections) to derive count of certain combination of matrix elements. On experimenting, I found that Counter() requires string value entries to iterate whereas my numpy matrix was by default a float type.
Using dtype while declaring the numpy matrix of zeros doesnt help.
So, I thought of converting each element of the numpymatrix into a string. But, its not working. How do i do it?
xx = np.matrix([[1.2,3.4],[5.4,6.7],[9.8, 5.2]])
zz = np.matrix([[str(ele) for ele in a] for a in np.array(xx)])
Result:
>>> xx
matrix([[ 1.2, 3.4],
[ 5.4, 6.7],
[ 9.8, 5.2]])
>>> zz
matrix([['1.2', '3.4'],
['5.4', '6.7'],
['9.8', '5.2']],
dtype='|S3')
It is unclear exactly what you are trying to do, but you might be better suited using np.histogram (or possibly np.bincount) to derive counts based on a numpy array.
But if you must:
In [45]: a = np.random.normal(size=(3,3))
In [46]: a
Out[46]:
array([[ 0.64552723, -0.4329958 , -1.84342512],
[ 0.83197804, -0.03053034, 0.22560254],
[ 0.61356459, -1.60778048, -1.51859134]])
In [47]: a.astype('|S8')
Out[47]:
array([['0.645527', '-0.43299', '-1.84342'],
['0.831978', '-0.03053', '0.225602'],
['0.613564', '-1.60778', '-1.51859']],
dtype='|S8')

Categories

Resources