String arrays in float arrays without format change - python

Is there a way to include strings in an array of floats without the format of the array changing such that all floats are changed to strings but the string element is still kept as a string?
eg.
import numpy as np
a = np.array([ 'hi' , 1. , 2. , 3. ])
Ideally I would like the format to remain the same as how it looks when input as 'a' above.
This gives:
array(['hi', '1.0', '2.0', '3.0'], dtype='|S3')
And then how would one save such an array as a text file?
Many thanks,
J

I'm guessing your problem is this: you want to dump out the array np.array([ 'hi' , 1. , 2. , 3. ]) using np.savetxt() but are getting this error:
TypeError: Mismatch between array dtype ('|S3') and format specifier ('%.18e')
If this is the case, you just need to set the fmt kwarg in np.savetxt. Instead of the default %.18e, which is for formatting floating point data, you can use %s, which formats things as a string, even if the original value in the array was numerical.
So this will work:
import numpy as np
a = np.array([ 'hi' , 1. , 2. , 3. ])
np.savetxt("test.out",a,fmt="%s")
Note that you can just do this with the original list - numpy will convert it to an array for you. So for example you can do:
np.savetxt("test.out",[ 'hi' , 1. , 2. , 3. ],fmt="%s")
and it should work fine too.
For the first part of the question, this is not really what numpy arrays are intended for. If you are trying to put different data types into the same array, then you probably want a different data structure. A vanilla python list would do it, but depending on your situation, a dict is probably what you're looking for.
Edit: Based on the comment threads & the specific question, it looks like this is an attempt to make a header on a data file. This can be done directly through
np.savetxt("a.txt",a,header="title goes here")
This can be read directly with np.loadtxt() because by default the header is prepended with #, and by default np.loadtxt() ignores lines that start with #.

Use pickle:
import pickle
a = ['abc',3,4,5,6,7.0]
pickle.dump( a, open( "save.p", "wb" ))
b = pickle.load( open( "save.p", "rb" ) )
print(b)
Output:
['abc', 3, 4, 5, 6, 7.0]

Related

How to round a specific selection of numbers in numpy array?

From the random numpy list, I want to round only the numbers that are in the index of padInputs. The following code is something that I am trying but doesn't work. What would be a workaround?
padInputs = [0, 2, 7, 8]
random = np.random.rand(13)
for padInput in padInputs:
np.around(random[padInput])
For example,
Input
[0.87720789, 0.88194004, 0.06039337, 0.13874861, 0.85552875]
Output
[0.87720789, 1, 0, 0.13874861, 0.85552875]
Try this way:
random[padInputs] = np.around(random[padInputs])
Note that this will round without decimals, you can pass it as an argument to round in the following way:
random[padInputs] = np.around(random[padInputs], decimals=2)
Problem in your code is you have to assign result back to array as np.around is not in memory function.
like
for padInput in padInputs:
random[padInput] = np.around(random[padInput])
random
array([1. , 0.53206402, 1. , 0.18129529, 0.71238687,
0.92995779, 0.21934659, 0. , 1. , 0.26042076,
0.76826639, 0.82750894, 0.35687544])
but it should be replace by one line as #Bruno define in his answer.
The following one-line piece of code can replace your for loop and does exactly what you want
np.put(random, padInputs, np.around(random))

Multiplying ndarray with scalar: TypeError: invalid type promotion

I'm trying to multiply every column in an ndarray by a scalar. When I try to do this, I get the error TypeError: invalid type promotion.
I've tried using array.astype(float), but this gives all NaNs.
array = np.genfromtxt("file.csv", dtype=float, delimiter='\t', names=True)
newarray = array*4.0
file.csv has a number of column headers. For example:
array['col_a'] = [5.0, 6.0]
After multiplying by the scalar, I want:
newarray['col_a'] to be [20.0, 24.0]
I'm honestly amazed that this has never come up in my own code, but it turns out that Numpy structured arrays (ie arrays with field names) don't support the standard arithmetic operators +, -, *, or / (see footnote *).
Thus, your only option is to work with a non-structured version of your array. #hpaulj's comment points out the ways you can do so (this old answer contains a thorough exploration of exactly how you can get addition to work with structured arrays.). Either index a single field (the result of which behaves like a standard array) and multiply that:
import numpy as np
from io import StringIO
csv = '''col_a\tcol_b\tcol_c
5.0\t19.6\t22.8
6.0\t42.42\t39.208
'''
arr = np.genfromtxt(StringIO(csv), dtype=np.float64, delimiter='\t', names=True)
xcol_a = arr['col_a']*4
print(xcol_a)
# output: [20. 24.]
or omit the names=True kwarg when you generate your array (which makes np.genfromtxt return a standard array instead of a structured one):
arrstd = np.genfromtxt(StringIO(csv), dtype=np.float64, delimiter='\t', skip_header=True)
print(arrstd*4)
# output: [[ 20. 78.4 91.2 ]
# [ 24. 169.68 156.832]]
*: Technically, it appears that many of Numpy's built-in ufunc's are not supported when working with structured arrays. At least some of the comparison functions/operators (<, >, and ==) are supported.

Python: Converting string to floats, reading floats into 2D array, if/then, reordering of rows?

Let me start by saying that I know nothing about Python, but I am trying to learn(mostly through struggling it seems). I've looked around this site and tried to cobble together code to do what I need it to, but I keep running into problems. Firstly, I need to convert a file of 2 columns and 512 rows of strings to floats then put them in a 512x2 array. I check the first column (all rows) for negative values. If negative, add 512. Then I need to reorder the rows in numerical order and write/save the new array.
On to my first problem, converting to floats and putting the floats into an array. I have this code, which I made from others' questions:
with open("binfixtest.composite") as f:
f_values = map(lambda l: l.strip().split(' '), f)
print f_values
newarray = [map(float, v) for v in f_values]
Original format of file:
-91. 0.444253325
-90. 0.883581936
-89. -0.0912338793
New format of f_values:
['-91. 0.444253325'], ['-90. 0.883581936'], ['-89. -0.0912338793']
I'm getting the error:
Traceback (most recent call last):
File "./binfix.py", line 10, in <module>
newarray = [map(float, v) for v in f_values]
ValueError: invalid literal for float(): -91. 0.444253325
which I can't seem to fix. If I don't convert to float, when I try to add 512.0 to negative rows it gives me the error TypeError: cannot concatenate 'str' and 'float' objects
Any help is most definitely appreciated as I am completely clueless here.
If you anticipate having to do tasks like this now and then, I have some suggestions.
Something that will make your life a lot easier is to start learning to use numpy arrays instead of trying to use your own arrays (made up of lists of lists).
For this problem, you can use numpy like this:
>>> import numpy as np
>>> data = np.loadtxt('binfixtest.composite')
>>> data
array([[-91. , 0.44425332],
[-90. , 0.88358194],
[-89. , -0.09123388]])
That's it. Done. Your data is now in a numpy array full of floats.
This works because by default, the numpy.loadtxt method reads line-breaks as row delimiters, and white spaces (including spaces and tabs) as column delimiters, and numbers as floats. There are also a lot of other options for customizing how numpy reads your file if you need them.
Viewing your numpy array
To access row zero, do this:
>>> data[0]
array([-91. , 0.44425332])
To access a value at address 0,0, do this:
>>> data[0,0]
-91.0
To access column zero, do this (the first colon means "all of the rows"):
>>> data[:,0]
array([-91., -90., -89.])
To access a row/column range, do this:
>>> data[1:, :2]
array([[-90. , 0.88358194],
[-89. , -0.09123388]])
The above means "all of the rows start at position 1, and all of the columns until and not including position 2". You can also do thing like 1:3, which would get a total of two rows or columns (3-1=2) starting with position 1.
Changing your numpy array
To change just a single value, do this:
>>> data[0,0] = 1
>>> data[0,0]
1.0
Note that the value we changed at 0,0 has been stored as a float, even though you assigned to an int. This is because a numpy array has ONE data type, and anything you put in that array will try to be converted to that data type:
>>> data.dtype
dtype('float64')
If you want to add 512 to a value at a specific address in the array, you can do this:
>>> data[0,0] = data[0,0] + 512
>>> data[0,0]
421.0
If you want to add 512 to the entire first column, you can do this:
>>> data[:,0] = data[:,0] + 512
>>> data
array([[ 4.21000000e+02, 4.44253325e-01],
[ 4.22000000e+02, 8.83581936e-01],
[ 4.23000000e+02, -9.12338793e-02]])
Useful manipulations for your numpy array
If you want to do a comparison on an array (or a part of one), do that like this (it will return a new array):
>>> data<0
array([[ True, False],
[ True, False],
[ True, True]], dtype=bool)
One way to get only the values in the array that are less than zero is the following (there are other ways):
>>> data*(data<0)
array([[-91. , 0. ],
[-90. , 0. ],
[-89. , -0.09123388]])
This works because in numpy, True values act like 1, and False values act like 0.
And finally, if you want to add 512 to the entire first column only if the value is negative, you can put all of those together and do this:
>>> data[:,0] = (data[:,0]+512)*(data[:,0]<0)
>>> data
array([[ 4.21000000e+02, 4.44253325e-01],
[ 4.22000000e+02, 8.83581936e-01],
[ 4.23000000e+02, -9.12338793e-02]])
Save your array to a new file
If you wish to save the array to a new file, you can use the numpy.savetxt method:
>>> np.savetxt('output.txt', data, fmt = '%.8f', delimiter = ' ', newline = '\n')
The fmt = '%.8f' argument specifies how the float values should be printed (in this case, it will print with 8 decimal places). Consult this part of the docs for more information.
First Part:
#njzk2 is exactly right. Simply removing the literal spaces to change from l.strip().split(' ') to l.strip().split() will correct the error, and you will see the following output for f_values:
[['-91.', '0.444253325'], ['-90.', '0.883581936'], ['-89.', '-0.0912338793']]
And the output for newarray shows float values rather than strings:
[[-91.0, 0.444253325], [-90.0, 0.883581936], [-89.0, -0.0912338793]]
Second Part:
For the second part of the question "if negative, add 512", a simple loop would be clear and simple, and I'm a big believer in clear, readable code.
For example the following is simple and straightforward:
for items in newarray:
if items[0] < 0:
items[0] += 512.00
When we print newarray after the loop, we see the following:
[[421.0, 0.444253325], [422.0, 0.883581936], [423.0, -0.0912338793]]

Python eval function with numpy arrays via string input with dictionaries

I am implementing the code in python which has the variables stored in numpy vectors. I need to perform simple operation: something like (vec1+vec2^2)/vec3. Each element of each vector is summed and multiplied. (analog of MATLAB elementwise .* operation).
The problem is in my code that I have dictionary which stores all vectors:
var = {'a':np.array([1,2,2]),'b':np.array([2,1,3]),'c':np.array([3])}
The 3rd vector is just 1 number which means that I want to multiply this number by each element in other arrays like 3*[1,2,3]. And at the same time I have formula which is provided as a string:
formula = '2*a*(b/c)**2'
I am replacing the formula using Regexp:
formula_for_dict_variables = re.sub(r'([A-z][A-z0-9]*)', r'%(\1)s', formula)
which produces result:
2*%(a)s*(%(b)s/%(c)s)**2
and substitute the dictionary variables:
eval(formula%var)
In the case then I have just pure numbers (Not numpy arrays) everything is working, but when I place numpy.arrays in dict I receive an error.
Could you give an example how can I solve this problem or maybe suggest some different approach. Given that vectors are stored in dictionary and formula is a string input.
I also can store variables in any other container. The problem is that I don't know the name of variables and formula before the execution of code (they are provided by user).
Also I think iteration through each element in vectors probably will be slow given the python for loops are slow.
Using numexpr, then you could do this:
In [143]: import numexpr as ne
In [146]: ne.evaluate('2*a*(b/c)**2', local_dict=var)
Out[146]: array([ 0.88888889, 0.44444444, 4. ])
Pass the dictionary to python eval function:
>>> var = {'a':np.array([1,2,2]),'b':np.array([2,1,3]),'c':np.array([3])}
>>> formula = '2*a*(b/c)**2'
>>> eval(formula, var)
array([ 0.8889, 0.4444, 4. ])

convert binary string to numpy array

Assume I have the string:
my_data = '\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#'
Where I got it is irrelevant, but for the sake of having something concrete, assume I read it from a binary file.
I know my string is the binary representation of 4 (4-byte) floats. I would like to get those floats as a numpy array. I could do:
import struct
import numpy as np
tple = struct.unpack( '4f', my_data )
my_array = np.array( tple, dtype=np.float32 )
But it seems silly to create an intermediate tuple. Is there a way to do this operation without creating an intermediate tuple?
EDIT
I would also like to be able to construct the array in such a way that I can specify the endianness of the string.
>>> np.frombuffer(b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#', dtype='<f4') # or dtype=np.dtype('<f4'), or np.float32 on a little-endian system (which most computers are these days)
array([ 1., 2., 3., 4.], dtype=float32)
Or, if you want big-endian:
>>> np.frombuffer(b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#', dtype='>f4') # or dtype=np.dtype('>f4'), or np.float32 on a big-endian system
array([ 4.60060299e-41, 8.96831017e-44, 2.30485571e-41,
4.60074312e-41], dtype=float32)
The b isn't necessary prior to Python 3, of course.
In fact, if you actually are using a binary file to load the data from, you could even skip the using-a-string step and load the data directly from the file with numpy.fromfile().
Also, dtype reference, just in case: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
np.fromstring() is deprecated. Use np.frombuffer() instead.
import numpy as np
my_data = b'\x00\x00\x80?\x00\x00\x00#\x00\x00##\x00\x00\x80#'
# np.fromstring is deprecated
# data = np.fromstring(my_data, np.float32)
data = np.frombuffer(my_data, np.float32)
print(data)
[1. 2. 3. 4.]

Categories

Resources