I have a question regarding the addition of numpy arrays.
Let's assume I have defined a function
def foo(a,b):
return a+b
that takes two arrays of the same shape and simply returns their sum.
Now, I have to deal with the cases that some of the entries may be None.
I would like to deal with those entries as they correspond to float(0), such that
[1.0,None,2.0] + [1.0,2.0,2.0]
would add up to
[2.0,2.0,4.0]
Can you provide me with an already-implemented solution?
TIA
I suggest numpy.nan_to_num:
>>> np.nan_to_num(np.array([1.0,None,2.0], dtype=np.float))
array([ 1., 0., 2.])
Then,
>>> def foo(a,b):
... return np.nan_to_num(a) + np.nan_to_num(b)
...
>>> foo(np.array([1.0,None,2.0], dtype=np.float), np.array([1.0,2.0,2.0], dtype=np.float))
array([ 2., 0., 4.])
Usually, the answer to this is to use an array of floats, rather than an array of arbitrary objects, and then use np.nan instead of None. NaN has well-defined semantics for arithmetic. (Also, using an array of floats instead of objects will make your code significantly more time and space efficient.)
Notice that you don't have to manually convert None to np.nan if you build the array with an explicit dtype of float or np.float64. Both of these are equivalent:
>>> a = np.array([1.0,np.nan,2.0])
>>> a = np.array([1.0,None,2.0],dtype=float)
Which means that if, for some reason, you really needed arrays of arbitrary objects with actual None in them, you could do that, and then convert it to an array of floats on the fly to get the benefits of NaN:
>>> a.astype(float) + b.astype(float)
At any rate, in this case, just using NaN isn't sufficient:
>>> a = np.array([1.0,np.nan,2.0])
>>> b = np.array([1.0,2.0,2.0])
>>> a + b
array([ 2., nan, 4.])
That's because the semantics of NaN are that the result of any operation with NaN returns NaN. But you want to treat it as 0.
But it does make the problem easy to solve. The simplest way to solve that is with the function nan_to_num:
>>> np.nan_to_num(a, 0)
array([1., 0., 2.0])
>>> np.nan_to_num(a, 0) + np.nan_to_num(b, 0)
array([2., 2., 4.])
You can use column_stack to concatenates both arrays along the second axis then use np.nansum() to sum items over the second axis.
In [15]: a = np.array([1.0,None,2.0], dtype=np.float)
# Using dtype here is necessary to convert None to np.nan
In [16]: b = np.array([1.0,2.0,2.0])
In [17]: np.nansum(np.column_stack((a, b)), 1)
Out[17]: array([2., 2., 4.])
Related
I am currently trying to implement a datatype that stores floats in an numpy array. However trying to assign an array with elements of this type with various lengths seems to obviously break the code. One would assign a sequence to an array element, which is not possible.
One can bypass this by using the data type object instead of float. Why is that? How could one resolve this problem using floats without creating a sequence?
Example code that does not work.
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3.]], dtype=foo)
Example code that does work:
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3., 2.]], dtype=foo)
Example code that does work, I try to replicate for float:
from numpy import *
foo= dtype(object, [])
x = array([[2., 3.], [3.]], dtype=foo)
The object dtype in Numpy simply creates an array of pointers to Python objects. This means you lose the performance advantage you usually get from Numpy, but it's still sometimes useful to do this.
Your last example creates a one-dimensional Numpy array of length two, so that's two pointers to Python objects. Both these objects happen to be lists, and Python list have arbitrary dynamic length.
I don't know what you were trying to achieve with this, but note that
>>> np.dtype(np.float32, []) == np.float32
True
Arrays require the same number of elements for each row. So, if you feed a list of lists in numpy and all sublists have the same number of elements, it'll happily convert it to an array. This is why your second example works.
If the sublists are not the same length, then each sublist is treated as a single object and you end up with a 1D array of objects. This is why your third example works. Your first example doesn't work because you try to cast a sequence of objects to floats, which isn't possible.
In short, you can't create an array of floats if your sublists are of different lengths. At best, you can create an array of 1D arrays, since they are still considered objects.
>>> x = np.array(list(map(np.array, [[2., 3.], [3.]])))
>>> x
array([array([ 2., 3.]), array([ 3.])], dtype=object)
>>> x[0]
array([ 2., 3.])
>>> x[0][1]
3.0
>>> # but you can't do this
>>> x[0,1]
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
x[0,1]
IndexError: too many indices for array
If you're bent on creating a float 2D array, you have to extend all your sublists to the same size with None, which will be converted to np.nan.
>>> lists = [[2., 3.], [3.]]
>>> max_len = max(map(len, lists))
>>> for i, sublist in enumerate(lists):
sublist = sublist + [None] * (max_len - len(sublist))
lists[i] = sublist
>>> np.array(lists, dtype=np.float32)
array([[ 2., 3.],
[ 3., nan]], dtype=float32)
I have a list of np.complex128 numbers but for all the numbers the complex part is equal to zero. How can I extract the real part of the number (which is pretty much the only part of the number)?
As a side not, I want to do scipy integration but I am not sure whether their integration methods can handle y samples with dtype of np.complex128.
If your list is a NumPy array, you can simply refer to its real attribute:
In [59]: a = np.array([1+0j, 2+0j, -1+0j])
In [60]: a
Out[60]: array([ 1.+0.j, 2.+0.j, -1.+0.j])
In [61]: a.real
Out[61]: array([ 1., 2., -1.])
If your list is a Python list, perhaps the following list comprehension would be the simplest way to get the real parts you want:
In [64]: l
Out[64]: [(1+0j), (2+0j), (-1+0j)]
In [67]: [c.real for c in l]
Out[67]: [1.0, 2.0, -1.0]
You would need to do this conversion if you want to integrate a function returning a np.complex128 with quad: scipy.integrate.quad expects the function to return some kind of float.
I am still having troubles adjusting to 'more pythonian ways' of writing code sometimes ... right now I am iterating over some values (x). I have many arrays and I always compare the first value of all the arrays, the second value ... shortly: a mean value of all the entries in an array by position in the array.
sum_mean_x = []
for i in range(0, int_points):
for j in range(0, len(x)):
mean_x.append(x[j][i])
sum_mean_x.append(sum(mean_x)/len(x))
mean_x = []
I am pretty sure that can be done super beautiful. I know I could change the second last line to something like sum_mean_x.append(mean_x.mean) but, I guess I miss some serious magic this way.
Use the numpy package for numeric processing. Suppose you have the following three lists in plain Python:
a1 = [1., 4., 6.]
a2 = [3., 7., 3.]
a3 = [2., 0., -1.]
And you want to get the mean value for each position. Arrange the vectors in a single array:
import numpy as np
a = np.array([a1, a2, a3])
Then you can get the per-column mean like this:
>>> a.mean(axis=0)
array([ 2. , 3.66666667, 2.66666667])
It sounds like what you're trying to do is treat your list of lists are a 2D array where each list is a row, and then average each column.
The obvious way to do this is to use NumPy, make it an actual 2D array, and just call mean by columns. See simleo's answer, which is better than what I was going to add here. :)
But if you want to stick with lists of lists, going by column effectively means transposing, and that means zip:
>>> from statistics import mean
>>> arrs = [[1., 2., 3.], [0., 0., 0.], [2., 4., 6.]]
>>> column_means = [mean(col) for col in zip(*arrs)]
>>> column_means
[1.0, 2.0, 3.0]
That statistics.mean is only in the stdlib in 3.4+, but it's based on stats on PyPI, and if yur Python is too old even for that, you can write it on your own. Getting the error handling right on the edge cases is tricky, so you probably want to look at the code from statistics, but if you're only dealing with values near 1, you can just do it the obvious way:
def mean(iterable):
total, length = 0.0, 0
for value in iterable:
total += value
length += 1
return total / length
ar1 = [1,2,3,4,5,6]
ar2 = [3,5,7,2,5,7]
means = [ (i+j)/2.0 for (i,j) in zip(ar1, ar2)]
print(means)
You mean something like
import numpy as np
ar1 = [1,2,3,4,5,6]
ar2 = [3,5,7,2,5,7]
mean_list = []
for i, j in zip(ar1, ar2):
mean_list.append(np.array([i,j]).mean())
print(mean_list)
[2.0, 3.5, 5.0, 3.0, 5.0, 6.5]
I have a rather big numpy array of M*N ints and I want to end up with a M array of strings that have all N corresponding ints concatenated.
I tried using a view but this is probably not the way to go with numpy.
Hope this is what you want
numpy.apply_along_axis(numpy.array_str,0,array)
Look the documentation of apply_along_axis
http://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html
and array_str
http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html
for deeper understanding
if you want to concatenating ints to string it wasnt be numpy array longer ! you will have a list with string indexes.
this is an example that concatenate 'a' to a numpy zero array :
>>> np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> [str(i)+'a' for i in np.zeros(10)]
['0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a']
>>>
Without an example, your request is unclear. But here's one way of understanding it
In [13]: X=np.arange(12).reshape(3,4)
In [14]: np.array([''.join([str(i) for i in x]) for x in X])
Out[14]:
array(['0123', '4567', '891011'],
dtype='|S6')
I have a 3x4 array; I convert each element to a string using str(i), and use join to concatenate the strings of a row into one longer string.
That's not a very satisfying answer, especially when joining '9' to '10'. Of course it could be refined by elaborating on the 'int' to 'string' formatting (ie. fixed width), maybe adding delimiters in the 'join', etc.
In [21]: np.array([','.join(['{:*^8}'.format(i) for i in x]) for x in X])
Out[21]:
array(['***0****,***1****,***2****,***3****',
'***4****,***5****,***6****,***7****',
'***8****,***9****,***10***,***11***'],
dtype='|S35')
A view would only work if what you want is some sort of string to bytes representation, or str to char.
2.765334406984874427e+00
3.309563282821381680e+00
The file looks like above: 2 rows, 1 col
numpy.loadtxt() returns
[ 2.76533441 3.30956328]
Please don't tell me use array.transpose() in this case, I need a real solution. Thank you in advance!!
You can always use the reshape command. A single column text file loads as a 1D array which in numpy's case is a row vector.
>>> a
array([ 2.76533441, 3.30956328])
>>> a[:,None]
array([[ 2.76533441],
[ 3.30956328]])
>>> b=np.arange(5)[:,None]
>>> b
array([[0],
[1],
[2],
[3],
[4]])
>>> np.savetxt('something.npz',b)
>>> np.loadtxt('something.npz')
array([ 0., 1., 2., 3., 4.])
>>> np.loadtxt('something.npz').reshape(-1,1) #Another way of doing it
array([[ 0.],
[ 1.],
[ 2.],
[ 3.],
[ 4.]])
You can check this using the number of dimensions.
data=np.loadtxt('data.npz')
if data.ndim==1: data=data[:,None]
Or
np.loadtxt('something.npz',ndmin=2) #Always gives at at least a 2D array.
Although its worth pointing out that if you always have a column of data numpy will always load it as a 1D array. This is more of a feature of numpy arrays rather then a bug I believe.
If you like, you can use matrix to read from string. Let test.txt involve the content. Here's a function for your needs:
import numpy as np
def my_loadtxt(filename):
return np.array(np.matrix(open(filename).read().strip().replace('\n', ';')))
a = my_loadtxt('test.txt')
print a
It gives column vectors if the input is a column vector. For the row vectors, it gives row vectors.
You might want to use the csv module:
import csv
import numpy as np
reader = csv.reader( open('file.txt') )
l = list(reader)
a = np.array(l)
a.shape
>>> (2,1)
This way, you will get the correct array dimensions irrespective of the number of rows / columns present in the file.
I've written a wrapper for loadtxt to do this and is similar to answer from #petrichor, but I think matrix can't have a string data format (probably understandably) so and that method doesn't seem to work if you're loading strings (such as column headings).
def my_loadtxt(filename, skiprows=0, usecols=None, dtype=None):
d = np.loadtxt(filename, skiprows=skiprows, usecols=usecols, dtype=dtype, unpack=True)
if len(d.shape) == 0:
d = d.reshape((1, 1))
elif len(d.shape) == 1:
d = d.reshape((d.shape[0], 1))
return d