Say I have two objects X, Y of shape k,1,n and k,m,n. I know that numpy will automatically extend/repeat X along the first dimension when I do operations such as X + Y. Does this magic work for all mathematical operations that are supported/included in numpy?
For example, can I do scipy.special.binom(X,Y) and get the expected result? I have tried some of the special functions, and I don't receive an error. Does not receiving an error allow me to conclude that the broadcasting was done correctly?
numpy does apply broadcasting for all operators, eg. *+-/ etc. It also applies it where possible to ufunc functions. That's part of the ufunc definition.
scipy.special.binom is, according to its docs a ufunc. It is compiled so I can't look at the code to verify this, but I can do a few simple tests:
In [379]: special.binom([1,2,3],[[1],[2]])
Out[379]:
array([[ 1., 2., 3.],
[ 0., 1., 3.]])
In [380]: special.binom([1,2,3,4],[[1],[2]])
Out[380]:
array([[ 1., 2., 3., 4.],
[ 0., 1., 3., 6.]])
In [385]: special.binom(np.arange(6).reshape(3,2,1),np.arange(6).reshape(3,1,2)).shape
Out[385]: (3, 2, 2)
The (2,3) and (2,4) output dimensions match the broadcasted inputs. That consistent with broadcasting.
np.dot is an example of a numpy function where broadcasting does not apply.
Related
numpy.vectorize takes a function f:a->b and turns it into g:a[]->b[].
This works fine when a and b are scalars, but I can't think of a reason why it wouldn't work with b as an ndarray or list, i.e. f:a->b[] and g:a[]->b[][]
For example:
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))
This yields:
array([[ 0. 0. 0. 0. 0.],
[ 1. 1. 1. 1. 1.],
[ 2. 2. 2. 2. 2.],
[ 3. 3. 3. 3. 3.]], dtype=object)
Ok, so that gives the right values, but the wrong dtype. And even worse:
g(a).shape
yields:
(4,)
So this array is pretty much useless. I know I can convert it doing:
np.array(map(list, a), dtype=np.float32)
to give me what I want:
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
but that is neither efficient nor pythonic. Can any of you guys find a cleaner way to do this?
np.vectorize is just a convenience function. It doesn't actually make code run any faster. If it isn't convenient to use np.vectorize, simply write your own function that works as you wish.
The purpose of np.vectorize is to transform functions which are not numpy-aware (e.g. take floats as input and return floats as output) into functions that can operate on (and return) numpy arrays.
Your function f is already numpy-aware -- it uses a numpy array in its definition and returns a numpy array. So np.vectorize is not a good fit for your use case.
The solution therefore is just to roll your own function f that works the way you desire.
A new parameter signature in 1.12.0 does exactly what you what.
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, signature='()->(n)')
Then g(np.arange(4)).shape will give (4L, 5L).
Here the signature of f is specified. The (n) is the shape of the return value, and the () is the shape of the parameter which is scalar. And the parameters can be arrays too. For more complex signatures, see Generalized Universal Function API.
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)
This should fix the problem and it will work regardless of what size your input is. "map" only works for one dimentional inputs. Using ".tolist()" and creating a new ndarray solves the problem more completely and nicely(I believe). Hope this helps.
You want to vectorize the function
import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
Assuming that you want to get single np.float32 arrays as result, you have to specify this as otype. In your question you specified however otypes=[np.ndarray] which means you want every element to be an np.ndarray. Thus, you correctly get a result of dtype=object.
The correct call would be
np.vectorize(f, signature='()->(n)', otypes=[np.float32])
For such a simple function it is however better to leverage numpy's ufunctions; np.vectorize just loops over it. So in your case just rewrite your function as
def f(x):
return np.multiply.outer(x, np.array([1,1,1,1,1], dtype=np.float32))
This is faster and produces less obscure errors (note however, that the results dtype will depend on x if you pass a complex or quad precision number, so will be the result).
I've written a function, it seems fits to your need.
def amap(func, *args):
'''array version of build-in map
amap(function, sequence[, sequence, ...]) -> array
Examples
--------
>>> amap(lambda x: x**2, 1)
array(1)
>>> amap(lambda x: x**2, [1, 2])
array([1, 4])
>>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
array([2, 5])
>>> amap(lambda x: (x, x), 1)
array([1, 1])
>>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
array([[1, 9], [4, 16]])
'''
args = np.broadcast(None, *args)
res = np.array([func(*arg[1:]) for arg in args])
shape = args.shape + res.shape[1:]
return res.reshape(shape)
Let try
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))
Outputs
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)
You may also wrap it with lambda or partial for convenience
g = lambda x:amap(f, x)
g(np.arange(4))
Note the docstring of vectorize says
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Thus we would expect the amap here have similar performance as vectorize. I didn't check it, Any performance test are welcome.
If the performance is really important, you should consider something else, e.g. direct array calculation with reshape and broadcast to avoid loop in pure python (both vectorize and amap are the later case).
The best way to solve this would be to use a 2-D NumPy array (in this case a column array) as an input to the original function, which will then generate a 2-D output with the results I believe you were expecting.
Here is what it might look like in code:
import numpy as np
def f(x):
return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)
a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)
This is a much simpler and less error prone way to complete the operation. Rather than trying to transform the function with numpy.vectorize, this method relies on NumPy's natural ability to broadcast arrays. The trick is to make sure that at least one dimension has an equal length between the arrays.
I'd like to assign multiple values to a tensor, but it seems that it's not supported at least in the way that is possible using numpy.
a = np.zeros((4, 4))
v = np.array([0, 2, 3, 1])
r = np.arange(4)
a[r, v] = 1
>>> a
array([[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
The above works, but the tensorflow equivalent doesn't:
import tensorflow as tf
a = tf.zeros((4, 4))
v = tf.Variable([0, 2, 3, 1])
r = tf.range(4)
a[r, v].assign(1)
TypeError: Only integers, slices, ellipsis, tf.newaxis and scalar tensors are valid indices, got <tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3])>
How could this be achieved? Are loops the only option? In my case the resulting array is indeed only slices of an identity matrix rearranged, so maybe that could be taken advantage of somehow.
Your example, which is updating a zero tensor at some indices to a certain value is most of time achieved through tf.scatter_nd :
idx = tf.stack([r,v],axis=-1)
tf.scatter_nd(idx, updates=tf.ones(4), shape=(4,4))
For more complex cases, you can look at the following functions:
tf.tensor_scatter_nd_add: Adds sparse updates to an existing tensor according to indices.
tf.tensor_scatter_nd_sub: Subtracts sparse updates from an existing tensor according to indices.
tf.tensor_scatter_nd_max: to copy element-wise maximum values from one tensor to another.
tf.tensor_scatter_nd_min: to copy element-wise minimum values from one tensor to another.
tf.tensor_scatter_nd_update: Scatter updates into an existing tensor according to indices.
You can read more in the guide: Introduction to tensor slicing
I'm trying to use some sklearn estimators for classifications on the coefficients of some fast fourier transform (technically Discrete Fourier Transform). I obtain a numpy array X_c as output of np.fft.fft(X) and I want to transform it into a real numpy array X_r, with each (complex) column of the original X_c transformed into two (real/float) columns in X_r, i.e the shape goes from (r, c) to (r, 2c). So I use .view(np.float64). and it works at first.
The problem is that if I first decide to keep only some coefficients of the original complex array with X_c2 = X_c[:, range(3)] and then to do the same thing as before instead of having the number of columns doubled, I obtain the number of ranks doubled (the imaginary part of each element is put in a new row below the original).
I really don't understand why this happens.
To make myself clearer, here is a toy example:
import numpy as np
# I create a complex array
X_c = np.arange(8, dtype = np.complex128).reshape(2, 4)
print(X_c.shape) # -> (2, 4)
# I use .view to transform it into something real and it works
# the way I want it.
X_r = X_c.view(np.float64)
print(X_r.shape) # -> (2, 8)
# Now I subset the array.
indices_coef = range(3)
X_c2 = X_c[:, indices_coef]
print(X_c2.shape) # -> (2, 3)
X_r2 = X_c2.view(np.float64)
# In the next line I obtain (4, 3), when I was expecting (2, 6)...
print(X_r2.shape) # -> (4, 3)
Does anyone see a reason for this difference of behavior?
I get a warning:
In [5]: X_c2 = X_c[:,range(3)]
In [6]: X_c2
Out[6]:
array([[ 0.+0.j, 1.+0.j, 2.+0.j],
[ 4.+0.j, 5.+0.j, 6.+0.j]])
In [7]: X_c2.view(np.float64)
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of non-C contiguous array by
descriptor assignment is deprecated. To maintain
the Fortran contiguity of a multidimensional Fortran
array, use 'a.T.view(...).T' instead
#!/usr/bin/python3
Out[7]:
array([[ 0., 1., 2.],
[ 0., 0., 0.],
[ 4., 5., 6.],
[ 0., 0., 0.]])
In [12]: X_c2.strides
Out[12]: (16, 32)
In [13]: X_c2.flags
Out[13]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
So this copy (or is a view?) is Fortran order. The recommended X_c2.T.view(float).T produces the same 4x3 array without the warning.
As your first view shows, a complex array has the same data layout as twice the number of floats.
I've seen funny shape behavior when trying to view a structured array. I'm wondering the complex dtype is behaving much like a dtype('f8,f8') array.
If I change your X_c2 so it is a copy, I get the expected behavior
In [19]: X_c3 = X_c[:,range(3)].copy()
In [20]: X_c3.flags
Out[20]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [21]: X_c3.strides
Out[21]: (48, 16)
In [22]: X_c3.view(float)
Out[22]:
array([[ 0., 0., 1., 0., 2., 0.],
[ 4., 0., 5., 0., 6., 0.]])
That's reassuring. But I'm puzzled as to why the [:, range(3)] indexing creates a F order view. That should be advance indexing.
And indeed, a true slice does not allow this view
In [28]: X_c[:,:3].view(np.float64)
---------------------------------------------------------------------------
ValueError: new type not compatible with array.
So the range indexing has created some sort of hybrid object.
I try to store a list of different shaped arrays as a dtype=object array using np.save (I'm aware I could just pickle the list but I'm really curious how to do this).
If I do this:
import numpy as np
np.save('test.npy', [np.zeros((2, 2)), np.zeros((3,3))])
it works.
But this:
np.save('test.npy', [np.zeros((2, 2)), np.zeros((2,3))])
Gives me an error:
ValueError: could not broadcast input array from shape (2,2) into shape (2)
I guess np.save converts the list into an array first, so I tried:
x=np.array([np.zeros((2, 2)), np.zeros((3,3))])
y=np.array([np.zeros((2, 2)), np.zeros((2,3))])
Which has the same effect (first one works, second one doesn't.
The resulting x behaves as expected:
>>> x.shape
(2,)
>>> x.dtype
dtype('O')
>>> x[0].shape
(2, 2)
>>> x[0].dtype
dtype('float64')
I also tried to force the 'object' dtype:
np.array([np.zeros((2, 2)), np.zeros((2,3))], dtype=object)
Without success. It seems numpy tries to broadcast the array with equal first dimension into the new array and realizes too late that their shape is different. Oddly it seems to have worked at one point - so I'm really curious what the difference is, and how to do this properly.
EDIT:
I figured out the case it worked before: The only difference seems to be that the numpy arrays in the list have another data type.
It works with dtype('<f8'), but it doesn't with dtype('float64'), I'm not even sure what the difference is.
EDIT 2:
I found a very non-pythonic way to solve my issue, I add it here, maybe it helps to understand what I wanted to do:
array_list=np.array([np.zeros((2, 2)), np.zeros((2,3))])
save_array = np.empty((len(array_list),), dtype=object)
for idx, arr in enumerate(array_list):
save_array[idx] = arr
np.save('test.npy', save_array)
One of the first things that np.save does is
arr = np.asanyarray(arr)
So yes it is trying to turn your list into an array.
Constructing an object array from arbitrary sized arrays or lists is tricky. np.array(...) tries to create as high a dimensional array as it can, even attempting to concatenate the inputs if possible. The surest way is to do what you did - make the empty array and fill it.
A slightly more compact way of constructing the object array:
In [21]: alist = [np.zeros((2, 2)), np.zeros((2,3))]
In [22]: arr = np.empty(len(alist), dtype=object)
In [23]: arr[:] = alist
In [24]: arr
Out[24]:
array([array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0., 0.],
[ 0., 0., 0.]])], dtype=object)
Here are 3 scenarios:
Arrays that match in shape, combine into a 3d array:
In [27]: np.array([np.zeros((2, 2)), np.zeros((2,2))])
Out[27]:
array([[[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.]]])
In [28]: _.shape
Out[28]: (2, 2, 2)
Arrays that don't match on the first dimension - create object array
In [29]: np.array([np.zeros((2, 2)), np.zeros((3,2))])
Out[29]:
array([array([[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])], dtype=object)
In [30]: _.shape
Out[30]: (2,)
And awkward intermediate case (which may even be described as a bug). The first dimensions match, but the second ones don't):
In [31]: np.array([np.zeros((2, 2)), np.zeros((2,3))])
...
ValueError: could not broadcast input array from shape (2,2) into shape (2)
[ 0., 0.]])], dtype=object)
It's as though it initialized a (2,2,2) array, and then found that the (2,3) wouldn't fit. And the current logic doesn't allow it to backup and create the object array as it did in the previous scenario.
If you wanted to put the two (2,2) arrays in object array you'd have to use the create and fill logic.
I have a numpy array(eg., a = np.array([ 8., 2.])), and another array which stores the indices I would like to get from the former array. (eg., b = np.array([ 0., 1., 1., 0., 0.]).
What I would like to do is to create another array from these 2 arrays, in this case, it should be: array([ 8., 2., 2., 8., 8.])
of course, I can always use a for loop to achieve this goal:
for i in range(5):
c[i] = a[b[i]]
I wonder if there is a more elegant method to create this array. Something like c = a[b[0:5]] (well, this apparently doesn't work)
Only integer arrays can be used for indexing, and you've created b as a float64 array. You can get what you're looking for if you explicitly convert to integer:
bi = np.array(b, dtype=int)
c = a[bi[0:5]]