Related
Simple case-
I have two arrays:
x1 = np.arange(1,10) and x2 = np.array([0,0,4,0,0,5,0,0,0])
I would like to merge or combine these two arrays such that the 0 in x2 will be replaced with values in x1 and the non-zero elements of x2 remains. NumPy.union1d seems to do this union. But I don't want it sorted/ordered.
Then
Actual case-
I would then like to perform this on multi-dimensional arrays, eg: x.shape=(xx,yy,zz). Both array objects will have the same shape. x.shape = y.shape
Is this possible or should I try something with masked arrays NumPy.ma?
---------------------------Example-----------------------------
k_angle = khan(_angle)
e_angle = emss(_angle)
_angle.shape = (3647, 16)
e_angle.shape = (2394, 3647, 16)
k_angle.shape = (2394, 3647, 16)
_angle contains a list of values 0 - 180 degrees, if angle < 5 it should only use one function khan anything else is emss function.
Any value larger than 5 for khan becomes 0. While emss works for all values.
Attempt 1: I tried splitting the angle values but recombining them proved tricky
khan = bm.Khans_beam_model(freq=f, theta=None)
emss = bm.emss_beam_model(f=f)
test = np.array([[0,1,2], [3,4,5], [6,7,8], [9,10,11]])
gt_idx = test > 5
le_idx = test <= 5
# then update the array
test[gt_idx] = khan(test[gt_idx])
test[le_idx] = emss(test[le_idx])
But this gets an error TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions
khan and emss are `lambda' functions
So I thought it would easier to execute khan and emss and then merge after the fact.
I applied the simple case above to help ease the question.
The np.where(boolean_mask, value_if_true, value_otherwise) function should be sufficient as long as x1 and x2 are the same shape.
Here, you could use np.where(x2, x2, x1) where the condition is simply x2, which means that truthy values (non-zero) will be preserved and falsy values will be replaced by the corresponding values in x1. In general, any boolean mask will work as a condition, and it is better to be explicit here: np.where(x2 == 0, x1, x2).
1D
In [1]: import numpy as np
In [2]: x1 = np.arange(1, 10)
In [3]: x2 = np.array([0,0,4,0,0,5,0,0,0])
In [4]: np.where(x2 == 0, x1, x2)
Out[4]: array([1, 2, 4, 4, 5, 5, 7, 8, 9])
2D
In [5]: x1 = x1.reshape(3, 3)
In [6]: x2 = x2.reshape(3, 3)
In [7]: x1
Out[7]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [8]: x2
Out[8]:
array([[0, 0, 4],
[0, 0, 5],
[0, 0, 0]])
In [9]: np.where(x2 == 0, x1, x2)
Out[9]:
array([[1, 2, 4],
[4, 5, 5],
[7, 8, 9]])
3D
In [10]: x1 = np.random.randint(1, 9, (2, 3, 3))
In [11]: x2 = np.random.choice((0, 0, 0, 0, 0, 0, 0, 0, 99), (2, 3, 3))
In [12]: x1
Out[12]:
array([[[3, 7, 4],
[1, 4, 3],
[7, 4, 3]],
[[5, 7, 1],
[5, 7, 6],
[1, 8, 8]]])
In [13]: x2
Out[13]:
array([[[ 0, 99, 99],
[ 0, 99, 0],
[ 0, 99, 0]],
[[99, 0, 0],
[ 0, 0, 99],
[ 0, 99, 0]]])
In [14]: np.where(x2 == 0, x1, x2)
Out[14]:
array([[[ 3, 99, 99],
[ 1, 99, 3],
[ 7, 99, 3]],
[[99, 7, 1],
[ 5, 7, 99],
[ 1, 99, 8]]])
I have a first ndarray, foo, in which I want to select several elements.
foo = array([0, 10, 30] , [20, 40, 60], [30, 50, 70])
To be precised, I have another ndarray, bar, in which I store the rows I want in each column of my first ndarray.
bar = array([1, 2, 0], [0, 0, 1])
What I want as result is :
array([20, 50, 30] , [0, 10, 60])
Is it a vectorized way to do it ?
When I try foo[bar], it increases the size of the array.
That is not what I'm looking for.
In [17]: foo[bar, np.arange(3)]
Out[17]:
array([[20, 50, 30],
[ 0, 10, 60]])
The 1-dimensional array np.arange(3) is broadcasted to the same shape as bar
so that it is equivalent to
In [35]: X, Y = np.broadcast_arrays(bar, np.arange(3)); Y
Out[35]:
array([[0, 1, 2],
[0, 1, 2]])
X is the same as bar since broadcasting does not change the shape of bar.
Then NumPy integer array indexing rules say that the (i,j) element of foo[X, Y] equals
foo[X, Y][i, j] = foo[X[i,j], Y[i,j]]
So for example,
foo[bar, np.arange(3)][0, 1] = foo[ bar[0,1], Y[0,1] ]
= foo[2, 1]
= 50
you need to also specify the columns to go with each index, respectively.
try this:
import numpy as np
foo = np.array([[0, 10, 30], [20, 40, 60], [30, 50, 70]])
bar = np.array([[1, 2, 0], [0, 0, 1]])
foo[bar, range(len(foo))]
Output:
array([[20, 50, 30],
[ 0, 10, 60]])
I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. For example, start with the values:
x = np.array([1, 3, 7, 3, 2, 9])
with a bucket size of 2, this transforms into:
bucket(x, bucket_size=2)
= [1+3, 7+3, 2+9]
= [4, 10, 11]
As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. For 1d numpy arrays, this isn't bad:
import numpy as np
def bucket(x, bucket_size):
return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)
bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10 4 5]
...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. I'd love it if we could establish a nice N-dimensional reference implementation.
Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)))
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets.
Bonus points for allowing the user to choose the initial bin edge offset.
As suggested by Divakar, here's my desired behavior in a sample 2-D case:
x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
[8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
[17, 8]]
...hopefully I did my arithmetic correctly ;)
I think you can do most of the fiddly work with skimage's view_as_blocks. This function is implemented using as_strided so it is very efficient (it just changes the stride information to reshape the array). Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed.
After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size tuple). Here's a new bucket() function:
from skimage.util import view_as_blocks
def bucket(x, bucket_size):
blocks = view_as_blocks(x, bucket_size)
tup = tuple(range(-len(bucket_size), 0))
return blocks.sum(axis=tup)
Then for example:
>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])
>>> x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
[17, 8]])
>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264, 300],
[ 408, 444],
[ 552, 588]],
[[1128, 1164],
[1272, 1308],
[1416, 1452]],
[[1992, 2028],
[2136, 2172],
[2280, 2316]]])
Natively from as_strided :
x = array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
from numpy.lib.stride_tricks import as_strided
def bucket(x,bucket_size):
x=np.ascontiguousarray(x)
oldshape=array(x.shape)
newshape=concatenate((oldshape//bucket_size,bucket_size))
oldstrides=array(x.strides)
newstrides=concatenate((oldstrides*bucket_size,oldstrides))
axis=tuple(range(x.ndim,2*x.ndim))
return as_strided (x,newshape,newstrides).sum(axis)
if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost.
verification :
In [9]: bucket(x,(2,2))
Out[9]:
array([[ 8, 23],
[17, 8]])
To specify different bin sizes along each axis for ndarray cases, you can use iteratively use np.add.reduceat along each axis of it, like so -
def bucket(x, bin_size):
ndims = x.ndim
out = x.copy()
for i in range(ndims):
idx = np.append(0,np.cumsum(bin_size[i][:-1]))
out = np.add.reduceat(out,idx,axis=i)
return out
Sample run -
In [126]: x
Out[126]:
array([[165, 107, 133, 82, 199],
[ 35, 138, 91, 100, 207],
[ 75, 99, 40, 240, 208],
[166, 171, 78, 7, 141]])
In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]:
array([[669, 588],
[629, 596]])
# [2, 2] are the bin sizes along axis=0
# [3, 2] are the bin sizes along axis=1
# array([[165, 107, 133, | 82, 199],
# [ 35, 138, 91, | 100, 207],
# -------------------------------------
# [ 75, 99, 40, | 240, 208],
# [166, 171, 78, | 7, 141]])
In [128]: x[:2,:3].sum()
Out[128]: 669
In [129]: x[:2,3:].sum()
Out[129]: 588
In [130]: x[2:,:3].sum()
Out[130]: 629
In [131]: x[2:,3:].sum()
Out[131]: 596
I am trying to improve my understanding of numpy functions. I understand the behaviour of numpy.dot. I'd like to understand the behaviour of numpy.outer in terms of numpy.dot.
Based on this Wikipedia article https://en.wikipedia.org/wiki/Outer_product I'd expect for array_equal to return True in the following code. However it does not.
X = np.matrix([
[1,5],
[5,9],
[4,1]
])
r1 = np.outer(X,X)
r2 = np.dot(X, X.T)
np.array_equal(r1, r2)
How can I assign r2 so that np.array_equal returns True? Also, why does numpy's implementation of np.outer not match the definition of outer multiplication on Wikipedia?
Using numpy 1.9.2
In [303]: X=np.array([[1,5],[5,9],[4,1]])
In [304]: X
Out[304]:
array([[1, 5],
[5, 9],
[4, 1]])
In [305]: np.inner(X,X)
Out[305]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [306]: np.dot(X,X.T)
Out[306]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
The Wiki outer link mostly talks about vectors, 1d arrays. Your X is 2d.
In [310]: x=np.arange(3)
In [311]: np.outer(x,x)
Out[311]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [312]: np.inner(x,x)
Out[312]: 5
In [313]: np.dot(x,x) # same as inner
Out[313]: 5
In [314]: x[:,None]*x[None,:] # same as outer
Out[314]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Notice that the Wiki outer does not involve summation. Inner does, in this example 5 is the sum of the 3 diagonal values of the outer.
dot also involves summation - all the products followed summation along a specific axis.
Some of the wiki outer equations use explicit indices. The einsum function can implement these calculations.
In [325]: np.einsum('ij,kj->ik',X,X)
Out[325]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [326]: np.einsum('ij,jk->ik',X,X.T)
Out[326]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [327]: np.einsum('i,j->ij',x,x)
Out[327]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [328]: np.einsum('i,i->',x,x)
Out[328]: 5
As mentioned in the comment, np.outer uses ravel, e.g.
return a.ravel()[:, newaxis]*b.ravel()[newaxis,:]
This the same broadcasted multiplication that I demonstrated earlier for x.
numpy.outer only works for 1-d vectors, not matrices. But for the case of 1-d vectors, there is a relation.
If
import numpy as np
A = np.array([1.0,2.0,3.0])
then this
np.matrix(A).T.dot(np.matrix(A))
should be the same as this
np.outer(A,A)
Another (clunky) version similar to a[:,None] * a[None,:]
a.reshape(a.size, 1) * a.reshape(1, a.size)
Suppose I have a numpy array of arrays of length 4:
In [41]: arr
Out[41]:
array([[ 1, 15, 0, 0],
[ 30, 10, 0, 0],
[ 30, 20, 0, 0],
...,
[104, 139, 146, 75],
[ 9, 11, 146, 74],
[ 9, 138, 146, 75]], dtype=uint8)
I want to know:
Is it true that arr includes [1, 2, 3, 4]?
If it true what index of [1, 2, 3, 4] in arr?
I want to find out it as fast as it possible.
Suppose arr contains 8550420 elements. I've checked several methods with timeit:
Just for checking without getting index: any(all([1, 2, 3, 4] == elt) for elt in arr). It tooks 15.5 sec in average on 10 runs on my machine
for-based solution:
for i,e in enumerate(arr):
if list(e) == [1, 2, 3, 4]:
break
It tooks about 5.7 secs in average
Does exists some faster solutions, for example numpy based?
This is Jaime's idea, I just love it:
import numpy as np
def asvoid(arr):
"""View the array as dtype np.void (bytes)
This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
https://stackoverflow.com/a/16216866/190597 (Jaime)"""
arr = np.ascontiguousarray(arr)
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
def find_index(arr, x):
arr_as1d = asvoid(arr)
x = asvoid(x)
return np.nonzero(arr_as1d == x)[0]
arr = np.array([[ 1, 15, 0, 0],
[ 30, 10, 0, 0],
[ 30, 20, 0, 0],
[1, 2, 3, 4],
[104, 139, 146, 75],
[ 9, 11, 146, 74],
[ 9, 138, 146, 75]], dtype='uint8')
arr = np.tile(arr,(1221488,1))
x = np.array([1,2,3,4], dtype='uint8')
print(find_index(arr, x))
yields
[ 3 10 17 ..., 8550398 8550405 8550412]
The idea is to view each row of the array as a string. For example,
In [15]: x
Out[15]:
array([^A^B^C^D],
dtype='|V4')
The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.
There is another way to do it:
def find_index2(arr, x):
return np.where((arr == x).all(axis=1))[0]
but it turns out to be not as fast:
In [34]: %timeit find_index(arr, x)
1 loops, best of 3: 209 ms per loop
In [35]: %timeit find_index2(arr, x)
1 loops, best of 3: 370 ms per loop
If you perform search more than one time and you don't mind to use extra memory, you can create set from you array (I'm using list here, but it's almost the same code):
>>> elem = [1, 2, 3, 4]
>>> elements = [[ 1, 15, 0, 0], [ 30, 10, 0, 0], [1, 2, 3, 4]]
>>> index = set([tuple(x) for x in elements])
>>> True if tuple(elem) in index else False
True