Vectorization - Adding numpy arrays without loops? - python

So I have the following numpy arrays:
c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
X = array([[10, 15, 20, 5],
[ 1, 2, 6, 23]])
y = array([1, 1])
I am trying to add each 1x4 row in the X array to one of the columns in c. The y array specifies which column. The above example, means that we are adding both rows in the X array to column 1 of c. That is, we should expect the result of:
c = array([[ 1, 2+10+1, 3], = array([[ 1, 13, 3],
[ 4, 5+15+2, 6], [ 4, 22, 6],
[ 7, 8+20+6, 9], [ 7, 34, 9],
[10, 11+5+23, 12]]) [10, 39, 12]])
Does anyone know how I can do this without any loops? I tried c[:,y] += X but it seems like this only adds the second row of X to column 1 of c once. With that being said, it should be noted that y does not necessarily have to be [1,1], it can also be [0,1]. In this case, we would add the first row of X to column 0 of c and the second row of X to column 1 of c.

My first thought when I saw your desired calculation, was to just sum the 2 rows of X, and add that to the 2nd column of c:
In [636]: c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
In [637]: c[:,1]+=X.sum(axis=0)
In [638]: c
Out[638]:
array([[ 1, 13, 3],
[ 4, 22, 6],
[ 7, 34, 9],
[10, 39, 12]])
But if we want to work from a general index like y, we need a special bufferless operation - that is if there are duplicates in y:
In [639]: c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
In [641]: np.add.at(c,(slice(None),y),X.T)
In [642]: c
Out[642]:
array([[ 1, 13, 3],
[ 4, 22, 6],
[ 7, 34, 9],
[10, 39, 12]])
You need to look up .at in the numpy docs.
in Ipython add.at? shows me the doc that includes:
Performs unbuffered in place operation on operand 'a' for elements
specified by 'indices'. For addition ufunc, this method is equivalent to
a[indices] += b, except that results are accumulated for elements that
are indexed more than once. For example, a[[0,0]] += 1 will only
increment the first element once because of buffering, whereas
add.at(a, [0,0], 1) will increment the first element twice.
With a different y it still works
In [645]: np.add.at(c,(slice(None),[0,2]),X.T)
In [646]: c
Out[646]:
array([[11, 2, 4],
[19, 5, 8],
[27, 8, 15],
[15, 11, 35]])

Firstly, your code seems to work in general if you transpose X. For example:
c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
X = array([[10, 15, 20, 5],
[ 1, 2, 6, 23]]).transpose()
y = array([1, 2])
c[:,y] += X
print c
#OUTPUT:
#[[ 1 12 4]
# [ 4 20 8]
# [ 7 28 15]
# [10 16 35]]
However, it doesn't work when there are any duplicate columns in y, like in your specific example. I believe this is because c[:, [1,1]] will generate an array with two columns, each having the slice c[:, 1]. Both of these slices point to the same part of c, and so when the addition happens on each, they are both read, then the corresponding part of X is added to each, then they are written back, meaning the last one to be written back is the final value. I don't believe numpy will let you vectorize an operation like this because it fundamentally can't be. This requires editing one column at a time, saving back it's value, and then editing it again later.
You might have to settle for no duplicates, or otherwise implement something like an accumulator.

This is the solution I came up with:
def my_func(c, X, y):
cc = np.zeros((len(y), c.shape[0], c.shape[1]))
cc[range(len(y)), :, y] = X
return c + np.sum(cc, 0)
The following interactive session demonstrates how it works:
>>> my_func(c, X, y)
array([[ 1., 13., 3.],
[ 4., 22., 6.],
[ 7., 34., 9.],
[ 10., 39., 12.]])
>>> y2 = np.array([0, 2])
>>> my_func(c, X, y2)
array([[ 11., 2., 4.],
[ 19., 5., 8.],
[ 27., 8., 15.],
[ 15., 11., 35.]])

Related

fastest way to reshape 2D numpy array (gray image) into a 3D stacked array

I have a 2D image with the shape (M, N), and would like to reshape it into (M//m * N//n, m, n). That is, to stack small patches of images into a 3D array.
Currently, I used two for-loop to achieve that
import numpy as np
a = np.arange(16).reshape(4,4)
b = np.zeros( (4,2,2))
# a = array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15]])
row_step = 2
col_step = 2
row_indices = np.arange(4, step = 2)
col_indices = np.arange(4, step = 2)
for i, row_idx in enumerate(row_indices):
for j, col_idx in enumerate(col_indices):
b[i*2+j, :,:] = a[row_idx:row_idx+row_step, col_idx:col_idx+col_step]
# b = array([[[ 0., 1.],
# [ 4., 5.]],
#
# [[ 2., 3.],
# [ 6., 7.]],
#
# [[ 8., 9.],
# [12., 13.]],
#
# [[10., 11.],
# [14., 15.]]])
Is there any other faster way to do this? Thanks a lot!
Use skimage.util.view_as_blocks:
from skimage.util.shape import view_as_blocks
out = view_as_blocks(a, (2,2))#.copy()
Output:
array([[[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]]],
[[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]]])
NB. Be aware that you have a view of the original object. If you want a copy, use view_as_blocks(a, (2,2)).copy()

Convert 2-d arrays to 1-d corresponding arrays in Python

When I make a 2 d grid with two 1-d array in python, I usually use numpy.meshgrid such as below:
x = np.arange(10)
y = np.arange(9)
xy = np.meshgrid(x,y)
However, my question is about the reverse of this process. I have 3 2-d arrays. Each array indicates lattitude and longitude and corresponding altitude. Are not there any way to convert these 2-d grids to 1-d arrays corresponding to each other in python?
Sample arrays are shown like below:
x=
[[-104.09417725 -104.08866882 -104.0831604 ..., -103.8795166 -103.87399292
-103.86849976]
[-104.09458923 -104.08908081 -104.08358765 ..., -103.87991333
-103.87438965 -103.86889648]
[-104.09500122 -104.08950806 -104.08401489 ..., -103.88031006
-103.87481689 -103.86932373]
...,
[-104.11058044 -104.10507202 -104.09954834 ..., -103.89535522
-103.88983154 -103.88430786]
[-104.11100769 -104.10548401 -104.09997559 ..., -103.89575195
-103.89022827 -103.88470459]
[-104.11141968 -104.10591125 -104.10038757 ..., -103.89614868 -103.890625
-103.88513184]]
y=
[[ 40.81712341 40.81744385 40.81776428 ..., 40.82929611 40.82960129
40.82990646]
[ 40.82128525 40.8216095 40.82191849 ..., 40.83345795 40.83376694
40.83407593]
[ 40.8254509 40.82577515 40.82608795 ..., 40.83763123 40.83792877
40.83824539]
...,
[ 40.97956848 40.9798851 40.98020172 ..., 40.99177551 40.99207687
40.99238968]
[ 40.98373413 40.98405457 40.98437119 ..., 40.99593735 40.99624252
40.99655533]
[ 40.98789597 40.9882164 40.98854065 ..., 41.00011063 41.00041199
41.00072479]]
z=
[[ 1605.58544922 1615.62341309 1624.33911133 ..., 1479.11254883
1478.328125 1476.13378906]
[ 1618.58520508 1632.38305664 1645.36132812 ..., 1485.84899902
1483.43847656 1481.36865234]
[ 1640.63037109 1656.0925293 1667.14697266 ..., 1492.79797363
1488.65686035 1487.40478516]
...,
[ 1599.78015137 1602.82250977 1610.40197754 ..., 1595.12097168
1594.40551758 1597.87902832]
[ 1597.80883789 1601.17883301 1607.41320801 ..., 1595.7421875
1594.26452637 1597.90893555]
[ 1596.03857422 1600.5690918 1606.30712891 ..., 1598.56982422
1594.90454102 1594.07763672]]
Any helps or ideas would be really appreciated.
Expected array would be:
such as
x =
[-104.09417725 -104.08866882 -104.0831604 ..., -103.8795166 -103.87399292
-103.86849976]
y =
[ 40.82128525 40.8216095 40.82191849 ..., 40.83345795 40.83376694
40.83407593]
z =
[ 1618.58520508 1632.38305664 1645.36132812 ..., 1485.84899902
1483.43847656 1481.36865234]
I think this may work. In the resulting array, each row is a xyz set of the 3 initial arrays:
In [105]:
arr1 = np.random.random((2,3))
arr2 = np.random.random((2,3))
arr3 = np.random.random((2,3))
In [106]:
arr1
Out[106]:
array([[ 0.95919623, 0.76646714, 0.07782125],
[ 0.82285529, 0.80274853, 0.28257592]])
In [107]:
arr2
Out[107]:
array([[ 0.0575891 , 0.13063203, 0.11439967],
[ 0.83353859, 0.72917084, 0.14294741]])
In [108]:
arr3
Out[108]:
array([[ 0.75823658, 0.09216087, 0.80364941],
[ 0.50705487, 0.24498723, 0.3719806 ]])
In [109]:
np.dstack((arr1, arr2, arr3)).reshape((-1,3))
Out[109]:
array([[ 0.95919623, 0.0575891 , 0.75823658],
[ 0.76646714, 0.13063203, 0.09216087],
[ 0.07782125, 0.11439967, 0.80364941],
[ 0.82285529, 0.83353859, 0.50705487],
[ 0.80274853, 0.72917084, 0.24498723],
[ 0.28257592, 0.14294741, 0.3719806 ]])
meshgrid produces a 2d array for each input
In [235]: xx,yy=np.meshgrid([1,2,3],[4,5,6])
one has identical rows
In [236]: xx
Out[236]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
another has identical columns
In [237]: yy
Out[237]:
array([[4, 4, 4],
[5, 5, 5],
[6, 6, 6]])
Recovering the originals is just a matter of selecting a row or a column
In [238]: xx[0,:]
Out[238]: array([1, 2, 3])
In [239]: yy[:,0]
Out[239]: array([4, 5, 6])
Your x has similar, but not identical rows. So you could pick one and ignore the others. Or you could average them
In [240]: xx.mean(axis=0)
Out[240]: array([ 1., 2., 3.])
Or you could flatten the arrays, keeping all values
In [241]: xx.flatten()
Out[241]: array([1, 2, 3, 1, 2, 3, 1, 2, 3])
In [242]: xx.T.flatten()
Out[242]: array([1, 1, 1, 2, 2, 2, 3, 3, 3])
The similarity patten in y is less obvious. And z? meshgrid with 3 inputs would produce 3 3d arrays.
Or you could join all 3 into a 3d array
In [252]: np.dstack([xx,yy,xx+10])
Out[252]:
array([[[ 1, 4, 11],
[ 2, 4, 12],
[ 3, 4, 13]],
[[ 1, 5, 11],
[ 2, 5, 12],
[ 3, 5, 13]],
[[ 1, 6, 11],
[ 2, 6, 12],
[ 3, 6, 13]]])
and turn that back into a 3 column array
In [253]: np.dstack([xx,yy,xx+10]).reshape(-1,3)
Out[253]:
array([[ 1, 4, 11],
[ 2, 4, 12],
[ 3, 4, 13],
[ 1, 5, 11],
[ 2, 5, 12],
[ 3, 5, 13],
[ 1, 6, 11],
[ 2, 6, 12],
[ 3, 6, 13]])

transform the upper/lower triangular part of a symmetric matrix (2D array) into a 1D array and return it to the 2D format

In this question it is explained how to access the lower and upper triagular parts of a given matrix, say:
m = np.matrix([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])
Here I need to transform the matrix in a 1D array, which can be done doing:
indices = np.triu_indices_from(m)
a = np.asarray( m[indices] )[-1]
#array([11, 12, 13, 22, 23, 33])
After doing a lot of calculations with a, changing its values, it will be used to fill a symmetric 2D array:
new = np.zeros(m.shape)
for i,j in enumerate(zip(*indices)):
new[j]=a[i]
new[j[1],j[0]]=a[i]
Returning:
array([[ 11., 12., 13.],
[ 12., 22., 23.],
[ 13., 23., 33.]])
Is there a better way to accomplish this? More especifically, avoiding the Python loop to rebuild the 2D array?
The fastest and smartest way to put back a vector into a 2D symmetric array is to do this:
Case 1: No offset (k=0) i.e. upper triangle part includes the diagonal
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 0)]
print(v)
# [1 2 3 5 6 9]
# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 0)] = v
X = X + X.T - np.diag(np.diag(X))
#array([[1., 2., 3.],
# [2., 5., 6.],
# [3., 6., 9.]])
The above will work fine even if instead of numpy.array you use numpy.matrix.
Case 2: With offset (k=1) i.e. upper triangle part does NOT include the diagonal
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 1)] # offset
print(v)
# [2 3 6]
# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 1)] = v
X = X + X.T
#array([[0., 2., 3.],
# [2., 0., 6.],
# [3., 6., 0.]])
Do you just want to form a symmetric array? You can skip the diagonal indices completely.
m=np.array(m)
inds = np.triu_indices_from(m,k=1)
m[(inds[1], inds[0])] = m[inds]
m
array([[11, 12, 13],
[12, 22, 23],
[13, 23, 33]])
Creating a symmetric array from a:
new = np.zeros((3,3))
vals = np.array([11, 12, 13, 22, 23, 33])
inds = np.triu_indices_from(new)
new[inds] = vals
new[(inds[1], inds[0])] = vals
new
array([[ 11., 12., 13.],
[ 12., 22., 23.],
[ 13., 23., 33.]])
You can use Array Creation Routines such as numpy.triu, numpy.tril, and numpy.diag to create a symmetric matrix from a triangular. Here's a simple 3x3 example.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
a_triu = np.triu(a, k=0)
array([[1, 2, 3],
[0, 5, 6],
[0, 0, 9]])
a_tril = np.tril(a, k=0)
array([[1, 0, 0],
[4, 5, 0],
[7, 8, 9]])
a_diag = np.diag(np.diag(a))
array([[1, 0, 0],
[0, 5, 0],
[0, 0, 9]])
Add the transpose and subtract the diagonal:
a_sym_triu = a_triu + a_triu.T - a_diag
array([[1, 2, 3],
[2, 5, 6],
[3, 6, 9]])
a_sym_tril = a_tril + a_tril.T - a_diag
array([[1, 4, 7],
[4, 5, 8],
[7, 8, 9]])
Faster than the accepted solution for large matrices:
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
values = X[np.triu_indices(X.shape[0], k = 0)]
X2 = np.zeros_like(X)
triu_idx = np.triu_indices_from(X2)
X2[triu_idx], X2[triu_idx[::-1]] = values, values

Fastest way to check if two arrays have equivalent rows

I am trying to figure out a better way to check if two 2D arrays contain the same rows. Take the following case for a short example:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> b
array([[6, 7, 8],
[3, 4, 5],
[0, 1, 2]])
In this case b=a[::-1]. To check if two rows are equal:
>>>a=a[np.lexsort((a[:,0],a[:,1],a[:,2]))]
>>>b=b[np.lexsort((b[:,0],b[:,1],b[:,2]))]
>>> np.all(a-b==0)
True
This is great and fairly fast. However the issue comes about when two rows are "close":
array([[-1.57839867 2.355354 -1.4225235 ],
[-0.94728367 0. -1.4225235 ],
[-1.57839867 -2.355354 -1.4225215 ]]) <---note ends in 215 not 235
array([[-1.57839867 -2.355354 -1.4225225 ],
[-1.57839867 2.355354 -1.4225225 ],
[-0.94728367 0. -1.4225225 ]])
Within a tolerance of 1E-5 these two arrays are equal by row, but the lexsort will tell you otherwise. This can be solved by a different sorting order but I would like a more general case.
I was toying with the idea of:
a=a.reshape(-1,1,3)
>>> a-b
array([[[-6, -6, -6],
[-3, -3, -3],
[ 0, 0, 0]],
[[-3, -3, -3],
[ 0, 0, 0],
[ 3, 3, 3]],
[[ 0, 0, 0],
[ 3, 3, 3],
[ 6, 6, 6]]])
>>> np.all(np.around(a-b,5)==0,axis=2)
array([[False, False, True],
[False, True, False],
[ True, False, False]], dtype=bool)
>>>np.all(np.any(np.all(np.around(a-b,5)==0,axis=2),axis=1))
True
This doesn't tell you if the arrays are equal by row just if all points in b are close to a value in a. The number of rows can be several hundred and I need to do it quite a bit. Any ideas?
Your last code doesn't do what you think it is doing. What it tells you is whether every row in b is close to a row in a. If you change the axis you use for the outer calls to np.any and np.all, you could check whether every row in a is close to some row in b. If both every row in b is close to a row in a, and every row in a is close to a row in b, then the sets are equal. Probably not very computationally efficient, but probably very fast in numpy for moderately sized arrays:
def same_rows(a, b, tol=5) :
rows_close = np.all(np.round(a - b[:, None], tol) == 0, axis=-1)
return (np.all(np.any(rows_close, axis=-1), axis=-1) and
np.all(np.any(rows_close, axis=0), axis=0))
>>> rows, cols = 5, 3
>>> a = np.arange(rows * cols).reshape(rows, cols)
>>> b = np.arange(rows)
>>> np.random.shuffle(b)
>>> b = a[b]
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b
array([[ 9, 10, 11],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b)
True
>>> b[0] = b[1]
>>> b
array([[ 3, 4, 5],
[ 3, 4, 5],
[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
>>> same_rows(a, b) # not all rows in a are close to a row in b
False
And for not too big arrays, performance is reasonable, even though it is having to build an array of (rows, rows, cols):
In [2]: rows, cols = 1000, 10
In [3]: a = np.arange(rows * cols).reshape(rows, cols)
In [4]: b = np.arange(rows)
In [5]: np.random.shuffle(b)
In [6]: b = a[b]
In [7]: %timeit same_rows(a, b)
10 loops, best of 3: 103 ms per loop

How to delete columns in numpy.array

I would like to delete selected columns in a numpy.array . This is what I do:
n [397]: a = array([[ NaN, 2., 3., NaN],
.....: [ 1., 2., 3., 9]])
In [398]: print a
[[ NaN 2. 3. NaN]
[ 1. 2. 3. 9.]]
In [399]: z = any(isnan(a), axis=0)
In [400]: print z
[ True False False True]
In [401]: delete(a, z, axis = 1)
Out[401]:
array([[ 3., NaN],
[ 3., 9.]])
In this example my goal is to delete all the columns that contain NaN's. I expect the last command
to result in:
array([[2., 3.],
[2., 3.]])
How can I do that?
Given its name, I think the standard way should be delete:
import numpy as np
A = np.delete(A, 1, 0) # delete second row of A
B = np.delete(B, 2, 0) # delete third row of B
C = np.delete(C, 1, 1) # delete second column of C
According to numpy's documentation page, the parameters for numpy.delete are as follow:
numpy.delete(arr, obj, axis=None)
arr refers to the input array,
obj refers to which sub-arrays (e.g. column/row no. or slice of the array) and
axis refers to either column wise (axis = 1) or row-wise (axis = 0) delete operation.
Example from the numpy documentation:
>>> a = numpy.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> numpy.delete(a, numpy.s_[1:3], axis=0) # remove rows 1 and 2
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
>>> numpy.delete(a, numpy.s_[1:3], axis=1) # remove columns 1 and 2
array([[ 0, 3],
[ 4, 7],
[ 8, 11],
[12, 15]])
Another way is to use masked arrays:
import numpy as np
a = np.array([[ np.nan, 2., 3., np.nan], [ 1., 2., 3., 9]])
print(a)
# [[ NaN 2. 3. NaN]
# [ 1. 2. 3. 9.]]
The np.ma.masked_invalid method returns a masked array with nans and infs masked out:
print(np.ma.masked_invalid(a))
[[-- 2.0 3.0 --]
[1.0 2.0 3.0 9.0]]
The np.ma.compress_cols method returns a 2-D array with any column containing a
masked value suppressed:
a=np.ma.compress_cols(np.ma.masked_invalid(a))
print(a)
# [[ 2. 3.]
# [ 2. 3.]]
See
manipulating-a-maskedarray
This creates another array without those columns:
b = a.compress(logical_not(z), axis=1)
From Numpy Documentation
np.delete(arr, obj, axis=None)
Return a new array with sub-arrays along an axis deleted.
>>> arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> np.delete(arr, 1, 0)
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
>>> np.delete(arr, np.s_[::2], 1)
array([[ 2, 4],
[ 6, 8],
[10, 12]])
>>> np.delete(arr, [1,3,5], None)
array([ 1, 3, 5, 7, 8, 9, 10, 11, 12])
In your situation, you can extract the desired data with:
a[:, -z]
"-z" is the logical negation of the boolean array "z". This is the same as:
a[:, logical_not(z)]
>>> A = array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> A = A.transpose()
>>> A = A[1:].transpose()
Removing Matrix columns that contain NaN.
This is a lengthy answer, but hopefully easy to follow.
def column_to_vector(matrix, i):
return [row[i] for row in matrix]
import numpy
def remove_NaN_columns(matrix):
import scipy
import math
from numpy import column_stack, vstack
columns = A.shape[1]
#print("columns", columns)
result = []
skip_column = True
for column in range(0, columns):
vector = column_to_vector(A, column)
skip_column = False
for value in vector:
# print(column, vector, value, math.isnan(value) )
if math.isnan(value):
skip_column = True
if skip_column == False:
result.append(vector)
return column_stack(result)
### test it
A = vstack(([ float('NaN'), 2., 3., float('NaN')], [ 1., 2., 3., 9]))
print("A shape", A.shape, "\n", A)
B = remove_NaN_columns(A)
print("B shape", B.shape, "\n", B)
A shape (2, 4)
[[ nan 2. 3. nan]
[ 1. 2. 3. 9.]]
B shape (2, 2)
[[ 2. 3.]
[ 2. 3.]]

Categories

Resources