Numpy: how delete rows common to 2 matrices

Numpy: how delete rows common to 2 matrices - python

problem is very simple: I have two 2d np.array and I want to get a third array that only contains the rows that are not in common with the latter twos.
for example:
X = np.array([[0,1],[1,2],[4,5],[5,6],[8,9],[9,10]])
Y = np.array([[5,6],[9,10]])
Z = function(X,Y)
Z = array([[0, 1],
[1, 2],
[4, 5],
[8, 9]])
I tried np.delete(X,Y,axis=0) but it doesn't work...

Z = np.vstack(row for row in X if row not in Y)

The numpy_indexed package (disclaimer: I am its author) extends the standard numpy array set operations to multi-dimensional use cases such as these, with good efficiency:
import numpy_indexed as npi
Z = npi.difference(X, Y)

Here's a views based approach -
# Based on http://stackoverflow.com/a/41417343/3293881 by #Eric
def setdiff2d(a, b):
# check that casting to void will create equal size elements
assert a.shape[1:] == b.shape[1:]
assert a.dtype == b.dtype
# compute dtypes
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
orig_dt = np.dtype((a.dtype, a.shape[1:]))
# convert to 1d void arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
a_void = a.reshape(a.shape[0], -1).view(void_dt)
b_void = b.reshape(b.shape[0], -1).view(void_dt)
# Get indices in a that are also in b
return np.setdiff1d(a_void, b_void).view(orig_dt)
Sample run -
In [81]: X
Out[81]:
array([[ 0, 1],
[ 1, 2],
[ 4, 5],
[ 5, 6],
[ 8, 9],
[ 9, 10]])
In [82]: Y
Out[82]:
array([[ 5, 6],
[ 9, 10]])
In [83]: setdiff2d(X,Y)
Out[83]:
array([[0, 1],
[1, 2],
[4, 5],
[8, 9]])

Z = np.unique([tuple(row) for row in X + Y])

Related

element wise multiplication of a vector and a matrix with numpy

Given python code with numpy:
import numpy as np
a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.shape = (3, 2)
b = np.arange(3) + 1 # b = [1, 2, 3] ; b.shape = (3,)
How can I multiply each value in b with each corresponding row ('vector') in a? So here, I want the result as:
result = [[0, 1], [4, 6], [12, 15]] # result.shape = (3, 2)
I can do this with a loop, but I am wondering about a vectorized approach. I found an Octave solution here. Apart from this, I didn't find anything else. Any pointers for this?
Thank you in advance.

Probably the simplest is to do the following.
import numpy as np
a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.shape = (3, 2)
b = np.arange(3) + 1
ans = np.diag(b)#a
Here's a method that exploits numpy multiplication broadcasting:
ans = (b*a.T).T
These two solutions basically take the same approach
ans = np.tile(b,(2,1)).T*a
ans = np.vstack([b for _ in range(a.shape[1])]).T*a

In [123]: a = np.arange(6).reshape(3, 2) # a = [[0, 1], [2, 3], [4, 5]]; a.
...: shape = (3, 2)
...: b = np.arange(3) + 1 # b = [1, 2, 3] ; b.
...: shape = (3,)
In [124]: a
Out[124]:
array([[0, 1],
[2, 3],
[4, 5]])
A (3,1) will multiply a (3,2) via broadcasting:
In [125]: a*b[:,None]
Out[125]:
array([[ 0, 1],
[ 4, 6],
[12, 15]])

Numpy: Applying a formula with elements of each row of an array

I have a multidimensional numpy array called k. Each row represents variables and i have the formula
How can I have a numpy array where every row (depending of the number of columns, this is just an example) has been processed by this formula?
My desired output is something like this:
or
[[12][12][4]]

You could use apply_along_axis.
import numpy as np
k = [[4, 2, 6], [5, 2, 9], [10, 3, 7]]
k = np.array(k)
def function(m):
x = m[0]
y = m[1]
z = m[2]
return ((4*z)/(x-y))
result = np.apply_along_axis(function, 1, k)
print(result)

Since these are numpy arrays, you can use array operations to solve all of these together without needing loops:
import numpy as np
k = [[4, 2, 6], [5, 2, 9], [10, 3, 7]]
k = np.array(k)
t = k.transpose()
x, y, z = t
print((4*z)/(x-y))
Output:
[12. 12. 4.]
Putting that in a function:
def function(m):
x, y, z = m.transpose()
return (4*z)/(x-y)
And if you want it as 3 single-item arrays in an array, put this reshape as the last part of the function:
a = (4*z)/(x-y)
print(a.reshape(3, 1))
Output:
[[12.]
[12.]
[ 4.]]

you can try this:
import numpy as np
# [(x1, y1, z1), (x2, y2, z2) ...)]
k = [[4, 2, 6], [5, 2, 9], [10, 3, 7]]
k = np.asarray(k)
x = k[:, 0]
y = k[:, 1]
z = k[:, 2]
out = np.divide(4*z, x-y)
# out = (4*z) / (x-y)
print(out)

How to combine a vector with a matrix in numpy [duplicate]

Suppose that I have define one 2x2 matrix using numpy:
array([[1, 2],
[2, 3]])
Now the other 2x1 matrix:
array([[3],
[4]])
How can I concatenate these 2 matrix by column, so that it will become:
array([[1, 2, 3],
[2, 3, 4]])
And how can I also delete the specify column, so that it will became
array([[1],
[2]])

There is a numpy.concatenate method
import numpy as np
np.concatenate( [ np.array( [ [1,2], [2,3] ] ), np.array( [ [3],[4] ] ) ] , axis = 1)
or simply use hstack or vstack
np.hstack( [ np.array( [ [1,2], [2,3] ] ), np.array( [ [3],[4] ] ) ] )
These can be also used to remove the column (concatenate two subarrays) - this can be used to remove many columns.
To remove i'th column you can take subarrays to this column, and from the next one, and concatenate them. For example, to remove second column (index 1):
a - np.array( [ [1,2,3], [2,3,4] ] )
a1= a[:,:1]
a2= a[:,2:]
np.hstack([a1,a2])
so in general
def remove_column( a, i ):
return np.hstack( [a[:,:i], a[:,(i+1):] ] )
and then
>>> remove_column(a, 1)
array([[1, 3],
[2, 4]])
>>> remove_column(a, 0)
array([[2, 3],
[3, 4]])
Actually, as pointed out in the comment - numpy implements its own delete method
np.delete(a, 1, 1)
deleted second column
and deleting multiple ones can be performed using
np.delete(a, [column1, columne2, ..., columnK], 1)
The third argument is the axis specifier, 0 would imply rows, 1 columns, None flatterns the whole array

You can use numpy.hstack:
>>> import numpy as np
>>> a = np.array([[1,2], [2,3]])
>>> b = np.array([[3], [4]])
>>> np.hstack((a,b))
array([[1, 2, 3],
[2, 3, 4]])
Removing is even easier, just slice:
>>> c = a[:,:1]
array([[1],
[2]])

In [3]: x = np.array([[1, 2], [2, 3]]
In [4]: y = np.array([[3], [4]])
In [9]: z = np.hstack([x, y])
In [10]: z
Out[10]:
array([[1, 2, 3],
[2, 3, 4]])
In [11]: z[:,:1]
array([[1],
[2]])

Cycling Slicing in Python

I've come up with this question while trying to apply a Cesar Cipher to a matrix with different shift values for each row, i.e. given a matrix X
array([[1, 0, 8],
[5, 1, 4],
[2, 1, 1]])
with shift values of S = array([0, 1, 1]), the output needs to be
array([[1, 0, 8],
[1, 4, 5],
[1, 1, 2]])
This is easy to implement by the following code:
Y = []
for i in range(X.shape[0]):
if (S[i] > 0):
Y.append( X[i,S[i]::].tolist() + X[i,:S[i]:].tolist() )
else:
Y.append(X[i,:].tolist())
Y = np.array(Y)
This is a left-cycle-shift. I wonder how to do this in a more efficient way using numpy arrays?
Update: This example applies the shift to the columns of a matrix. Suppose that we have a 3D array
array([[[8, 1, 8],
[8, 6, 2],
[5, 3, 7]],
[[4, 1, 0],
[5, 9, 5],
[5, 1, 7]],
[[9, 8, 6],
[5, 1, 0],
[5, 5, 4]]])
Then, the cyclic right shift of S = array([0, 0, 1]) over the columns leads to
array([[[8, 1, 7],
[8, 6, 8],
[5, 3, 2]],
[[4, 1, 7],
[5, 9, 0],
[5, 1, 5]],
[[9, 8, 4],
[5, 1, 6],
[5, 5, 0]]])

Approach #1 : Use modulus to implement the cyclic pattern and get the new column indices and then simply use advanced-indexing to extract the elements, giving us a vectorized solution, like so -
def cyclic_slice(X, S):
m,n = X.shape
idx = np.mod(np.arange(n) + S[:,None],n)
return X[np.arange(m)[:,None], idx]
Approach #2 : We can also leverage the power of strides for further speedup. The idea would be to concatenate the sliced off portion from the start and append it at the end, then create sliding windows of lengths same as the number of cols and finally index into the appropriate window numbers to get the same rolled over effect. The implementation would be like so -
def cyclic_slice_strided(X, S):
X2 = np.column_stack((X,X[:,:-1]))
s0,s1 = X2.strides
strided = np.lib.stride_tricks.as_strided
m,n1 = X.shape
n2 = X2.shape[1]
X2_3D = strided(X2, shape=(m,n2-n1+1,n1), strides=(s0,s1,s1))
return X2_3D[np.arange(len(S)),S]
Sample run -
In [34]: X
Out[34]:
array([[1, 0, 8],
[5, 1, 4],
[2, 1, 1]])
In [35]: S
Out[35]: array([0, 1, 1])
In [36]: cyclic_slice(X, S)
Out[36]:
array([[1, 0, 8],
[1, 4, 5],
[1, 1, 2]])
Runtime test -
In [75]: X = np.random.rand(10000,100)
...: S = np.random.randint(0,100,(10000))
# #Moses Koledoye's soln
In [76]: %%timeit
...: Y = []
...: for i, x in zip(S, X):
...: Y.append(np.roll(x, -i))
10 loops, best of 3: 108 ms per loop
In [77]: %timeit cyclic_slice(X, S)
100 loops, best of 3: 14.1 ms per loop
In [78]: %timeit cyclic_slice_strided(X, S)
100 loops, best of 3: 4.3 ms per loop
Adaption for 3D case
Adapting approach #1 for the 3D case, we would have -
shift = 'left'
axis = 1 # axis along which S is to be used (axis=1 for rows)
n = X.shape[axis]
if shift == 'left':
Sa = S
else:
Sa = -S
# For rows
idx = np.mod(np.arange(n)[:,None] + Sa,n)
out = X[:,idx, np.arange(len(S))]
# For columns
idx = np.mod(Sa[:,None] + np.arange(n),n)
out = X[:,np.arange(len(S))[:,None], idx]
# For axis=0
idx = np.mod(np.arange(n)[:,None] + Sa,n)
out = X[idx, np.arange(len(S))]
There could be a way to have a generic solution for a generic axis, but I will keep it to this point.

You could shift each row using np.roll and use the new rows to build the output array:
Y = []
for i, x in zip(S, X):
Y.append(np.roll(x, -i))
print(np.array(Y))
array([[1, 0, 8],
[1, 4, 5],
[1, 1, 2]])

numpy: how to get a max from an argmax result

I have a numpy array of arbitrary shape, e.g.:
a = array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
a.shape = (2, 3, 2)
and a result of argmax over the last axis:
np.argmax(a, axis=-1) = array([[1, 1, 0],
[1, 0, 1]])
I'd like to get max:
np.max(a, axis=-1) = array([[ 2, 4, 8],
[ 8, 9, 12]])
But without recalculating everything. I've tried:
a[np.arange(len(a)), np.argmax(a, axis=-1)]
But got:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,3)
How to do it? Similar question for 2-d: numpy 2d array max/argmax

You can use advanced indexing -
In [17]: a
Out[17]:
array([[[ 1, 2],
[ 3, 4],
[ 8, 6]],
[[ 7, 8],
[ 9, 8],
[ 3, 12]]])
In [18]: idx = a.argmax(axis=-1)
In [19]: m,n = a.shape[:2]
In [20]: a[np.arange(m)[:,None],np.arange(n),idx]
Out[20]:
array([[ 2, 4, 8],
[ 8, 9, 12]])
For a generic ndarray case of any number of dimensions, as stated in the comments by #hpaulj, we could use np.ix_, like so -
shp = np.array(a.shape)
dim_idx = list(np.ix_(*[np.arange(i) for i in shp[:-1]]))
dim_idx.append(idx)
out = a[dim_idx]

For ndarray with arbitrary shape, you can flatten the argmax indices, then recover the correct shape, as so:
idx = np.argmax(a, axis=-1)
flat_idx = np.arange(a.size, step=a.shape[-1]) + idx.ravel()
maximum = a.ravel()[flat_idx].reshape(*a.shape[:-1])

For arbitrary-shape arrays, the following should work :)
a = np.arange(5 * 4 * 3).reshape((5,4,3))
# for last axis
argmax = a.argmax(axis=-1)
a[tuple(np.indices(a.shape[:-1])) + (argmax,)]
# for other axis (eg. axis=1)
argmax = a.argmax(axis=1)
idx = list(np.indices(a.shape[:1]+a.shape[2:]))
idx[1:1] = [argmax]
a[tuple(idx)]
or
a = np.arange(5 * 4 * 3).reshape((5,4,3))
argmax = a.argmax(axis=0)
np.choose(argmax, np.moveaxis(a, 0, 0))
argmax = a.argmax(axis=1)
np.choose(argmax, np.moveaxis(a, 1, 0))
argmax = a.argmax(axis=2)
np.choose(argmax, np.moveaxis(a, 2, 0))
argmax = a.argmax(axis=-1)
np.choose(argmax, np.moveaxis(a, -1, 0))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: how delete rows common to 2 matrices - python

Z = np.vstack(row for row in X if row not in Y)

The numpy_indexed package (disclaimer: I am its author) extends the standard numpy array set operations to multi-dimensional use cases such as these, with good efficiency: import numpy_indexed as npi Z = npi.difference(X, Y)

Z = np.unique([tuple(row) for row in X + Y])

Related

element wise multiplication of a vector and a matrix with numpy

Numpy: Applying a formula with elements of each row of an array

How to combine a vector with a matrix in numpy [duplicate]

Cycling Slicing in Python

numpy: how to get a max from an argmax result

Categories

Resources