Eliminating array rows that do not meet a matching criterion

Eliminating array rows that do not meet a matching criterion - python

Consider an array, M, made up of pairs of elements. (I've used spaces to emphasize that we will be dealing with element PAIRS). The actual arrays will have a large number of rows, and 4,6,8 or 10 columns.
import numpy as np
M = np.array([[1,3, 2,1, 4,2, 3,3],
[3,5, 6,9, 5,1, 3,4],
[1,3, 2,4, 3,4, 7,2],
[4,5, 1,2, 2,1, 2,3],
[6,4, 4,1, 6,1, 4,7],
[6,7, 7,6, 9,7, 6,2],
[5,3, 1,5, 3,3, 3,3]])
PROBLEM: I want to eliminate rows from M having an element pair that has no common elements with any of the other pairs in that row.
In array M, the 2nd row and the 4th row should be eliminated. Here's why:
2nd row: the pair (6,9) has no common element with (3,5), (5,1), or (3,4)
4th row: the pair (4,5) has no common element with (1,2), (2,1), or (2,3)
I'm sure there's a nice broadcasting solution, but I can't see it.

This is a broadcasting solution. Hope it's self-explained:
a = M.reshape(M.shape[0],-1,2)
mask = ~np.eye(a.shape[1], dtype=bool)[...,None]
is_valid = (((a[...,None,:]==a[:,None,...])&mask).any(axis=(-1,-2))
|((a[...,None,:]==a[:,None,:,::-1])&mask).any(axis=(-1,-2))
).all(-1)
M[is_valid]
Output:
array([[1, 3, 2, 1, 4, 2, 3, 3],
[1, 3, 2, 4, 3, 4, 7, 2],
[6, 4, 4, 1, 6, 1, 4, 7],
[6, 7, 7, 6, 9, 7, 6, 2],
[5, 3, 1, 5, 3, 3, 3, 3]])

Another way of solving this would be the following -
M = np.array([[1,3, 2,1, 4,2, 3,3],
[3,5, 6,9, 5,1, 3,4],
[1,3, 2,4, 3,4, 7,2],
[4,5, 1,2, 2,1, 2,3],
[6,4, 4,1, 6,1, 4,7],
[6,7, 7,6, 9,7, 6,2],
[5,3, 1,5, 3,3, 3,3]])
MM = M.reshape(M.shape[0],-1,2)
matches_M = np.any(MM[:,:,None,:,None] == MM[:,None,:,None,:], axis=(-1,-2))
mask = ~np.eye(MM.shape[1], dtype=bool)[None,:]
is_valid = np.all(np.any(matches_M&mask, axis=-1), axis=-1)
M[is_valid]
array([[1, 3, 2, 1, 4, 2, 3, 3],
[1, 3, 2, 4, 3, 4, 7, 2],
[6, 4, 4, 1, 6, 1, 4, 7],
[6, 7, 7, 6, 9, 7, 6, 2],
[5, 3, 1, 5, 3, 3, 3, 3]])

Related

Creating shifted Hankel matrix

Say I have some time-series data in the form of a simple array.
X1 = np.array[(1, 2, 3, 4]
The Hankel matrix can be obtained by using scipy.linalg.hankel, which would look something like this:
hankel(X1)
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[3, 4, 0, 0],
[4, 0, 0, 0]])
Now assume I had a larger array in the form of
X2 = np.array([1, 2, 3, 4, 5, 6, 7])
What I want to do is fill in the zeros in this matrix with the numbers that are next in the index (specific to each row). Taking the same Hankel matrix earlier by using the first four values in the array X2, I'd like to see the following output:
hankel(X2[:4])
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]])
How would I do this? I'd ideally like to use this for larger data.
Appreciate any tips or pointers given. Thanks!

If you have a matrix with the appropriate index values into your dataset, you can use integer array indexing directly into your dataset.
To create the index matrix, you can simply use the upper-left quadrant of a double-sized Hankel array. There are likely simpler ways to create the index matrix, but this does the trick.
>>> X = np.array([9, 8, 7, 6, 5, 4, 3])
>>> N = 4 # the size of the "window"
>>> indices = scipy.linalg.hankel(np.arange(N*2))[:N, :N]
>>> indices
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
>>> X[indices]
array([[9, 8, 7, 6],
[8, 7, 6, 5],
[7, 6, 5, 4],
[6, 5, 4, 3]])

numpy.roll horizontally on a 2D ndarray with different values

Doing np.roll(a, 1, axis = 1) on:
a = np.array([
[6, 3, 9, 2, 3],
[1, 7, 8, 1, 2],
[5, 4, 2, 2, 4],
[3, 9, 7, 6, 5],
])
results in the correct:
array([
[3, 6, 3, 9, 2],
[2, 1, 7, 8, 1],
[4, 5, 4, 2, 2],
[5, 3, 9, 7, 6]
])
The documentation says:
If a tuple, then axis must be a tuple of the same size, and each of the given axes is shifted by the corresponding number.
Now I like to roll rows of a by different values, like [1,2,1,3] meaning, first row will be rolled by 1, second by 2, third by 1 and forth by 3. But np.roll(a, [1,2,1,3], axis=(1,1,1,1)) doesn't seem to do it. What would be the correct interpretation of the sentence in the docs?

By specifying a tuple in np.roll you can roll an array along various axes. For example, np.roll(a, (3,2), axis=(0,1)) will shift each element of a by 3 places along axis 0, and it will also shift each element by 2 places along axis 1. np.roll does not have an option to roll each row by a different amount. You can do it though for example as follows:
import numpy as np
a = np.array([
[6, 3, 9, 2, 3],
[1, 7, 8, 1, 2],
[5, 4, 2, 2, 4],
[3, 9, 7, 6, 5],
])
shifts = np.c_[[1,2,1,3]]
a[np.c_[:a.shape[0]], (np.r_[:a.shape[1]] - shifts) % a.shape[1]]
It gives:
array([[3, 6, 3, 9, 2],
[1, 2, 1, 7, 8],
[4, 5, 4, 2, 2],
[7, 6, 5, 3, 9]])

swap two elements in 2d array

I have an array of the shape (10296, 6). I want to swap the two last elements in the subarray.
a = [[1, 2, 3, 4, 5, 6][1, 2, 3, 4, 5, 6]...
So that 5 and 6 of each array is swapped into:
a = [[1, 2, 3, 4, 6, 5][1, 2, 3, 4, 6, 5]...

Try advanced slicing in numpy. Read more here -
import numpy as np
a = np.array([[1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6]])
a[:,[4, 5]] = a[:,[5, 4]]
array([[1, 2, 3, 4, 6, 5],
[1, 2, 3, 4, 6, 5]])

Problem involving 'alphabetization' of sets of row elements

Consider a variable setSize (it can take value 2 or 3), and a numpy array v.
The number of columns in v is divisible by setSize. Here's a small sample:
import numpy as np
setSize = 2
# the array spaces are shown to emphasize that the rows
# are made up of sets having, in this case, 2 elements each.
v = np.array([[2,5, 3,5, 1,8],
[4,6, 2,7, 5,9],
[1,8, 2,3, 1,4],
[2,8, 1,4, 3,5],
[5,7, 2,3, 7,8],
[1,2, 4,6, 3,5],
[3,5, 2,8, 1,4]])
PROBLEM: For the rows that have all elements unique, I need to ALPHABETIZE the sets.
For example: set 1,14 would precede set 3,5, which would precede set 5,1.
As a final step, I need to eliminate any duplicated rows that may result.
In this example above, the array rows having indices 1,3,5,and 6 have unique elements,
so these rows must be alphabetized. The other rows are not changed.
Further, the rows v[3] and v[6], after alphabetization, are now identical. One of them may be dropped.
The final output looks like:
v = [[2,5, 3,5, 1,8],
[2,7, 4,6, 5,9],
[1,8, 2,3, 1,4],
[1,4, 2,8, 3,5],
[5,7, 2,3, 7,8],
[1,2, 3,5, 4,6]]
I can identify the rows having unique elements with code like below, but I stuck with the alphabetization code.
s = np.sort(v,axis=1)
v[(s[:,:-1] != s[:,1:]).all(1)]

Assuming you have unsuitable rows dropped with:
s = np.sort(v, axis=1)
idx = (s[:,:-1] != s[:,1:]).all(1)
w = v[idx]
Then you can get orders of each row with np.lexsort on a reshaped array:
w = w.reshape(-1,3,2)
s = np.lexsort((w[:,:,1], w[:,:,0]))
Then you can apply fancy indexing and reshape it back:
rows, orders = np.repeat(np.arange(len(s)), 3), s.flatten()
v[idx] = w[rows, orders].reshape((-1,6))
If you need to drop duplicated rows, you can do it like so:
u, idx = np.unique(v, return_index=True, axis=0)
output = v[np.sort(idx)]
Visualization of process:
Sample run:
>>> s
array([[1, 0, 2],
[1, 0, 2],
[0, 2, 1],
[2, 1, 0]], dtype=int64)
>>> rows
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3])
>>> orders
array([1, 0, 2, 1, 0, 2, 0, 2, 1, 2, 1, 0], dtype=int64)
>>> v[idx]
array([[2, 7, 4, 6, 5, 9],
[1, 4, 2, 8, 3, 5],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> v
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> output
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6]])

Get index of largest element for each submatrix in a Numpy 2D array

I have a 2D Numpy ndarray, x, that I need to split in square subregions of size s. For each subregion, I want to get the greatest element (which I do), and its position within that subregion (which I can't figure out).
Here is a minimal example:
>>> x = np.random.randint(0, 10, (6,8))
>>> x
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
>>> h, w = x.shape
>>> s = 2
>>> f = x.reshape(h//s, s, w//s, s)
>>> mx = np.max(f, axis=(1, 3))
>>> mx
array([[9, 9, 7, 5],
[7, 9, 6, 7],
[8, 9, 8, 6]])
For example, the 8 in the lower left corner of mx is the greatest element from subregion [[1,6], [8, 7]] in the lower left corner of x.
What I want is to get an array similar to mx, that keeps the indices of the largest elements, like this:
[[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]]
where, for example, the 2 in the lower left corner is the index of 8 in the linear representation of [[1, 6], [8, 7]].
I could do it like this: np.argmax(f[i, :, j, :]) and iterate over i and j, but the speed difference is enormous for large amounts of computation. To give you an idea, I'm trying to use (only) Numpy for max pooling. Basically, I'm asking if there is a faster alternative than what I'm using.

Here's one approach -
# Get shape of output array
m,n = np.array(x.shape)//s
# Reshape and permute axes to bring the block as rows
x1 = x.reshape(h//s, s, w//s, s).swapaxes(1,2).reshape(-1,s**2)
# Use argmax along each row and reshape to output shape
out = x1.argmax(1).reshape(m,n)
Sample input, output -
In [362]: x
Out[362]:
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
In [363]: out
Out[363]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])
Alternatively, to simplify things, we could use scikit-image that does the heavy work of reshaping and permuting axes for us -
In [372]: from skimage.util import view_as_blocks as viewB
In [373]: viewB(x, (s,s)).reshape(-1,s**2).argmax(1).reshape(m,n)
Out[373]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Eliminating array rows that do not meet a matching criterion - python

Related

Creating shifted Hankel matrix

numpy.roll horizontally on a 2D ndarray with different values

swap two elements in 2d array

Problem involving 'alphabetization' of sets of row elements

Get index of largest element for each submatrix in a Numpy 2D array

Categories

Resources