Python generating repmat using each column individually - python

I have an array of shape 3x3 which looks something like:
import numpy as np
A = np.array(([1,2,3],[11,12,5],[4,9,1]))
>>> A
array([[ 1, 2, 3],
[11, 12, 5],
[ 4, 9, 1]])
I want to repmat one column at a time for 3 times so that I can achieve the following:
B
array([[ 1, 1, 1, 2, 2, 2, 3, 3, 3],
[11, 11, 11, 12, 12, 12, 5, 5, 5],
[ 4, 4, 4, 9, 9, 9, 1, 1, 1]])
I can do a loop for each column and repmat that but I am looking for smarter way to do it as my real life array has size 5000x300

This is the job of numpy.repeat. Quoting an example from the docs:
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])

Related

Problem involving 'alphabetization' of sets of row elements

Consider a variable setSize (it can take value 2 or 3), and a numpy array v.
The number of columns in v is divisible by setSize. Here's a small sample:
import numpy as np
setSize = 2
# the array spaces are shown to emphasize that the rows
# are made up of sets having, in this case, 2 elements each.
v = np.array([[2,5, 3,5, 1,8],
[4,6, 2,7, 5,9],
[1,8, 2,3, 1,4],
[2,8, 1,4, 3,5],
[5,7, 2,3, 7,8],
[1,2, 4,6, 3,5],
[3,5, 2,8, 1,4]])
PROBLEM: For the rows that have all elements unique, I need to ALPHABETIZE the sets.
For example: set 1,14 would precede set 3,5, which would precede set 5,1.
As a final step, I need to eliminate any duplicated rows that may result.
In this example above, the array rows having indices 1,3,5,and 6 have unique elements,
so these rows must be alphabetized. The other rows are not changed.
Further, the rows v[3] and v[6], after alphabetization, are now identical. One of them may be dropped.
The final output looks like:
v = [[2,5, 3,5, 1,8],
[2,7, 4,6, 5,9],
[1,8, 2,3, 1,4],
[1,4, 2,8, 3,5],
[5,7, 2,3, 7,8],
[1,2, 3,5, 4,6]]
I can identify the rows having unique elements with code like below, but I stuck with the alphabetization code.
s = np.sort(v,axis=1)
v[(s[:,:-1] != s[:,1:]).all(1)]
Assuming you have unsuitable rows dropped with:
s = np.sort(v, axis=1)
idx = (s[:,:-1] != s[:,1:]).all(1)
w = v[idx]
Then you can get orders of each row with np.lexsort on a reshaped array:
w = w.reshape(-1,3,2)
s = np.lexsort((w[:,:,1], w[:,:,0]))
Then you can apply fancy indexing and reshape it back:
rows, orders = np.repeat(np.arange(len(s)), 3), s.flatten()
v[idx] = w[rows, orders].reshape((-1,6))
If you need to drop duplicated rows, you can do it like so:
u, idx = np.unique(v, return_index=True, axis=0)
output = v[np.sort(idx)]
Visualization of process:
Sample run:
>>> s
array([[1, 0, 2],
[1, 0, 2],
[0, 2, 1],
[2, 1, 0]], dtype=int64)
>>> rows
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3])
>>> orders
array([1, 0, 2, 1, 0, 2, 0, 2, 1, 2, 1, 0], dtype=int64)
>>> v[idx]
array([[2, 7, 4, 6, 5, 9],
[1, 4, 2, 8, 3, 5],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> v
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> output
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6]])

How to get the index of a particular row using column values in numpy?

So if I have the following array arr:
>>> arr
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
Now if I want to acquire the first row, I would do something like this :
>>> arr[0]
array([0, 1, 2, 3, 4])
However, when I use np.where to locate a particular row, e.g.:
>>> np.where(arr == [0,1,2,3,4])
I get this output!
(array([0, 0, 0, 0, 0, 3, 3, 3, 3, 3], dtype=int64),
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4], dtype=int64))
However, this is not what am after. I would like to get the row indices instead. e.g.:
(array([0, 3], dtype=int64)
Is there a way to achieve that? any advice is very much appreciated!
I think you want to check if the rows are equal to a given array. In which case, you need all:
np.where((arr == [0,1,2,3,4]).all(1))
# (array([0, 3]),)

Fastest numpy way to remove a list of cells from a 2d array

I have a very large 2D numpy array of m x n elements. For each row, I need to remove exactly one element. So for example from a 4x6 matrix I might need to delete [0, 1], [1, 4], [2, 3], and [3, 3] - I have this set of coordinates stored in a list. In the end, the matrix will ultimately shrink in width by 1.
Is there a standard way to do this using a mask? Ideally, I need this to be as performant as possible.
Here is a method that use ravel_multi_index() to calculate one-dim index, and then delete() the elements, and reshape back to two-dim array:
import numpy as np
n = 12
a = np.repeat(np.arange(10)[None, :], n, axis=0)
index = np.random.randint(0, 10, n)
ravel_index = np.ravel_multi_index((np.arange(n), index), a.shape)
np.delete(a, ravel_index).reshape(n, -1)
the index:
array([4, 6, 9, 0, 3, 5, 3, 8, 9, 8, 4, 4])
the result:
array([[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 4, 5, 6, 7, 8, 9]])

Get index of largest element for each submatrix in a Numpy 2D array

I have a 2D Numpy ndarray, x, that I need to split in square subregions of size s. For each subregion, I want to get the greatest element (which I do), and its position within that subregion (which I can't figure out).
Here is a minimal example:
>>> x = np.random.randint(0, 10, (6,8))
>>> x
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
>>> h, w = x.shape
>>> s = 2
>>> f = x.reshape(h//s, s, w//s, s)
>>> mx = np.max(f, axis=(1, 3))
>>> mx
array([[9, 9, 7, 5],
[7, 9, 6, 7],
[8, 9, 8, 6]])
For example, the 8 in the lower left corner of mx is the greatest element from subregion [[1,6], [8, 7]] in the lower left corner of x.
What I want is to get an array similar to mx, that keeps the indices of the largest elements, like this:
[[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]]
where, for example, the 2 in the lower left corner is the index of 8 in the linear representation of [[1, 6], [8, 7]].
I could do it like this: np.argmax(f[i, :, j, :]) and iterate over i and j, but the speed difference is enormous for large amounts of computation. To give you an idea, I'm trying to use (only) Numpy for max pooling. Basically, I'm asking if there is a faster alternative than what I'm using.
Here's one approach -
# Get shape of output array
m,n = np.array(x.shape)//s
# Reshape and permute axes to bring the block as rows
x1 = x.reshape(h//s, s, w//s, s).swapaxes(1,2).reshape(-1,s**2)
# Use argmax along each row and reshape to output shape
out = x1.argmax(1).reshape(m,n)
Sample input, output -
In [362]: x
Out[362]:
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
In [363]: out
Out[363]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])
Alternatively, to simplify things, we could use scikit-image that does the heavy work of reshaping and permuting axes for us -
In [372]: from skimage.util import view_as_blocks as viewB
In [373]: viewB(x, (s,s)).reshape(-1,s**2).argmax(1).reshape(m,n)
Out[373]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])

Rescale a numpy array

I have a 2D numpy array that represents a monochrome image from a CCD that has been binned 3x3 (that is, each value in the array represents 9 pixels (3x3) on the physical CCD).
I want to rescale it to match the original CCD layout (so I can easily overlay it with a non-binned image from the same CCD).
I saw Resampling a numpy array representing an image, but that doesn't seem to do what I want.
Suppose I have an array g:
import numpy as np
import scipy.ndimage
g = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
When I try to scale it by a factor of 2:
o = scipy.ndimage.zoom(g, 2, order=0)
I get exactly what I expect - each value is now 2x2 identical values:
array([[0, 0, 1, 1, 2, 2],
[0, 0, 1, 1, 2, 2],
[3, 3, 4, 4, 5, 5],
[3, 3, 4, 4, 5, 5],
[6, 6, 7, 7, 8, 8],
[6, 6, 7, 7, 8, 8]])
But when I try to scale by a factor of 3, I get this:
o = scipy.ndimage.zoom(g, 3, order=0)
Gives me:
array([[0, 0, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 1, 1, 1, 1, 2, 2, 2],
[3, 3, 4, 4, 4, 4, 5, 5, 5],
[3, 3, 4, 4, 4, 4, 5, 5, 5],
[3, 3, 4, 4, 4, 4, 5, 5, 5],
[3, 3, 4, 4, 4, 4, 5, 5, 5],
[6, 6, 7, 7, 7, 7, 8, 8, 8],
[6, 6, 7, 7, 7, 7, 8, 8, 8],
[6, 6, 7, 7, 7, 7, 8, 8, 8]])
I wanted each value in the original array to become a set of 3x3 values...that's not what I get.
How can I do it? (And why do I get this unintuitive result?)
You can use np.kron:
In [16]: g
Out[16]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [17]: np.kron(g, np.ones((3,3), dtype=int))
Out[17]:
array([[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4, 5, 5, 5],
[3, 3, 3, 4, 4, 4, 5, 5, 5],
[3, 3, 3, 4, 4, 4, 5, 5, 5],
[6, 6, 6, 7, 7, 7, 8, 8, 8],
[6, 6, 6, 7, 7, 7, 8, 8, 8],
[6, 6, 6, 7, 7, 7, 8, 8, 8]])
The output of zoom(g, 3, order=0) is a bit surprising. Consider the first row: [0, 0, 1, 1, 1, 1, 2, 2, 2]. Why are there four 1s?
When order=0 zoom (in effect) computes np.linspace(0, 2, 9), which looks like
In [80]: np.linspace(0, 2, 9)
Out[80]: array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
and then rounds the values. If you use np.round(), you get:
In [71]: np.round(np.linspace(0, 2, 9)).astype(int)
Out[71]: array([0, 0, 0, 1, 1, 1, 2, 2, 2])
Note that np.round(0.5) gives 0, but np.round(1.5) gives 2. np.round() uses the "round half to even" tie-breaking rule. Apparently the rounding done in the zoom code uses the "round half down" rule: it rounds 0.5 to 0 and 1.5 to 1, as in the following
In [81]: [int(round(x)) for x in np.linspace(0, 2, 9)]
Out[81]: [0, 0, 1, 1, 1, 1, 2, 2, 2]
and that's why there are four 1s in there.
And why do I get this unintuitive result?
Because zoom is a spline interpolation function. In other words, it draws a cubic spline from the midpoint of that 1 to the midpoint of that 0, and the values in between get the values of the spline at the appropriate location.
If you want nearest, linear or quadratic interpolation instead of cubic, you can use the order=0 or order=1 or order=2 argument. But if you don't want interpolation at all—which you don't—don't use an interpolation function. This is like asking why using [int(i*2.3) for i in range(10)] to get even numbers from 0 to 20 gives you some odd numbers. It's not a function to get even numbers from 0 to 20, so it doesn't do that, but it does exactly what you asked it to.
How can I do it?
Again, if you want non-interpolated scaling, don't use an interpolation function. The simplest way is probably to use np.kron, to Kroenecker-multiply your array with np.ones((scale, scale)).

Categories

Resources