Slice multidimensional numpy array from max in a given axis - python

I have a 3-dimensional array a of shape (n, m, l). I extract one column j from it's last axis and compute the maximum index along the first axis like follows:
sub = a[:, :, j] # shape (n, m)
wheremax = np.argmax(sub, axis=0) # this have a shape of m
Now I'd like to slice the original array a to get all the information based on the index where the column j is maximal. I.e. I'd like an numpythonic way to do the following using array broadcasting or numpy functions:
new_arr = np.zeros((m, l))
for i, idx in enumerate(wheremax):
new_arr[i, :] = a[idx, i, :]
a = new_arr
Is there one?

As #hpaulj mentionned in the comments, using a[wheremax, np.arange(m)] did the trick.

Related

NumPy Array Conversion

I'm new to python and I'm trying to convert a (m,n,1) multidimensional array to (m,n) in a fast way, how can I go about it?
Also given a (m,n,k) array how can I split it to k (m,n) arrays? (each of the k members belongs to a different array)
To reshape array a you can use a.reshape(m,n).
To split array a along the depth dimension, you can use numpy.dsplit(a, a.shape[2]).
https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dsplit.html#numpy.dsplit
To reshape a NumPy Array arr with shape (m, n, 1) to the shape (m, n) simply use:
arr = arr.reshape(m, n)
You can get a list of (m, n)-shaped arrays out of a (m, n, k) shaped array arr_k by:
array_list = [arr_k[:, :, i] for i in range(arr_k.shape[2])]

Permute rows in "slices" of 3d array to match each other

I have a series of 2d arrays where the rows are points in some space. Many similar points occur across all arrays but in different row order. I want to sort the rows so they have the most similar order. Also the points are too different for clustering with K-means or DBSCAN. The problem can also be cast like this. If I stack the arrays into a 3d array, how do I permute the rows to minimize the average standard deviation (SD) along the 2nd axis? What's a good sorting algorithm for this problem?
I've tried the following approaches.
Create a set of reference 2d array and sort rows in each array to minimize mean euclidean distances to the reference 2d array. This I am afraid gives biased results.
Sort rows in arrays pairwise, then pairs of pair-medians, then pairs of that, etc... This doesn't really work and I'm not sure why.
A third approach could be just brute force optimization but I try to avoid that since I have multiple sets of arrays to perform the procedure on.
This is my code for the 2nd approach (Python):
def reorder_to(A, B):
"""Reorder rows in A to best match rows in B.
Input
-----
A : N x M numpy.array
B : N x M numpy.array
Output
------
perm_order : permutation order
"""
if A.shape != B.shape:
print "A and B must have the same shape"
return None
N = A.shape[0]
# Create a distance matrix of distance between rows in A and B
distance_matrix = np.ones((N, N))*np.inf
for i, a in enumerate(A):
for ii, b in enumerate(B):
ba = (b-a)
distance_matrix[i, ii] = np.sqrt(np.dot(ba, ba))
# Choose permutation order by smallest distances first
perm_order = [[] for _ in range(N)]
for _ in range(N):
ind = np.argmin(distance_matrix)
i, ii = ind/N, ind%N
perm_order[ii] = i
distance_matrix[i, :] = np.inf
distance_matrix[:, ii] = np.inf
return perm_order
def permute_tensor_rows(A):
"""Permute 1d rows in 3d array along the 0th axis to minimize average SD along 2nd axis.
Input
-----
A : numpy.3darray
Each "slice" in the 2nd direction is an independent array whose rows can be permuted
to decrease the average SD in the 2nd direction.
Output
------
A : numpy.3darray
A with sorted rows in each "slice".
"""
step = 2
while step <= A.shape[2]:
for k in range(0, A.shape[2], step):
# If last, reorder to previous
if k + step > A.shape[2]:
A_kk = A[:, :, k:(k+step)]
kk_order = reorder_to(np.median(A_kk, axis=2), np.median(A_k, axis=2))
A[:, :, k:(k+step)] = A[kk_order, :, k:(k+step)]
continue
k_0, k_1 = k, k+step/2
kk_0, kk_1 = k+step/2, k+step
A_k = A[:, :, k_0:k_1]
A_kk = A[:, :, kk_0:kk_1]
order = reorder_to(np.median(A_k, axis=2), np.median(A_kk, axis=2))
A[:, :, k_0:k_1] = A[order, :, k_0:k_1]
print "Step:", step, "\t ... Average SD:", np.mean(np.std(A, axis=2))
step *= 2
return A
Sorry I should have looked at your code sample; that was very informative.
Seems like this here gives an out-of-the-box solution to your problem:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html#scipy.optimize.linear_sum_assignment
Only really feasible for a few 100 points at most though, in my experience.

removing entries from a numpy array

I have a multidimensional numpy array with the shape (4, 2000). Each column in the array is a 4D element where the first two elements represent 2D positions.
Now, I have an image mask with the same shape as an image which is binary and tells me which pixels are valid or invalid. An entry of 0 in the mask highlights pixels that are invalid.
Now, I would like to do is filter my first array based on this mask i.e. remove entries where the position elements in my first array correspond to invalid pixels in the image. This can be done by looking up the corresponding entries in the mask and marking those columns to be deleted which correspond to a 0 entry in the mask.
So, something like:
import numpy as np
# Let mask be a 2D array of 0 and 1s
array = np.random.rand(4, 2000)
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] <= 0:
# Somehow remove this entry from my array.
If possible, I would like to do this without looping as I have in my incomplete code.
You could select the x and y coordinates from array like this:
xarr, yarr = array[0, :], array[1, :]
Then form a boolean array of shape (2000,) which is True wherever the mask is 1:
idx = mask[xarr, yarr].astype(bool)
mask[xarr, yarr] is using so-called "integer array indexing".
All it means here is that the ith element of idx equals mask[xarr[i], yarr[i]].
Then select those columns from array:
result = array[:, idx]
import numpy as np
mask = np.random.randint(2, size=(500,500))
array = np.random.randint(500, size=(4, 2000))
xarr, yarr = array[0, :], array[1, :]
idx = mask[xarr, yarr].astype(bool)
result = array[:, idx]
cols = []
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] > 0:
cols.append(i)
expected = array[:, cols]
assert np.allclose(result, expected)
I'm not sure if I'm reading the question right. Let's try again!
You have an array with 2 dimensions and you want to remove all columns that have masked data. Again, apologies if I've read this wrong.
import numpy.ma as ma
a = ma.array((([[1,2,3,4,5],[6,7,8,9,10]]),mask=[[0,0,0,1,0],[0,0,1,0,0]])
a[:,-a.mask.any(0)] # this is where the action happens
the a.mask.any(0) identifies all columns that are masked into a Boolean array. It's negated (the '-' sign) because we want the inverse, and then it uses that array to remove all masked values via indexing.
This gives me an array:
[[1 2 5],[6 7 10]]
In other words, the array has all removed all columns with masked data anywhere. Hope I got it right this time.

How to generate multi-dimensional 2D numpy index using a sub-index for one dimension

I want to use numpy.ix_ to generate an multi-dimensional index for a 2D space of values. However, I need to use a subindex to look up the indices for one dimension. For example,
assert subindex.shape == (ny, nx)
data = np.random.random(size=(ny,nx))
# Generator returning the index tuples
def get_idx(ny,nx,subindex):
for y in range(ny):
for x in range(nx):
yi = y # This is easy
xi = subindex[y,x] # Get the second index value from the subindex
yield (yi,xi)
# Generator returning the data values
def get_data_vals(ny,nx,data,subindex):
for y in range(ny):
for x in range(nx):
yi = y # This is easy
xi = subindex[y,x] # Get the second index value from the subindex
yield data[y,subindex[y,x]]
So instead of the for loops above, I'd like to use a multi-dimensional index to index data Using numpy.ix_, I guess I would have something like:
idx = numpy.ix_([np.arange(ny), ?])
data[idx]
but I don't know what the second dimension argument should be. I'm guessing it should be something involving numpy.choose?
What you actually seem to want is:
y_idx = np.arange(ny)[:,np.newaxis]
data[y_idx, subindex]
BTW, you could achieve the same thing with y_idx = np.arange(ny).reshape((-1, 1)).
Let's look at a small example:
import numpy as np
ny, nx = 3, 5
data = np.random.rand(ny, nx)
subindex = np.random.randint(nx, size=(ny, nx))
Now
np.arange(ny)
# array([0, 1, 2])
are just the indices for the "y-axis", the first dimension of data. And
y_idx = np.arange(ny)[:,np.newaxis]
# array([[0],
# [1],
# [2]])
adds a new axis to this array (after the existing axis) and effectively transposes it. When you now use this array in an indexing expression together with the subindex array, the former gets broadcasted to the shape of the latter. So y_idx becomes effectively:
# array([[0, 0, 0, 0, 0],
# [1, 1, 1, 1, 1],
# [2, 2, 2, 2, 2]])
And now for each pair of y_idx and subindex you look up an element in the data array.
Here you can find out more about "fancy indexing"
It sounds like you need to do two things:
Find all indices into the data array and
Translate the column indices according to some other array, subindex.
The code below therefore generates indices for all array positions (using np.indices), and reshapes it to (..., 2) -- a 2-D list of coordinates representing each position in the array. For each coordinate, (i, j), we then translate the column coordinate j using the subindex array provided, and then use that translated index as the new column index.
With numpy, it is not necessary to do that in a for-loop--we can simply pass in all the indices at once:
i, j = np.indices(data.shape).reshape((-1, 2)).T
data[i, subindex[i, j]]

Multiply a 1d array x 2d array python

I have a 2d array and a 1d array and I need to multiply each element in the 1d array x each element in the 2d array columns. It's basically a matrix multiplication but numpy won't allow matrix multiplication because of the 1d array. This is because matrices are inherently 2d in numpy. How can I get around this problem? This is an example of what I want:
FrMtx = np.zeros(shape=(24,24)) #2d array
elem = np.zeros(24, dtype=float) #1d array
Result = np.zeros(shape=(24,24), dtype=float) #2d array to store results
some_loop to increment i:
some_other_loop to increment j:
Result[i][j] = (FrMtx[i][j] x elem[j])
Numerous efforts have given me errors such as arrays used as indices must be of integer or boolean type
Due to the NumPy broadcasting rules, a simple
Result = FrMtx * elem
Will give the desired result.
You should be able to just multiply your arrays together, but its not immediately obvious what 'direction' the arrays will be multiplied since the matrix is square. To be more explicit about which axes are being multiplied, I find it is helpful to always multiply arrays that have the same number of dimensions.
For example, to multiply the columns:
mtx = np.zeros(shape=(5,7))
col = np.zeros(shape=(5,))
result = mtx * col.reshape((5, 1))
By reshaping col to (5,1), we guarantee that axis 0 of mtx is multiplied against axis 0 of col. To multiply rows:
mtx = np.zeros(shape=(5,7))
row = np.zeros(shape=(7,))
result = mtx * row.reshape((1, 7))
This guarantees that axis 1 in mtx is multiplied by axis 0 in row.

Categories

Resources