How to numpy-ify a two dimensional conditional lookup? - python

I'm trying to vectorize or otherwise make faster (likely using numpy) a lookup/matching for loop. I've looked into np.vectorize, numpy indexing and np.where, but can't find the right implementation/combination to fit my needs.
Code in Question:
Sx = np.zeros((Np+1, 2*N+1))
rows, cols = prepped_array.shape[0], prepped_array.shape[1]
for ind1 in range(rows):
for ind2 in range(cols):
if prepped_array[ind1][ind2][0] != -1:
Sx[ind1, ind2] = M[prepped_array[ind1][ind2][0], prepped_array[ind1][ind2][1]]
prepped_array is a lookup table (initialized to all [-1, -1]) where values have been replaced where they should be changed in Sx.
M is transformed input that we want to map into the Sx array.
Any ideas/pointers? Thanks!

You can use a boolean mask for indexing the Sx and prepped_array and then use two index arrays, derived from the prepped_array, to index into the M array. The code might speak clearer than the previous sentence:
mask = prepped_array[:, :, 0] != -1
Sx[mask] = M[tuple(prepped_array[mask].T)]
Let's look at the involved steps:
mask = prepped_array[:, :, 0] != -1 creates a 2D boolean array indicating where the condition is met.
prepped_array[mask] creates a 2D array where entries from the previous 3rd dimensions appear now along the 2nd dimension; the first dimensions corresponds to each True instance in mask.
tuple(prepped_array[mask].T) creates two 1D arrays which can be used to further index into other arrays: the first array denotes row indices and the second array denotes column indices.
So Sx[mask] = M[tuple(prepped_array[mask].T)] maps the indices contained in prepped_array to the array M using the previous two 1D index arrays.
Sx[mask] finally references those elements in Sx that for which the condition in prepped_array[:, :, 0] is met.

Related

Selecting from 5D numpy array with a corresponding 3D array containing indices of the 4th dimension

I have a 5D numpy array containing values, and would like to obtain a subarray with one less dimension, where the values have been selected based on a 3D array that contains indices of the forth dimension of the first array. E.g., I have the following arrays:
values = np.random.randn(3,4,5,10,2)
indices = np.random.randint(0,values.shape[3],size=values.shape[:3])
I found one solution, but find it rather complicated, and would prefer a one-liner:
x = np.arange(values.shape[0])
y = np.arange(values.shape[1])
z = np.arange(values.shape[2])
result = values[x[:,None,None],y[None,:,None],z[None, None,:],indices,:]
Is there any better solution to get this array?
You can try the following:
indices = indices[..., None, None]
result = np.take_along_axis(values, indices, axis=3).squeeze(axis=3)

add column Numpy array python

I am very new to python and am very familiar with R, but my question is very simple using Numpy Arrays:
Observe:
I have one array X of dimension (100,2) of floating point type and I want to add a 3rd column, preferably into a new Numpy array of dimension (100,3) such that the 3rd column = col(1)^2 for every row in array of X.
My understanding is Numpy arrays are generally of fixed dimension so I'm OK with creating a new array of dim 100x3, I just don't know how to do so using Numpy arrays.
Thanks!
One way to do this is by creating a new array and then concatenating it. For instance, say that M is currently your array.
You can compute col(1)^2 as C = M[:,0] ** 2 (which I'm interpreting as column 1 squared, not column 1 to the power of the values in column two). C will now be an array with shape (100, ), so we can reshape it using C = np.expand_dims(C, 1) which will create a new axis of length 1, so our new column now has shape (100, 1). This is important because we want all both of our arrays to have the same number of dimensions when concatenating them.
The last step here is to concatenate them using np.concatenate. In total, our result looks like this
C = M[:, 0] ** 2
C = np.expand_dims(C, 1)
M = np.concatenate([M, C], axis=1) #third row will now be col(1) ^ 2
If you're the kind of person who likes to do things in one line, you have:
M = np.concatenate([M, np.expand_dims(M[:, 0] ** 2, 0)], axis=1)
That being said, I would recommend looking at Pandas, it supports these actions more naturally, in my opinion. In Pandas, it would be
M["your_col_3_name"] = M["your_col_1_name"] ** 2
where M is a pandas dataframe.
Append with axis=1 should work.
a = np.zeros((5,2))
b = np.ones((5,1))
print(np.append(a,b,axis=1))
This should return:
[[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1]]
# generate an array with shape (100,2), fill with 2.
a = np.full((100,2),2)
# calcuate the square to first column, this will be a 1-d array.
squared=a[:,0]**2
# concatenate the 1-d array to a,
# first need to convert it to 2-d arry with shape (100,1) by reshape(-1,1)
c = np.concatenate((a,squared.reshape(-1,1)),axis=1)

Randomly keeping a single element different from zero along one axis of a numpy array

I need an efficient way to create a numpy array of shape (x,y,3) where only one random element out of the 3 for each tuple (x,y) has a value randomly selected from [-1,0,1]
np.random.randint(-1, 2, (x,y,3))
does the work only for the second half of my requirements.
I could use a nested loop to iterate on each (x, y) and multiple its value by a random mask but it would not be efficient at all.
Here is the loop implementation:
a=np.random.randint(-1, 2, (x,y,3))
for i in range(a.shape[0]):
for j in range(a.shape[1]):
mask = np.array(np.random.permutation([0,1,0]))
a[i][j] = a[i][j] * mask
Rather than generating a whole bunch of extra numbers and turning most of them off, I'd approach this from the point of view of only generating the numbers you need. You want to assign to a random index between 0 and 2 for each x-y pair. So generate a random index, and the random values, and assign:
indices = np.random.randint(3, size=(x, y))
values = np.random.randint(-1, 2, size=(x, y))
result = np.zeros((x, y, 3), dtype=int)
result[(*np.ogrid[:x, :y], indices)] = values
The indexing expression is an advanced index because indices is a list of integers. Using ... or :, : for the first two indices won't do what you want in that case. Instead, np.ogrid generates ranges of the correct shape to force the elements of indices to correspond to the correct x-y coordinates.

2d array compare to 1d array returns 2d array

I am trying to compare a 1D array element-wise to a 2D array, and returns the elements of the 2D array which fulfils the condition in a 2D array form without using a for loop. Preferably using numpy or quicker method.
a = range(1,10)
Tna = np.random.choice(a, size=[250,10,1000], replace=True)
sum_Ta = np.sum(Tna, axis = 1)
percent = np.percentile(sum_Ta, 5, axis =0)
Now I would like to get a 2D array which contains the elements of sum_Ta if the elements are smaller the percent. Such that 250 elements of sum_Ta are comparing with 1 element of percent for 1000 times. Originally I can do, ES = sum_Ta[sum_Ta < percent[:,None]], but it only gives me a 1D array, not a 2D array.
Assuming you mean that for each row, you want the element of the row to be included if it is less than the percentage associated with its column.
Try the following:
mask = sum_Ta < (percent * np.ones((250,1)))
ES = np.zeros((250, 1000))
ES[mask] = sum_Ta[mask]

removing entries from a numpy array

I have a multidimensional numpy array with the shape (4, 2000). Each column in the array is a 4D element where the first two elements represent 2D positions.
Now, I have an image mask with the same shape as an image which is binary and tells me which pixels are valid or invalid. An entry of 0 in the mask highlights pixels that are invalid.
Now, I would like to do is filter my first array based on this mask i.e. remove entries where the position elements in my first array correspond to invalid pixels in the image. This can be done by looking up the corresponding entries in the mask and marking those columns to be deleted which correspond to a 0 entry in the mask.
So, something like:
import numpy as np
# Let mask be a 2D array of 0 and 1s
array = np.random.rand(4, 2000)
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] <= 0:
# Somehow remove this entry from my array.
If possible, I would like to do this without looping as I have in my incomplete code.
You could select the x and y coordinates from array like this:
xarr, yarr = array[0, :], array[1, :]
Then form a boolean array of shape (2000,) which is True wherever the mask is 1:
idx = mask[xarr, yarr].astype(bool)
mask[xarr, yarr] is using so-called "integer array indexing".
All it means here is that the ith element of idx equals mask[xarr[i], yarr[i]].
Then select those columns from array:
result = array[:, idx]
import numpy as np
mask = np.random.randint(2, size=(500,500))
array = np.random.randint(500, size=(4, 2000))
xarr, yarr = array[0, :], array[1, :]
idx = mask[xarr, yarr].astype(bool)
result = array[:, idx]
cols = []
for i in range(2000):
current = array[:, i]
if mask[current[0], current[1]] > 0:
cols.append(i)
expected = array[:, cols]
assert np.allclose(result, expected)
I'm not sure if I'm reading the question right. Let's try again!
You have an array with 2 dimensions and you want to remove all columns that have masked data. Again, apologies if I've read this wrong.
import numpy.ma as ma
a = ma.array((([[1,2,3,4,5],[6,7,8,9,10]]),mask=[[0,0,0,1,0],[0,0,1,0,0]])
a[:,-a.mask.any(0)] # this is where the action happens
the a.mask.any(0) identifies all columns that are masked into a Boolean array. It's negated (the '-' sign) because we want the inverse, and then it uses that array to remove all masked values via indexing.
This gives me an array:
[[1 2 5],[6 7 10]]
In other words, the array has all removed all columns with masked data anywhere. Hope I got it right this time.

Categories

Resources