2D array, inequalities and where function - python

If given an array like
a = array([[2,4,9,8,473],[54,7,24,19,20]])
then how can I write the indexes of the array which are between values x and y?
currently I've got:
where(5 > a > 10)
if will however give an output if I say for example:
where(a > 5)
but the where function doesn't take this command and once it will it should output a 2 one dimensional array, is there a way to easily stack them?

You can use logical operator &(and) | (or) to chain different conditions together, so for your case, you can do:
np.where((a > 5) & (a < 10))
# (array([0, 0, 1]), array([2, 3, 1]))
# here np.where gives a tuple, the first element of which gives the row index, while the
# second element gives the corresponding column index
If you want the indices to be an array where each row represents an element, you can stack them:
np.stack(np.where((a > 5) & (a < 10)), axis=-1)
# array([[0, 2],
# [0, 3],
# [1, 1]])
Or as #Divakar commented use np.argwhere((a > 5) & (a < 10)).

you have two indexes that you need to specify, one for which inner array you are referencing and the other for what actual member of that array you are referring to

Related

How to extract numpy array stored in tuple?

Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2

Select all rows from Numpy array where each column satisfies some condition

I have an array x of the form,
x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]
I also have an array of non-allowed numbers for each column. I want to select all of the rows which only have allowed characters in each column. For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column; I can do this by,
x[~np.in1d(x[:,0], [1,2,3])]
And for any single column, I can do this. But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column. I can't seem to get x.any or x.all to do this well - how should I go about this?
EDIT: To clarify, the non-allowed numbers are different for each column. In actuality, I will have some array y,
y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]
Where I want rows from x for which column 1 cannot be in [1,2,3], column 2 cannot be in [4,5,6], and so on.
You can broadcast the comparison, then all to check:
x[(x != y[:,None,:]).all(axis=(0,-1))]
Break down:
# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])
# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)
# slice
x[mask]
For example, consider:
x = np. array([[1, 2],
[9, 8],
[5, 6],
[7, 8]])
y = np.array([[1, 4],
[2, 5],
[3, 7]])
Then mask = (x != y[:,None,:]).all(axis=(0,1)) gives
array([False, True, True, True])
It's recommended to use np.isin rather than np.in1d these days. This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.
x[np.isin(x, [1, 2, 3], invert=True).all(1)]
np.isin preserves the shape of x, so you can then use .all across the columns. It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]), but more efficiently.
This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

How do I remove the first and last rows and columns from a 2D numpy array?

I'd like to know how to remove the first and last rows and columns from a 2D array in numpy. For example, say we have a (N+1) x (N+1) matrix called H then in MATLAB/Octave, the code I'd use would be:
Hsub = H(2:N,2:N);
What's the equivalent code in Numpy? I thought that np.reshape might do what I want but I'm not sure how to get it to remove just the target rows as I think if I reshape to a (N-1) x (N-1) matrix, it'll remove the last two rows and columns.
How about this?
Hsub = H[1:-1, 1:-1]
The 1:-1 range means that we access elements from the second index, or 1, and we go up to the second last index, as indicated by the -1 for a dimension. We do this for both dimensions independently. When you do this independently for both dimensions, the result is the intersection of how you're accessing each dimension, which is essentially chopping off the first row, first column, last row and last column.
Remember, the ending index is exclusive, so if we did 0:3 for example, we only get the first three elements of a dimension, not four.
Also, negative indices mean that we access the array from the end. -1 is the last value to access in a particular dimension, but because of the exclusivity, we are getting up to the second last element, not the last element. Essentially, this is the same as doing:
Hsub = H[1:H.shape[0]-1, 1:H.shape[1]-1]
... but using negative indices is much more elegant. You also don't have to use the number of rows and columns to extract out what you need. The above syntax is dimension agnostic. However, you need to make sure that the matrix is at least 3 x 3, or you'll get an error.
Small bonus
In MATLAB / Octave, you can achieve the same thing without using the dimensions by:
Hsub = H(2:end-1, 2:end-1);
The end keyword with regards to indexing means to get the last element for a particular dimension.
Example use
Here's an example (using IPython):
In [1]: import numpy as np
In [2]: H = np.meshgrid(np.arange(5), np.arange(5))[0]
In [3]: H
Out[3]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [4]: Hsub = H[1:-1,1:-1]
In [5]: Hsub
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
As you can see, the first row, first column, last row and last column have been removed from the source matrix H and the remainder has been placed in the output matrix Hsub.

filtering elements of matrix by row in python scipy/numpy

How can I filter elements of an NxM matrix in scipy/numpy in Python by some condition on the rows?
For example, just you can do where(my_matrix != 3) which treats the matrix "element-wise", I want to do this by row, so that you can ask things like where (my_matrix != some_other_row), to filter out all rows that are not equal to some_other_row. How can this be done?
Assume you have a matrix
a = numpy.array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2]])
and you want to get the indices of the rows tha are not equal to
row = numpy.array([0, 1, 2])
You can get these indices by
indices, = (a != row).any(1).nonzero()
a != row compares each row of a to row element-wise, returning a Boolean array of the same shape as a. Then, we use any() along the first axis to find rows in which any element differs from the corresponding element in row. Last, nonzero() gives us the indices of those rows.

Categories

Resources