Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2
Related
I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])
How to perform a sum just for a list of indices over numpy array, e.g., if I have an array a = [1,2,3,4] and a list of indices to sum, indices = [0, 2] and I want a fast operation to give me the answer 4 because the value for summing value at index 0 and index 2 in a is 4
You can use sum directly after indexing with indices:
a = np.array([1,2,3,4])
indices = [0, 2]
a[indices].sum()
The accepted a[indices].sum() approach copies data and creates a new array, which might cause problem if the array is large. np.sum actually has an argument to mask out colums, you can just do
np.sum(a, where=[True, False, True, False])
Which doesn't copy any data.
The mask array can be obtained by:
mask = np.full(4, False)
mask[np.array([0,2])] = True
Try:
>>> a = [1,2,3,4]
>>> indices = [0, 2]
>>> sum(a[i] for i in indices)
4
Faster
If you have a lot of numbers and you want high speed, then you need to use numpy:
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[indices]
array([1, 3])
>>> np.sum(a[indices])
4
If given an array like
a = array([[2,4,9,8,473],[54,7,24,19,20]])
then how can I write the indexes of the array which are between values x and y?
currently I've got:
where(5 > a > 10)
if will however give an output if I say for example:
where(a > 5)
but the where function doesn't take this command and once it will it should output a 2 one dimensional array, is there a way to easily stack them?
You can use logical operator &(and) | (or) to chain different conditions together, so for your case, you can do:
np.where((a > 5) & (a < 10))
# (array([0, 0, 1]), array([2, 3, 1]))
# here np.where gives a tuple, the first element of which gives the row index, while the
# second element gives the corresponding column index
If you want the indices to be an array where each row represents an element, you can stack them:
np.stack(np.where((a > 5) & (a < 10)), axis=-1)
# array([[0, 2],
# [0, 3],
# [1, 1]])
Or as #Divakar commented use np.argwhere((a > 5) & (a < 10)).
you have two indexes that you need to specify, one for which inner array you are referencing and the other for what actual member of that array you are referring to
Is there a numpy method which is equivalent to the builtin pop for python lists?
Popping obviously doesn't work on numpy arrays, and I want to avoid a list conversion.
There is no pop method for NumPy arrays, but you could just use basic slicing (which would be efficient since it returns a view, not a copy):
In [104]: y = np.arange(5); y
Out[105]: array([0, 1, 2, 3, 4])
In [106]: last, y = y[-1], y[:-1]
In [107]: last, y
Out[107]: (4, array([0, 1, 2, 3]))
If there were a pop method it would return the last value in y and modify y.
Above,
last, y = y[-1], y[:-1]
assigns the last value to the variable last and modifies y.
Here is one example using numpy.delete():
import numpy as np
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(arr)
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr = np.delete(arr, 1, 0)
print(arr)
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])
Pop doesn't exist for NumPy arrays, but you can use NumPy indexing in combination with array restructuring, for example hstack/vstack or numpy.delete(), to emulate popping.
Here are some example functions I can think of (which apparently don't work when the index is -1, but you can fix this with a simple conditional):
def poprow(my_array,pr):
""" row popping in numpy arrays
Input: my_array - NumPy array, pr: row index to pop out
Output: [new_array,popped_row] """
i = pr
pop = my_array[i]
new_array = np.vstack((my_array[:i],my_array[i+1:]))
return [new_array,pop]
def popcol(my_array,pc):
""" column popping in numpy arrays
Input: my_array: NumPy array, pc: column index to pop out
Output: [new_array,popped_col] """
i = pc
pop = my_array[:,i]
new_array = np.hstack((my_array[:,:i],my_array[:,i+1:]))
return [new_array,pop]
This returns the array without the popped row/column, as well as the popped row/column separately:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,poparow] = poprow(A,0)
>>> poparow
array([1, 2, 3])
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,popacol] = popcol(A,2)
>>> popacol
array([3, 6])
There isn't any pop() method for numpy arrays unlike List, Here're some alternatives you can try out-
Using Basic Slicing
>>> x = np.array([1,2,3,4,5])
>>> x = x[:-1]; x
>>> [1,2,3,4]
Or, By Using delete()
Syntax - np.delete(arr, obj, axis=None)
arr: Input array
obj: Row or column number to delete
axis: Axis to delete
>>> x = np.array([1,2,3,4,5])
>>> x = x = np.delete(x, len(x)-1, 0)
>>> [1,2,3,4]
The important thing is that it takes one from the original array and deletes it.
If you don't m
ind the superficial implementation of a single method to complete the process, the following code will do what you want.
import numpy as np
a = np.arange(0, 3)
i = 0
selected, others = a[i], np.delete(a, i)
print(selected)
print(others)
# result:
# 0
# [1 2]
The most 'elegant' solution for retrieving and removing a random item in Numpy is this:
import numpy as np
import random
arr = np.array([1, 3, 5, 2, 8, 7])
element = random.choice(arr)
elementIndex = np.where(arr == element)[0][0]
arr = np.delete(arr, elementIndex)
For curious coders:
The np.where() method returns two lists. The first returns the row indexes of the matching elements and the second the column indexes. This is useful when searching for elements in a 2d array. In our case, the first element of the first returned list is interesting.
To add, If you want to implement pop for a row or column from a numpy 2D array you could do like:
col = arr[:, -1] # gets the last column
np.delete(arr, -1, 1) # deletes the last column
and for row:
row = arr[-1, :] # gets the last row
np.delete(arr, -1, 0) # deletes the last row
unutbu had a simple answer for this, but pop() can also take an index as a parameter. This is how you replicate it with numpy:
pop_index = 4
pop = y[pop_index]
y = np.concatenate([y[:pop_index],y[pop_index+1:]])
OK, since I didn't see a good answer that RETURNS the 1st element and REMOVES it from the original array, I wrote a simple (if kludgy) function utilizing global for a 1d array (modification required for multidims):
tmp_array_for_popfunc = 1d_array
def array_pop():
global tmp_array_for_popfunc
r = tmp_array_for_popfunc[0]
tmp_array_for_popfunc = np.delete(tmp_array_for_popfunc, 0)
return r
check it by using-
print(len(tmp_array_for_popfunc)) # confirm initial size of tmp_array_for_popfunc
print(array_pop()) #prints return value at tmp_array_for_popfunc[0]
print(len(tmp_array_for_popfunc)) # now size is 1 smaller
I made a function as follow, doing almost the same. This function has 2 arguments: np_array and index, and return the value of the given index of the array.
def np_pop(np_array, index=-1):
'''
Pop the "index" from np_array and return the value.
Default value for index is the last element.
'''
# add this to make sure 'numpy' is imported
import numpy as np
# read the value of the given array at the given index
value = np_array[index]
# remove value from array
np.delete(np_array, index, 0)
# return the value
return value
Remember you can add a condition to make sure the given index is exist in the array and return -1 if anything goes wrong.
Now you can use it like this:
import numpy as np
i = 2 # let's assume we want to pop index number 2
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # assume 'y' is our numpy array
poped_val = np_pop(y, i) # value of the piped index
But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.