Effective way to find and delete certain elements from a numpy array

Effective way to find and delete certain elements from a numpy array - python

I have a numpy array with some positive numbers and some -1 elements. I want to find these elements with -1 values, delete them and store their indeces.
One way of doing it is iterating through the array and cheking if the value is -1. Is this the only way? If not, what about its effectivness? Isn't there a more effective python tool then?

With numpy.argwhere() and numpy.delete() routines:
import numpy as np
arr = np.array([1, 2, 3, -1, 4, -1, 5, 10, -1, 14])
indices = np.argwhere(arr == -1).flatten()
new_arr = np.delete(arr, indices)
print(new_arr) # [ 1 2 3 4 5 10 14]
print(indices.tolist()) # [3, 5, 8]
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.argwhere.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html

import numpy as np
yourarray=np.array([4,5,6,7,-1,2,3,-1,9,-1]) #say
rangenumpyarray=np.arange(len(yourarray)) # to create a column adjacent to your array of range
arra=np.hstack((rangenumpyarray.reshape(-1,1),yourarray.reshape(-1,1))) # combining both arrays as two columns
arra[arra[:,1]==-1][:,0] # learn boolean indexing

Use a combination of np.flatnonzero and simple boolean indexing.
x = array([ 0, 0, -1, 0, 0, -1, 0, -2, 0, 0])
m = x != -1 # generate a mask
idx = np.flatnonzero(~m)
x = x[m]
idx
array([2, 5])
x
array([ 0, 0, 0, 0, 0, -2, 0, 0])

Related

Replacing the values of a numpy array of zeros using a array of indexes

I'm working with numpy and I got a problem with index, I have a numpy array of zeros, and a 2D array of indexes, what I need is to use this indexes to change the values of the array of zeros by the value of 1, I tried something, but it's not working, here is what I tried.
import numpy as np
idx = np.array([0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1)) #This repeats the array of zeros to match the number of rows of the index array
res = []
for i, j in zip(repeat, idx):
res.append(i[j] = 1) #Here I try to replace the matching index by the value of 1
output = np.array(res)
but I get the syntax error
expression cannot contain assignment, perhaps you meant "=="?
my desired output should be
output = [[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]]
This is just an example, the idx array can be bigger, I think the problem is the indexing, and I believe there is a much simple way of doing this without repeating the array of zeros and using the zip function, but I can't figure it out, any help would be aprecciated, thank you!
EDIT: When I change the = by == I get a boolean array which I don't need, so I don't know what's happening there either.

You can use np.put_along_axis to assign values into the array repeat based on indices in idx. This is more efficient than a loop (and easier).
import numpy as np
idx = np.array([[0, 3, 4],
[1, 3, 5],
[0, 4, 5]]) #Array of index
zeros = np.zeros(6).astype(int) #Array of zeros [0, 0, 0, 0, 0, 0]
repeat = np.tile(zeros, (idx.shape[0], 1))
np.put_along_axis(repeat, idx, 1, 1)
repeat will then be:
array([[1, 0, 0, 1, 1, 0],
[0, 1, 0, 1, 0, 1],
[1, 0, 0, 0, 1, 1]])
FWIW, you can also make the array of zeros directly by passing in the shape:
np.zeros([idx.shape[0], 6])

Creating 2D numpy array of start and end indices of "streaks" in another array.

Say I have a 1D numpy array of numbers myArray = ([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0 ,1, 2, 1, 1, 1]).
I want to create a 2D numpy array that describe the first (column 1) and last (column 2) indices of any "streak" of consecutive 1's that is longer than 2.
So for the example above, the 2D array should look like this:
indicesArray =
([5, 8],
[13, 15])
Since there are at least 3 consecutive ones in the 5th, 6th, 7th, 8th places and in the 13th, 14th, 15th places.
Any help would be appreciated.

Approach #1
Here's one approach inspired by this post -
def start_stop(a, trigger_val, len_thresh=2):
# "Enclose" mask with sentients to catch shifts later on
mask = np.r_[False,np.equal(a, trigger_val),False]
# Get the shifting indices
idx = np.flatnonzero(mask[1:] != mask[:-1])
# Get lengths
lens = idx[1::2] - idx[::2]
return idx.reshape(-1,2)[lens>len_thresh]-[0,1]
Sample run -
In [47]: myArray
Out[47]: array([1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 1, 2, 1, 1, 1])
In [48]: start_stop(myArray, trigger_val=1, len_thresh=2)
Out[48]:
array([[ 5, 8],
[13, 15]])
Approach #2
Another with binary_erosion -
from scipy.ndimage.morphology import binary_erosion
mask = binary_erosion(myArray==1,structure=np.ones((3)))
idx = np.flatnonzero(mask[1:] != mask[:-1])
out = idx.reshape(-1,2)+[0,1]

How to randomly select one nonzero element per row from a sparse matrix with out for loop in python

I have a large sparse matrix whose each row contains multiple nonzero elements, for example
a = np.array([[1, 1,0,0,0,0], [2,0, 1,0,2,0], [3,0,4,0,0, 3]])
I want to be able to randomly select one nonzero element per row without for loop. Any good suggestion? As output, I am more interested in chosen elements' index than its value.

With a numpy array such as:
arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
you can do arr != 0 which will give a True / False array of values which pass the condition so in our case, where the values are not equal (!=) to 0. So:
array([ True, True, True, False, True, False, False, True], dtype=bool)
from here, we can 'index' arr with this boolean array by doing arr[arr != 0] which gives us:
array([5, 2, 6, 2, 6])
So now that we have a way of removing the non-zero values from a numpy array, we can do a simple list comprehension on each row in your a array. For each row, we remove the zeros and then perform a random.choice on the array. As so:
np.array([np.random.choice(r[r!=0]) for r in a])
which gives you back an array of length 3 containing random non-zero items from each row in a. :)
Hope this helps!
Update
If you want the indexes of the random non-zero numbers in the array, you can use .nonzero().
So if we have this array:
arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
we can do:
arr.nonzero()
which gives a tuple of the indexes of non-zero elements:
(array([0, 1, 2, 4, 7]),)
so as with before, we can use this and np.random.choice() in a list-comprehension to produce random indexes:
a = np.array([[1, 1, 0, 0, 0, 0], [2, 0, 1, 0, 2, 0], [3, 0, 4, 0, 0, 3]])
np.array([np.random.choice(r.nonzero()[0]) for r in a])
which returns an array of the form [x, y, z] where x, y and z are random indexes of non-zero elements from their corresponding rows.
E.g. one result could be:
array([1, 4, 2])
And if you want it to also return the rows, you could just add in a numpy.arrange() call on the length of a to get an array of row numbers:
([np.arange(len(a))], np.array([np.random.choice(r.nonzero()[0]) for r in a]))
so an example random output could be:
([array([0, 1, 2])], array([1, 2, 5]))
for a as:
array([[1, 1, 0, 0, 0, 0],
[2, 0, 1, 0, 2, 0],
[3, 0, 4, 0, 0, 3]])
Hope this does what you want now :)

Calculate the sum of every 5 elements in a python array

I have a python array in which I want to calculate the sum of every 5 elements. In my case I have the array c with ten elements. (In reality it has a lot more elements.)
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
So finally I would like to have a new array (c_new) which should show the sum of the first 5 elements, and the second 5 elements
So the result should be that one
1+0+0+0+0 = 1
2+0+0+0+0 = 2
c_new = [1, 2]
Thank you for your help
Markus

You can use np.add.reduceat by passing indices where you want to split and sum:
import numpy as np
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
np.add.reduceat(c, np.arange(0, len(c), 5))
# array([1, 2])

Heres one way of doing it -
c = [1, 0, 0, 0, 0, 2, 0, 0, 0, 0]
print [sum(c[i:i+5]) for i in range(0, len(c), 5)]
Result -
[1, 2]

If five divides the length of your vector and it is contiguous then
np.reshape(c, (-1, 5)).sum(axis=-1)
It also works if it is non contiguous, but then it is typically less efficient.
Benchmark:
def aredat():
return np.add.reduceat(c, np.arange(0, len(c), 5))
def reshp():
np.reshape(c, (-1, 5)).sum(axis=-1)
c = np.random.random(10_000_000)
timeit(aredat, number=100)
3.8516048429883085
timeit(reshp, number=100)
3.09542763303034
So where possible, reshapeing seems a bit faster; reduceat has the advantage of gracefully handling non-multiple-of-five vectors.

why don't you use this ?
np.array([np.sum(i, axis = 0) for i in c.reshape(c.shape[0]//5,5,c.shape[1])])

There are various ways to achieve that. Will leave, below, two options using numpy built-in methods.
Option 1
numpy.sum and numpy.ndarray.reshape as follows
c_sum = np.sum(np.array(c).reshape(-1, 5), axis=1)
[Out]: array([1, 2])
Option 2
Using numpy.vectorize, a custom lambda function, and numpy.arange as follows
c_sum = np.vectorize(lambda x: sum(c[x:x+5]))(np.arange(0, len(c), 5))
[Out]: array([1, 2])

Insert sections of zeros into numpy array using zip and np.insert

I cut out the zeros of a numpy array, do some stuff and want to insert them back in visual purposes. I do have the indices of the sections and tried to insert the zeros back in with numpy.insert and zip but the index runs out of bounds, even though I start at the lower end. Example:
import numpy as np
a = np.array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])
a = a[a != 0] # cut zeros out
zero_start = [3, 9, 13]
zero_end = [5, 10, 15]
# Now insert the zeros back in using the former indices
for ev in zip(zero_start, zero_end):
a = np.insert(a, ev[0], np.zeros(ev[1]-ev[0]))
>>> IndexError: index 13 is out of bounds for axis 0 with size 12
Seems like he is not refreshing the array size inside the loop. Any suggestions or other (more pythonic) approaches to solve this problem?

Approach #1: Using indexing -
# Get all zero indices
idx = np.concatenate([range(i,j+1) for i,j in zip(zero_start,zero_end)])
# Setup output array of zeros
N = len(idx) + len(a)
out = np.zeros(N,dtype=a.dtype)
# Get mask of non-zero places and assign values from a into those
out[~np.in1d(np.arange(N),idx)] = a
We can also generate the actual indices where a had non-zeros originally and then assign. Thus, the last step of masking could be replaced with something like this -
out[np.setdiff1d(np.arange(N),idx)] = a
Approach #2: Using np.insert given zero_start and zero_end as arrays -
insert_start = np.r_[zero_start[0], zero_start[1:] - zero_end[:-1]-1].cumsum()
out = np.insert(a, np.repeat(insert_start, zero_end - zero_start + 1), 0)
Sample run -
In [755]: a = np.array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])
...: a = a[a != 0] # cut zeros out
...: zero_start = np.array([3, 9, 13])
...: zero_end = np.array([5, 10, 15])
...:
In [756]: s0 = np.r_[zero_start[0], zero_start[1:] - zero_end[:-1]-1].cumsum()
In [757]: np.insert(a, np.repeat(s0, zero_end - zero_start + 1), 0)
Out[757]: array([1, 2, 4, 0, 0, 0, 3, 6, 2, 0, 0, 1, 3, 0, 0, 0, 5])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Effective way to find and delete certain elements from a numpy array - python

Use a combination of np.flatnonzero and simple boolean indexing. x = array([ 0, 0, -1, 0, 0, -1, 0, -2, 0, 0]) m = x != -1 # generate a mask idx = np.flatnonzero(~m) x = x[m] idx array([2, 5]) x array([ 0, 0, 0, 0, 0, -2, 0, 0])

Related

Replacing the values of a numpy array of zeros using a array of indexes

Creating 2D numpy array of start and end indices of "streaks" in another array.

How to randomly select one nonzero element per row from a sparse matrix with out for loop in python

Calculate the sum of every 5 elements in a python array

Insert sections of zeros into numpy array using zip and np.insert

Categories

Resources