How to delete rows of numpy array by multiple row indices?

How to delete rows of numpy array by multiple row indices? - python

I have two lists of indices (idx[0] and idx[1]), and I should delete the corresponding rows from numpy array y_test.
y_test
12 11 10
1 2 2
3 2 3
4 1 2
13 1 10
idx[0] = [0,2]
idx[1] = [1,3]
I tried to delete the rows as follows (using ~). But it didn't work:
result = y_test[(~idx[0]+~idx[1]+~idx[2])]
Expected result:
result =
13 1 10

Instead of removing elements, just make a new array with the desired ones. This will keep any future indexing from getting jumbled up and maintain the old array.
import numpy as np
y_test = np.asarray([[12, 11, 10], [1, 2, 2], [3, 2, 3], [4, 1, 2], [13, 1, 10]])
idx = [[0, 2], [1, 3]]
# flatten list of lists
idx_flat = [i for j in idx for i in j]
# assign values that are NOT in your idx list to a new array
result = [row for num, row in enumerate(y_test) if num not in idx_flat]
# cast this however you want it, right now 'result' is a list of np.arrays
print result
[array([13, 1, 10])]
For an understanding of the flatten step using list comprehensions check this out

You can use numpy.delete which deletes the subarrays along the axis.
np.delete(y_test, idx, axis=0)
Make sure that idx.dtype is an integer type and use numpy.astype if not.
Your approach did not work because idx is not a boolean index array but holds the indices. So ~ which is binary negation will produce ~[0, 2] = [-1, -3] (where both should be numpy arrays).
I would definitely recommend reading up on the difference between index arrays and boolean index arrays. For boolean index arrays I would suggest using numpy.logical_not and numpy.logical_or.
+ concatenates Python lists but is the standard plus for numpy arrays.

Since you are using NumPy I'd suggest masking in this way.
Setup:
import numpy as np
y_test = np.array([[12,11,10],
[1,2,2],
[3,2,3],
[4,1,2],
[13,1,10]])
idx = np.array([[0,2], [1,3]])
Generate the mask:
Generate a mask of ones then set to zero elements at index in idx:
mask = np.ones(len(y_test), dtype = int).reshape(5,1)
mask[idx.flatten()] = 0
Finally apply the mask:
y_test[~np.all(y_test * mask == 0, axis=1)]
#=> [[13 1 10]]
y_test has not been modified.

Related

Numpy: Find indexes of rows in another array

I have two arrays containing lists of 3d coordinates.
I am trying to get a list of the indexes where each element in my array is found in another array.
a = np.array([[0.4,0.6,0.8],
[0.4, 1.0, 1.2],
[0.6,1.0,1.4],
[0.6,1.2,1.6]])
b = np.array([[0.4, 1.0, 1.2],
[0.4,0.6,0.8],
[0.6,1.2,1.6],
[0.6,1.0,1.4],
[0.6,1.0,1.4]])
idx = [np.where(np.all(a==i,axis=1)) for i in b]
# idx = [1, 0, 3, 2, 2]
Is there a way to achieve this using numpy methods as my arrays for a and b are large ~100k elements each.
Thanks in advance.

You can use b[:,None] for the comparison. It is a special kind of slicing that allows you to compare both arrays row-wise, since the shape of b is now (5,1,3), instead of (5,3).
idx = np.where( (a==b[:,None]).all(-1) )[1]
#output of idx: [1 0 3 2 2]

Faster way to set new df value using np.array index values in dataframe

I need to set the value of a new pandas df column based on the NumPy array index, also stored in the df. This works, but it is pretty slow with a large df. Any tips on how to speed things up?
a=np.random.random((5,5))
df=pd.DataFrame(np.array([[1,1],[3,3],[2,2],[3,2]]),columns=['i','j'])
df['ij']=df.apply(lambda x: (int(x['i']-1),int(x['j']-1)),axis=1)
for idx,r in df.iterrows():
df.loc[idx,'new']=a[r['ij']]

With NumPy indexing:
inds = df[["i", "j"]].to_numpy() - 1
df["new"] = a[inds[:, 0], inds[:, 1]]
where we index into a along rows with numbers in inds' first column and columns with its second column.
to get
>>> a
array([[0.27494719, 0.17706064, 0.71306907, 0.94776026, 0.04024955],
[0.56557293, 0.63732559, 0.12254121, 0.53177861, 0.48435987],
[0.33299644, 0.43459935, 0.57227818, 0.96142159, 0.79794503],
[0.80112425, 0.52816002, 0.01885327, 0.39880301, 0.51974912],
[0.60377461, 0.24419486, 0.88203753, 0.87263663, 0.49345361]])
>>> inds
array([[0, 0],
[2, 2],
[1, 1],
[2, 1]])
>>> df
i j new
0 1 1 0.274947
1 3 3 0.572278
2 2 2 0.637326
3 3 2 0.434599
for the ij column, you can do df["ij"] = inds.tolist().

Coming from the numpy side you could reshape the indices such that they match the a shape, using ravel_multi_index:
df["new"] = np.take(a, np.ravel_multi_index([df.i -1, df.j - 1], a.shape))

iterating a filtered Numpy array whilst maintaining index information

I am attempting to pass filtered values from a Numpy array into a function.
I need to pass values only above a certain value, and their index position with the Numpy array.
I am attempting to avoid iterating over the entire array within python by using Numpys own filtering systems, the arrays i am dealing with have 20k of values in them with potentially only very few being relevant.
import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = np.nonzero(somearray > 4)
for i in arrayindex:
somefunction(arrayindex[0], somearray[arrayindex[0]])
This threw up errors of logic not being able to handle multiple values,
this led me to testing it through print statement to see what was going on.
for cell in arrayindex:
print(f"index {cell}")
print(f"data {somearray[cell]}")
I expected an output of
index 4
data 5
index 5
data 6
But instead i get
index [4 5]
data [5 6]
I have looked through different methods to iterate through numpy arrays such and neditor, but none seem to still allow me to do the filtering of values outside of the for loop.
Is there a solution to my quandary?
Oh, i am aware that is is generally frowned upon to loop through a numpy array, however the function that i am passing these values to are complex, triggering certain events and involving data to be uploaded to a data base dependent on the data location within the array.
Thanks.

import numpy as np
somearray = np.array([1,2,3,4,5,6])
arrayindex = [idx for idx, val in enumerate(somearray) if val > 4]
for i in range(0, len(arrayindex)):
somefunction(arrayindex[i], somearray[arrayindex[i]])
for i in range(0, len(arrayindex)):
print("index", arrayindex[i])
print("data", somearray[arrayindex[i]])

You need to have a clear idea of what nonzero produces, and pay attention to the difference between indexing with a list(s) and with a tuple.
===
In [110]: somearray = np.array([1,2,3,4,5,6])
...: arrayindex = np.nonzero(somearray > 4)
nonzero produces a tuple of arrays, one per dimension (this becomes more obvious with 2d arrays):
In [111]: arrayindex
Out[111]: (array([4, 5]),)
It can be used directly as an index:
In [113]: somearray[arrayindex]
Out[113]: array([5, 6])
In this 1d case you could take the array out of the tuple, and iterate on it:
In [114]: for i in arrayindex[0]:print(i, somearray[i])
4 5
5 6
argwhere does a 'transpose', which could also be used for iteration
In [115]: idxs = np.argwhere(somearray>4)
In [116]: idxs
Out[116]:
array([[4],
[5]])
In [117]: for i in idxs: print(i,somearray[i])
[4] [5]
[5] [6]
idxs is (2,1) shape, so i is (1,) shape array, resulting in the brackets in the display. Occasionally it's useful, but nonzero is used more (often by it's other name, np.where).
2d
argwhere has a 2d example:
In [119]: x=np.arange(6).reshape(2,3)
In [120]: np.argwhere(x>1)
Out[120]:
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
In [121]: np.nonzero(x>1)
Out[121]: (array([0, 1, 1, 1]), array([2, 0, 1, 2]))
In [122]: x[np.nonzero(x>1)]
Out[122]: array([2, 3, 4, 5])
While nonzero can be used to index the array, argwhere elements can't.
In [123]: for ij in np.argwhere(x>1):
...: print(ij,x[ij])
...:
...
IndexError: index 2 is out of bounds for axis 0 with size 2
Problem is that ij is a list, which is used to index on dimension. numpy distinguishes between lists and tuples when indexing. (Earlier versions fudged the difference, but current versions are taking a more rigorous approach.)
So we need to change the list into a tuple. One way is to unpack it:
In [124]: for i,j in np.argwhere(x>1):
...: print(i,j,x[i,j])
...:
...:
0 2 2
1 0 3
1 1 4
1 2 5
I could have used: print(ij,x[tuple(ij)]) in [123].
I should have used unpacking the [117] iteration:
In [125]: for i, in idxs: print(i,somearray[i])
4 5
5 6
or somearray[tuple(i)]

Operations on 'N' dimensional numpy arrays

I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});

In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).

If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]

You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])

The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page

Numpy extract submatrix

I'm pretty new in numpy and I am having a hard time understanding how to extract from a np.array a sub matrix with defined columns and rows:
Y = np.arange(16).reshape(4,4)
If I want to extract columns/rows 0 and 3, I should have:
[[0 3]
[12 15]]
I tried all the reshape functions...but cannot figure out how to do this. Any ideas?

Give np.ix_ a try:
Y[np.ix_([0,3],[0,3])]
This returns your desired result:
In [25]: Y = np.arange(16).reshape(4,4)
In [26]: Y[np.ix_([0,3],[0,3])]
Out[26]:
array([[ 0, 3],
[12, 15]])

One solution is to index the rows/columns by slicing/striding. Here's an example where you are extracting every third column/row from the first to last columns (i.e. the first and fourth columns)
In [1]: import numpy as np
In [2]: Y = np.arange(16).reshape(4, 4)
In [3]: Y[0:4:3, 0:4:3]
Out[1]: array([[ 0, 3],
[12, 15]])
This gives you the output you were looking for.
For more info, check out this page on indexing in NumPy.

print y[0:4:3,0:4:3]
is the shortest and most appropriate fix .

First of all, your Y only has 4 col and rows, so there is no col4 or row4, at most col3 or row3.
To get 0, 3 cols: Y[[0,3],:]
To get 0, 3 rows: Y[:,[0,3]]
So to get the array you request: Y[[0,3],:][:,[0,3]]
Note that if you just Y[[0,3],[0,3]] it is equivalent to [Y[0,0], Y[3,3]] and the result will be of two elements: array([ 0, 15])

You can also do this using:
Y[[[0],[3]],[0,3]]
which is equivalent to doing this using indexing arrays:
idx = np.array((0,3)).reshape(2,1)
Y[idx,idx.T]
To make the broadcasting work as desired, you need the non-singleton dimension of your indexing array to be aligned with the axis you're indexing into, e.g. for an n x m 2D subarray:
Y[<n x 1 array>,<1 x m array>]
This doesn't create an intermediate array, unlike CT Zhu's answer, which creates the intermediate array Y[(0,3),:], then indexes into it.

This can also be done by slicing: Y[[0,3],:][:,[0,3]]. More elegantly, it is possible to slice arrays (or even reorder them) by given sets of indices for rows, columns, pages, et cetera:
r=np.array([0,3])
c=np.array([0,3])
print(Y[r,:][:,c]) #>>[[ 0 3][12 15]]
for reordering try this:
r=np.array([0,3])
c=np.array([3,0])
print(Y[r,:][:,c])#>>[[ 3 0][15 12]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to delete rows of numpy array by multiple row indices? - python

Related

Numpy: Find indexes of rows in another array

Faster way to set new df value using np.array index values in dataframe

iterating a filtered Numpy array whilst maintaining index information

Operations on 'N' dimensional numpy arrays

Numpy extract submatrix

Categories

Resources