Delete rows at select indexes from a numpy array - python

In my dataset I've close to 200 rows but for a minimal working e.g., let's assume the following array:
arr = np.array([[1,2,3,4], [5,6,7,8],
[9,10,11,12], [13,14,15,16],
[17,18,19,20], [21,22,23,24]])
I can take a random sampling of 3 of the rows as follows:
indexes = np.random.choice(np.arange(arr.shape[0]), int(arr.shape[0]/2), replace=False)
Using these indexes, I can select my test cases as follows:
testing = arr[indexes]
I want to delete the rows at these indexes and I can use the remaining elements for my training set.
From the post here, it seems that training = np.delete(arr, indexes) ought to do it. But I get 1d array instead.
I also tried the suggestion here using training = arr[indexes.astype(np.bool)] but it did not give a clean separation. I get element [5,6,7,8] in both the training and testing sets.
training = arr[indexes.astype(np.bool)]
testing
Out[101]:
array([[13, 14, 15, 16],
[ 5, 6, 7, 8],
[17, 18, 19, 20]])
training
Out[102]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Any idea what I am doing wrong? Thanks.

To delete indexed rows from numpy array:
arr = np.delete(arr, indexes, axis=0)

One approach would be to get the remaining row indices with np.setdiff1d and then use those row indices to get the desired output -
out = arr[np.setdiff1d(np.arange(arr.shape[0]), indexes)]
Or use np.in1d to leverage boolean indexing -
out = arr[~np.in1d(np.arange(arr.shape[0]), indexes)]

Related

How to square a row in NumPy to go from a 2-d array to a 3-d one where each row was squared?

I am trying to figure out a way to get the rows of a 2-d matrix squared.
The behaviour I would like to have is something like this:
in[1] import numpy as np
in[2] a = np.array([[1,2,3],
[4,5,6]])
in[3] some_function(a) # for each row, row.reshape(-1,1); row # row.T
out[1] array([[[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9]],
[[16, 20, 24],
[20, 25, 30],
[24, 30, 36]]])
I need this to make a softmax derivative for auto diff in a manual implementation of a feed-forward neural network.
The same derivative would look like this for a point:
in[4] def softmax_derivative(x):
in[5] s = x.reshape(-1,1)
in[6] return np.diagflat(s) - np.dot(s,s.T)
Instead of np.diagflat I am using:
in[7] matrix = np.array([[1,2,3],
[4,5,6])
in[8] matrix.shape
out[2] (2,3)
in[9] Id = np.eye(matrix.shape[-1])
in[10] (matrix[...,np.newaxis] * Id).shape
out[3] (2,3,3)
The reason I want a 3-d array of the squared rows is to subtract it from the 3-d array of the diagonal rows which I get in the same way as in the above example.
While I know that I can get the same multiplication result from
in[11] def get_squared_rows(matrix):
in[12] s = matrix.reshape(-1,1)
in[13] return s # s.T
I do not know how to get it to the correct shape in a fast way. Since, yes, the correct 2-d arrays are a part of the matrix on the diagonal, I have to get them together to match the shape of the diagonal 3-d matrix I got. This means I would somehow both have to extract the correct matrices and then turn that into a 3-d array of shape (n_samples,row,row). I do not know how to do that any faster than just a simple loop through all rows of the input matrix.
Use broadcasting:
>>> a[:, None, :] * a[:, :, None]
array([[[ 1, 2, 3],
[ 2, 4, 6],
[ 3, 6, 9]],
[[16, 20, 24],
[20, 25, 30],
[24, 30, 36]]])

how to delete rows and columns in numpy python?

I am having trouble creating a function which takes a matrix M as an input and deletes BOTH rows and columns containing the number 0 and giving an output containing the remaining numbers. Any help is much appreciated as I have my programming exam coming up soon.
By "deleting both rows and columns" this is what I mean:
import numpy as np
x = np.array([[1,2,3,4,5],
[6,0,8,9,10],
[11,12,13,14,15],
[16,0,0,19,20]])
idxs_array = list(np.where(x==0))
idxs_array = [list(dict.fromkeys(x)) for x in idxs_array]
for axis, idxs in enumerate(idxs_array):
sub_factor = 0
for idx in idxs:
x = np.delete(x,idx-sub_factor,axis)
sub_factor += 1
print(x)
# x = [[ 1, 4, 5],
# [11, 14, 15]]
1. Locate zero elements
First of all, we need to identify the location of the zero elements in the matrix, which can be done easily with np.where().
np.where will return the row/column indices of the elements matched specific condition (doc).
row_idx, col_idx = np.where(arr == 0)
2. Remove corresponding rows/columns
To remove corresponding rows and columns, there is an easy way to do this, which is indexing (doc).
That is, you can specify the row (or column) you want to keep with True, else it shall be False.
print(np.arange(4)[[True, False, True, False]])
# array([0, 2])
3. Put two things together
Here is a minimal example.
arr = np.array([[ 1, 2, 3, 4, 5],
[ 6, 0, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 0, 0, 19, 20]])
row_idx, col_idx = np.where(arr == 0)
rm_row_idx = set(row_idx.tolist())
rm_col_idx = set(col_idx.tolist())
row_mask = [i not in rm_row_idx for i in range(arr.shape[0])]
col_mask = [i not in rm_col_idx for i in range(arr.shape[1])]
arr = arr[row_mask, :]
arr = arr[:, col_mask]
print(arr)
# Shall be:
# array([[ 1, 4, 5],
# [11, 14, 15]])

numpy python - slicing rows and columns at the same time

I have a numpy matrix with 130 X 13. Say I want to select a specific set of rows meeting a condition and a subset of columns -
trainx[trainy==label,[0,6]]
The above code does not work and throws an error - IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (43,) (2,).
However if I do it in 2 steps - first subset rows and then columns, it works. Is it something weird or numpy works this way?
temp1 = trainx[trainy==label,:]
temp1 = temp1[:,[0,6]]
you can simply chain the indexing like
trainx[trainy==label][:, [0,6]]
Runable Example
arr = np.random.rand(130,13)
arr[arr[:,0]>0.5][:, [0,6]]
In [154]: x = np.arange(24).reshape(6,4)
In [155]: mask = np.array([1,0,1,0,1,0],bool)
With your two step approach:
In [156]: x[mask] # x[mask, :]
Out[156]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11],
[16, 17, 18, 19]])
In [157]: x[mask][:,[1,3]]
Out[157]:
array([[ 1, 3],
[ 9, 11],
[17, 19]])
Or the two indices could be combined with ix_:
In [158]: np.ix_(mask, [1,3])
Out[158]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
In [159]: x[np.ix_(mask, [1,3])]
Out[159]:
array([[ 1, 3],
[ 9, 11],
[17, 19]])
Note that the first array in Out[158] is np.nonzero(mask)[0][:,None], the nonzero indices in column vector form. That (3,1) indexing array can broadcast with the (2,) column array to select a (3,2) array of elements. Or in your example a (43,2) array.
The boolean mask cannot be turned into a (6,1) array and used to mask x; that would only work if it was turned into a (6,4) mask, matching the shape of x.
So either use the 2 step indexing, or use ix_.

How to Reccurently Transpose A Series/List/Array

I have a array/list/pandas series :
np.arange(15)
Out[11]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
What I want is:
[[0,1,2,3,4,5],
[1,2,3,4,5,6],
[2,3,4,5,6,7],
...
[10,11,12,13,14]]
That is, recurently transpose this columns into a 5-column matrix.
The reason is that I am doing feature engineering for a column of temperature data. I want to use last 5 data as features and the next as target.
What's the most efficient way to do that? my data is large.
If the array is formatted like this :
arr = np.array([1,2,3,4,5,6,7,8,....])
You could try it like this :
recurr_transpose = np.matrix([[arr[i:i+5] for i in range(len(arr)-4)]])

getting multiple array after performing subtraction operation within array elements

import numpy as np
m = []
k = []
a = np.array([[1,2,3,4,5,6],[50,51,52,40,20,30],[60,71,82,90,45,35]])
for i in range(len(a)):
m.append(a[i, -1:])
for j in range(len(a[i])-1):
n = abs(m[i] - a[i,j])
k.append(n)
k.append(m[i])
print(k)
Expected Output in k:
[5,4,3,2,1,6],[20,21,22,10,10,30],[25,36,47,55,10,35]
which is also a numpy array.
But the output that I am getting is
[array([5]), array([4]), array([3]), array([2]), array([1]), array([6]), array([20]), array([21]), array([22]), array([10]), array([10]), array([30]), array([25]), array([36]), array([47]), array([55]), array([10]), array([35])]
How can I solve this situation?
You want to subtract the last column of each sub array from themselves. Why don't you use a vectorized approach? You can do all the subtractions at once by subtracting the last column from the rest of the items and then column_stack together with unchanged version of the last column. Also note that you need to change the dimension of the last column inorder to be subtractable from the 2D array. For that sake we can use broadcasting.
In [71]: np.column_stack((abs(a[:, :-1] - a[:, None, -1]), a[:,-1]))
Out[71]:
array([[ 5, 4, 3, 2, 1, 6],
[20, 21, 22, 10, 10, 30],
[25, 36, 47, 55, 10, 35]])

Categories

Resources