How to delete particular array in 2 dimensional NumPy array by value? - python

Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array

There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.

You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)

the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]

Related

Combinig two numpy arrays of different dimensions

I want to add one numpy array two another so it will look like this:
a = [3, 4]
b = [[6, 5], [2, 1]]
output:
[[3, 4], [[6, 5], [2, 1]]]
It should look like the output above and not like [[3,4],[6,5],[2,1]].
How do I do that with numpy arrays?
Work with pure python lists, and not numpy arrays.
It doesn't make sense to have a numpy array holding two list objects. There's literally no gain in doing that.
If you directly instantiate an array like so:
np.array([[3, 4], [[6, 5], [2, 1]]])
You get
array([[3, 4],
[list([6, 5]), list([2, 1])]], dtype=object)
which is an array with dtype=object. Most of numpy's power is lost in this case. For more information on examples and why, take a look at this thread.
If you work with pure python lists, then you can easily achieve what you want:
>>> a + b
[[3, 4], [[6, 5], [2, 1]]]
Numpy as built-in stack commands that are (in my opinion) slightly easier to use:
>>> a = np.array([3, 4])
>>> b = np.array([[6, 5], [2, 1]])
>>> np.row_stack([b, a])
array([[3, 4],
[6, 5],
[2, 1]])
There's also a column stack.
Ref: https://numpy.org/doc/stable/reference/generated/numpy.ma.row_stack.html
You can't stack arrays of different shapes in one array the way you want (or you have to fill in gaps with NaNs or zeroes), so if you want to iterate over them, consider using list.
a = np.array([3, 4])
b = np.array([[6, 5], [2, 1]])
c = [a, b]
for arr in c:
...
If you still want a numpy array, you can try this:
>>> a = np.array([3, 4])
>>> b = np.array([[6, 5], [2, 1]])
>>> a.resize(b.shape,refcheck=False)
>>> c = np.array([a, b])
array([[[3, 4],
[0, 0]],
[[6, 5],
[2, 1]]])

Comparing three numpy arrays so wherever one is not a given value, the other are not that given value either

Is there an effective way in which to compare all three numpy arrays at once?
For example, if the given value to check is 5, wherever the value is not 5, it should be not 5 for all three arrays.
The only way I've thought of how to do this would be checking that occurrences that arr1 != 5 & arr2 == 5 is 0. However this only checks one direction between the two arrays, and then I need to also incorporate arr3. This seems inefficient and might end up with some logical hole.
This should pass:
arr1 = numpy.array([[1, 7, 3],
[4, 5, 6],
[4, 5, 2]])
arr2 = numpy.array([[1, 2, 3],
[4, 5, 6],
[8, 5, 6]])
arr3 = numpy.array([[1, 1, 3],
[4, 5, 6],
[9, 5, 6]])
However this should fail due to arr2 having a 3 where other arrays have 5s
arr1 = numpy.array([[1, 2, 3],
[8, 5, 6],
[4, 5, 6]])
arr2 = numpy.array([[1, 2, 3],
[2, 3, 1],
[2, 5, 6]])
arr3 = numpy.array([[1, 2, 3],
[4, 5, 6],
[4, 5, 3]])
There is a general solution (regardless number of arrays). And it's quite educational:
import numpy as np #a recommended way of import
arr = np.array([arr1, arr2, arr3])
is_valid = np.all(arr==5, axis=0) == np.any(arr==5, axis=0) #introduce axis
out = np.all(is_valid)
#True for the first case, False for the second one
Is this a valid solution?
numpy.logical_and(((arr1==5)==(arr2==5)).all(), ((arr2==5)==(arr3==5)).all())
You could AND all comparisons to 5 and compare to any one of the comparisons:
A = (arr1==5)
(A==(A&(arr2==5)&(arr3==5))).all()
Output: True for the first example, False for the second
NB. This works for any number of arrays

Accessing all elements of a numpy array

I have a 2d numpy array of integers, and now I want to change all the elements greater than 5 to 5.
For example,
[[2, 6],
[7, 3]]
to
[[2, 5],
[5, 3]]
Now, my current approach is to access all the elements using two for loops and then checking if each element is greater then 5, like this:
h, w = arr.shape[:2]
for x in range(h):
for y in range(w):
if arr[x,y] > 5:
arr[x,y] = 5
Is there any more pythonic approach for doing this?
Use np.clip() It can clip both on lower and upper value. We just pass None as lower value, to clip only on upper one.
>>> import numpy as np
>>> arr = np.array([[2, 6], [7, 3]])
>>> arr
array([[2, 6],
[7, 3]])
>>> np.clip(arr, None, 5)
array([[2, 5],
[5, 3]])
>>>

Can numpy strides stride only within subarrays?

I have a really big numpy array(145000 rows * 550 cols). And I wanted to create rolling slices within subarrays. I tried to implement it with a function. The function lagged_vals behaves as expected but np.lib.stride_tricks does not behave the way I want it to -
def lagged_vals(series,l):
# Garbage implementation but still right
return np.concatenate([[x[i:i+l] for i in range(x.shape[0]) if i+l <= x.shape[0]] for x in series]
,axis = 0)
# Sample 2D numpy array
something = np.array([[1,2,2,3],[2,2,3,3]])
lagged_vals(something,2) # Works as expected
# array([[1, 2],
# [2, 2],
# [2, 3],
# [2, 2],
# [2, 3],
# [3, 3]])
np.lib.stride_tricks.as_strided(something,
(something.shape[0]*something.shape[1],2),
(8,8))
# array([[1, 2],
# [2, 2],
# [2, 3],
# [3, 2], <--- across subarray stride, which I do not want
# [2, 2],
# [2, 3],
# [3, 3])
How do I remove that particular row in the np.lib.stride_tricks implementation? And how can I scale this cross array stride removal for a big numpy array ?
Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -
from numpy.lib.stride_tricks import as_strided
L = 2 # window length
shp = a.shape
strd = a.strides
out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)
out = as_strided(a, out_shp, out_strd).reshape(-1,L)
Sample input, output -
In [177]: a
Out[177]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [178]: out
Out[178]:
array([[0, 1],
[1, 2],
[2, 3],
[4, 5],
[5, 6],
[6, 7]])
Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -
In [181]: np.shares_memory(a, out)
Out[181]: False
In [182]: as_strided(a, out_shp, out_strd)
Out[182]:
array([[[0, 1],
[1, 2],
[2, 3]],
[[4, 5],
[5, 6],
[6, 7]]])
In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True

Search Numpy array with multiple values

I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]

Categories

Resources