Is there an existing function in numpy that takes 2 numpy arrays (x,y) and returns a boolean matrix for each i,j (x[i]>y[j])
For example, let x = [3, 4 ,5] and y = [1, 2, 3] and I want
res = [ [True, True, False],
[True, True, True],
[True, True, True] ]
You don't need a function here, just array broadcasting can work if you shape your arrays properly. I think you want this approach, which makes x a column vector and y a row vector:
x = np.array([3,4,5])
y = np.array([1,2,3])
res = x[:,None] > y[None,:]
Using numpy, you can cast your x and y list to arrays like so:x = np.array([3,4,5]) y=np.array([1,2,3]) and then numpy does elementwise comparisons by simply doing: print(x > y)
Related
I have two 2d arrays, one containing float values, one containing bool. I want to create an array containing the mean values of the first matrix for each column considering only the values corresponding to False in the second matrix.
For example:
A = [[1 3 5]
[2 4 6]
[3 1 0]]
B = [[True False False]
[False False False]
[True True False]]
result = [2, 3.5, 3.67]
Where B is False, keep the value of A, make it NaN otherwise and then use the nanmean function which ignores NaN's for operations.
np.nanmean(np.where(~B, A, np.nan), axis=0)
>>> array([2. , 3.5 , 3.66666667])
Using numpy.mean using where argument to specify elements to include in the mean.
np.mean(A, where = ~B, axis = 0)
>>> [2. 3.5 3.66666667]
A = [[1, 3, 5],
[2, 4, 6],
[3, 1, 0]]
B = [[True, False, False],
[False, False, False],
[True, True, False]]
sums = [0]*len(A[0])
amounts = [0]*len(A[0])
for i in range(0, len(A)):
for j in range(0, len(A[0])):
sums[j] = sums[j] + (A[i][j] if not B[i][j] else 0)
amounts[j] = amounts[j] + (1 if not B[i][j] else 0)
result = [sums[i]/amounts[i] for i in range(0, len(sums))]
print(result)
There may be some fancy numpy trick for this, but I think using a list comprehension to construct a new array is the most straightforward.
result = np.array([a_col[~b_col].mean() for a_col, b_col in zip(A.T,B.T)])
To follow better, this is what the line does expanded out:
result=[]
for i in range(len(A)):
new_col = A[:,i][~B[:,i]]
result.append(new_col.mean())
You could also use a masked array:
import numpy as np
result = np.ma.array(A, mask=B).mean(axis=0).filled(fill_value=0)
# Output:
# array([2. , 3.5 , 3.66666667])
which has the advantage of being able to supply a fill_value for when every element in some column in B is True.
I would like to check if a numpy tuple is present in a numpy array of tuples.
When I run the following code:
import numpy as np
myarray=np.array([[0,1],[0,2],[0,3],[4,4]])
test1=np.array([0,3])
test2=np.array([4,0])
myarraylst=myarray.tolist()
test1lst=test1.tolist()
test2lst=test2.tolist()
print(test1lst in myarraylst)
print(test2lst in myarraylst)
I get "True" for the first test and "False" for the second test as it should be.
Is there a way to do this without converting the numpy arrays to python lists ?
Many Thanks !
For lists, the in tests for the identity/equality of the sublists. For arrays in, or np.isin the evaluation goes all the down, to the numeric elements.
In [181]: myarray = np.array([[0, 1], [0, 2], [0, 3], [4, 4]])
...: test1 = np.array([0, 3])
...: test2 = np.array([4, 0])
But we can do an elementwise test:
In [183]: test1 == myarray
Out[183]:
array([[ True, False],
[ True, False],
[ True, True],
[False, False]])
Here one array is (4,2) and the other (2,) shape, which broadcast together just fine. For other cases we may need to tweak dimensions.
You just want the rows where both elements match:
In [184]: (test1 == myarray).all(axis=1)
Out[184]: array([False, False, True, False])
In [185]: (test2 == myarray).all(axis=1)
Out[185]: array([False, False, False, False])
and reduce those arrays to one value with any:
In [187]: (test1 == myarray).all(axis=1).any()
Out[187]: True
In [188]: (test2 == myarray).all(axis=1).any()
Out[188]: False
For my requirement, I need to find the locations of all the instances between 2 numpy arrays that have different data types
array 1 can be so : numpy.array(['1',3, 9, None])
array 2 can be so : numpy.array([5,4,3,2])
if they all were of the same type then I can do array 1 - array 2 diff to get the numerical differences. This won't be possible in the above scenario. So, as part of my data quality check, I would like to explicitly flag the indexes of array 1 that are of a different type than array 2. What would be the most pythonic way to do so?
A testing function:
def foo(a,b):
try:
a-b
return True
except TypeError:
return False
the two sample arrays:
In [22]: array1 = np.array(['1',3, 9, None])
...: array2 = np.array([5,4,3,2])
test the 2 arrays - ones that work:
In [23]: [i for i,(a,b) in enumerate(zip(array1,array2)) if foo(a,b)]
Out[23]: [1, 2]
ones that don't:
In [24]: [i for i,(a,b) in enumerate(zip(array1,array2)) if not foo(a,b)]
Out[24]: [0, 3]
another way to use foo, getting a boolean array:
In [26]: f = np.frompyfunc(foo, 2, 1)
In [27]: f(array1,array2)
Out[27]: array([False, True, True, False], dtype=object)
actually it is still object dtype, but the values are boolean.
In [28]: f(array1,array2).astype(bool)
Out[28]: array([False, True, True, False])
and the problem items in array1:
In [29]: array1[~_]
Out[29]: array(['1', None], dtype=object)
The test function could be more elaborate.
there are two multidimensional boolean arrays with a different number of rows. I want to quickly find indexes of True values in common rows. I wrote the following code but it is too slow.
Is there a faster way to do this?
a=np.random.choice(a=[False, True], size=(100,100))
b=np.random.choice(a=[False, True], size=(1000,100))
for i in a:
for j in b:
if np.array_equal(i, j):
print(np.where(i))
Let's start with an edition to the question that makes sense and usually prints something:
a = np.random.choice(a=[False, True], size=(2, 2))
b = np.random.choice(a=[False, True], size=(4, 2))
print(f"a: \n {a}")
print(f"b: \n {b}")
matches = []
for i, x in enumerate(a):
for j, y in enumerate(b):
if np.array_equal(x, y):
matches.append((i, j))
And the solution using scipy.cdist which compares all rows in a against all rows in b, using hamming distance for Boolean vector comparison:
import numpy as np
import scipy
from scipy import spatial
d = scipy.spatial.distance.cdist(a, b, metric='hamming')
cdist_matches = np.where(d == 0)
mathces_values = [(a[i], b[j]) for (i, j) in matches]
cdist_values = a[cdist_matches[0]], b[cdist_matches[1]]
print(f"matches_inds = \n{matches}")
print(f"matches = \n{mathces_values}")
print(f"cdist_inds = \n{cdist_matches}")
print(f"cdist_matches =\n {cdist_values}")
out:
a:
[[ True False]
[False False]]
b:
[[ True True]
[ True False]
[False False]
[False True]]
matches_inds =
[(0, 1), (1, 2)]
matches =
[(array([ True, False]), array([ True, False])), (array([False, False]), array([False, False]))]
cdist_inds =
(array([0, 1], dtype=int64), array([1, 2], dtype=int64))
cdist_matches =
(array([[ True, False],
[False, False]]), array([[ True, False],
[False, False]]))
See this for a pure numpy implementation if you don't want to import scipy
The comparision of each row of a to each row of b can be made by making the shape of a broadcastable to the shape of b with the use of np.newaxis and np.tile
import numpy as np
a=np.random.choice(a=[True, False], size=(2,5))
b=np.random.choice(a=[True, False], size=(10,5))
broadcastable_a = np.tile(a[:, np.newaxis, :], (1, b.shape[0], 1))
a_equal_b = np.equal(b, broadcastable_a)
indexes = np.where(a_equal_b)
indexes = np.stack(np.array(indexes[1:]), axis=1)
I have an array: [[True], [False], [True]]. If I would want this array to filter my existing array, e.g [[1,2],[3,4],[5,6]] should get filtered to [[1,2],[5,6]], what is the correct way to do this?
A simple a[b] indexing gives the error: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1
The solution is to get the array [[True], [False], [True]] into shape [True, False, True], so that it works for indexing the rows of the other array. As Divakar said, ravel does this; in general it flattens any array to a 1D array. Another option is squeeze which removes the dimensions with size 1 but leaves the other dimensions as they were,
Use .ravel...
From the documentation, ravel will:
Return a contiguous flattened array.
So if we have your b array:
b = np.array([[True], [False], [True]])
we can take the boolean values out of their sub-arrays with:
b.ravel()
which gives:
array([ True, False, True], dtype=bool)
So then, we can simply use b.ravel() as a mask for a and it will work as you want:
a = np.array([[1,2], [3,4], [5,6]])
b = np.array([[True], [False], [True]])
c = a[b.ravel()]
which gives c as:
array([[1, 2],
[5, 6]])