I have generated matrix of pairwise distances between list items, but something went wrong and it is not symmetric.
In the case the matrix looks like this:
array = np.array([
[0, 3, 4],
[3, 0, 2],
[1, 2, 0]
])
How can I locate the actual asymmetries? In this case, the indices of 4 and 1.
I have confirmed the asymmetry by trying to condense the matrix by scipy squareform function, and then using
def check_symmetric(a, rtol=1e-05, atol=1e-08):
return np.allclose(a, a.T, rtol=rtol, atol=atol)
quite late but here would be a alternative the numpy way...
import numpy as np
m = np.array([[0, 3, 4 ],
[ 3, 0, 2 ],
[ 1, 2, 0 ]])
def check_symmetric(a):
diff = a - a.T
boolmatrix = np.isclose(a, a.T) # play around with your tolerances here...
output = np.argwhere(boolmatrix == False)
return output
output:
check_symmetric(m)
>>> array([[0, 2],
[2, 0]])
You can simply use the negation of np.isclose():
mask = ~np.isclose(array, array.T)
mask
# array([[False, False, True],
# [False, False, False],
# [ True, False, False]])
Use that value as an index to get the values:
array[mask]
# array([4, 1])
And use np.where() if you want the indices instead:
np.where(mask)
# (array([0, 2]), array([2, 0]))
The following is quick and slow but if the object is to debug will probably do.
a # nearly symmetric array.
Out:
array([[8, 1, 6, 5, 3],
[1, 9, 4, 4, 4],
[6, 4, 3, 7, 1],
[5, 4, 7, 5, 2],
[3, 4, 1, 3, 7]])
Define function to find and print the differences.
ERROR_LIMIT = 0.00001
def find_asymmetries( a ):
""" Prints the row and column indices with the difference
where abs(a[r,c] - a[c,r]) > ERROR_LIMIT """
res = a-a.T
for r, row in enumerate(res):
for c, cell in enumerate(row):
if abs(cell) > ERROR_LIMIT : print( r, c, cell )
find_asymmetries( a )
3 4 -1
4 3 1
This version halves the volume of results.
def find_asymmetries( a ):
res = a-a.T
for r, row in enumerate(res):
for c, cell in enumerate(row):
if c == r: break # Stop column search once c == r
if abs(cell) > ERROR_LIMIT : print( r, c, cell )
find_asymmetries( a )
4 3 1 # Row number always greater than column number
Related
For example, let's consider the following numpy array:
[1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
Also, let's suppose that the threshold is equal to 3.
That is to say that we are looking for sequences of at least two consecutive values that are all above the threshold.
The output would be the indices of those values, which in our case is:
[[3, 4, 5], [8, 9]]
If the output array was flattened that would work as well!
[3, 4, 5, 8, 9]
Output Explanation
In our initial array we can see that for index = 1 we have the value 5, which is greater than the threshold, but is not part of a sequence (of at least two values) where every value is greater than the threshold. That's why this index would not make it to our output.
On the other hand, for indices [3, 4, 5] we have a sequence of (at least two) neighboring values [5, 4, 6] where each and every of them are above the threshold and that's the reason that their indices are included in the final output!
My Code so far
I have approached the issue with something like this:
(arr > 3).nonzero()
The above command gathers the indices of all the items that are above the threshold. However, I cannot determine if they are consecutive or not. I have thought of trying a diff on the outcome of the above snippet and then may be locating ones (that is to say that indices are one after the other). Which would give us:
np.diff((arr > 3).nonzero())
But I'd still be missing something here.
If you convolve a boolean array with a window full of 1 of size win_size ([1] * win_size), then you will obtain an array where there is the value win_size where the condition held for win_size items:
import numpy as np
def groups(arr, *, threshold, win_size, merge_contiguous=False, flat=False):
conv = np.convolve((arr >= threshold).astype(int), [1] * win_size, mode="valid")
indexes_start = np.where(conv == win_size)[0]
indexes = [np.arange(index, index + win_size) for index in indexes_start]
if flat or merge_contiguous:
indexes = np.unique(indexes)
if merge_contiguous:
indexes = np.split(indexes, np.where(np.diff(indexes) != 1)[0] + 1)
return indexes
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
win_size = 2
print(groups(arr, threshold=threshold, win_size=win_size))
print(groups(arr, threshold=threshold, win_size=win_size, merge_contiguous=True))
print(groups(arr, threshold=threshold, win_size=win_size, flat=True))
[array([3, 4]), array([4, 5]), array([8, 9])]
[array([3, 4, 5]), array([8, 9])]
[3 4 5 8 9]
You can do what you want using simple numpy operations
import numpy as np
arr = np.array([1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
arr_padded = np.concatenate(([0], arr, [0]))
a = np.where(arr_padded > 3, 1, 0)
da = np.diff(a)
idx_start = (da == 1).nonzero()[0]
idx_stop = (da == -1).nonzero()[0]
valid = (idx_stop - idx_start >= 2).nonzero()[0]
result = [list(range(idx_start[i], idx_stop[i])) for i in valid]
print(result)
Explanation
Array a is a padded binary version of the original array, with 1s where the original elements are greater than three. da contains 1s where "islands" of 1s begin in a, and -1 where the "islands" end in a. Due to the padding, there is guaranteed to be an equal number of 1s and -1s in da. Extracting their indices, we can calculate the length of the islands. Valid index pairs are those whose respective "islands" have length >= 2. Then, its just a matter of generating all numbers between the index bounds of the valid "islands".
I follow your original idea. You are almost done.
I use another diff2 to pick the index of the first value in a sequence. See comments in code for details.
import numpy as np
arr = np.array([ 1, 5, 0, 5, 4, 6, 1, -1, 5, 10])
threshold = 3
all_idx = (arr > threshold).nonzero()[0]
# array([1, 3, 4, 5, 8, 9])
result = np.empty(0)
if all_idx.size > 1:
diff1 = np.zeros_like(all_idx)
diff1[1:] = np.diff(all_idx)
# array([0, 2, 1, 1, 3, 1])
diff1[0] = diff1[1]
# array([2, 2, 1, 1, 3, 1])
# **Positions with a value 1 in diff1 should be reserved.**
# But we also want the position before each 1. Create another diff2
diff2 = np.zeros_like(all_idx)
diff2[:-1] = np.diff(diff1)
# array([ 2, -1, 0, 2, -2, 0])
# **Positions with a negative value in diff2 should be reserved.**
result = all_idx[(diff1==1) | (diff2<0)]
print(result)
# array([3, 4, 5, 8, 9])
I'll try something different using window views, I'm not sure this works all the time so counterexamples are welcome. It has the advantage of not requiring Python loops.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as window
def consec_thresh(arr, thresh):
win = window(np.argwhere(arr > thresh), (2, 1))
return np.unique(win[np.diff(win, axis=2).ravel() == 1, :,:].ravel())
How does it work?
So we start with the array and gather the indices where the threshold is met:
In [180]: np.argwhere(arr > 3)
Out[180]:
array([[1],
[3],
[4],
[5],
[8],
[9]])
Then we build a sliding window that makes up pair of values along the column (which is the reason for the (2, 1) shape of the window).
In [181]: window(np.argwhere(arr > 3), (2, 1))
Out[181]:
array([[[[1],
[3]]],
[[[3],
[4]]],
[[[4],
[5]]],
[[[5],
[8]]],
[[[8],
[9]]]])
Now we want to take the difference inside each pair, if it's one then the indices are consecutive.
In [182]: np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2)
Out[182]:
array([[[[2]]],
[[[1]]],
[[[1]]],
[[[3]]],
[[[1]]]])
We can plug those values back in the windows we created above,
In [185]: window(np.argwhere(arr > 3), (2, 1))[np.diff(window(np.argwhere(arr > 3), (2, 1)), axis=2).ravel() == 1, :, :]
Out[185]:
array([[[[3],
[4]]],
[[[4],
[5]]],
[[[8],
[9]]]])
Then we can ravel (flatten without copy when possible), we have to get rid of the repeated indices created by windowing so I call np.unique. We ravel again and get:
array([3, 4, 5, 8, 9])
The below iteration code should help with O(n) complexity
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
threshold = 3
sequence = 2
output = []
temp_arr = []
for i in range(len(arr)):
if arr[i] > threshold:
temp_arr.append(i)
else:
if len(temp_arr) >= sequence:
output.append(temp_arr)
temp_arr = []
if len(temp_arr):
output.append(temp_arr)
temp_arr = []
print(output)
# Output
# [[3, 4, 5], [8, 9]]
I would suggest using a for loop with two indces. You will have one that starts at j=1 and the other at i=0, both stepping forward by 1.
You can then ask if the value at both is greater than the threshold, if so
add the indices to a list and keep moving forward with j until the threshold or .next() is not greater than threshhold.
values = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
res=[]
threshold= 3
i=0
j=0
for _ in values:
j=i+1
lista=[]
try:
print(f"i: {i} j:{j}")
# check if condition is met
if(values[i] > threshold and values[j] > threshold):
lista.append(i)
# add sequence
while values[j] > threshold:
lista.append(j)
print(f"j while: {j}")
j+=1
if(j>=len(values)):
break
res.append(lista)
i=j
if(j>=len(values)):
break
except:
print("ex")
this works. but needs refactoring
Let's try the following code:
# Simple is better than complex
# Complex is better than complicated
arr = [1, 5, 0, 5, 4, 6, 1, -1, 5, 10]
arr_3=[i if arr[i]>3 else 'a' for i in range(len(arr))]
arr_4=''.join(str(x) for x in arr_3)
i=0
while i<len(arr_5):
if len(arr_5[i]) <=1:
del arr_5[i]
else:
i+=1
arr_6=[list(map(lambda x: int(x), list(x))) for x in arr_5]
print(arr_6)
Outputs:
[[3, 4, 5], [8, 9]]
Here is a solution that makes use of pandas Series:
thresh = 3
win_size = 2
s = pd.Series(arr)
# locating groups of values where there are at least (win_size) consecutive values above the threshold
groups = s.groupby(s.le(thresh).cumsum().loc[s.gt(thresh)]).transform('count').ge(win_size)
0 False
1 False
2 False
3 True
4 True
5 True
6 False
7 False
8 True
9 True
dtype: bool
We can now easily take their indices in a 1D array:
np.flatnonzero(groups)
# array([3, 4, 5, 8, 9], dtype=int64)
OR multiple lists:
[np.arange(index.start, index.stop) for index in np.ma.clump_unmasked(np.ma.masked_not_equal(groups.values, value=True))]
# [array([3, 4, 5], dtype=int64), array([8, 9], dtype=int64)]
Is there any efficient numpy way to do the following:
Assume I have some matix M of size R X C. Now assume I have another matrix
E which is of shape R X a (where a is just some constant a < C), which contains row indices of
M (and -1 for padding, i.e., every element of E is in {-1, 0, .., R-1}). For example,
M=array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
E = array([[ 0, 1],
[ 2, -1],
[-1, 0]])
Now, given those matrices, I want to generate a third matrix P, where the i'th row of P will
contain the sum of the following rows of M : E[i,:]. In the example, P will be,
P[0,:] = M[0,:] + M[1,:]
P[1,:] = M[2,:]
P[2,:] = M[0,:]
Yes, doing it with a loop is pretty straight forward and easy, I was wondering if there is
any fancy numpy way to make it more efficient (assuming that I want to do it with large matrices,
e.g., 200 X 200.
Thanks!
One way would be to sum with indexed on original array and then subtract out the summations caused by the last indexed ones by -1s -
out = M[E].sum(1) - M[-1]*(E==-1).sum(1)[:,None]
Another way would be pad zeros at the end of M, so that those -1 would index into those zeros and hence have no effect on the final sum after indexing -
M1 = np.vstack((M, np.zeros((1,M.shape[1]), dtype=M.dtype)))
out = M1[E].sum(1)
If there is exactly one or lesser -1 per row in E, we can optimize further -
out = M[E].sum(1)
m = (E==-1).any(1)
out[m] -= M[-1]
Another based on tensor-multiplication -
np.einsum('ij,kli->kj',M, (E[...,None]==np.arange(M.shape[1])))
You could index M with E, and np.sum where the actual indices in E are greater or equal to 0. For that we have the where parameter:
np.sum(M[E], where=(E>=0)[...,None], axis=1)
array([[5, 7, 9],
[7, 8, 9],
[1, 2, 3]])
Where we have that:
M[E]
array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[7, 8, 9]],
[[7, 8, 9],
[1, 2, 3]]])
Is added on the rows:
(E>=0)[...,None]
array([[[ True],
[ True]],
[[ True],
[False]],
[[False],
[ True]]])
Probably not the fastest but maybe educational: The operation you are describing can be thought of as matrix multiplication with a certain adjacency matrix:
from scipy import sparse
# construct adjacency matrix
indices = E[E!=-1]
indptr = np.concatenate([[0],np.count_nonzero(E!=-1,axis=1).cumsum()])
data = np.ones_like(indptr)
aux = sparse.csr_matrix((data,indices,indptr))
# multiply
aux*M
# array([[5, 7, 9],
# [7, 8, 9],
# [1, 2, 3]], dtype=int64)
I have the following matrix:
import numpy as np
A:
matrix([[ 1, 2, 3, 4],
[ 3, 4, 10, 8]])
The question is how do I input the following restriction: if any number of a column in the matrix A is less than or equal to (<=) K (3), then change the last number of that column to minimum between the last entry of the column and 5? So basically, my matrix should transform to this:
A:
matrix([[ 1, 2, 3, 4],
[ 3, 4, 5, 8]])
I tried this function:
A[-1][np.any(A <= 3, axis=0)] = np.maximum(A[-1], 5)
But I have the following error:
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions
You should be using np.minimum here. Create a mask, and index, setting values accordingly.
B = np.array(A)
m = (B <= 3).any(0)
A[-1, m] = np.minimum(A[-1, m], 5)
A
matrix([[1, 2, 3, 4],
[3, 4, 5, 8]])
Here is one way:
A[-1][np.logical_and(A[-1] > 5, np.any(A <= 3, axis=0))] = 5
# matrix([[1, 2, 3, 4],
# [3, 4, 5, 8]])
This takes advantage of the fact you only need to change a number if it greater than 5. Therefore, the minimum criterion is taken care of by the A[-1] > 5 condition.
I have an array in Python like so:
Example:
>>> scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
>>> scores
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
I want to find all [row, col] indices in scores where the value is:
1) the minimum in its row
2) larger than a threshold
3) at most .8 times the next largest value in the row
I would like to do it as efficiently as possible, preferably without any loops. I've been struggling with this for a while, so any help you can provide would be greatly appreciated!
It should go something along the lines of
In [1]: scores = np.array([[8,5,6,2], [9,4,1,4], [2,5,3,8]]); threshold = 1.1; scores
Out[1]:
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
In [2]: part = np.partition(scores, 2, axis=1); part
Out[2]:
array([[2, 5, 6, 8],
[1, 4, 4, 9],
[2, 3, 5, 8]])
In [3]: row_mask = (part[:,0] > threshold) & (part[:,0] <= 0.8 * part[:,1]); row_mask
Out[3]: array([ True, False, True], dtype=bool)
In [4]: rows = row_mask.nonzero()[0]; rows
Out[4]: array([0, 2])
In [5]: cols = np.argmin(scores[row_mask], axis=1); cols
Out[5]: array([3, 0])
At that moment if you're looking for actual coordinate pairs, you can just zip them:
In [6]: coords = zip(rows, cols); coords
Out[6]: [(0, 3), (2, 0)]
Or if you're planning to look those elements up, you can use them as is:
In [7]: scores[rows, cols]
Out[7]: array([2, 2])
I think that you're going to have a hard time doing this with out any for loops (or at least something that performs such a loop but might be disguising it as something else), seeing as how the operation is only dependent on the row and you want to do it for each row. It's not the most efficient (and what is may depend on how frequently conditions 2 and 3 are true) but this will work:
import heapq
threshold = 1.5
ratio = .8
scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
found_points = []
for i,row in enumerate(scores):
lowest,second_lowest = heapq.nsmallest(2,row)
if lowest > threshold and lowest <= ratio*second_lowest:
found_points.append([i,numpy.where(row == lowest)[0][0]])
You get (for the example):
found_points = [[0, 3], [2, 0]]
I have to analyze a quadratic 2D numpy array LL for values which are symmetric (LL[i,j] == LL[j,i]) and not zero.
Is there a faster and more "array like" way without loops to do this?
Is there a easy way to store the indices of the values for later use without creating a array and append the tuple of the indices in every loop?
Here my classical looping approach to store the indices:
IdxArray = np.array() # Array to store the indices
for i in range(len(LL)):
for j in range(i+1,len(LL)):
if LL[i,j] != 0.0:
if LL[i,j] == LL[j,i]:
IdxArray = np.vstack((IdxArray,[i,j]))
later use the indices:
for idx in IdxArray:
P = LL[idx]*(TT[idx[0]]-TT[idx[1]])
...
>>> a = numpy.matrix('5 2; 5 4')
>>> b = numpy.matrix('1 2; 3 4')
>>> a.T == b.T
matrix([[False, False],
[ True, True]], dtype=bool)
>>> a == a.T
matrix([[ True, False],
[False, True]], dtype=bool)
>>> numpy.nonzero(a == a.T)
(matrix([[0, 1]]), matrix([[0, 1]]))
How about this:
a = np.array([[1,0,3,4],[0,5,4,6],[7,4,4,5],[3,4,5,6]])
np.fill_diagonal(a, 0) # changes original array, must be careful
overlap = (a == a.T) * a
indices = np.argwhere(overlap != 0)
Result:
>>> a
array([[0, 0, 3, 4],
[0, 0, 4, 6],
[7, 4, 0, 5],
[3, 4, 5, 0]])
>>> overlap
array([[0, 0, 0, 0],
[0, 0, 4, 0],
[0, 4, 0, 5],
[0, 0, 5, 0]])
>>> indices
array([[1, 2],
[2, 1],
[2, 3],
[3, 2]])