Python: find consecutive values in 3D numpy array without using groupby? - python

Say that you have the following 3D numpy array:
matrices=
numpy.array([[[1, 0, 0], #Level 0
[1, 1, 1],
[0, 1, 1]],
[[0, 1, 0], #Level 1
[1, 1, 0],
[0, 0, 0]],
[[0, 0, 1], #Level 2
[0, 1, 1],
[1, 0, 1]]])
And that you want to compute the number of times you get consecutive values of 1 for each cell. Let's say you want to count the number of occurrences of 2 and 3 consecutive values of 1 for each cell. The result should be something like this:
two_cons=([[0,0,0],
[1,1,0],
[0,0,0]])
three_cons=([[0,0,0],
[0,1,0],
[0,0,0]])
meaning that two cells have had at least 2 consecutive values of 1, and only one has had 3 consecutive values.
I know this could be done by using groupby, extracting the "vertical" series of values for each cell, and counting how many times you get n consecutive ones:
import numpy
two_cons=numpy.zeros((3,3))
for i in range(0,matrices.shape[0]): #Iterate through each "level"
for j in range(0,matrices.shape[1]):
vertical=matrices[:,i,j] #Extract the series of 0-1 for each cell of the matrix
#Determine the occurrence of 2 consecutive values
cons=numpy.concatenate([numpy.cumsum(c) if c[0] == 1 else c for c in numpy.split(vertical, 1 + numpy.where(numpy.diff(vertical))[0])])
two_cons[i][j]=numpy.count_nonzero(cons==2)
In this example, you get that:
two_cons=
array([[ 0., 0., 0.],
[ 1., 1., 0.],
[ 0., 0., 0.]])
My question: how can I do this if I cannot access vertical? In my real case, the 3D numpy array is too large for me to extract vertical series across many levels, so I have to loop through each level at once, and kind of keep memory of what happened at the previous n levels. What do you suggest to do?

I haven't checked the code, but something like this should work... the idea is to scan the matrix along the third dimension and have 2 helper matrices, one keeping track of the length of the actual sequence of ones, and one keeping track of the best sequence encountered so far.
bests = np.zeros(matrices.shape[:-1])
counter = np.zeros(matrices.shape[:-1])
for depth in range(matrices.shape[0]):
this_level = matrices[depth, :, :]
counter = counter * this_level + this_level
bests = (np.stack([bests, counter], axis=0)).max(axis=0)
two_con = bests > 1
three_con = bests > 2

Related

calculate sum of Nth column of numpy array entry grouped by the indices in first two columns?

I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.

How to find the nearest neighbor in numpy?

There are two array u and v.
u.shape = (N,d)
v.shape = (q,d)
I need to find, for every q, the nearest value's index for each d in u.
For example:
u = [[5,3],
[3,4],
[3,2],
[8,7]] , shape (4,2)
v = [[1,3],
[2,4]] , shape (2,2)
and I found many people said we can do that:
v = v.expand_dims(v,axis=1) # reshape to (2,1,2) for broadcast
result = np.argmin(abs(v-u),axis=1) # (u-v).shape = (2,4,2)
Of course it found the nearest value's index. But! when there are two nearest value, I need to take the "second" one's index.
In that case:
v-u = [[[-4, 0],
[-2, -1],
[-2, 1],
[-7, -4]],
[[-3, 1],
[-1, 0],
[-1, 2],
[-6, -3]]])
along axis=1, there are two -2 in (u-v)[0,:,0] and two -1 in (u-v)[1,:,0]
If we directly use:
result = np.argmin(abs(v-u),axis=1)
result will be:
array([[1, 0],
[1, 1]], dtype=int64)
It returns the indices corresponding to the first occurrence but I need the second one, i,e
array([[2, 0],
[2, 1]], dtype=int64)
Can anyone help? Thanks!
If there can be at most 2 minimal values, you can retrieve indices of
the last minimum.
To do it:
reverse abs(v-u) along axis 1,
compute argmin, getting a "reversed_index" (actually the index in the
reversed array),
map back to "original" indices using u.shape[0] - 1 - <reversed_index>
formula (in your case of 4 rows, reversed index == 3 corresponds to
original index == 0)
The whole code is:
u.shape[0] - 1 - np.argmin(abs(v-u)[:,::-1,:],axis=1)
Other choice, when there can be more than 2 min values, is to write
a specialized version of argmin, for an 1-D input array, returning
the index of the second minimal value if there are more of them:
def argmin2(arr):
ind = arr.argpartition(1)[:2]
return ind[0] if arr[ind[0]] < arr[ind[1]] else ind[1]
and then apply it to abs(v-u) along axis 1:
np.apply_along_axis(argmin2, 1, abs(v-u))

Reduce sum with condition in tensorflow

I am given a 2D Tensor with stochastic rows. After applying tf.math.greater() and tf.cast(tf.int32) I am left with a Tensor with 0's and 1's. I now want to apply reduce sum onto that matrix but with a condition: If there was at least one 1 summed and a 0 follows I want to remove all following 1 aswell, meaning 1 0 1 should result in 1 instead of 2.
I have tried to solve the Problem with tf.scan(), but I was not able to come up with a function yet that is able to handle starting 0's, because the row might look like: 0 0 0 1 0 1
One idea was to set the lower part of the matrix to one (bc I know everything left from the diagonal will always be 0) and then have a function like tf.scan() run to filter out the spots (see code and error message below).
Let z be the matrix after tf.cast.
helper = tf.matrix_band_part(tf.ones_like(z), -1, 0)
z = tf.math.logical_or(tf.cast(z, tf.bool), tf.cast(helper,tf.bool))
z = tf.cast(z, tf.int32)
z = tf.scan(lambda a, x: x if a == 1 else 0 ,z)
Resulting in:
ValueError: Incompatible shape for value ([]), expected ([5])
IIUC, this is one way to do what you want without scanning or looping. It may be a bit convoluted, and is actually iterating the columns twice (one cumsum and one cumprod), but being vectorized operations I think it is probably faster. Code is TF 2.x but runs the same in TF 1.x (except for the last line obviously).
import tensorflow as tf
# Example data
a = tf.constant([[0, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 0],
[1, 1, 0, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]])
# Cumsum columns
c = tf.math.cumsum(a, axis=1)
# Column-wise differences
diffs = tf.concat([tf.ones([tf.shape(c)[0], 1], c.dtype), c[:, 1:] - c[:, :-1]], axis=1)
# Find point where we should not sum anymore (cumsum is not zero and difference is zero)
cutoff = tf.equal(a, 0) & tf.not_equal(c, 0)
# Make mask
mask = tf.math.cumprod(tf.dtypes.cast(~cutoff, tf.uint8), axis=1)
# Compute result
result = tf.reduce_max(c * tf.dtypes.cast(mask, c.dtype), axis=1)
print(result.numpy())
# [0 1 2 1 3 2 3 4]

How to create in one line a null vector of size 10 but the fifth value being 1 using numpy

I am able to do it in two lines for the numpy module:
x=np.zeros(10)
x[4]=1
However, I was wondering if its possible to combine the two together
There are multiple ways to do this. For example, np.arange(10) == 4 gives you an array of all False values except for one True at position 4.
Under the covers, NumPy's bool values are just 0 and 1 as uint8 (just like Python's bool values are 0 and 1, although of a unique integral type), so you can just use it as-is in any expression:
>>> np.arange(10) == 4
array([False, False, False, False, True, False, False, False, False, False], dtype=bool)
>>> np.arange(10) * 1
array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0])
>>> np.arange(10) + 23
array([23, 23, 23, 23, 24, 23, 23, 23, 23, 23])
… or view it as uint8 instead of bool:
>>> (np.arange(10) == 4).view(np.uint8)
array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=uint8)
… or, if you want normal int values, you can convert it:
>>> (np.arange(10) == 4).astype(int)
    array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0])
And so on.
However, this seems a lot less readable, and it's also about 20% slower in a quick test, so unless you're doing this for code-golfing reasons, why?
x = numpy.array([0,0,0,0,1,0,0,0,0,0])
:P
b=np.array([int(x==4) for x in range(10)])
print(b)
[0 0 0 0 1 0 0 0 0 0]
Use this
np.where(np.arange(10)==4,1,0)
I don't that it is possible to combine:
x=np.zeros(10) #1 create an array of values
x[4]=1 #2 assign at least one value of an array a different value
in one line.
Naive:
x = np.zeros(10)[4] = 1 # Fails
fails as value of x is 1 due how python handles chained assignment. Both x and element 4 in array of zeros are assigned value 1.
Therefore, we need to first create an array of zeros, and then assign element assign element 4 a value of 1, and these two cannot be done in one line.
If you need to do this to multiple elements:
my_vect = np.zeros(42)
# set index 1 and 3 to value 1
my_vect[np.array([1,3])] = 1
Create a null vector of size 10 but the fifth value which is 1.
import numpy
x = np.zeros(10, dtype=int)
if x[5] == 0:
x[5] = 1
print(x)
array = numpy.eye( array_size )[ element_in_array_which_should_be_1 - 1 ]
So to create a null vector of size 10 but the fifth value being 1 in one line is
array = numpy.eye( 10 ) [ 5 - 1 ]
===> array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])
:)

Finding the row with the highest average in a numpy array

Given the following array:
complete_matrix = numpy.array([
[0, 1, 2, 4],
[1, 0, 3, 5],
[2, 3, 0, 6],
[4, 5, 6, 0]])
I would like to identify the row with the highest average, excluding the diagonal zeros.
So, in this case, I would be able to identify complete_matrix[:,3] as being the row with the highest average.
Note that the presence of the zeros doesn't affect which row has the highest mean because all rows have the same number of elements. Therefore, we just take the mean of each row, and then ask for the index of the largest element.
#Take the mean along the 1st index, ie collapse into a Nx1 array of means
means = np.mean(complete_matrix, 1)
#Now just get the index of the largest mean
idx = np.argmax(means)
idx is now the index of the row with the highest mean!
You don't need to worry about the 0s, they shouldn't effect how the averages compare since there will presumably be one in each row. Hence, you can do something like this to get the index of the row with the highest average:
>>> import numpy as np
>>> complete_matrix = np.array([
... [0, 1, 2, 4],
... [1, 0, 3, 5],
... [2, 3, 0, 6],
... [4, 5, 6, 0]])
>>> np.argmax(np.mean(complete_matrix, axis=1))
3
Reference:
numpy.mean
numpy.argmax
As pointed out by a lot of people, presence of zeros isn't an issue as long as you have the same number of zeros in each column. Just in case your intention was to ignore all the zeros, preventing them from participating in the average computation, you could use weights to suppress the contribution of the zeros. The following solution assigns 0 weight to zero entries, 1 otherwise:
numpy.argmax(numpy.average(complete_matrix,axis=0, weights=complete_matrix!=0))
You can always create a weight matrix where the weight is 0 for diagonal entries, and 1 otherwise.
You will see that this answer actually would fit better to your other question that was marked as duplicated to this one (and don't know why because it is not the same question...)
The presence of zeros can indeed affect the columns' or rows' average, for instance:
a = np.array([[ 0, 1, 0.9, 1],
[0.9, 0, 1, 1],
[ 1, 1, 0, 0.5]])
Without eliminating the diagonals, it would tell that the column 3 has the highest average, but eliminating the diagonals the highest average belongs to column 1 and now column 3 has the least average of all columns!
You can correct the calculated mean using the lcm (least common multiple) of the number of lines with and without the diagonals, by guaranteeing that where a diagonal element does not exist the correction is not applied:
correction = column_sum/lcm(len(column), len(column)-1)
new_mean = mean + correction
I copied the algorithm for lcm from this answer and proposed a solution for your case:
import numpy as np
def gcd(a, b):
"""Return greatest common divisor using Euclid's Algorithm."""
while b:
a, b = b, a % b
return a
def lcm(a, b):
"""Return lowest common multiple."""
return a * b // gcd(a, b)
def mymean(a):
if len(a.diagonal()) < a.shape[1]:
tmp = np.hstack((a.diagonal()*0+1,0))
else:
tmp = a.diagonal()*0+1
return np.mean(a, axis=0) + np.sum(a,axis=0)*tmp/lcm(a.shape[0],a.shape[0]-1)
Testing with the a given above:
mymean(a)
#array([ 0.95 , 1. , 0.95 , 0.83333333])
With another example:
b = np.array([[ 0, 1, 0.9, 0],
[0.9, 0, 1, 1],
[ 1, 1, 0, 0.5],
[0.9, 0.2, 1, 0],
[ 1, 1, 0.7, 0.5]])
mymean(b)
#array([ 0.95, 0.8 , 0.9 , 0.5 ])
With the corrected average you just use np.argmax() to get the column index with the highest average. Similarly, np.argmin() to get the index of the column with the least average:
np.argmin(mymean(a))

Categories

Resources