find position of last element before specific value in python numpy - python

I have the foll. python numpy array, arr:
([1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L])
I can find the first occurrence of 1 like this:
np.where(arr.squeeze() == 1)[0]
How do I find the position of the last 1 before either a 0 or 3.

Here's one approach using np.where and np.in1d -
# Get the indices of places with 0s or 3s and this
# decides the last index where we need to look for 1s later on
last_idx = np.where(np.in1d(arr,[0,3]))[0][-1]
# Get all indices of 1s within the range of last_idx and choose the last one
out = np.where(arr[:last_idx]==1)[0][-1]
Please note that for cases where no indices are found, using something like [0][-1] would complain about having no elements, so error-checking codes are needed to be wrapped around those lines.
Sample run -
In [118]: arr
Out[118]: array([1, 1, 3, 0, 3, 2, 0, 1, 2, 1, 0, 2, 2, 3, 2])
In [119]: last_idx = np.where(np.in1d(arr,[0,3]))[0][-1]
In [120]: np.where(arr[:last_idx]==1)[0][-1]
Out[120]: 9

You can use a rolling window and search that for the values you want:
import numpy as np
arr = np.array([1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
match_1_0 = np.all(rolling_window(arr, 2) == [1, 0], axis=1)
match_1_3 = np.all(rolling_window(arr, 2) == [1, 3], axis=1)
all_matches = np.logical_or(match_1_0, match_1_3)
print(np.flatnonzero(all_matches)[-1])
Depending on your arrays, this might be good enough performance-wise. With that said, a less flexible (but simpler) solution might perform better even if it is a loop over indices that you usually want to avoid with numpy...:
for ix in xrange(len(arr) - 2, -1, -1): # range in python3.x
if arr[ix] == 1 and (arr[ix + 1] == 0 or arr[ix + 1] == 3):
return ix
You might even be able to do something like which is probably a bit more flexible than the hard-coded solution above and (I would guess) still probably would out-perform the rolling window solution:
def rfind(haystack, needle):
len_needle = len(needle)
for ix in xrange(len(haystack) - len_needle, -1, -1): # range in python3.x
if (haystack[ix:ix + len_needle] == needle).all():
return ix
Here, you'd do something like:
max(rfind(arr, np.array([1, 0])), rfind(arr, np.array([1, 3])))
And of course, with all of these answers, I haven't actually handled the case where the thing you are searching for isn't present since you didn't specify what you would want for that case...

Related

Generating a new array based on certain criterion in Python

I have an array X and a list T2. I want to create a new array Xnew such that elements of X are placed according to locations specified in T2. I present the current and expected outputs.
import numpy as np
X=np.array([4.15887486e+02, 3.52446375e+02, 2.81627790e+02, 1.33584716e+02,
6.32045703e+01, 2.07514659e+02, 1.00000000e-24])
T2=[0, 3, 5, 8, 9, 10, 11]
def make_array(indices, values):
rtrn = np.zeros(np.max(indices) + 1, dtype=values.dtype)
rtrn[indices] = values
return
Xnew = np.array([make_array(Ti, Xi) for Ti, Xi in zip([T2], X)], dtype=object)
print("New X =",[Xnew])
The current output is
New X = [array([None], dtype=object)]
The expected output is
[array([[4.15887486e+02, 0.0, 0.0, 3.52446375e+02, 0.0,
2.81627790e+02, 0.0, 0.0, 1.33584716e+02,
6.32045703e+01, 2.07514659e+02, 1.00000000e-24]],
dtype=object)]
You basically have what you need, but you are calling your function in a very weird way.
The function works with numpy arrays / lists as input, you don't need to put in individual elements.
X = np.arange(5)
ind = np.asarray([1, 4, 3, 2, 10])
def make_array(indices, values):
rtrn = np.zeros(np.max(indices) + 1, dtype=values.dtype)
rtrn[indices] = values
return rtrn
make_array(ind, X) # array([0, 0, 3, 2, 1, 0, 0, 0, 0, 0, 4])

How to pick one of two arrays in an axis of multidimensional NumPy array with an 1D index array for that axis

I have an array with shape (n, 2, 3) as:
array = np.array([[[-0.903, -3.47, -0.946], [-0.883, -3.48, -0.947]],
[[-1.02, -3.45, -0.992], [-1.01, -3.46, -1]],
[[-1.02, -3.45, -0.992], [-0.998, -3.45, -1]],
[[-0.638, -3.5, -0.897], [-0.604, -3.51, -0.896]],
[[-0.596, -3.52, -0.896], [-0.604, -3.51, -0.896]]])
and an index array for the second axis in which each value refer to each of two combinations e.g. for [-0.903, -3.47, -0.946], [-0.883, -3.48, -0.947] if the corresponding value in index array be 1, [-0.883, -3.48, -0.947] must be taken:
indices = np.array([0, 1, 0, 0, 1], dtype=np.int64)
the resulted array must be as below with shape (n, 3):
[-0.903, -3.47, -0.946] [-1.01, -3.46, -1] [-1.02, -3.45, -0.992] [-0.638, -3.5, -0.897] [-0.604, -3.51, -0.896]
How could I do so on a specified dimension just by NumPy.
In numpy you can combine slices along two dimensions. If you do arr[idx_x, idx_y] where idx_x and idx_y are 1d arrays of the same length you will get array of elements: [arr[idx_x[0], idx_y[0]], arr[idx_x[1], idx_y[1]], arr[idx_x[2], idx_y[2]], ...]
In your example if you do:
indices = np.array([0, 1, 0, 0, 1], dtype=np.int64)
x_idxs = np.arange(len(indices), dtype=int)
print(array[x_idxs, indices])
This will return result you want.
Try with a for loop:
out = []
for i in range(len(indices)):
out.append(list(arr[i,indices[i]]))
print(out)
Output:
[[-0.903, -3.47, -0.946],
[-1.01, -3.46, -1.0],
[-1.02, -3.45, -0.992],
[-0.638, -3.5, -0.897],
[-0.604, -3.51, -0.896]]

calculate sum of Nth column of numpy array entry grouped by the indices in first two columns?

I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.

R seems Weird, Python seems goes well in the same way

The following codes run the result of unexpected, i think that is somewhat weird, firstly define the featfun():
featfun <- function(yi_1, yi, i) {
all_fea <- list(c(1, 2, 2),
c(1, 2, 3),
c(1, 1, 2),
c(2, 1, 3),
c(2, 1, 2),
c(2, 2, 3),
c( 1, 1),
c( 2, 1),
c( 2, 2),
c( 1, 2),
c( 1, 3),
c( 2, 3))
weights <- c(1,1,0.6,1,1,0.2,1,0.5,0.5,0.8,0.8,0.5)
idx1 <- 0; idx2 <- 0
if (list(c(yi_1, yi, i)) %in% all_fea) {
idx1 <- which(all_fea %in% list(c(yi_1, yi, i)))
}
if (list(c(yi, i)) %in% all_fea) {
idx2 <- which(all_fea %in% list(c(yi, i)))
}
if (idx1 != 0 & idx2 != 0) {
return(list(c(1, weights[idx1]), c(1, weights[idx2])))
} else if (idx1 != 0 & idx2 == 0) {
return(list(c(1, weights[idx1])))
} else if (idx1 == 0 & idx2 != 0) {
return(list(c(1, weights[idx2])))
} else {
return(NA)
}
}
> featfun(1,1,2)
[[1]]
[1] 1.0 0.6
[[2]]
[1] 1.0 0.8
I combine the featfun() with for loops:
> for (k in seq(2,3)) {
+ cat("k=",k,"\n")
+ for (i in seq(1, 2)) {
+ cat("i=", i,"\n")
+ print(featfun(1, i, k))
+ }
+ }
k= 2
i= 1
[[1]]
[1] 1.0 0.6
i= 2
[[1]]
[1] 1 1
[[2]]
[1] 1.0 0.5
k= 3
i= 1
[[1]]
[1] 1.0 0.8
i= 2
[[1]]
[1] 1 1
As we can see, when k = 2,i = 1, it only return the first element “[1] 1.0 0.6” , and the second element is missing, it is not the same as the result of featfun(1,1,2).
Further, I rewrite the codes by using python. Following is the python codes:
def featfun(yi_1, yi, i):
all_fea = [
[1,2,2],
[1,2,3],
[1,1,2],
[2,1,3],
[2,1,2],
[2,2,3],
[ 1,1],
[ 2,1],
[ 2,2],
[ 1,2],
[ 1,3],
[ 2,3]]
weights = [1,1,0.6,1,1,0.2,1,0.5,0.5,0.8,0.8,0.5]
idx1 = 999
idx2 = 999
if [yi_1,yi,i] in all_fea:
idx1 = all_fea.index([yi_1, yi, i])
if [yi, i] in all_fea:
idx2 = all_fea.index([yi, i])
if (idx1!=999)&(idx2!=999):
return [[1,weights[idx1]],[1,weights[idx2]]]
elif (idx1!=999)&(idx2==999):
return [1,weights[idx1]]
elif (idx1==999)&(idx2!=999):
return [1,weights[idx2]]
else:
return None
featfun(1,1,2) returns [[1, 0.6], [1, 0.8]].
then I combine python_based featfun with for loops again:
for k in [2,3]:
for i in [1,2]:
return featfun(1,i,k)
following is the return results, the correct result, that is the same as the answer in textbook.
[[1, 0.6], [1, 0.8]]
[[1, 1], [1, 0.5]]
[1, 0.8]
[[1, 1], [1, 0.5]]
what happen with my R codes ? Or it seems that something wrong in R?
I hope someone can help me! Thanks!
Okay, I'm not fully sure why this issue is coming up, but it'a numerical precision issue. When you use seq(1,2) or seq(2,3) they are integers, the all_fea list is numeric, and for some reason (this is unusual) the matching isn't working because of that. If you make the all_fea list items integers, then it works:
all_fea <- list(c(1L, 2L, 2L),
c(1L, 2L, 3L),
c(1L, 1L, 2L),
c(2L, 1L, 3L),
c(2L, 1L, 2L),
c(2L, 2L, 3L),
c( 1L, 1L),
c( 2L, 1L),
c( 2L, 2L),
c( 1L, 2L),
c( 1L, 3L),
c( 2L, 3L))
The above is a manual way. Alternately you could leave it as-is and add the line all_fea = lapply(all_fea, as.integer). Anyway, after that change your loop works as expected.

Making time bins bigger in a large dataset with python

I have a dataset with an array of time bins of size 1/4096 seconds against the number of photons in each time bin. Now, I want to change the resolution by making the time bins a factor of 2 larger, by summing up 2 of them and taking the mean, both with the times and with the photon count. I tried a couple of things like:
tnew = []
for n in range(int((len(t))/2)):
tnew[n] = (t[2*n]+t[2*n+1])/2
and:
for l in range(int((len(t))/2):
np.append(t, (np.sum(t[2*l:4096*(2*l+1)]))/2)
but I can't seem to make this work. I'm really new to Python.
If you want to take the means of adjacent elements in a NumPy array, you can do the following:
In [2]: a = np.arange(10)
In [3]: a
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [4]: (a[:-1:2] + a[1::2])/2.
Out[4]: array([ 0.5, 2.5, 4.5, 6.5, 8.5])
Here, a[:-1:2] is all the elements at even indexes and a[1::2] is all the elements at odd indexes.
In your case, since your array's length is a power of 2, you might choose to allow binning by m = 2, 4, 8, etc with by reshaping and taking the mean along the corresponding axis:
In [5]: n = 1024
In [6]: a = np.arange(n)
In [7]: m = 8
In [8]: b = a.reshape((a.shape[0]/m, m))
In [9]: b.mean(axis=1)
Out[9]:
array([ 3.5, 11.5, 19.5, 27.5, 35.5, 43.5, 51.5,
59.5, 67.5, 75.5, 83.5, 91.5, 99.5, 107.5,
...
])

Categories

Resources