Averaging indexes of peaks if they are close in Python - python

This might be a simple problem but I haven't come up with a solution.
Say I have an array as np.array([0,1,0,1,0,0,0,1,0,1,0,0,1]) with peaks at indexes [1,3,7,9,12]. How can I replace the indexes with [2,8,12], that is, averaging indexes close in distance, if a threshold distance between peaks is set to be greater than 2 in this example?
Please note that the binary values of the array are just for illustration, the peak value can be any real number.

You could use Raymond Hettinger's cluster function:
from __future__ import division
def cluster(data, maxgap):
"""Arrange data into groups where successive elements
differ by no more than *maxgap*
>>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
[[1, 6, 9], [100, 102, 105, 109], [134, 139]]
>>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
[[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]
"""
data.sort()
groups = [[data[0]]]
for item in data[1:]:
val = abs(item - groups[-1][-1])
if val <= maxgap:
groups[-1].append(item)
else:
groups.append([item])
return groups
peaks = [1,3,7,9,12]
print([sum(arr)/len(arr) for arr in cluster(peaks, maxgap=2)])
yields
[2.0, 8.0, 12.0]

Related

Divide list into sublist following certain pattern

Given an example list a = [311, 7426, 3539, 2077, 13, 558, 288, 176, 6, 196, 91, 54, 5, 202, 116, 95] with n = 16 elements (it will be in general a list of an even number of elements).
I wish to create n/4 lists that would be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
(The solution is not taking an element every n such as a[i::3] in a for loop because values are excluded as the sliding window moves to the left)
Thanks for the tips!
UPDATE:
Thanks for the solutions which work well for this particular example. I realized however that my problem is a bit more complex than this.
In the sense that the list a is generated dynamically in the sense the list can decrease or increase. Now my issue is the following, say that the list grows of another group i.e. until 20 elements. Now the output lists should be 5 using the same concept. Example:
a = [311, 7426, 3539, 2077, 1 ,13, 558, 288, 176, 1, 6, 196, 91, 54, 1, 5, 202, 116, 95, 1]
Now the output should be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
list5 = [1, 1, 1, 1]
And so on for whatever size of the list.
Thanks again!
I'm assuming the length of the list a is a multiple of 4. You can use numpy for your problem.
import numpy as np
a = [...]
desired_shape = (-1, len(a)//4)
arr = np.array(a).reshape(desired_shape).transpose().tolist()
Output:
[[311, 13, 6, 5],
[7426, 558, 196, 202],
[3539, 288, 91, 116],
[2077, 176, 54, 95],
[1, 1, 1, 1]]
Unpack the list into variables or iterate over them as desirable.
Consult numpy.transpose, and reshape to understand their usage.
One option: nested list comprehension.
split in n/4 chunks of 4 items
out = [[a[i+4*j] for j in range(4)]
for i in range(len(a)//4)]
Output:
[[311, 1, 176, 91],
[7426, 13, 1, 54],
[3539, 558, 6, 1],
[2077, 288, 196, 5],
[1, 176, 91, 202]]
split in 4 chunks of n/4 items
out = [[a[i+4*j] for j in range(len(a)//4)]
for i in range(4)]
Output:
[[311, 1, 176, 91, 202],
[7426, 13, 1, 54, 116],
[3539, 558, 6, 1, 95],
[2077, 288, 196, 5, 1]]
To split in lists:
list1, list2, list3, list4 = out
Although it is not easily possible to do this programmatically (and not recommended to use many variables)

Why is meshgrid changing (x, y, z) order to (y, x, z)?

I have 3 vectors:
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20]) #tens
w = np.array([0, 1]) #units
Then I used np.meshgrid to sum u[i]+v[j],w[k]:
x, y, z = np.meshgrid(u, v, w)
func1 = x + y + z
So, when (i,j,k)=(3,2,1), func1[i, j, k] should return 321, but I only get 321 if I put func1[2, 3, 1].
Why is it asking me for vector v before u? Should I use numpy.ix_ instead?
From the meshgrid docs:
Notes
-----
This function supports both indexing conventions through the indexing
keyword argument. Giving the string 'ij' returns a meshgrid with
matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing.
In the 2-D case with inputs of length M and N, the outputs are of shape
(N, M) for 'xy' indexing and (M, N) for 'ij' indexing. In the 3-D case
with inputs of length M, N and P, outputs are of shape (N, M, P) for
'xy' indexing and (M, N, P) for 'ij' indexing.
In [109]: U,V,W = np.meshgrid(u,v,w, sparse=True)
In [110]: U
Out[110]:
array([[[ 0], # (1,4,1)
[100],
[200],
[300]]])
In [111]: U+V+W
Out[111]:
array([[[ 0, 1],
[100, 101],
[200, 201],
[300, 301]],
[[ 10, 11],
[110, 111],
[210, 211],
[310, 311]],
[[ 20, 21],
[120, 121],
[220, 221],
[320, 321]]])
The result is (3,4,2) array; This is the cartesian case described in the notes.
With the documented indexing change:
In [113]: U,V,W = np.meshgrid(u,v,w, indexing='ij',sparse=True)
In [114]: U.shape
Out[114]: (4, 1, 1)
In [115]: (U+V+W).shape
Out[115]: (4, 3, 2)
Which matches the ix_ that you wanted:
In [116]: U,V,W = np.ix_(u,v,w)
In [117]: (U+V+W).shape
Out[117]: (4, 3, 2)
You are welcome to use either. Or even np.ogrid as mentioned in the docs.
Or even the home-brewed broadcasting:
In [118]: (u[:,None,None]+v[:,None]+w).shape
Out[118]: (4, 3, 2)
Maybe the 2d layout clarifies the two coordinates:
In [119]: Out[111][:,:,0]
Out[119]:
array([[ 0, 100, 200, 300], # u going across, x-axis
[ 10, 110, 210, 310],
[ 20, 120, 220, 320]])
In [120]: (u[:,None,None]+v[:,None]+w)[:,:,0]
Out[120]:
array([[ 0, 10, 20], # u going down - rows
[100, 110, 120],
[200, 210, 220],
[300, 310, 320]])
For your indexing method, you need axis 0 to be the direction of increment of 1s, axis 1 to be for 10s, and axis 2 to be for 100s.
You can just transpose to swap the axes to suit your indexing method -
u = np.array([0, 100, 200, 300]) #hundreds
v = np.array([0, 10, 20, 30]) #tens
w = np.array([0, 1, 2, 3]) #units
x,y,z = np.meshgrid(w,v,u)
func1 = x + y + z
func1 = func1.transpose(2,0,1)
func1
# axis 0 is 1s
#------------------>
array([[[ 0, 1, 2, 3],
[ 10, 11, 12, 13], #
[ 20, 21, 22, 23], # Axis 1 is 10s
[ 30, 31, 32, 33]],
[[100, 101, 102, 103], #
[110, 111, 112, 113], # Axis 2 is 100s
[120, 121, 122, 123], #
[130, 131, 132, 133]],
[[200, 201, 202, 203],
[210, 211, 212, 213],
[220, 221, 222, 223],
[230, 231, 232, 233]],
[[300, 301, 302, 303],
[310, 311, 312, 313],
[320, 321, 322, 323],
[330, 331, 332, 333]]])
Testing this by indexing -
>> func1[2,3,1]
231
>> func1[3,2,1]
321

Finding max and min indices in lists in Python

I have a list that looks like:
trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
The list represents a series of experiments, each with a minimum and a maximum index. For example, in the list above, the minimum and maximum would be as follows:
Experiment 1:
Min: 0.5
Max: 130.8
Experiment 2:
Min: 0.01
Max: 130.1
Experiment 3:
Min: 0.9
Max: 103.4
I obtained the values for each experiment above because I know that each
experiment starts at around zero (such as 0.4, 0.001, 0.009, etc.) and ends at around 130 (130, 131.2, 130.009, etc.). You can imagine a nozzle turning on and off. When it turns on, the pressure rises and as it's turned off, the pressure dips. I am trying to calculate the minimum and maximum values for each experiment.
What I've tried so far is iterating through the list to first mark each index as max, but I can't seem to get that right.
Here is my code. Any suggestions on how I can change it?
for idx, item in enumerate(trial_lst):
if idx > 0:
prev = trial_lst[idx-1]
curr = item
if prev > curr:
result.append((curr, "max"))
else:
result.append((curr, ""))
I am looking for a manual way to do this, no libraries.
Use the easiest way ( sort your list or array first ):
trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
trial_lst.sort(key=float)
for count, items in enumerate(trial_lst):
counter = count + 1
last_object = (counter, trial_lst[count], trial_lst[(len(trial_lst)-1) - count])
print( last_object )
You can easily get the index of the minimum value using the following:
my_list.index(min(my_list))
Here is an interactive demonstration which may help:
>>> trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
Use values below 1 to identify where one experiment ends and another begins
>>> indices = [x[0] for x in enumerate(map(lambda x:x<1, trial_lst)) if x[1]]
Break list into sublists at those values
>>> sublists = [trial_lst[i:j] for i,j in zip([0]+indices, indices+[None])[1:]]
Compute max/min for each sublist
>>> for i,l in enumerate(sublists):
... print "Experiment", i+1
... print "Min", min(l)
... print "Max", max(l)
... print
...
Experiment 1
Min 0.5
Max 130.8
Experiment 2
Min 0.01
Max 130.1
Experiment 3
Min 0.9
Max 130.4

Best way to vectorize operation having input and output history dependence?

My goal is to vectorize the following operation in numpy,
y[n] = c1*x[n] + c2*x[n-1] + c3*y[n-1]
If n is time, I essentially need the outputs depending on previous inputs as well as previous outputs. I'm given the values of x[-1] and y[-1]. Also, this is a generalized version of my actual problem where c1 = 1.001, c2 = -1 and c3 = 1.
I could figure out the procedure to add the first two operands, simply by adding c1*x and c2*np.concatenate([x[-1], x[0:-1]), but I can't seem to figure out the best way to deal with y[n-1].
One may use an IIR filter to do this. scipy.signal.lfilter is the correct choice in this case.
For my specific constants, the following code snippet would do -
from scipy import signal
inital = signal.lfiltic([1.001,-1], [1, -1], [y_0], [x_0])
output, _ = signal.lfilter([1.001,-1], [1, -1], input, zi=inital)
Here, signal.lfiltic is used to to specify the initial conditions.
Just by playing around with cumsum:
First a little function to produce your expression iteratively:
def foo1(x,C):
x=x.copy()
for i in range(1,x.shape[0]-1):
x[i]=np.dot(x[i-1:i+2],C)
return x[1:-1]
Make a small test array (I first worked with np.arange(10))
In [227]: y=np.arange(1,11); np.random.shuffle(y)
# array([ 4, 9, 7, 8, 2, 6, 1, 5, 10, 3])
In [229]: foo1(y,[1,2,1])
Out[229]: array([ 29, 51, 69, 79, 92, 99, 119, 142])
In [230]: y[0] + np.cumsum(2*y[1:-1] + 1*y[2:])
Out[230]: array([ 29, 51, 69, 79, 92, 99, 119, 142], dtype=int32)
and with a different C:
In [231]: foo1(y,[1,3,2])
Out[231]: array([ 45, 82, 110, 128, 148, 161, 196, 232])
In [232]: y[0]+np.cumsum(3*y[1:-1]+2*y[2:])
Out[232]: array([ 45, 82, 110, 128, 148, 161, 196, 232], dtype=int32)
I first tried:
In [238]: x=np.arange(10)
In [239]: foo1(x,[1,2,1])
Out[239]: array([ 4, 11, 21, 34, 50, 69, 91, 116])
In [240]: np.cumsum(x[:-2]+2*x[1:-1]+x[2:])
Out[240]: array([ 4, 12, 24, 40, 60, 84, 112, 144], dtype=int32)
and then realized that the x[:-2] term wasn't needed:
In [241]: np.cumsum(2*x[1:-1]+x[2:])
Out[241]: array([ 4, 11, 21, 34, 50, 69, 91, 116], dtype=int32)
If I was back in school I probably would have discovered this sort of pattern with algebra, rather than a numpy trial-n-error. It may not be general enough, but hopefully it's a start.

New array of smaller size excluding one value from each column

In Python 2.7 using numpy or by any means if I had an array of any size and wanted to excluded certain values and output the new array how would I do that? Here is What I would like
[(1,2,3),
(4,5,6), then exclude [4,2,9] to make the array[(1,5,3),
(7,8,9)] (7,8,6)]
I would always be excluding data the same length as the row length and always only one entry per column. [(1,5,3)] would be another example of data I would want to excluded. So every time I loop the function it reduces the array row size by one. I would imagine I have to use a masked array or convert my mask to a masked array and subtract the two then maybe condense the output but I have no idea how. Thanks for your time.
You can do it very efficiently if you transform your 2-D array in an unraveled 1-D array. Then you repeat the array with the elements to be excluded, called e in order to do an element-wise comparison:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
e = [1, 5, 3]
ar = a.T.ravel()
er = np.repeat(e, a.shape[0])
ans = ar[er != ar].reshape(a.shape[1], a.shape[0]-1).T
But it will work if each element in e only matches one row of a.
EDIT:
as suggested by #Jaime, you can avoid the ravel() and get the same result doing directly:
ans = a.T[(a != e).T].reshape(a.shape[1], a.shape[0]-1).T
To exclude vector e from matrix a:
import numpy as np
a = np.array([(1,2,3), (4,5,6), (7,8,9)])
e = [4,2,9]
print np.array([ [ i for i in a.transpose()[j] if i != e[j] ]
for j in range(len(e)) ]).transpose()
This would take some work to generalize, but here's something that can handle 2-d cases of the kind you describe. If passed unexpected input, this won't notice and will generate strange results, but it's at least a starting point:
def columnwise_compress(a, values):
a_shape = a.shape
a_trans_flat = a.transpose().reshape(-1)
compressed = a_trans_flat[~numpy.in1d(a_trans_flat, values)]
return compressed.reshape(a_shape[:-1] + ((a_shape[0] - 1),)).transpose()
Tested:
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [4, 2, 9])
array([[1, 5, 3],
[7, 8, 6]])
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [1, 5, 3])
array([[4, 2, 6],
[7, 8, 9]])
The difficulty is that you're asking for "compression" of a kind that numpy.compress doesn't do (removing different values for each column or row) and you're asking for compression along columns instead of rows. Compressing along rows is easier because it moves along the natural order of the values in memory; you might consider working with transposed arrays for that reason. If you want to do that, things become a bit simpler:
>>> a = numpy. array([[1, 4, 7],
... [2, 5, 8],
... [3, 6, 9]])
>>> a[~numpy.in1d(a, [4, 2, 9]).reshape(3, 3)].reshape(3, 2)
array([[1, 7],
[5, 8],
[3, 6]])
You'll still need to handle shape parameters intelligently if you do it this way, but it will still be simpler. Also, this assumes there are no duplicates in the original array; if there are, this could generate wrong results. Saullo's excellent answer partially avoids the problem, but any value-based approach isn't guaranteed to work unless you're certain that there aren't duplicate values in the columns.
In the spirit of #SaulloCastro's answer, but handling multiple occurrences of items, you can remove the first occurrence on each column doing the following:
def delete_skew_row(a, b) :
rows, cols = a.shape
row_to_remove = np.argmax(a == b, axis=0)
items_to_remove = np.ravel_multi_index((row_to_remove,
np.arange(cols)),
a.shape, order='F')
ret = np.delete(a.T, items_to_remove)
return np.ascontiguousarray(ret.reshape(cols,rows-1).T)
rows, cols = 5, 10
a = np.random.randint(100, size=(rows, cols))
b = np.random.randint(rows, size=(cols,))
b = a[b, np.arange(cols)]
>>> a
array([[50, 46, 85, 82, 27, 41, 45, 27, 17, 26],
[92, 35, 14, 34, 48, 27, 63, 58, 14, 18],
[90, 91, 39, 19, 90, 29, 67, 52, 68, 69],
[10, 99, 33, 58, 46, 71, 43, 23, 58, 49],
[92, 81, 64, 77, 61, 99, 40, 49, 49, 87]])
>>> b
array([92, 81, 14, 82, 46, 29, 67, 58, 14, 69])
>>> delete_skew_row(a, b)
array([[50, 46, 85, 34, 27, 41, 45, 27, 17, 26],
[90, 35, 39, 19, 48, 27, 63, 52, 68, 18],
[10, 91, 33, 58, 90, 71, 43, 23, 58, 49],
[92, 99, 64, 77, 61, 99, 40, 49, 49, 87]])

Categories

Resources