I have a list that looks like:
trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
The list represents a series of experiments, each with a minimum and a maximum index. For example, in the list above, the minimum and maximum would be as follows:
Experiment 1:
Min: 0.5
Max: 130.8
Experiment 2:
Min: 0.01
Max: 130.1
Experiment 3:
Min: 0.9
Max: 103.4
I obtained the values for each experiment above because I know that each
experiment starts at around zero (such as 0.4, 0.001, 0.009, etc.) and ends at around 130 (130, 131.2, 130.009, etc.). You can imagine a nozzle turning on and off. When it turns on, the pressure rises and as it's turned off, the pressure dips. I am trying to calculate the minimum and maximum values for each experiment.
What I've tried so far is iterating through the list to first mark each index as max, but I can't seem to get that right.
Here is my code. Any suggestions on how I can change it?
for idx, item in enumerate(trial_lst):
if idx > 0:
prev = trial_lst[idx-1]
curr = item
if prev > curr:
result.append((curr, "max"))
else:
result.append((curr, ""))
I am looking for a manual way to do this, no libraries.
Use the easiest way ( sort your list or array first ):
trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
trial_lst.sort(key=float)
for count, items in enumerate(trial_lst):
counter = count + 1
last_object = (counter, trial_lst[count], trial_lst[(len(trial_lst)-1) - count])
print( last_object )
You can easily get the index of the minimum value using the following:
my_list.index(min(my_list))
Here is an interactive demonstration which may help:
>>> trial_lst = [0.5, 3, 6, 40, 90, 130.8, 129, 111, 8, 9, 0.01, 9, 40, 90, 130.1, 112, 108, 90, 77, 68, 0.9, 8, 40, 90, 92, 130.4]
Use values below 1 to identify where one experiment ends and another begins
>>> indices = [x[0] for x in enumerate(map(lambda x:x<1, trial_lst)) if x[1]]
Break list into sublists at those values
>>> sublists = [trial_lst[i:j] for i,j in zip([0]+indices, indices+[None])[1:]]
Compute max/min for each sublist
>>> for i,l in enumerate(sublists):
... print "Experiment", i+1
... print "Min", min(l)
... print "Max", max(l)
... print
...
Experiment 1
Min 0.5
Max 130.8
Experiment 2
Min 0.01
Max 130.1
Experiment 3
Min 0.9
Max 130.4
Related
please I need more clarity on this, I really do not understand it well. Using this as an example
import random
my_list = [9999, 45, 63, 19, 89, 5, 72]
cum_w = [1, 9, 10, 9, 2, 12, 7]
d_rand = random.choices(my_list, cum_weights=cum_w, k=7)
sum = 0
for idx, i in enumerate(cum_w):
if idx == 0:
for i in cum_w: sum += i
print(f"cum_weight for {my_list[idx]}\t= {i/sum}\tRandom={random.choices(my_list, cum_weights=cum_w, k=7)}")
Below is the output
cum_weight for 9999 = 0.14 Random=[45, 45, 9999, 45, 45, 9999, 45]
cum_weight for 45 = 0.18 Random=[45, 45, 45, 45, 9999, 45, 45]
cum_weight for 63 = 0.2 Random=[45, 45, 45, 9999, 9999, 9999, 45]
cum_weight for 19 = 0.18 Random=[45, 45, 45, 45, 45, 45, 9999]
cum_weight for 89 = 0.04 Random=[9999, 45, 45, 45, 45, 9999, 45]
cum_weight for 5 = 0.24 Random=[45, 45, 45, 45, 45, 45, 45]
cum_weight for 72 = 0.14 Random=[45, 45, 9999, 45, 45, 45, 45]
The probability of 9(cum_w[1] and cum_w[3]) are 0.18.
Why does 45(9) occur so often?
I've read random.choices documentation and does not really get to me.
How does the cum_weights works?
Please, I kindly need depth knowledge on this.
You asked "Why does 45(9) occur so often?" and "How do the cum_weights work?" Addressing the second question will explain the first. Note that what follows is an implementation of one approach used for this kind of problem. I'm not claiming that this is python's implementation, it is intended to illustrate the concepts involved.
Let's start by looking at how values can be generated if you use cumulative weights, i.e., a list where at each index the entry is the sum of all weights up to and including the current index.
import random
# Given cumulative weights, convert them to proportions, then generate U ~ Uniform(0,1)
# random values to use in a linear search to generate values in the correct proportions.
# This is based on the well-known probability result that P{a<=U<=b} = (b - a) for
# 0 <= a < b <= 1.
def gen_cumulative_weighted(values, c_weights): # values and c_weights must be lists of the same length
# Convert cumulative weights to probabilities/proportions by dividing by the last value.
# This yields a list of non-decreasing values between 0 and 1. Note that the last entry
# is always 1, so a Uniform(0, 1) random number will *always* be less than or equal to
# some entry in the list.
p = [c_weights[i] / c_weights[-1] for i in range(len(c_weights))]
while True:
index = 0 # starting from the beginning of the list
# The following three lines find the first index having the property u <= p[index].
u = random.random()
while u > p[index]:
index += 1
yield(values[index]) # yield the corresponding value.
As the comments point out, the weights are scaled by the last (and largest) value to scale them to a set of values in the range (0,1). These can be thought of as the right-most endpoints of non-overlapping subranges, each of which has a length equal to the corresponding scaled weight. (Sketch it out on paper if this is unclear, you should see it pretty quickly.) A generated Uniform(0,1) value will fall in one of those subranges, and the probability it does so is equal to the length of the subrange according to a well-known result from probability.
If we have the raw weights rather than the cumulative weights, all we have to do is convert them to cumulative and then pass the work off to the cumulative weighted version of the generator:
def gen_weighted(values, weights): # values and weights must be lists of the same length
cumulative_w = [sum(weights[:i+1]) for i in range(len(weights))]
return gen_cumulative_weighted(values, cumulative_w)
Now we're ready to use the generators:
my_values = [9999, 45, 63, 19, 89, 5, 72]
my_weights = [1, 9, 10, 9, 2, 12, 7]
good_gen = gen_weighted(my_values, my_weights)
print('Passing raw weights to the weighted implementation:')
print([next(good_gen) for _ in range(20)])
which will produce results such as:
Passing raw weights to the weighted implementation:
[63, 5, 63, 63, 72, 19, 63, 5, 45, 63, 72, 19, 5, 89, 72, 63, 63, 19, 89, 45]
Okay, so what happens if we pass raw weights to the cumulative weighted version of the algorithm? Your raw weights of [1, 9, 10, 9, 2, 12, 7] get scaled by dividing by the last value, and become [1/7, 9/7, 10/7, 9/7, 2/7, 12/7, 1]. When we generate u ~ Uniform(0, 1) and use it to search linearly through the scaled weights, it will yield index zero => 9999 with probability 1/7, and index one => 45 with probability 6/7! This happens because u is always ≤ 1, and therefore always less than 9/7. As a result, the linear search will never get past any scaled weight ≥ 1, which for your inputs means it can only generate the first two values and does so with the wrong weighting.
print('Passing raw weights to the cumulative weighted implementation:')
bad_gen = gen_cumulative_weighted(my_values, my_weights)
print([next(bad_gen) for _ in range(20)])
produces results such as:
Passing raw weights to the cumulative weighted implementation:
[45, 45, 45, 45, 45, 45, 45, 9999, 45, 9999, 45, 45, 45, 45, 45, 9999, 45, 9999, 45, 45]
X= [23, 174, 3, 38, 22, 97, 11, 5, 36, 94, 25]
y = [8, 58, 2, 13, 8, 86, 5, 2, 23, 60, 20]
Now using linear regression I got coefficient = 0.46
y intercept 4
Now I need to find the optimum proportion of y and x
I am not sure if linear regression can be of help. is there any optimization process that can take all this into consideration or the coefficient itself gives that value
This will make a dictionary out of your lists with x as keys and y as values. You can now access the dictionary with your x value and get the appropriate y value.
min_max_values = dict(zip(x, y))
for k, v in min_max:
print ("min: {min}, max: {max}".format(min=v, max=k)
Output:
min: 8, max: 23
min: 58, max: 174
...
I have a problem where I need to determine where a value lands between other values. This is an awful long question...but its a convoluted problem (at least to me).
The simplest presentation of the problem can be seen with the following data:
I have a value of 24.0. I need to determine where that value lands within six 'ranges'. The ranges are: 10, 20, 30, 40, 50, 60. I need to calculate where along the ranges, the value lands. I can see that it lands between 20 and 30. A simple if statement can find that for me.
My if statement for checking if the value is between 20 and 30 would be:
if value >=20 and value <=30:
Pretty simple stuff.
What I'm having trouble with is when I try to rank the output.
As an example, let's say that each range value is given an integer representation. 10 =1, 20=2, 30=3, 40=4, 50=5, 60=6, 70=7. Additionally, lets say that if the value is less than the midpoint between two values, it is assigned the rank output of the lower value. For example, my value of 24 is between 20 and 30 so it should be ranked as a "2".
This in and of itself is fairly straightforward with this example, but using real world data, I have ranges and values like the following:
Value = -13 with Ranges = 5,35,30,25,-25,-30,-35
Value = 50 with Ranges = 5,70,65,60,40,35,30
Value = 6 with Ranges = 1,40,35,30,5,3,0
Another wrinkle - the orders of the ranges matter. In the above, the first range number equates to a ranking of 1, the second to a ranking of 2, etc as I mentioned a few paragraphs above.
The negative numbers in the range values were causing trouble until I decided to use a percentile ranking which gets rid of the negative values all together. To do this, I am using an answer from Map each list value to its corresponding percentile like this:
y=[stats.percentileofscore(x, a, 'rank') for a in x]
where x is the ranges AND the value I'm checking. Running the value=6 values above through this results in y being:
x = [1, 40, 35, 30, 5, 3, 0, 6]
y=[stats.percentileofscore(x, a, 'rank') for a in x]
Looking at "y", we see it as:
[25.0, 100.0, 87.5, 75.0, 50.0, 37.5, 12.5, 62.5]
What I need to do now is compare that last value (62.5) with the other values to see what the final ranking will be (rankings of 1 through 7) according to the following ranking map:
1=25.0
2=100.0
3=87.5
4=75.0
5=50.0
6=37.5
7=12.5
If the value lies between two of the values, it should be assigned the lower rank. In this example, the 62.5 value would have a final ranking value of 4 because it sits between 75.0 (rank=4) and 50.0 (rank=5).
If I take 'y' and break it out and use those values in multiple if/else statements it works for some but not all (the -13 example does not work correctly).
My question is this:
How can I programmatically analyze any value/range set to find the final ranking without building an enormous if/elif structure? Here are a few sample sets. Rankings are in order of presentation below (first value in Ranges =1 , second = 2, etc etc)
Value = -13 with Ranges = 5, 35, 30, 25, -25, -30, -35 --> Rank = 4
Value = 50 with Ranges = 5, 70, 65, 60, 40, 35, 30 --> Rank = 4
Value = 6 with Ranges = 1, 40, 35, 30, 5, 3,0 --> Rank = 4
Value = 24 with Ranges = 10, 20, 30, 40, 50, 60, 70 --> Rank = 2
Value = 2.26 with Ranges = 0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95 --> Rank = 7
Value = 31 with Ranges = 10, 20, 30, 40, 60, 70, 80 --> Rank = 3
I may be missing something very easy within python to do this...but I've bumped my head on this wall for a few days with no progress.
Any help/pointers are appreciated.
def checker(term):
return term if term >= 0 else abs(term)+1e10
l1, v1 = [5, 35, 30, 25, -25, -30, -35], -13 # Desired: 4
l2, v2 = [5, 70, 65, 60, 40, 35, 30], 50 # Desired: 4
l3, v3 = [1, 40, 35, 30, 5, 3, 0], 6 # Desired: 4
l4, v4 = [10, 20, 30, 40, 50, 60, 70], 24 # Desired: 2
l5, v5 = [0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95], 2.26 # Desired: 7
l6, v6 = [10, 20, 30, 40, 60, 70, 80], 31 # Desired: 3
Result:
>>> print(*(sorted(l_+[val], key=checker).index(val) for
... l_, val in zip((l1,l2,l3,l4,l5,l6),(v1,v2,v3,v4,v5,v6))), sep='\n')
4
4
4
2
7
3
Taking the first example of -13.
y = [5, 35, 30, 25, -25, -30, -35]
value_to_check = -13
max_rank = len(y) # Default value in case no range found (as per 2.26 value example)
for ii in xrange(len(y)-1,0,-1):
if (y[ii] <= value_to_check <= y[ii-1]) or (y[ii] >= value_to_check >= y[ii-1]):
max_rank = ii
break
>>> max_rank
4
In function form:
def get_rank(y, value_to_check):
max_rank = len(y) # Default value in case no range found (as per 2.26 value example)
for ii in xrange(len(y)-1,0,-1):
if (y[ii] <= value_to_check <= y[ii-1]) or (y[ii] >= value_to_check >= y[ii-1]):
max_rank = ii
break
return max_rank
When you call:
>>> get_rank(y, value_to_check)
4
This correctly finds the answer for all your data:
def get_rank(l,n):
mindiff = float('inf')
minindex = -1
for i in range(len(l) - 1):
if l[i] <= n <= l[i + 1] or l[i + 1] <= n <= l[i]:
diff = abs(l[i + 1] - l[i])
if diff < mindiff:
mindiff = diff
minindex = i
if minindex != -1:
return minindex + 1
if n > max(l):
return len(l)
return 1
>>> test()
[5, 35, 30, 25, -25, -30, -35] -13 Desired: 4 Actual: 4
[5, 70, 65, 60, 40, 35, 30] 50 Desired: 4 Actual: 4
[1, 40, 35, 30, 5, 3, 0] 6 Desired: 4 Actual: 4
[10, 20, 30, 40, 50, 60, 70] 24 Desired: 2 Actual: 2
[0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95] 2.26 Desired: 7 Actual: 7
[10, 20, 30, 40, 60, 70, 80] 31 Desired: 3 Actual: 3
For completeness, here is my test() function, but you only need get_rank for what you are doing:
>>> def test():
lists = [[[5, 35, 30, 25, -25, -30, -35],-13,4],[[5, 70, 65, 60, 40, 35, 30],50,4],[[1, 40, 35, 30, 5, 3,0],6,4],[[10, 20, 30, 40, 50, 60, 70],24,2],[[0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95],2.26,7],[[10, 20, 30, 40, 60, 70, 80],31,3]]
for l,n,desired in lists:
print l,n,'Desired:',desired,'Actual:',get_rank(l,n)
This might be a simple problem but I haven't come up with a solution.
Say I have an array as np.array([0,1,0,1,0,0,0,1,0,1,0,0,1]) with peaks at indexes [1,3,7,9,12]. How can I replace the indexes with [2,8,12], that is, averaging indexes close in distance, if a threshold distance between peaks is set to be greater than 2 in this example?
Please note that the binary values of the array are just for illustration, the peak value can be any real number.
You could use Raymond Hettinger's cluster function:
from __future__ import division
def cluster(data, maxgap):
"""Arrange data into groups where successive elements
differ by no more than *maxgap*
>>> cluster([1, 6, 9, 100, 102, 105, 109, 134, 139], maxgap=10)
[[1, 6, 9], [100, 102, 105, 109], [134, 139]]
>>> cluster([1, 6, 9, 99, 100, 102, 105, 134, 139, 141], maxgap=10)
[[1, 6, 9], [99, 100, 102, 105], [134, 139, 141]]
"""
data.sort()
groups = [[data[0]]]
for item in data[1:]:
val = abs(item - groups[-1][-1])
if val <= maxgap:
groups[-1].append(item)
else:
groups.append([item])
return groups
peaks = [1,3,7,9,12]
print([sum(arr)/len(arr) for arr in cluster(peaks, maxgap=2)])
yields
[2.0, 8.0, 12.0]
I have a list:
d = [23, 67, 110, 25, 69, 24, 102, 109]
how can I group nearest values with a dynamic gap, and create a tuple like this, what is the fastest method? :
[(23, 24, 25), (67, 69), (102, 109, 110)]
Like
d = [23,67,110,25,69,24,102,109]
d.sort()
diff = [y - x for x, y in zip(*[iter(d)] * 2)]
avg = sum(diff) / len(diff)
m = [[d[0]]]
for x in d[1:]:
if x - m[-1][0] < avg:
m[-1].append(x)
else:
m.append([x])
print m
## [[23, 24, 25], [67, 69], [102, 109, 110]]
Fist we calculate an average difference between sequential elements and then group together elements whose difference is less than average.