Count elements if over a certain value - python

I have a list of elements with certain values of type float. I want to iterate over the elements and count them if they are over a certain value, but also only count them if they appear over the treshold value a minimum_count of times. So for example, if a have following input:
list_of_values = [2.0, 2.0, 2.0, 2.0, 0, 0, 2.0, 2.0, 2.0, 0, 0]
treshold_value = 1.0
minimum_count = 4
the answer should be 4, since the treshold_value 1.0 is consecutively exceeded 4 times only at indexes 0-3. I now have the code below,
for value in list_of_values:
if value >= treshold_value:
counter += 1
if counter >= (minimum_count):
time_use += 1
if value < min_treshold_value:
counter = 0
print(time_use)
I know there should be some pythonic way to achieve this :)
Edit: The sum of all consecutive subsequence values over the threshold should be counted.

The following use of groupby with a conditional generator and max with appropriate key function should work:
from itertools import groupby
len(max((list(g) for k, g in groupby(list_ov, key=lambda x: x > threshold) if k), key=len))
groupby groups an iterable by consecutive identical values wrt to the key function. It produces pairs of the key value and according sub-iterable.

You could use itertools.groupby() to help:
from itertools import groupby
def count_runs(list_of_values, threshold_value=1.0, minimum_count=4):
count = 0
for k, g in groupby(list_of_values, key=lambda x: x >= threshold_value):
if k:
g = list(g)
if len(g) >= minimum_count:
count += len(g)
return count
>>> count_runs([2.0, 2.0, 2.0, 0.0, 0, 0, 2.0, 2.0, 2.0, 0, 0])
0
>>> count_runs([2.0, 2.0, 2.0, 2.0, 0, 0, 2.0, 2.0, 2.0, 0, 0])
4
>>> count_runs([2.0, 2.0, 2.0, 2.0, 0, 0, 3.0, 2.0, 2.0, 2.0, 10.0, 0, 0])
9
This will provide the count of the number of values that are above the threshold in groups of minimum_count or more. Note that it handles multiple groups that match the criteria.
For example the groupby() for the last example will return the following:
>>> list_of_values = [2.0, 2.0, 2.0, 2.0, 0, 0, 3.0, 2.0, 2.0, 2.0, 10.0, 0, 0]
>>> for k, g in groupby(list_of_values, key=lambda x: x >= threshold_value):
... print(k, list(g))
...
True [2.0, 2.0, 2.0, 2.0]
False [0, 0]
True [3.0, 2.0, 2.0, 2.0, 10.0]
False [0, 0]
Any group of 1 or more values >= the threshold will appear in a group with key True. Only those with a length >= the minimum count will be considered further, where its length will be tallied with other such groups.
This code can be written more succinctly, and far less readably, like this:
def count_runs(list_of_values, threshold_value=1.0, minimum_count=4):
return sum(count for count in (len(list(g)) for k, g in groupby(list_of_values, key=lambda x: x >= threshold_value) if k) if count >= minimum_count)

just iterate over the list and create a dictionary with key = the float number and value = the number of times you encounter this number. and only add to dict floats that are greater then threshold . something like this:
d = {}
for f in list_of_values :
if f > treshold:
if d.get(f,False):
d[f] +=1
else:
d[f] = 1
max = 0
for k,v in d.iteritems():
if v> max:
max = v
return max

It looks like you don't care about the order. In this case, groupby isn't correct because it only groups adjacent elements.
You could use a Counter and two list comprehensions to filter values:
list_of_values = [2.0, 2.0, 2.0, 2.0, 0, 0, 3.0, 2.0, 2.0, 2.0, 10.0, 0, 0]
threshold_value = 1.0
minimum_count = 4
from collections import Counter
counter = Counter([x for x in list_of_values if x > threshold_value])
print(counter)
# Counter({2.0: 7, 3.0: 1, 10.0: 1})
print([(x, count) for x, count in counter.items() if count > minimum_count])
# [(2.0, 7)]

Related

How to find part of series in some series

The question is simple.
Suppose we have Series with this values:
srs = pd.Series([7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0])
How can I find place (index) of subseries 1.0, 2.0, 3.0?
Using a rolling window we can find the first occurrence of a list a.It puts a 'marker' (e.g. 0, any non-Nan value will be fine) at the end (right border) of the window. Then we use first_valid_index to find the index of this element and correct this value by the window size:
a = [1.0, 2.0, 3.0]
srs.rolling(len(a)).apply(lambda x: 0 if (x == a).all() else np.nan).first_valid_index()-len(a)+1
Output:
2
The simplest solution might be to use list comprehension:
a = srs.tolist() # [7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0]
b = [1.0, 2.0, 3.0]
[x for x in range(len(a)) if a[x:x+len(b)] == b]
# [2]
One naive way is to iterate over the series, subset the n elements and compare if they are equal to the given list:
Here the code:
srs = pd.Series([7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0])
sub_list = [1.0, 2.0, 3.0]
n = len(sub_list)
index_matching = []
for i in range(srs.shape[0] - n + 1):
sub_srs = srs.iloc[i: i+n]
if (sub_srs == sub_list).all():
index_matching.append(sub_srs.index)
print(index_matching)
# [RangeIndex(start=2, stop=5, step=1)]
Or in one line with list comprehension:
out = [srs.iloc[i:i+n].index for i in range(srs.shape[0] - n + 1) if (srs.iloc[i: i+n] == sub_list).all()]
print(out)
# [RangeIndex(start=2, stop=5, step=1)]
If you want an explicit list:
real_values = [[i for i in idx] for idx in out]
print(real_values)
# [[2, 3, 4]]

Find the indices of first positive elements in list - python

I am trying to find the indices of the starting position of each positive value sequence. I only got the position of the positive values ​​in the code. My code looks like following:
index = []
for i, x in enumerate(lst):
if x > 0:
index.append(i)
print index
I expect the output of [-1.1, 2.0, 3.0, 4.0, 5.0, -2.0, -3.0, -4.0, 5.5, 6.6, 7.7, 8.8, 9.9] to be [1, 8]
I think it would better if you use list comprehension
index = [i for i, x in enumerate(lst) if x > 0]
Currently you are selecting all indexes where the number is positive, instead you would want to collect the index only when a number switches from negative to positive.
Additionally you can handle all negative numbers, or numbers starting from positive as well
def get_pos_indexes(lst):
index = []
#Iterate over the list using indexes
for i in range(len(lst)-1):
#If first element was positive, add 0 as index
if i == 0:
if lst[i] > 0:
index.append(0)
#If successive values are negative and positive, i.e indexes switch over, collect the positive index
if lst[i] < 0 and lst[i+1] > 0:
index.append(i+1)
#If index list was empty, all negative characters were encountered, hence add -1 to index
if len(index) == 0:
index = [-1]
return index
print(get_pos_indexes([-1.1, 2.0, 3.0, 4.0, 5.0, -2.0, -3.0, -4.0, 5.5, 6.6, 7.7, 8.8, 9.9]))
print(get_pos_indexes([2.0, 3.0, 4.0, 5.0, -2.0, -3.0, -4.0, 5.5, 6.6, 7.7, 8.8, 9.9]))
print(get_pos_indexes([2.0,1.0,4.0,5.0]))
print(get_pos_indexes([-2.0,-1.0,-4.0,-5.0]))
The output will be
[1, 8]
[0, 7]
[0]
[-1]

Python How to Decompress a dictionary

I have a dictionary with:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
d = {'inds': inds, 'vals': vals}
print(d) will get me: {'inds': [0, 3, 7, 3, 3, 5, 1], 'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0,
7.0]}
As you can see, inds(keys) are not ordered, there are dupes, and there are missing ones: range is 0 to 7 but there are only 0,1,3,5,7 distinct integers. I want to write a function that takes the dictionary (d) and decompresses this into a full vector like shown below. For any repeated indices (3 in this case), I'd like to sum the corresponding values, and for the missing indices, want 0.0.
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Trying to write a function that returns me a final list... something like this:
def decompressor (d, n=None):
final_list=[]
for i in final_list:
final_list.append()
return(final_list)
# final_list.index: 0 1 2 3* 4 5 6 7
# final_list = [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Try it,
xyz = [0.0 for x in range(max(inds)+1)]
for i in range(max(inds)):
if xyz[inds[i]] != 0.0:
xyz[inds[i]] += vals[i]
else:
xyz[inds[i]] = vals[i]
Some things are still not clear to me but supposing you are trying to make a list in which the maximum index is the one you can find in your inds list, and you want a list as a result you can do something like this:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
#initialize a list of zeroes with lenght max index
res=[float(0)]*(max(inds)+1)
#[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
#Loop indexes and values in pairs
for i, v in zip(inds, vals):
#Add the value to the corresponding index
res[i] += v
print (res)
#[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
first you have to initialise the dictionary , ranging from min to max value in the inds list
max_id = max(inds)
min_id = min(inds)
my_dict={}
i = min_id
while i <= max_id:
my_dict[i] = 0.0
i = i+1
for i in range(len(inds)):
my_dict[inds[i]] += vals[i]
my_dict = {0: 1.0, 1: 7.0, 2: 0, 3: 11.0, 4: 0, 5: 6.0, 6: 0, 7: 3.0}

How do I create a vector which shows the sum of the individual values whenever there are repeated indices?

I have a dictionary β€˜d’ which stores a list of indices (d['inds']) and a list of values (d['vals']). For example:
d['inds'] == [0, 3, 7, 3, 3, 5, 1]
d['vals'] == [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
In the above example, the index 3 appears three times. How do I create a vector which shows the sum of the individual values whenever there are repeated indices? In other words, the vector corresponding to this example of d would be:
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
You can create a dictionary where the keys are the indices and the values are the total sum:
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
result = {}
for i, v in zip(d['inds'], d['vals']):
if i not in result:
result[i] = 0
result[i] += v
print(result)
Output
{0: 1.0, 1: 7.0, 3: 11.0, 5: 6.0, 7: 3.0}
If the list (vector) is mandatory can be done in the following way:
result = [0]*(max(d['inds']) + 1)
for i, v in zip(d['inds'], d['vals']):
result[i] += v
print(result)
Output
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]

Creating List From Dictionary that Specifies Position by Index and Skips Some Index Positions

I am working with the following dictionary:
d = {'inds':[0, 3, 7, 3, 3, 5, 1], 'vals':[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]}
I am wanting to create a new_list that takes the values in list d['vals'] and places them in new_list by corresponding index in list d['inds']. The ultimate result should be:
[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
This takes the following:
d['inds'] == [0, 3, 7, 3, 3, 5, 1]
d['vals'] == [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
For any index position not included in d['inds'] the corresponding value is 0.0.
For index positions that are repeated, the True value for that position is the sum of the individual values. For example, above 3 is repeated 3 times; so, new_list[3] should == 11, which is the sum of 2.0 + 4.0 + 5.0.
First, allocate a list of the appropriate length and full of zeroes:
result = [0] * (max(d['inds']) + 1)
Then loop over the indices and values and add them to the values in the list:
for ind, value in zip(d['inds'], d['vals']):
result[ind] += value
Output:
>>> result
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]
After collaborating with a co-worker, who helped walked me through this, the following was arrived at for a more dynamic function (to allow for different lengths of the resulting list):
import numpy as np
d ={
'inds': [0,3,7,3,3,5,1],
'vals': list(range(1,8))}
## this assumes the values in the list associated with the 'vals' key
## remain in numerical order due to range function.
def newlist(dictionary, length) ##length must be at least max(d['inds'])+1
out = np.zeroes(length)
for i in range (len(dictionary['inds'])):
out[dictionary['inds'][i]] += d['vals'][i]
return(out)

Categories

Resources