Finding an interval from a list

Finding an interval from a list - python

I have a list of including some cost values for a project. The list c says that a project started in year 1 (assuming c[0] = year 1, suspended in year 3, and completed at the end of year 4. So, there is no associated cost in year 5.
c = [3000000.0, 3000000.0, 0.0, 200000.0, 0.0]
From the list, I want to find the project length, which is basically 4 not 3 in the above example based on my way of programming. If the list would be as following:
d = [3000000.0, 3000000.0, 100000.0, 200000.0, 0.0]
I could have the following to solve my problem:
Input:
cc = 0
for i in d:
if i>0:
cc += 1
Output:
cc = 4
However, it does not really work when there is a suspension(gap) between two years. Any suggestions to make it work?

So, you want to find the position of last 0 in the list.
Look at this question
What I think is the best approach from the link above is:
last_0 = max(loc for loc, val in enumerate(c) if val == 0)
You can also calculate the first 0:
first_0 = min(loc for loc, val in enumerate(c) if val == 0)
And their difference is the length.
In one block:
zeros_indices = [loc for loc, val in enumerate(c) if val == 0]
length = max(zeros_indices) - min(zeros_indices)

if you want to find the last index + 1 (index start to 0) which is not 0 you can do :
>>> c = [3000000.0, 3000000.0, 0.0, 200000.0, 0.0, 0.0]
>>> cc=[c.index(i) for i in c if i!=0][-1]+1
>>> cc
4
EDIT :
you can use numpy to not take in account the first 0 in the list:
>>> c = [0.0, 0.0, 3000000.0, 3000000.0, 0.0, 200000.0, 0.0, 0.0]
>>> import numpy as np
>>> np.trim_zeros(c)
[3000000.0, 3000000.0, 0.0, 200000.0]
>>> len(np.trim_zeros(c))
4

You could look backwards through the list until you find the index of an element which has a value greater than zero, and if you add one to that index it will be the length in years with gaps allowed.

To include gap years to your counting, you can check that the current item is not the last one in the list.
You would better do this by iterating over indexes instead of elements to avoid having to find out your position within the list at each iteration.
cc = 0
for i in xrange(len(c)):
if c[i] > 0 and i < (len(c) - 1):
cc += 1
The xrange function generates a list from 0 to len(c). You could eihter use range instead but range keeps the generated list in memory while xrange does not.
EDIT: This does not handle multiple zeros at the end of the list c

c = [3000000.0, 3000000.0, 0.0, 0.0, 200000.0, 0.0]
cc = 0
templength = 0
for i in reversed(c):
if i == 0:
templength += 1
else:
break
print (len(c)-templength)
output:
5

Related

Improving performance for a nested for loop iterating over dates

I am looking to learn how to improve the performance of code over a large dataframe (10 million rows) and my solution loops over multiple dates (2023-01-10, 2023-01-20, 2023-01-30) for different combinations of category_a and category_b.
The working approach is shown below, which iterates over the dates for different pairings of the two-category data by first locating a subset of a particular pair. However, I would want to refactor it to see if there is an approach that is more efficient.
My input (df) looks like:
date
category_a
category_b
outflow
open
inflow
max
close
buy
random_str
0
2023-01-10
4
1
1
0
0
10
0
0
a
1
2023-01-20
4
1
2
0
0
20
nan
nan
a
2
2023-01-30
4
1
10
0
0
20
nan
nan
a
3
2023-01-10
4
2
2
0
0
10
0
0
b
4
2023-01-20
4
2
2
0
0
20
nan
nan
b
5
2023-01-30
4
2
0
0
0
20
nan
nan
b
with 2 pairs (4, 1) and (4,2) over the days and my expected output (results) looks like this:
date
category_a
category_b
outflow
open
inflow
max
close
buy
random_str
0
2023-01-10
4
1
1
0
0
10
-1
23
a
1
2023-01-20
4
1
2
-1
23
20
20
10
a
2
2023-01-30
4
1
10
20
10
20
20
nan
a
3
2023-01-10
4
2
2
0
0
10
-2
24
b
4
2023-01-20
4
2
2
-2
24
20
20
0
b
5
2023-01-30
4
2
0
20
0
20
20
nan
b
I have a working solution using pandas dataframes to take a subset then loop over it to get a solution but I would like to see how I can improve the performance of this using perhaps ;numpy, numba, pandas-multiprocessing or dask. Another great idea was to rewrite it in BigQuery SQL.
I am not sure what the best solution would be and I would appreciate any help in improving the performance.
Minimum working example
The code below generates the input dataframe.
import pandas as pd
import numpy as np
# prepare the input df
df = pd.DataFrame({
'date' : ['2023-01-10', '2023-01-20','2023-01-30', '2023-01-10', '2023-01-20','2023-01-30'] ,
'category_a' : [4, 4,4,4, 4, 4] ,
'category_b' : [1, 1,1, 2, 2,2] ,
'outflow' : [1.0, 2.0,10.0, 2.0, 2.0, 0.0],
'open' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ,
'inflow' : [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] ,
'max' : [10.0, 20.0, 20.0 , 10.0, 20.0, 20.0] ,
'close' : [0.0, np.nan,np.nan, 0.0, np.nan, np.nan] ,
'buy' : [0.0, np.nan,np.nan, 0.0, np.nan,np.nan],
'random_str' : ['a', 'a', 'a', 'b', 'b', 'b']
})
df['date'] = pd.to_datetime(df['date'])
# get unique pairs of category_a and category_b in a dictionary
unique_pairs = df.groupby(['category_a', 'category_b']).size().reset_index().rename(columns={0:'count'})[['category_a', 'category_b']].to_dict('records')
unique_dates = np.sort(df['date'].unique())
Using this input dataframe and Numpy, the code below is what I am trying to optmizize.
df = df.set_index('date')
day_0 = unique_dates[0] # first date
# Using Dictionary comprehension
list_of_numbers = list(range(len(unique_pairs)))
myset = {key: None for key in list_of_numbers}
for count_pair, value in enumerate(unique_pairs):
# pair of category_a and category_b
category_a = value['category_a']
category_b = value['category_b']
# subset the dataframe for the pair
df_subset = df.loc[(df['category_a'] == category_a) & (df['category_b'] == category_b)]
log.info(f" running for {category_a} and {category_b}")
# day 0
df_subset.loc[day_0, 'close'] = df_subset.loc[day_0, 'open'] + df_subset.loc[day_0, 'inflow'] - df_subset.loc[day_0, 'outflow']
# loop over single pair using date
for count, date in enumerate(unique_dates[1:], start=1):
previous_date = unique_dates[count-1]
df_subset.loc[date, 'open'] = df_subset.loc[previous_date, 'close']
df_subset.loc[date, 'close'] = df_subset.loc[date, 'open'] + df_subset.loc[date, 'inflow'] - df_subset.loc[date, 'outflow']
# check if closing value is negative, if so, set inflow to buy for next weeks deficit
if df_subset.loc[date, 'close'] < df_subset.loc[date, 'max']:
df_subset.loc[previous_date, 'buy'] = df_subset.loc[date, 'max'] - df_subset.loc[date, 'close'] + df_subset.loc[date, 'inflow']
elif df_subset.loc[date, 'close'] > df_subset.loc[date, 'max']:
df_subset.loc[previous_date, 'buy'] = 0
else:
df_subset.loc[previous_date, 'buy'] = df_subset.loc[date, 'inflow']
df_subset.loc[date, 'inflow'] = df_subset.loc[previous_date, 'buy']
df_subset.loc[date, 'close'] = df_subset.loc[date, 'open'] + df_subset.loc[date, 'inflow'] - df_subset.loc[date, 'outflow']
# store all the dataframes in a container myset
myset[count_pair] = df_subset
# make myset into a dataframe
result = pd.concat(myset.values()).reset_index(drop=False)
result
After which we can check that the solution is the same as what we expected.
from pandas.testing import assert_frame_equal
expected = pd.DataFrame({
'date' : [pd.Timestamp('2023-01-10 00:00:00'), pd.Timestamp('2023-01-20 00:00:00'), pd.Timestamp('2023-01-30 00:00:00'), pd.Timestamp('2023-01-10 00:00:00'), pd.Timestamp('2023-01-20 00:00:00'), pd.Timestamp('2023-01-30 00:00:00')] ,
'category_a' : [4, 4, 4, 4, 4, 4] ,
'category_b' : [1, 1, 1, 2, 2, 2] ,
'outflow' : [1, 2, 10, 2, 2, 0] ,
'open' : [0.0, -1.0, 20.0, 0.0, -2.0, 20.0] ,
'inflow' : [0.0, 23.0, 10.0, 0.0, 24.0, 0.0] ,
'max' : [10, 20, 20, 10, 20, 20] ,
'close' : [-1.0, 20.0, 20.0, -2.0, 20.0, 20.0] ,
'buy' : [23.0, 10.0, np.nan, 24.0, 0.0, np.nan] ,
'random_str' : ['a', 'a', 'a', 'b', 'b', 'b']
})
# check that the result is the same as expected
assert_frame_equal(result, expected)
SQL to create first table
The solution can also be in sql, if so you can use the following code to create the initial table.
I am busy trying to implement a solution in big query sql using a user defined function to keep the logic going too. This would be a nice approach to solving the problem too.
WITH data AS (
SELECT
DATE '2023-01-10' as date, 4 as category_a, 1 as category_b, 1 as outflow, 0 as open, 0 as inflow, 10 as max, 0 as close, 0 as buy, 'a' as random_str
UNION ALL
SELECT
DATE '2023-01-20' as date, 4 as category_a, 1 as category_b, 2 as outflow, 0 as open, 0 as inflow, 20 as max, NULL as close, NULL as buy, 'a' as random_str
UNION ALL
SELECT
DATE '2023-01-30' as date, 4 as category_a, 1 as category_b, 10 as outflow, 0 as open, 0 as inflow, 20 as max, NULL as close, NULL as buy, 'a' as random_str
UNION ALL
SELECT
DATE '2023-01-10' as date, 4 as category_a, 2 as category_b, 2 as outflow, 0 as open, 0 as inflow, 10 as max, 0 as close, 0 as buy, 'b' as random_str
UNION ALL
SELECT
DATE '2023-01-20' as date, 4 as category_a, 2 as category_b, 2 as outflow, 0 as open, 0 as inflow, 20 as max, NULL as close, NULL as buy, 'b' as random_str
UNION ALL
SELECT
DATE '2023-01-30' as date, 4 as category_a, 2 as category_b, 0 as outflow, 0 as open, 0 as inflow, 20 as max, NULL as close, NULL as buy, 'b' as random_str
)
SELECT
ROW_NUMBER() OVER (ORDER BY date) as " ",
date,
category_a,
category_b,
outflow,
open,
inflow,
max,
close,
buy,
random_str
FROM data

Efficient algorithm
First of all, the complexity of the algorithm can be improved. Indeed, (df['category_a'] == category_a) & (df['category_b'] == category_b) travels the whole dataframe and this is done for each item in unique_pairs. The running time is O(U R) where U = len(unique_pairs) and R = len(df).
An efficient solution is to perform a groupby, that is, to split the dataframe in M groups each sharing the same pair of category. This operation can be done in O(R) time where R is the number of rows in the dataframe. In practice, Pandas may implement this using a (comparison-based) sort running in O(R log R) time.
Faster access & Conversion to Numpy
Moreover, accessing a dataframe item per item using loc is very slow. Indeed, Pandas needs to locate the location of the column using an internal dictionary, find the row based on the provided date, extract the value in the dataframe based on the ith row and jth column, create a new object and return it, not to mention the several check done (eg. types and bounds). On top of that, Pandas introduces a significant overhead partially due to its code being interpreted using typically CPython.
A faster solution is to extract the columns ahead of time, and to iterate over the row using integers instead of values (like dates). The thing is the order of the sorted date may not be the one in the dataframe subset. I guess it is the case for your input dataframe in practice, but if it is not, then you can sort the dataframe of each precomputed groups by date. I assume all the dates are present in all subset dataframe (but again, if this not the case, you can correct the result of the groupby). Each column can be converted to Numpy so to the can be faster. The result is a pure-Numpy code, not using Pandas anymore. Computationally-intensive Numpy codes are great since they can often be heavily optimized, especially when the target arrays contains native numerical types.
Here is the implementation so far:
df = df.set_index('date')
day_0 = unique_dates[0] # first date
# Using Dictionary comprehension
list_of_numbers = list(range(len(unique_pairs)))
myset = {key: None for key in list_of_numbers}
groups = dict(list(df.groupby(['category_a', 'category_b'])))
for count_pair, value in enumerate(unique_pairs):
# pair of category_a and category_b
category_a = value['category_a']
category_b = value['category_b']
# subset the dataframe for the pair
df_subset = groups[(category_a, category_b)]
# Extraction of the Pandas columns and convertion to Numpy ones
col_open = df_subset['open'].to_numpy()
col_close = df_subset['close'].to_numpy()
col_inflow = df_subset['inflow'].to_numpy()
col_outflow = df_subset['outflow'].to_numpy()
col_max = df_subset['max'].to_numpy()
col_buy = df_subset['buy'].to_numpy()
# day 0
col_close[0] = col_open[0] + col_inflow[0] - col_outflow[0]
# loop over single pair using date
for i in range(1, len(unique_dates)):
col_open[i] = col_close[i-1]
col_close[i] = col_open[i] + col_inflow[i] - col_outflow[i]
# check if closing value is negative, if so, set inflow to buy for next weeks deficit
if col_close[i] < col_max[i]:
col_buy[i-1] = col_max[i] - col_close[i] + col_inflow[i]
elif col_close[i] > col_max[i]:
col_buy[i-1] = 0
else:
col_buy[i-1] = col_inflow[i]
col_inflow[i] = col_buy[i-1]
col_close[i] = col_open[i] + col_inflow[i] - col_outflow[i]
# store all the dataframes in a container myset
myset[count_pair] = df_subset
# make myset into a dataframe
result = pd.concat(myset.values()).reset_index(drop=False)
result
This code is not only faster, but also a bit easier to read.
Fast execution using Numba
At this point, the general solution is to use vectorized functions but this is really not easy to do that efficiently (if even possible) here due to the loop dependencies and the conditionals. A fast solution is to use a JIT compiler like Numba so to generate a very-fast implementation. Numba is designed to work efficiently on natively-typed Numpy arrays so this is the perfect use-case. Note that Numba need the input parameter to have a well-defined (native) type. Providing the types manually cause Numba to generate the code eagerly (during the definition of the function) instead of lazily (during the first execution).
Here is the final resulting code:
import numba as nb
#nb.njit('(float64[:], float64[:], float64[:], int64[:], int64[:], float64[:], int64)')
def compute(col_open, col_close, col_inflow, col_outflow, col_max, col_buy, n):
# Important checks to avoid out-of bounds that are
# not checked by Numba for sake of performance.
# If they are not true and not done, then
# the function can simply cause a crash.
assert col_open.size == n and col_close.size == n
assert col_inflow.size == n and col_outflow.size == n
assert col_max.size == n and col_buy.size == n
# day 0
col_close[0] = col_open[0] + col_inflow[0] - col_outflow[0]
# loop over single pair using date
for i in range(1, n):
col_open[i] = col_close[i-1]
col_close[i] = col_open[i] + col_inflow[i] - col_outflow[i]
# check if closing value is negative, if so, set inflow to buy for next weeks deficit
if col_close[i] < col_max[i]:
col_buy[i-1] = col_max[i] - col_close[i] + col_inflow[i]
elif col_close[i] > col_max[i]:
col_buy[i-1] = 0
else:
col_buy[i-1] = col_inflow[i]
col_inflow[i] = col_buy[i-1]
col_close[i] = col_open[i] + col_inflow[i] - col_outflow[i]
df = df.set_index('date')
day_0 = unique_dates[0] # first date
# Using Dictionary comprehension
list_of_numbers = list(range(len(unique_pairs)))
myset = {key: None for key in list_of_numbers}
groups = dict(list(df.groupby(['category_a', 'category_b'])))
for count_pair, value in enumerate(unique_pairs):
# pair of category_a and category_b
category_a = value['category_a']
category_b = value['category_b']
# subset the dataframe for the pair
df_subset = groups[(category_a, category_b)]
# Extraction of the Pandas columns and convertion to Numpy ones
col_open = df_subset['open'].to_numpy()
col_close = df_subset['close'].to_numpy()
col_inflow = df_subset['inflow'].to_numpy()
col_outflow = df_subset['outflow'].to_numpy()
col_max = df_subset['max'].to_numpy()
col_buy = df_subset['buy'].to_numpy()
# Numba-accelerated computation
compute(col_open, col_close, col_inflow, col_outflow, col_max, col_buy, len(unique_dates))
# store all the dataframes in a container myset
myset[count_pair] = df_subset
# make myset into a dataframe
result = pd.concat(myset.values()).reset_index(drop=False)
result
Feel free to change the type of the parameters if they do not match with the real-world input data-type (eg. int32 vs int64 or float64 vs int64). Note that you can replace things like float64[:] by float64[::1] if you know that the input array is contiguous which is likely the case. This generates a faster code.
Also please note that myset can be a list since count_pair is an increasing integer. This is simpler and faster but it might be useful in your real-world code.
Performance results
The Numba function call runs in about 1 µs on my machine as opposed to 7.1 ms for the initial code. This means the hot part of the code is 7100 times faster just on the tiny example. That being said, Pandas takes some time to convert the columns to Numpy, to create groups and to merge the dataframes. The former takes a small constant time negligible for large arrays. The two later operations take more time on bigger input dataframes and they are actually the main bottleneck on my machine (both takes 1 ms on the small example). Overall, the whole initial code takes 16.5 ms on my machine for the tiny example dataframe, while the new one takes 3.1 ms. This means a 5.3 times faster code just for this small input. On bigger input dataframes the speed-up should be significantly better. Finally, please not that df.groupby(['category_a', 'category_b']) was actually already precomputed so I am not even sure we should include it in the benchmark ;) .

How to replace values in a array?

I'm beggining to study python and saw this:
I have and array(km_media) that have nan values,
km_media = km / (2019 - year)
it happend because the variable year has some 2019.
So for the sake of learning, I would like to know how do to 2 things:
how can I use the replace() to substitute the nan values for 0 in the variable;
how can i print the variable that has the nan values with the replace.
What I have until now:
1.
km_media = km_media.replace('nan', 0)
print(f'{km_media.replace('nan',0)}')
Thanks

Not sure is this will do what you are looking for?
a = 2 / np.arange(5)
print(a)
array([ inf, 2. , 1. , 0.66666667, 0.5 ])
b = [i if i != np.inf or i != np.nan else 0 for i in a]
print(b)
Output:
[0, 2.0, 1.0, 0.6666666666666666, 0.5]
Or:
np.where(((a == np.inf) | (a == np.nan)), 0, a)
Or:
a[np.isinf(a)] = 0
Also, for part 2 of your question, I'm not sure what you mean. If you have just replaced the inf's with 0, then you will just be printing zeros. If you want the index position of the inf's you have replaced, you can grab them before replacement:
np.where(a == np.inf)[0][0]
Output:
0 # this is the index position of np.inf in array a

Get top values and positions of items in list of lists

I have used sklearn to fit and predict a model, but I want to have the top 5 predictions (in terms of probabilities) per item.
So I used predict_proba, which gave me a list of lists like:
probabilities = [[0.8,0.15,0.5,0,0],[0.4,0.6,0,0,0],[0,0,0,0,1]]
What I want to do, is loop over this list of lists to give me an overview of each prediction made, along with its position in the list (which represents the classes).
When using [i for i, j in enumerate(predicted_proba[0]) if j > 0] it returns me [0],[1] , which is what I want for the complete list of lists (and if possible also with the probability next to it).
When trying to use a for-loop over the above code, it returns an IndexError.

Something like this:
probabilities = [[0.8, 0.15, 0.5, 0, 0], [0.4, 0.6, 0, 0, 0], [0, 0, 0, 0, 1]]
for list in range(0,len(probabilities)):
print("Iteration_number:", list)
for index, prob in enumerate(probabilities[list]):
print("index", index, "=", prob)
Results in:
Iteration_number: 0
index 0 = 0.8
index 1 = 0.15
index 2 = 0.5
index 3 = 0
index 4 = 0
Iteration_number: 1
index 0 = 0.4
index 1 = 0.6
index 2 = 0
index 3 = 0
index 4 = 0
Iteration_number: 2
index 0 = 0
index 1 = 0
index 2 = 0
index 3 = 0
index 4 = 1

for i in predicted_proba:
for index, value in enumerate(i):
if value > 0:
print(index)
Hope this helps.

Split a pandas dataframe by a list of values from another data frame

I'm pretty sure there's a really simple solution for this and I'm just not realising it. However...
I have a data frame of high-frequency data. Call this data frame A. I also have a separate list of far lower frequency demarcation points, call this B. I would like to append a column to A that would display 1 if A's timestamp column is between B[0] and B[1], 2 if it is between B[1] and B[2], and so on.
As said, it's probably incredibly trivial, and I'm just not realising it at this late an hour.

Here is a quick and dirty approach using a list comprehension.
>>> df = pd.DataFrame({'A': np.arange(1, 3, 0.2)})
>>> A = df.A.values.tolist()
A: [1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.5, 2.6, 2.8]
>>> B = np.arange(0, 3, 1).tolist()
B: [0, 1, 2]
>>> BA = [k for k in range(0, len(B)-1) for a in A if (B[k]<=a) & (B[k+1]>a) or (a>max(B))]
BA: [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Use searchsorted:
A['group'] = B['timestamp'].searchsorted(A['timestamp'])
For each value in A['timestamp'], an index value is returned. That index indicates where amongst the sorted values in B['timestamp'] that value from A would be inserted into B in order to maintain sorted order.
For example,
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 10
A = pd.DataFrame({'timestamp':np.random.uniform(0, 1, size=N).cumsum()})
B = pd.DataFrame({'timestamp':np.random.uniform(0, 3, size=N).cumsum()})
# timestamp
# 0 1.739869
# 1 2.467790
# 2 2.863659
# 3 3.295505
# 4 5.106419
# 5 6.872791
# 6 7.080834
# 7 9.909320
# 8 11.027117
# 9 12.383085
A['group'] = B['timestamp'].searchsorted(A['timestamp'])
print(A)
yields
timestamp group
0 0.896705 0
1 1.626945 0
2 2.410220 1
3 3.151872 3
4 3.613962 4
5 4.256528 4
6 4.481392 4
7 5.189938 5
8 5.937064 5
9 6.562172 5
Thus, the timestamp 0.896705 is in group 0 because it comes before B['timestamp'][0] (i.e. 1.739869). The timestamp 2.410220 is in group 1 because it is larger than B['timestamp'][0] (i.e. 1.739869) but smaller than B['timestamp'][1] (i.e. 2.467790).
You should also decide what to do if a value in A['timestamp'] is exactly equal to one of the cutoff values in B['timestamp']. Use
B['timestamp'].searchsorted(A['timestamp'], side='left')
if you want searchsorted to return i when B['timestamp'][i] <= A['timestamp'][i] <= B['timestamp'][i+1]. Use
B['timestamp'].searchsorted(A['timestamp'], side='right')
if you want searchsorted to return i+1 in that situation. If you don't specify side, then side='left' is used by default.

Find position in a csv list

Say I have the following CSV file;
1.41, 123456
1.42, 123456
1.43, 123456
and i want to find the "position"/location of a value in row 0 i.e. "1.41, 1.42, 1.43" in this case, depending on whether the particular row value is greater than or equal to an arbitrary inputted value.
So for example if the inputted value was 1.42, we would return position 0 and 1, or if the inputted value was 1.4, we would return 0, 1 and 2. Likewise, if the value was 1.45 we would not return any positions. Here is what i have:
out = open("test.csv","rU")
dataf=csv.reader(out, delimiter=',')
for row in dataf:
x = row[0]
for i in xrange(len(x)):
if x[i] >=1 :
print i
only to get,
0
1
2
3
0
1
2
3
0
1
2
3
so then i use
for i in xrange(len(x)):
if x[i] >=2 :
print i
But i still get the same position values. Can anyone steer me in the right direction?

From what I can gather, this does what you're asking...
#!/usr/bin/env python
import csv
value = 1.42
out = open("test.csv","rU")
dataf=csv.reader(out, delimiter=',')
matches = [float(row[0]) >= value for row in dataf]
matches.reverse()
for i,m in enumerate(matches):
if m:
print i
matches is a list of boolean values, that shows whether the first column in each row is greater than value. It looks like you want to order from the bottom up, so I reversed the list. The loop prints the index of the (reversed) list if the value in the first column was greater than or equal to value.
value = 1.42
output:
0
1
value = 1.4
output:
0
1
2
value = 1.45
no output.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding an interval from a list - python

You could look backwards through the list until you find the index of an element which has a value greater than zero, and if you add one to that index it will be the length in years with gaps allowed.

c = [3000000.0, 3000000.0, 0.0, 0.0, 200000.0, 0.0] cc = 0 templength = 0 for i in reversed(c): if i == 0: templength += 1 else: break print (len(c)-templength) output: 5

Related

Improving performance for a nested for loop iterating over dates

How to replace values in a array?

Get top values and positions of items in list of lists

Split a pandas dataframe by a list of values from another data frame

Find position in a csv list

Categories

Resources