Given the following two lists:
dates = [1,2,3,4,5]
rates = [0.0154, 0.0169, 0.0179, 0.0187, 0.0194]
I would like to generate a list
df = []
of same lengths as dates and rates (0 to 4 = 5 elements) in 'pure' Python (without Numpy) as an exercise.
df[i] would be equal to:
df[0] = (1 / (1 + rates[0])
df[1] = (1 - df[0] * rates[1]) / (1 + rates[1])
...
df[4] = (1 - (df[0] + df[1]..+df[3])*rates[4]) / (1 + rates[4])
I was trying:
df = []
df.append(1 + rates[0]) #create df[0]
for date in enumerate(dates, start = 1):
running_sum_vec = 0
for i in enumerate(rates, start = 1):
running_sum_vec += df[i] * rates[i]
df[i] = (1 - running_sum_vec) / (1+ rates[i])
return df
but am getting as TypeError: list indices must be integers. Thank you.
So, the enumerate method return two values: index and value
>>> x = ['a', 'b', 'a']
>>> for y_count, y in enumerate(x):
... print('index: {}, value: {}'.format(y_count, y))
...
index: 0, value: a
index: 1, value: b
index: 2, value: a
It's because of for i in enumerate(rates, start = 1):. enumerate generates tuples of the index and the object in the list. You should do something like
for i, rate in enumerate(rates, start=1):
running_sum_vec += df[i] * rate
You'll need to fix the other loop (for date in enumerate...) as well.
You also need to move df[i] = (1 - running_sum_vec) / (1+ rates[i]) back into the loop (currently it will only set the last value) (and change it to append since currently it will try to set at an index out of bounds).
Not sure if this is what you want:
df = []
sum = 0
for ind, val in enumerate(dates):
df.append( (1 - (sum * rates[ind])) / (1 + rates[ind]) )
sum += df[ind]
Enumerate returns both index and entry.
So assuming the lists contain ints, your code can be:
df = []
df.append(1 + rates[0]) #create df[0]
for date in dates:
running_sum_vec = 0
for i, rate in enumerate(rates[1:], start = 1):
running_sum_vec += df[i] * rate
df[i] = (1 - running_sum_vec) / (1+ rate)
return df
Although I'm almost positive there's a way with list comprehension. I'll have to think about it for a bit.
Related
data picture
Sorry for inconvenience of picture of the data !
I get this data, I try to calculate EMA_20 a row base on EMA_20 row before
Example: calculate EMA_20 at index 1003 base on EMA_20 at index 1004, I try using vectorization for speed up but don't know how to specify the index at row
def vec_EMA(data ,indicator = 20):
K = 2/(indicator + 1)
if (data['index'].values[0] == len(data) - 1):
return data["close"] * K + data["SMA_" + str(indicator)] * (1- K)
return data["close"] * K + data["EMA_20"][data.index + 1] * (1- K)
new_data['EMA_20'] = vec_EMA(new_data)
The result just like on picture but it not exactly what I try to do
Expected out put is:
EMA_20 at index 1003 = data['close'] at index 1003 * K + EMA_20 at index 1004 * (1 - K) where K = 2/(20+1)
result is 47.13531746031746 not 39.158333
Instead trying to update the dataframe directly taking a list and finally returning the list from the function would be an easier approach
def vec_EMA(df,indicator=20):
EMA_20_list=[]
K = 2/(indicator + 1)
for index in df.index :
if index==len(df)-1: #indexing is done in reverse order starting from 0
value=df.loc[index,'close']*K + (1-K)*df.loc[index,"SMA_" +str(indicator)]
EMA_20_list.append(value)
else:
value=df.loc[index,'close']*K+ (1-K)* EMA_20_list[-1]
#EMA_20_list[-1] return above rows value
EMA_20_list.append(value)
return EMA_20_list
df['EMA_20']=vec_EMA(df)
list = [[159.2213, 222.2223, 101.2122]
[359.2222, 22.2210, 301.2144]]
if list[1][0] < list[0][0]:
avg = (list[1][0] + list[0][0] - 200)/2
else:
avg = (list[1][0] + list[0][0] + 200)/2
Hello! I want to do this for every column and output the results in another list.
Fix
You may loop iterate the number of cols there is
values = [[159.2213, 222.2223, 101.2122], [359.2222, 22.2210, 301.2144]]
avgs = []
for idx_col in range(len(values[0])):
if values[1][idx_col] < values[0][idx_col]:
avg = (values[1][idx_col] + values[0][idx_col] - 200) / 2
else:
avg = (values[1][idx_col] + values[0][idx_col] + 200) / 2
avgs.append(avg)
Simplify
You can use zip to iterate on both rows at a time, and simplify the if/else condition
avgs = []
for first_row, second_row in zip(*values):
factor = -1 if second_row < first_row else 1
avgs.append((first_row + second_row + (200 * factor)) / 2)
Best with numpy
Easy syntax and best performance
import numpy as np
values = np.array(values)
res = values.sum(axis=0) / 2
res += np.where(values[1] < values[0], -100, 100)
A list comprehension would look like this:
avg = [(x + y + (200 if x <= y else -200)) / 2 for x, y in zip(*lst)]
Arguably easier if you use numpy:
arr = np.array(lst)
avg = 0.5 * (arr.sum(axis=0) + np.copysign(200, np.diff(arr, axis=0)))
lis = [[159.2213, 222.2223, 101.2122],
[359.2222, 22.2210, 301.2144]]
res = []
for i in range(len(lis[0])):
if lis[1][i] < lis[0][i]:
res.append((lis[1][i] + lis[0][i] - 200)/2)
else:
res.append((lis[1][i] + lis[0][i] + 200)/2)
This should work, however using numpy would be a better solution for these kind of problems.
You can do it like this:
list = [[159.2213, 222.2223, 101.2122]
[359.2222, 22.2210, 301.2144]]
results = []
for x,y in zip(list[0],list[1]):
if y < x:
avg = (y + x - 200)/2
else:
avg = (y + x + 200)/2
results.append(avg)
I would like to convert y dataframe from one format (X:XX:XX:XX) of values to another (X.X seconds)
Here is my dataframe looks like:
Start End
0 0:00:00:00
1 0:00:00:00 0:07:37:80
2 0:08:08:56 0:08:10:08
3 0:08:13:40
4 0:08:14:00 0:08:14:84
And I would like to transform it in seconds, something like that
Start End
0 0.0
1 0.0 457.80
2 488.56 490.80
3 493.40
4 494.0 494.84
To do that I did:
i = 0
j = 0
while j < 10:
while i < 10:
if data.iloc[i, j] != "":
Value = (int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100)
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], Value)
i += 1
else:
NewValue = data.iloc[:, j].replace([data.iloc[i, j]], "")
i += 1
data.update(NewValue)
i = 0
j += 1
But I failed to replace the new values in my oldest dataframe in a permament way, when I do:
print(data)
I still get my old data frame in the wrong format.
Some one could hep me? I tried so hard!
Thank you so so much!
You are using pandas.DataFrame.update that requires a pandas dataframe as an argument. See the Example part of the update function documentation to really understand what update does https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
If I may suggest a more idiomatic solution; you can directly map a function to all values of a pandas Series
def parse_timestring(s):
if s == "":
return s
else:
# weird to use centiseconds and not milliseconds
# l is a list with [hour, minute, second, cs]
l = [int(nbr) for nbr in s.split(":")]
return sum([a*b for a,b in zip(l, (3600, 60, 1, 0.01))])
df["Start"] = df["Start"].map(parse_timestring)
You can remove the if ... else ... from parse_timestring if you replace all empty string with nan values in your dataframe with df = df.replace("", numpy.nan) then use df["Start"] = df["Start"].map(parse_timestring, na_action='ignore')
see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
The datetimelibrary is made to deal with such data. You should also use the apply function of pandas to avoid iterating on the dataframe like that.
You should proceed as follow :
from datetime import datetime, timedelta
def to_seconds(date):
comp = date.split(':')
delta = (datetime.strptime(':'.join(comp[1:]),"%H:%M:%S") - datetime(1900, 1, 1)) + timedelta(days=int(comp[0]))
return delta.total_seconds()
data['Start'] = data['Start'].apply(to_seconds)
data['End'] = data['End'].apply(to_seconds)
Thank you so much for your help.
Your method was working. I also found a method using loop:
To summarize, my general problem was that I had an ugly csv file that I wanted to transform is a csv usable for doing statistics, and to do that I wanted to use python.
my csv file was like:
MiceID = 1 Beginning End Type of behavior
0 0:00:00:00 Video start
1 0:00:01:36 grooming type 1
2 0:00:03:18 grooming type 2
3 0:00:06:73 0:00:08:16 grooming type 1
So in my ugly csv file I was writing only the moment of the begining of the behavior type without the end when the different types of behaviors directly followed each other, and I was writing the moment of the end of the behavior when the mice stopped to make any grooming, that allowed me to separate sequences of grooming. But this type of csv was not usable for easily making statistics.
So I wanted 1) transform all my value in seconds to have a correct format, 2) then I wanted to fill the gap in the end colonne (a gap has to be fill with the following begining value, as the end of a specific behavior in a sequence is the begining of the following), 3) then I wanted to create columns corresponding to the duration of each behavior, and finally 4) to fill this new column with the duration.
My questionning was about the first step, but I put here the code for each step separately:
step 1: transform the values in a good format
import pandas as pd
import numpy as np
data = pd.read_csv("D:/Python/TestPythonTraitementDonnéesExcel/RawDataBatch2et3.csv", engine = "python")
data.replace(np.nan, "", inplace = True)
i = 0
j = 0
while j < len(data.columns):
while i < len(data.index):
if (":" in data.iloc[i, j]) == True:
Value = str((int(data.iloc[i, j][0]) * 3600) + (int(data.iloc[i, j][2:4]) *60) + int(data.iloc[i, j][5:7]) + (int(data.iloc[i, j][8: 10])/100))
data = data.replace([data.iloc[i, j]], Value)
data.update(data)
i += 1
else:
i += 1
i = 0
j += 1
print(data)
step 2: fill the gaps
i = 0
j = 2
while j < len(data.columns):
while i < len(data.index) - 1:
if data.iloc[i, j] == "":
data.iloc[i, j] = data.iloc[i + 1, j - 1]
data.update(data)
i += 1
elif np.all(data.iloc[i:len(data.index), j] == ""):
break
else:
i += 1
i = 0
j += 4
print(data)
step 3: create a new colunm for each mice:
j = 1
k = 0
while k < len(data.columns) - 1:
k = (j * 4) + (j - 1)
data.insert(k, "Duree{}".format(k), "")
data.update(data)
j += 1
print(data)
step 3: fill the gaps
j = 4
i = 0
while j < len(data.columns):
while i < len(data.index):
if data.iloc[i, j - 2] != "":
data.iloc[i, j] = str(float(data.iloc[i, j - 2]) - float(data.iloc[i, j - 3]))
data.update(data)
i += 1
else:
break
i = 0
j += 5
print(data)
And of course, export my new usable dataframe
data.to_csv(r"D:/Python/TestPythonTraitementDonnéesExcel/FichierPropre.csv", index = False, header = True)
here are the transformations:
click on the links for the pictures
before step1
after step 1
after step 2
after step 3
after step 4
I am trying to sum the values in the 'Callpayoff' list however am unable to do so, print(Callpayoff) returns a vertical list:
0
4.081687878300656
1.6000410648454846
0.5024316862043037
0
so I wonder if it's a special sublist ? sum(Callpayoff) does not work unfortunately. Any help would be greatly appreciated.
def Generate_asset_price(S,v,r,dt):
return (1 + r * dt + v * sqrt(dt) * np.random.normal(0,1))
def Call_Poff(S,T):
return max(stream[-1] - S,0)
# initial values
S = 100
v = 0.2
r = 0.05
T = 1
N = 2 # number of steps
dt = 0.00396825
simulations = 5
for x in range(simulations):
stream = [100]
Callpayoffs = []
t = 0
for n in range(N):
s = stream[t] * Generate_asset_price(S,v,r,dt)
stream.append(s)
t += 1
Callpayoff = Call_Poff(S,T)
print(Callpayoff)
plt.plot(stream)
Right now you're not appending values to a list, you're just replacing the value of Callpayoff at each iteration and printing it. At each iteration, it's printed on a new line so it looks like a "vertical list".
What you need to do is use Callpayoffs.append(Call_Poff(S,T)) instead of Callpayoff = Call_Poff(S,T).
Now a new element will be added to Callpayoffs at every iteration of the for loop.
Then you can print the list with print(Callpayoffs) or the sum with print(sum(Callpayoffs))
All in all the for loop should look like this:
for x in range(simulations):
stream = [100]
Callpayoffs = []
t = 0
for n in range(N):
s = stream[t] * Generate_asset_price(S,v,r,dt)
stream.append(s)
t += 1
Callpayoffs.append(Call_Poff(S,T))
print(Callpayoffs,"sum:",sum(Callpayoffs))
Output:
[2.125034975231003, 0] sum: 2.125034975231003
[0, 0] sum: 0
[0, 0] sum: 0
[0, 0] sum: 0
[3.2142923036024342, 4.1390018820809615] sum: 7.353294185683396
If n = 4, m = 3, I have to select 4 elements (basically n elements) from a list from start and end. From below example lists are [17,12,10,2] and [2,11,20,8].
Then between these two lists I have to select the highest value element and after this the element has to be deleted from the original list.
The above step has to be performed m times and take the summation of the highest value elements.
A = [17,12,10,2,7,2,11,20,8], n = 4, m = 3
O/P: 20+17+12=49
I have written the following code. However, the code performance is not good and giving time out for larger list. Could you please help?
A = [17,12,10,2,7,2,11,20,8]
m = 3
n = 4
scoreSum = 0
count = 0
firstGrp = []
lastGrp = []
while(count<m):
firstGrp = A[:n]
lastGrp = A[-n:]
maxScore = max(max(firstGrp), max(lastGrp))
scoreSum = scoreSum + maxScore
if(maxScore in firstGrp):
A.remove(maxScore)
else:
ai = len(score) - 1 - score[::-1].index(maxScore)
A.pop(ai)
count = count + 1
firstGrp.clear()
lastGrp.clear()
print(scoreSum )
I would like to do that this way, you can generalize it later:
a = [17,12,10,2,7,2,11,20,8]
a.sort(reverse=True)
sums=0
for i in range(3):
sums +=a[i]
print(sums)
If you are concerned about performance, you should use specific libraries like numpy. This will be much faster !
A = [17,12,10,2,7,11,20,8]
n = 4
m = 3
score = 0
for _ in range(m):
sublist = A[:n] + A[-n:]
subidx = [x for x in range(n)] + [x for x in range(len(A) - n, len(A))]
sub = zip(sublist, subidx)
maxval = max(sub, key=lambda x: x[0])
score += maxval[0]
del A[maxval[1]]
print(score)
Your method uses a lot of max() calls. Combining the slices of the front and back lists allows you to reduce the amounts of those max() searches to one pass and then a second pass to find the index at which it occurs for removal from the list.