I am working on a project for my thesis, which has to do with the capitalization of Research & Development (R&D) expenses for a data set of companies that I have.
For those who are not familiar with financial terminology, I am trying to accumulate the values of each year's R&D expenses with the following ones by decaying its value (or "depreciating" it) every time period.
I was able to apply the following code to get the gist of the operation:
df['rd_capital'] = [(df['r&d_exp'].iloc[:i] * (1 - df['dep_rate'].iloc[:i]*np.arange(i)[::-1])).sum() for i in range(1,len(df)+1)]
However, there is a major flaw with this method, which is that it continues to take away the depreciation rate once the value has reached zero, therefore going into negative territory.
For example if we have Apple's R&D expenses for 5 years at a constant depreciation rate of 20%, the code above gives me the following result:
year r&d_exp dep_rate r&d_capital
0 1999 10 0.2 10
1 2000 8 0.2 16
2 2001 12 0.2 24.4
3 2002 7 0.2 25.4
4 2003 15 0.2 33
5 2004 8 0.2 30.6
6 2005 11 0.2 29.6
However, the value for the year 2005 is incorrect as it should be 31.6!
If it was not clear, r&d_capital is retrieved the following way:
2000 = 10*(1-0.2) + 8
2001 = 10*(1-0.4) + 8*(1-0.2) + 12
2002 = 10*(1-0.6) + 8*(1-0.4) + 12*(1-0.2) + 7
2003 = 10*(1-0.8) + 8*(1-0.6) + 12*(1-0.4) + 7*(1-0.2) + 15
the key problem comes here as the code above does the following:
2004 = 10*(1-1) + 8*(1-0.8) + 12*(1-0.6) + 7*(1-0.4) + 15*(1-0.2) + 8
2005 = 10*(1-1.2) + 8*(1-1) + 12*(1-0.8) + 7*(1-0.6) + 15*(1-0.4) + 8*(0.2) + 11
Instead it should discard the values once the value reaches zero, just like this:
2004 = 8*(1-0.8) + 12*(1-0.6) + 7*(1-0.4) + 15*(1-0.2) + 8
2005 = 12*(1-0.8) + 7*(1-0.6) + 15*(1-0.4) + 8*(0.2) + 11
Thank you in advance for any help that you will give, really appreciate it :)
A possible way would be to compute the residual part for each investment. The assumption is that there a finite and known number of years after which any investment is fully depreciated. Here I will use 6 years (5 would be enough but it demonstrates how to avoid negative depreciations):
# cumulated depreciation rates:
cum_rate = pd.DataFrame(index = df.index)
for i in range(2, 7):
cum_rate['cum_rate' + str(i)] = df['dep_rate'].rolling(i).sum().shift(1 - i)
cum_rate['cum_rate1'] = df['dep_rate']
cum_rate[cum_rate > 1] = 1 # avoid negative rates
# residual values
resid = pd.DataFrame(index = df.index)
for i in range(1, 7):
resid['r' + str(i)] = (df['r&d_exp'] * (1 - cum_rate['cum_rate' + str(i)])
).shift(i)
# compute the capital
df['r&d_capital'] = resid.apply('sum', axis=1) + df['r&d_exp']
It gives as expected:
year r&d_exp dep_rate r&d_capital
0 1999 10 0.2 10.0
1 2000 8 0.2 16.0
2 2001 12 0.2 24.4
3 2002 7 0.2 25.4
4 2003 15 0.2 33.0
5 2004 8 0.2 30.6
6 2005 11 0.2 31.6
You have to keep track of the absolute depreciation and stop depreciating when the asset reaches value zero. Look at the following code:
>>> exp = [10, 8, 12, 7, 15, 8, 11]
>>> dep = [0.2*x for x in exp]
>>> cap = [0]*7
>>> for i in range(7):
... x = exp[:i+1]
... for j in range(i):
... x[j] -=(i-j)*dep[j]
... x[j] = max(x[j], 0)
... cap[i] = sum(x)
...
>>> cap
[10, 16.0, 24.4, 25.4, 33.0, 30.599999999999998, 31.6]
>>>
In the for loops I calculate for every year the remaining value of all assets (in variable x). When this reaches zero, I stop depreciating. That is what the statement x[j] = max(x[j], 0) does. The sum of the value of all assets is then stored in cap[i].
Related
I'm trying to calculate daily returns using the time weighted rate of return formula:
(Ending Value-(Beginning Value + Net Additions)) / (Beginning value + Net Additions)
My DF looks like:
Account # Date Balance Net Additions
1 9/1/2022 100 0
1 9/2/2022 115 10
1 9/3/2022 117 0
2 9/1/2022 50 0
2 9/2/2022 52 0
2 9/3/2022 40 -15
It should look like:
Account # Date Balance Net Additions Daily TWRR
1 9/1/2022 100 0
1 9/2/2022 115 10 0.04545
1 9/3/2022 117 0 0.01739
2 9/1/2022 50 0
2 9/2/2022 52 0 0.04
2 9/3/2022 40 -15 0.08108
After calculating the daily returns for each account, I want to link all the returns throughout the month to get the monthly return:
((1 + return) * (1 + return)) - 1
The final result should look like:
Account # Monthly Return
1 0.063636
2 0.12432
Through research (and trial and error), I was able to get the output I am looking for but as a new python user, I'm sure there is an easier/better way to accomplish this.
DF["Numerator"] = DF.groupby("Account #")[Balance].diff() - DF["Net Additions"]
DF["Denominator"] = ((DF["Numerator"] + DF["Net Additions"] - DF["Balance"]) * -1) + DF["Net Additions"]
DF["Daily Returns"] = (DF["Numerator"] / DF["Denominator"]) + 1
DF = DF.groupby("Account #")["Daily Returns"].prod() - 1
Any help is appreciated!
import numpy as np
import matplotlib.pyplot as plt
Pa00 = 100.0 # systemic arterial pressure
Pic00 = 9.5 # intracranial pressure
Pc00 = 25.0 # capillary pressure
Pvs00 = 6.0 # dural sinus pressure
Ca00 = 0.15 # arterial compliance
Rvp = 0.068 # resistance of venous plexus
CBFTo = 10.5 + 2.0 # total CBF of supine state
CBFv = 2.0 # vertebral vein part CBF of supine state
CBFj = CBFTo - CBFv # jugular vein part CBF of supine state
CVP = 2.0 # central venous pressure
Q_normal = 12.5 # flow rate, ml/sec
unit_R = R_comp * 3.0 # arterial segment resistance
unit_C = 0.1 # arterial segment capacitance
Pvc = np.empty(22, dtype=float)
for i in range(1, 23):
if i < 10:
Pvc = Pa00 - i * unit_R * Q_normal
elif i > 13 & i < 23:
Pvc = CVP + i * R_comp * CBFj
elif i == 10:
Pvc = Pvc[8] - unit_R * Q_normal
Pvc = Ca00 * (Pvc - Pic00)
elif i == 11:
Pvc = Ca00
elif i == 12:
Pvc = Pic00
elif i == 13:
Pvc = Pvs00
print("{} {}".format(i, Pvc))
plt.plot(i, Pvc)
plt.show()
after running the above code, i got the result of 'print'.
1 99.95142857142856
2 99.90285714285714
3 99.85428571428571
4 99.80571428571429
5 99.75714285714285
6 99.70857142857143
7 99.66
8 99.61142857142858
9 99.56285714285714
10 2.136
11 2.1496
12 9.5
13 6.0
14 2.1904000000000003
15 2.204
16 2.2176
17 2.2312000000000003
18 2.2448
19 2.2584000000000004
20 2.2720000000000002
21 2.2856000000000005
22 2.2992000000000004
but i cannot get the result of plot.
Is there any problem about 'array' or 'plot'?
For several days, i was stuck in this problem.
Could you please let me know the solution about this problem?
you are actually assigning new values to Pvc variable instead of allocating it in the array. Use Pvc[index] = ... to assign it to a specific index.
Also, you need to record succession of i values in an array or generate a new array (e.g. np.arange(22)) else i will just contain it's most recent value (integer). In any case, you can get the same result omitting the x-val array in plot function (e.g plt.plot(Pvc))
Hello everyone i have such a problem. I need to filter my data accorging to equation.
What do i mean
For example i have such dataframe:
tonnage period_year
5 2,462,297.5 2014
13 2,274,912.9 2015
19 2,181,492.2 2015
20 2,173,654.8 2016
21 2,158,043.7 2016
... ... ...
92885 5.0 2016
92886 5.0 2016
92901 5.0 2016
94814 0.0 2016
94861 0.0 2013
and i have
data[data.tonnage > 0.02e6]['tonnage'].sum()/data.tonnage.sum() * 100.0
97.08690080799717
data[data.tonnage > 5e6]['tonnage'].sum()/data.tonnage.sum() * 100.0
18.541547916532426
so i need to find the max x where
data[data.tonnage > x]['tonnage'].sum()/data.tonnage.sum() * 100.0
will gave answer equal or greater than 40
what's the best way to do it?
Try this:
# Your sample input
df = pd.DataFrame({
'tonnage': [100,100,100,200,5,5,5,5,5]
})
# Get the sum of each unique value in `tonnage`
t = df.groupby('tonnage')['tonnage'].sum().sort_index(ascending=False)
# Since your requirement is "> x", we have to subtract the current value from the cumsum
ratio = (t.cumsum() - t) / t.sum() * 100
# And voila!
x = ratio[ratio >= 40].index[0]
I would like to take a weighted average of "cycle" based on a "day" as window. The window is not always the same. How do I compute weighted average in pandas?
In [3]: data = {'cycle':[34.1, 41, 49.0, 53.9, 35.8, 49.3, 38.6, 51.2, 44.8],
'day':[6,6,6,13,13,20,20,20,20]}
In [4]: df = pd.DataFrame(data, index=np.arange(9), columns = ['cycle', 'day'])
In [5]: df
Out[5]:
cycle day
0 34.1 6
1 41.0 6
2 49.0 6
3 53.9 13
4 35.8 13
5 49.3 20
6 38.6 20
7 51.2 20
8 44.8 20
I would expect three values (if I have done this correctly):
34.1 * 1/3 + 41 * 1/3 + 49 * 1/3 = 41.36
cycle day
41.36 6
6.90 13
45.90 20
If I'm understanding correctly, I think you just want:
df.groupby(['day']).mean()
Group on day, and then apply a lambda function that calculates the sum of the group and divides it by then number of non-null values within the group.
>>> df.groupby('day').cycle.apply(lambda group: group.sum() / group.count())
day
6 41.366667
13 44.850000
20 45.975000
Name: cycle, dtype: float64
Although you say weighted average, I don't believe there are any weights involved. It appears as a simple average of the cycle value for a particular day. In fact, a simple mean should suffice.
Also, I believe the value for day 13 should be calculated as 53.9 * 1/2 + 35.8 * 1/2 which yields 44.85. Same approach for day 20.
I have a dataframe of data that has a year column ('Year' and a dollar value column. I want to group by the year, then for each row, determine if the row is above the group's median by 20% or below the group's median by 20%.
I tried the following:
def f(x):
if x >= 1.2* np.median(x):
return 'H'
elif x<= .8* np.median(x):
return 'L'
transformed = df.groupby('Year').transform(f)
But I get an error saying the truth value of an array is ambiguous. This makes me think python is treating x in both the left and right hand side of the equation as the array of values, when in other transformation functions it knows on the left hand side the x is the row element and on the right hand side, where x is wrapped in an aggregation, x is the array.
Any idea on how to do this?
I think what you want is something like this:
n = 20
dr = randint(2000, 2014, size=n)
df = DataFrame({'year': dr, 'dollar': hstack((poisson(1000, size=n / 2), poisson(100000, size=n / 2)))})
def med_replace(x):
res = Series(index=x.index, name='med_cmp')
med = x.dollar.median()
upper = 1.2 * med
lower = 0.8 * med
res[x.dollar >= upper] = 'H'
res[x.dollar <= lower] = 'L'
res[(x.dollar > lower) & (x.dollar < upper)] = 'N'
return x.join(res)
df.groupby('year').apply(med_replace)
yielding:
dollar year med_cmp
0 1016 2004 N
1 956 2002 L
2 1044 2010 N
3 985 2008 L
4 1038 2001 L
5 997 2001 L
6 1015 2001 L
7 971 2012 L
8 1017 2013 N
9 1040 2010 N
10 99760 2001 H
11 99835 2001 H
12 100017 2012 H
13 99532 2001 H
14 100311 2011 N
15 100344 2002 H
16 100209 2007 N
17 99988 2008 H
18 100204 2007 N
19 100996 2005 N
A numpy ndarray is not a valid argument to bool unless its size is 0 or 1. This means that you cannot evaluate its "truthiness" in an if statement unless it has 0 or 1 elements. This is why you're getting the error you reported.