Compute balance column python dataframe with initial static value - python

i am trying to get a balance column in a python dataframe with an initial static value.
The logic:
start balance = 1000
current balance = previous current balance*(1+df['return'])
My attempt:
df.at[1,'current balance'] = 1000
df['current balance'] = df['current balance'].shift(1)*(1+df['return])
I can't get this output
Output dataframe:
return current balance
0.01 1010.00
0.03 1040.30
0.045 1087.11

Standard compound return:
initial_balance = 1000
df['current balance'] = (1 + df['return']).cumprod() * initial_balance
>>> df
return current balance
0 0.010 1010.0000
1 0.030 1040.3000
2 0.045 1087.1135

I would approach this by getting my df columns ready in lists and then add those rather than changing values in the df itself.
There's probably a more elegant solution but this should work:
df_return = [0.00, 0.01, 0.03, 0.045]
df_current_balance = [1000]
# Calculate the current value for each return value
for x in range(1, len(df_return)):
df_current_balance.append(df_current_balance[x - 1] * (1 + df_return[x]))
output = {"return": df_return[1:], "current balance": df_current_balance[1:]}
df = pd.DataFrame(output)

Related

How to replace an entire column with a for loop function?

input = '18.7m, 17.7m, 11.1m, 9.3m, 6.9m, 9.9m, 4.4m, 1.8m, 7.9m'
multipliers = {'k': 1e3,
'm': 1e6,
}
pattern = r'([0-9.]+)([km])'
for number, suffix in re.findall(pattern, input):
number = float(number)
print(number * multipliers[suffix])
for views in df_tiktok: #THIS IS WHERE I GET STUCK
I want to apply this for loop to a few columns containing variables such as '18.7M, 17.7M, and 645.1k'. Basically, I would like to apply the above for loop to these columns.
Views avg.
Likes avg
Comments avg.
Shares avg
3
18.7M
2.6M
54.7K
6
17.7M
2.3M
18K
FIRST ATTEMPT:
likes_avg = [[likes] for likes in tiktok_data]
if 'K' in likes :
like=likes.strip('K')
like=float(like)*1000
likes_avg.append(round(like))
if 'M' in likes :
like=likes.strip('M')
like=float(like)*1000000
likes_avg.append(round(like))
SECOND ATTEMPT:
input = '18.7m, 17.7m, 11.1m, 9.3m, 6.9m, 9.9m, 4.4m, 1.8m, 7.9m'
multipliers = {'k': 1e3,
'm': 1e6,
}
pattern = r'([0-9.]+)([km])'
for number, suffix in re.findall(pattern, input):
number = float(number)
print(number * multipliers[suffix])
You can simplify it a lot by creating a function
def modify(input):
if isinstance(input, int) or isinstance(input,float) :
return input
if 'K' in input:
return float(input.replace('K',''))*1000
elif 'M' in input:
return float(input.replace('M',''))*1000000
else:
return input
Then apply this modify function on each column of your df
for c in df.columns:
df[c]= df[c].apply(lambda x: modify(x))
This is what I get after testing
for c in df.columns:
df[c] = df[c].apply(lambda x: modify(x))
Output
Views avg. Likes avg Comments avg. Shares avg
0 3 18700000.0 2600000.0 54700.0
1 6 17700000.0 2300000.0 18000.0
Or you can use applymap for entire df
df.applymap(lambda x: modify(x))
Out[259]:
Views avg. Likes avg Comments avg. Shares avg
0 3 18700000.0 2600000.0 54700.0
1 6 17700000.0 2300000.0 18000.0

How can I implement the current algorithm into a for loop in increments?

I need the loop to follow the algorithm of the code below. Every loop should take the first day (2) and add the average (.3) to it.
startOrganism = 2
startOrganism = int(input("Starting number of organisms:"))
dailyInc = .3
dailyInc = float(input("Average daily increase (percent): "))
numberDaysMulti = 10
numberDaysMulti = int(input("Number of days to multiply: "))
for x in range(numberDaysMulti):
(Needs to be below)
Day 1 2
Day 2 2.6
Day 3 3.38
etc.
Ok, I've assumed that the user will input the percent daily increase as 0.3 as opposed to 30:
startOrganism = int(input("Starting number of organisms:"))
dailyInc = float(input("Average daily increase (percent): "))
numberDaysMulti = int(input("Number of days to multiply: "))
percDailyInc = 1 + dailyInc
for day in range(numberDaysMulti):
print(f'Day {day+1} {startOrganism:.2f}')
startOrganism *= percDailyInc
Sample Output:
Day 1 2.00
Day 2 2.60
Day 3 3.38
Day 4 4.39
Day 5 5.71
Day 6 7.43
Day 7 9.65
Day 8 12.55
Day 9 16.31
Day 10 21.21
This sort of computation can be handled mathematically, using the following formula:
organisms = starting_organisms * ((1.0 + growth_rate) ^ days)
Where the growth_rate is the percent increase per unit of time, in your case days. Then it becomes a matter of simply printing out the value for each day.
starting_organisms = int(input("Starting number of organisms:"))
growth_rate = float(input("Average daily increase in percent): ")) # 0.3 = 30%
period = int(input("Number of days to grow:"))
growth = [(starting_organisms * ((1.0 + growth_rate) ** day) for day in range(period)]
# The list 'growth' has all our values, now we just print them out:
for day, population in enumerate(growth):
print(f"Day {day} {population}")
(This uses a zero-index for the days. That is, 'day 0' will be your starting number. Indexing this way saves you off-by-one errors, but you can change the print statement to be {day + 1} to hide this data structuring from the user.)
There is an alternative method where you carry state in each loop, calculating the next day as you iterate. However, I'd recommend separating the concern of calculation and display - that instinct will serve you well going forward.
The growth = [... for day in range(period)] is a list comprehension, which works much like a one-line for-loop.
startOrganism = 0
startOrganism = int(input("Starting number of organisms:"))
dailyInc = 0
dailyInc = float(input("Average daily increase (percent): "))
dayApproximate = 0
numberDaysMulti = 0
numberDaysMulti = int(input("Number of days to multiply: "))
for x in range(numberDaysMulti):
startOrganism = dailyInc * startOrganism + startOrganism
print(startOrganism)

Having a hard time grasping *= function

So if
balance = int(100)
balance *= 0.05
since balance is mutable should'nt that equal to 105? instead i just get 5.
and if i add another line of code such as
balance = int(100)
balance *= 0.05
balance *= 0.05
the output would be 0.25, essentially my variable is not carrying over and im just multiplying the end outcome to 5%
if i add
balance= int(100)
balance *= 0.05 + balance
i get 10005
I thought += or *= function could be used for an equation that would take a variable, do the equation then carry over the variable + the outcome as the new variable.
How do i do that for a multi step equation.
balance = int(100)
balance *= 0.05
is the same as
balance = int(100)
balance = balance * 0.05
Wouldn't you say that that's 5, not 105?
A *= B is just a shorthand for A = A * B.
Your third example is the same as:
balance= int(100)
balance = balance * (0.05 + balance)
Again, you're getting what I would think you'd expect from this code.
BTW, you don't need the int(). 100 by itself is a literal value of type 'int'. So the most concise way to state your first code block is:
balance = 100 * .05
Sorry for saying this but you have to first under the python or any programming language basics.
'+' is addition sigh
'*' is multiplication sign
A = 2 + 3
gives 5 as answer, and
A = 2 * 3 will give 6 as answer.
Secondly, '+=' , '*=' are shorthand where the operation's first value is the same space where you want to save the result.
like
A = 5
and want to add 3 in this same 'A'
A = A + 3 or can also be written as A += 3,
similarly for multiplication
A = 100
A = 100 * 0.05 can also be written as A *= 0.05
it will give A as 5
So, good luck.

Adding simple moving average as an additional column to python DataFrame

I have sales data in sales_training.csv that looks like this -
time_period sales
1 127
2 253
3 123
4 253
5 157
6 105
7 244
8 157
9 130
10 221
11 132
12 265
I want to add 3rd column that contains the moving average. My code -
import pandas as pd
df = pd.read_csv("./Sales_training.csv", index_col="time_period")
periods = df.index.tolist()
period = int(input("Enter a period for the moving average :"))
sum1 = 0
for i in periods:
if i < period:
df['forecast'][i] = i
else:
for j in range(period):
sum1 += df['sales'][i-j]
df['forecast'][i] = sum1/period
sum1 = 0
print(df)
df.to_csv("./forecast_mannual.csv")
This is giving KeyError: 'forecast' at the line df['forecast'][i] = i. What is the issue?
one simple solution for it, just df['forecast'] = df['sales']
import pandas as pd
df = pd.read_csv("./Sales_training.csv", index_col="time_period")
periods = df.index.tolist()
period = int(input("Enter a period for the moving average :"))
sum1 = 0
df['forecast'] = df['sales'] # add one line
for i in periods:
if i < period:
df['forecast'][i] = i
else:
for j in range(period):
sum1 += df['sales'][i-j]
df['forecast'][i] = sum1/period
sum1 = 0
print(df)
df.to_csv("./forecast_mannual.csv")
Your code is giving 'keyerror' because of incorrect way of referencing column value for 'forecast'.Because the first time your code runs,'forecast' column is not yet created and when it tries to reference df'forecast' for first iteration then it gives key error.
Here,our task is to update values in dynamically created new column called 'forecast'. Therefore, instead of df['forecast'][i] you can write df.at[i,'forecast'].
There is another issue in the code.When value of i is less than period you are assigning 'i' to forecast which to my understanding is not correct.It should not display any thing in such case.
Here is my version of corrected code:
import pandas as pd
df = pd.read_csv("./sales.csv", index_col="time_period")
periods = df.index.tolist()
period = int(input("Enter a period for the moving average :"))
sum1 = 0
for i in periods:
print(i)
if i < period:
df.at[i,'forecast'] = ''
else:
for j in range(period):
sum1 += df['sales'][i-j]
df['forecast'][i] = sum1/period
sum1 = 0
print(df)
df.to_csv("./forecast_mannual.csv")
Output when I entered period=2 to calculate moving average:
Hope this helps.

While loop overwrites field

I am trying to execute a loop that calculates a new field "DELTA", using values in the existing dataframe. My goal is to use the DELTA field to calculate the next row of the field "QUALITY", until the loop is complete.
import pandas as pd
import csv
import numpy as np
Input = pd.read_csv('C:/PyTemp/Input.csv')
Input = pd.DataFrame(Input)
print(Input)
QUALITY AGE
0 15 10
AGE = Input['AGE']
QUALITY = Input['QUALITY']
loopcount = 2
i = 1
while i < loopcount:
Input['DELTA'] = QUALITY * .1
Input2 = pd.DataFrame(Input)
Input2['AGE'] = Input['AGE'] + 1
Input2['DELTA'] = Input2['QUALITY'] * .1
Input2['QUALITY'] = Input2['QUALITY'] + Input2['DELTA']
Input = Input.append(Input2)
i += 1
print (Input)
My result:
QUALITY AGE DELTA
0 16.5 11 1.5
0 16.5 11 1.5
I this is what I am after:
QUALITY AGE DELTA
0 15 10 1.5
0 16.5 11 1.65
This behavior is due to this line in your loop:
Input2 = pd.DataFrame(Input)
You think that you are making a copy of Input, but instead you are making a view, so when you change the values in Input2 you change Input also. The DataFrame constructor has a copy keyword that defaults to False. You can test this by looking at the underlying values
Input2.values.base is Input.values.base
If this is True, you have a view. Change the line to
Input2 = pd.DataFrame(Input, copy=True)
The problem is that Pandas Dataframe saves a reference and not a copy. So every operation is done on your Input and Input2 just shows what in Input is set. See also pandas.DataFrame Reference
import pandas as pd
import csv
import numpy as np
Input = pd.DataFrame({'AGE': 10, 'QUALITY':15}, index=[0])
print(Input)
AGE = Input['AGE']
QUALITY = Input['QUALITY']
loopcount = 2
i = 1
while i < loopcount:
Input['DELTA'] = QUALITY * .1
Input2 = pd.DataFrame(Input, copy=True) # Here is the change
Input2['AGE'] = Input['AGE'] + 1
Input2['DELTA'] = Input2['QUALITY'] * .1
Input2['QUALITY'] = Input2['QUALITY'] + Input2['DELTA']
Input = Input.append(Input2)
i += 1
print (Input)
This outputs
AGE QUALITY DELTA
0 10 15.0 1.5
0 11 16.5 1.5
which is not exactly what you wanted but I'm not really sure what logic is wanted so I'm not able to alter the commands.

Categories

Resources