pandas adding column values in a loop without using iloc

pandas adding column values in a loop without using iloc - python

I want to add (ideally get the mean of the sum) of several column values starting with my index i,
investmentlength=list(range(1,13,1))
returns=list()
for i in range(0,len(stocks2)):
if stocks2['Startpoint'][i]==1:
nextmonth=nextmonth+stocks2['RET'][i+1]+stocks2['RET'][i+2]+stocks2['RET'][i+3]+....
counter+=1
Is there a way to give the beginning Index and the end index and prob step size and then sum it all in one command instead of copy and paste to death? I wanted to go trough all the different investment lengths and put the avg returns in the empty list.
SHRCD EXCHCD SICCD PRC VOL RET SHROUT \
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0
MV XRET IB ... PE2 \
DATE PERMNO ...
1970-08-31 10559.0 421680.000 0.025357 NaN ... 13.852692
12626.0 306360.000 0.082935 NaN ... 13.145312
12749.0 1578167.500 0.120873 NaN ... 25.970466
13100.0 332354.000 0.165942 NaN ... 9.990711
13653.0 17548.125 0.215630 NaN ... 6.273570
13936.0 45333.000 -0.058361 NaN ... 6.473123
14322.0 9983608.250 0.019109 NaN ... 22.204047
16969.0 592875.500 0.181551 NaN ... 11.948061
17072.0 79871.250 0.020787 NaN ... 8.845526
17304.0 202731.250 0.053211 NaN ... 8.641655
lagPE1 lagPE2 lagMV lagSEQ QUINTILE1 \
DATE PERMNO
1970-08-31 10559.0 13.852692 13.852692 412644.000 264.686 4
12626.0 13.145312 13.145312 281520.000 164.151 4
12749.0 25.970466 25.970466 1404742.500 367.519 5
13100.0 9.990711 9.990711 288921.375 414.820 3
13653.0 6.273570 6.273570 14372.750 24.958 1
13936.0 6.473123 6.473123 48289.500 76.986 1
14322.0 22.204047 22.204047 9790874.500 3439.802 5
16969.0 11.948061 11.948061 499536.500 NaN 4
17072.0 8.845526 8.845526 77840.625 NaN 3
17304.0 8.641655 8.641655 191525.000 307.721 3
QUINTILE2 avgvol avg Startpoint
DATE PERMNO
1970-08-31 10559.0 4 9229.057592 1697.2 0
12626.0 4 3654.367470 894.4 0
12749.0 5 188206.566860 5828.6 0
13100.0 3 94127.319048 3477.2 0
13653.0 1 816.393162 268.8 0
13936.0 1 71547.050633 553.2 0
14322.0 5 195702.521519 6308.8 0
16969.0 4 3670.297872 2002.0 0
17072.0 3 3774.083333 3867.8 0
17304.0 3 12622.112903 1679.4 0

Related

How to multiply different columns in different dataframes using Pandas

I have 2 dataframes that I want to multiply. I want to multiply multiple columns from dataframe 1 with one column in dataframe 2
raw_material_LCI = dataframe1[["climate change","ozone depletion",
"ionising radiation, hh","photochemical ozone formation, hh",
"particulate matter","human toxicity, non-cancer",
"human toxicity, cancer","acidification",
"eutrophication, freshwater","eutrophication, marine",
"eutrophication, terrestrial","ecotoxicity, freshwater",
"land use", "resource use, fossils","resource use, minerals and metals",
"water scarcity"]] * dataframe2["mass_frac"]
The above code returns a dataframe where all the values are NaN. The names of the columns all are fields with numeric values in them.
I decided to try multiply dataframe1 with just a single value to see if it worked e.g. example below
raw_material_LCI = dataframe1[["climate change","ozone depletion",
"ionising radiation, hh","photochemical ozone formation, hh",
"particulate matter","human toxicity, non-cancer",
"human toxicity, cancer","acidification",
"eutrophication, freshwater","eutrophication, marine",
"eutrophication, terrestrial","ecotoxicity, freshwater",
"land use", "resource use, fossils","resource use, minerals and metals",
"water scarcity"]] * 0.7
The example with the single value returns a dataframe with numbers, so it works. Does anyone know why the multiplication in the first instance does not work? I have looked at multiple articles on multiplying columns in different dataframes in Python and cannot find a solution.

You have to align both row and column indexes when you multiply two dataframes and align row index when you multiply a DataFrame by a Series:
>>> df
A B C D E
0 0.787081 0.350508 0.058542 0.492340 0.489379
1 0.512436 0.501375 0.108115 0.960808 0.841969
2 0.055247 0.305830 0.976043 0.016188 0.006424
3 0.303570 0.914876 0.157100 0.767454 0.340381
4 0.446077 0.595001 0.307799 0.115410 0.568281
5 0.226516 0.636902 0.086790 0.079260 0.402414
6 0.451920 0.526025 0.012470 0.931610 0.267155
7 0.472778 0.137005 0.227569 0.941355 0.584782
8 0.944396 0.769115 0.497214 0.531419 0.570797
9 0.788023 0.310288 0.336480 0.585466 0.432246
>>> sr
0 0.920878
1 0.445332
2 0.894407
3 0.613317
4 0.242270
5 0.299121
6 0.843052
7 0.279014
8 0.526778
9 0.249538
dtype: float64
So, this produces nan values:
>>> df * sr
A B C D E
0 0.724805 0.322775 0.053910 0.453385 0.450658
1 0.228204 0.223279 0.048147 0.427878 0.374956
2 0.049413 0.273536 0.872980 0.014479 0.005745
3 0.186185 0.561109 0.096352 0.470693 0.208762
4 0.108071 0.144151 0.074571 0.027961 0.137678
5 0.067756 0.190511 0.025961 0.023708 0.120371
6 0.380992 0.443466 0.010513 0.785396 0.225226
7 0.131912 0.038226 0.063495 0.262651 0.163162
8 0.497487 0.405153 0.261921 0.279940 0.300683
9 0.196642 0.077429 0.083965 0.146096 0.107862
but using mul along index axis works as expected:
>>> df.mul(sr, axis=0) # but not df.mul(sr) (same as df*sr)
A B C D E
0 0.724805 0.322775 0.053910 0.453385 0.450658
1 0.228204 0.223279 0.048147 0.427878 0.374956
2 0.049413 0.273536 0.872980 0.014479 0.005745
3 0.186185 0.561109 0.096352 0.470693 0.208762
4 0.108071 0.144151 0.074571 0.027961 0.137678
5 0.067756 0.190511 0.025961 0.023708 0.120371
6 0.380992 0.443466 0.010513 0.785396 0.225226
7 0.131912 0.038226 0.063495 0.262651 0.163162
8 0.497487 0.405153 0.261921 0.279940 0.300683
9 0.196642 0.077429 0.083965 0.146096 0.107862
If your series and dataframe have not the same length, you get a partial result:
>>> df.mul(sr.iloc[:5], axis=0)
A B C D E
0 0.724805 0.322775 0.053910 0.453385 0.450658
1 0.228204 0.223279 0.048147 0.427878 0.374956
2 0.049413 0.273536 0.872980 0.014479 0.005745
3 0.186185 0.561109 0.096352 0.470693 0.208762
4 0.108071 0.144151 0.074571 0.027961 0.137678
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN
>>> df.mul(sr.iloc[5:], axis=0)
A B C D E
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 0.067756 0.190511 0.025961 0.023708 0.120371
6 0.380992 0.443466 0.010513 0.785396 0.225226
7 0.131912 0.038226 0.063495 0.262651 0.163162
8 0.497487 0.405153 0.261921 0.279940 0.300683
9 0.196642 0.077429 0.083965 0.146096 0.107862
Take care to have the same index between instances.

Pandas dataframe merge row by addition

I want to create a dataframe from census data. I want to calculate the number of people that returned a tax return for each specific earnings group.
For now, I wrote this
census_df = pd.read_csv('../zip code data/19zpallagi.csv')
sub_census_df = census_df[['zipcode', 'agi_stub', 'N02650', 'A02650', 'ELDERLY', 'A07180']].copy()
num_of_returns = ['Number_of_returns_1_25000', 'Number_of_returns_25000_50000', 'Number_of_returns_50000_75000',
'Number_of_returns_75000_100000', 'Number_of_returns_100000_200000', 'Number_of_returns_200000_more']
for i, column_name in zip(range(1, 7), num_of_returns):
sub_census_df[column_name] = sub_census_df[sub_census_df['agi_stub'] == i]['N02650']
I have 6 groups attached to a specific zip code. I want to get one row, with the number of returns for a specific zip code appearing just once as a column. I already tried to change NaNs to 0 and to use groupby('zipcode').sum(), but I get 50 million rows summed for zip code 0, where it seems that only around 800k should exist.
Here is the dataframe that I currently get:
zipcode agi_stub N02650 A02650 ELDERLY A07180 Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more Amount_1_25000 Amount_25000_50000 Amount_50000_75000 Amount_75000_100000 Amount_100000_200000 Amount_200000_more
0 0 1 778140.0 10311099.0 144610.0 2076.0 778140.0 NaN NaN NaN NaN NaN 10311099.0 NaN NaN NaN NaN NaN
1 0 2 525940.0 19145621.0 113810.0 17784.0 NaN 525940.0 NaN NaN NaN NaN NaN 19145621.0 NaN NaN NaN NaN
2 0 3 285700.0 17690402.0 82410.0 9521.0 NaN NaN 285700.0 NaN NaN NaN NaN NaN 17690402.0 NaN NaN NaN
3 0 4 179070.0 15670456.0 57970.0 8072.0 NaN NaN NaN 179070.0 NaN NaN NaN NaN NaN 15670456.0 NaN NaN
4 0 5 257010.0 35286228.0 85030.0 14872.0 NaN NaN NaN NaN 257010.0 NaN NaN NaN NaN NaN 35286228.0 NaN
And here is what I want to get:
zipcode Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more
0 0 778140.0 525940.0 285700.0 179070.0 257010.0 850.0

here is one way to do it using groupby and sum the desired columns
num_of_returns = ['Number_of_returns_1_25000', 'Number_of_returns_25000_50000', 'Number_of_returns_50000_75000',
'Number_of_returns_75000_100000', 'Number_of_returns_100000_200000', 'Number_of_returns_200000_more']
df.groupby('zipcode', as_index=False)[num_of_returns].sum()
zipcode Number_of_returns_1_25000 Number_of_returns_25000_50000 Number_of_returns_50000_75000 Number_of_returns_75000_100000 Number_of_returns_100000_200000 Number_of_returns_200000_more
0 0 778140.0 525940.0 285700.0 179070.0 257010.0 0.0

This question needs more information to actually give a proper answer. For example you leave out what is meant by certain columns in your data frame:
- `N1: Number of returns`
- `agi_stub: Size of adjusted gross income`
According to IRS this has the following levels.
Size of adjusted gross income "0 = No AGI Stub
1 = ‘Under $1’
2 = '$1 under $10,000' 3 = '$10,000 under $25,000' 4 = '$25,000 under $50,000' 5 = '$50,000 under $75,000' 6 = '$75,000 under $100,000' 7 = '$100,000 under $200,000'
8 = ‘$200,000 under $500,000’
9 = ‘$500,000 under $1,000,000’
10 = ‘$1,000,000 or more’"
I got the above from https://www.irs.gov/pub/irs-soi/16incmdocguide.doc
With this information, I think what you want to find is the number of
people who filed a tax return for each of the income levels of agi_stub.
If that is what you mean then, this can be achieved by:
import pandas as pd
data = pd.read_csv("./data/19zpallagi.csv")
## select only the desired columns
data = data[['zipcode', 'agi_stub', 'N1']]
## solution to your problem?
df = data.pivot_table(
index='zipcode',
values='N1',
columns='agi_stub',
aggfunc=['sum']
)
## bit of cleaning up.
PREFIX = 'agi_stub_level_'
df.columns = [PREFIX + level for level in df.columns.get_level_values(1).astype(str)]
Here's the output.
In [77]: df
Out[77]:
agi_stub_level_1 agi_stub_level_2 ... agi_stub_level_5 agi_stub_level_6
zipcode ...
0 50061850.0 37566510.0 ... 21938920.0 8859370.0
1001 2550.0 2230.0 ... 1420.0 230.0
1002 2850.0 1830.0 ... 1840.0 990.0
1005 650.0 570.0 ... 450.0 60.0
1007 1980.0 1530.0 ... 1830.0 460.0
... ... ... ... ... ...
99827 470.0 360.0 ... 170.0 40.0
99833 550.0 380.0 ... 290.0 80.0
99835 1250.0 1130.0 ... 730.0 190.0
99901 1960.0 1520.0 ... 1030.0 290.0
99999 868450.0 644160.0 ... 319880.0 142960.0
[27595 rows x 6 columns]

Create new columns based on previous columns with multiplication

I want to create a list of columns where the new columns are based on previous columns times 1.5. It will roll until Year 2020. I tried to use previous and current but it didn't work as expected. How can I make it work as expected?
df = pd.DataFrame({
'us2000':[5,3,6,9,2,4],
}); df
a = []
for i in range(1, 21):
a.append("us202" + str(i))
for previous, current in zip(a, a[1:]):
df[current] = df[previous] * 1.5

IIUC you can fix you code with:
a = []
for i in range(0, 21):
a.append(f'us20{i:02}')
for previous, current in zip(a, a[1:]):
df[current] = df[previous] * 1.5
Another, vectorial, approach with numpy would be:
df2 = (pd.DataFrame(df['us2000'].to_numpy()[:,None]*1.5**np.arange(21),
columns=[f'us20{i:02}' for i in range(21)]))
output:
us2000 us2001 us2002 us2003 us2004 us2005 us2006 us2007 ...
0 5 7.5 11.25 16.875 25.3125 37.96875 56.953125 85.429688
1 3 4.5 6.75 10.125 15.1875 22.78125 34.171875 51.257812
2 6 9.0 13.50 20.250 30.3750 45.56250 68.343750 102.515625
3 9 13.5 20.25 30.375 45.5625 68.34375 102.515625 153.773438
4 2 3.0 4.50 6.750 10.1250 15.18750 22.781250 34.171875
5 4 6.0 9.00 13.500 20.2500 30.37500 45.562500 68.343750

Try:
for i in range(1, 21):
df[f"us{int(2000+i):2d}"] = df[f"us{int(2000+i-1):2d}"].mul(1.5)
>>> df
us2000 us2001 us2002 ... us2018 us2019 us2020
0 5 7.5 11.25 ... 7389.45940 11084.18910 16626.283650
1 3 4.5 6.75 ... 4433.67564 6650.51346 9975.770190
2 6 9.0 13.50 ... 8867.35128 13301.02692 19951.540380
3 9 13.5 20.25 ... 13301.02692 19951.54038 29927.310571
4 2 3.0 4.50 ... 2955.78376 4433.67564 6650.513460
5 4 6.0 9.00 ... 5911.56752 8867.35128 13301.026920
[6 rows x 21 columns]

pd.DataFrame(df.to_numpy()*[1.5**i for i in range(0,21)])\
.rename(columns=lambda x:str(x).rjust(2,'0')).add_prefix("us20")
out
us2000 us2001 us2002 ... us2018 us2019 us2020
0 5 7.5 11.25 ... 7389.45940 11084.18910 16626.283650
1 3 4.5 6.75 ... 4433.67564 6650.51346 9975.770190
2 6 9.0 13.50 ... 8867.35128 13301.02692 19951.540380
3 9 13.5 20.25 ... 13301.02692 19951.54038 29927.310571
4 2 3.0 4.50 ... 2955.78376 4433.67564 6650.513460
5 4 6.0 9.00 ... 5911.56752 8867.35128 13301.026920
[6 rows x 21 columns]

python pandas dataframe equivalent function logic for nonzero value calculation

This is a pinescript code which I am trying to code in python.what shall be an optimized equivalent python code be for the same
kama[1] here is previous kama value, for first-time calc in an array what should be done for this kama[1] value as it would not exist the first time.
kama=nz(kama[1], close[1])+smooth*(close[1]-nz(kama[1], close[1]))
pinescript info :
nz
Replaces NaN values with zeros (or given value) in a series.
nz(x, y) → integer
nz(sma(close, 100))
RETURNS
Two args version: returns x if it's a valid (not NaN) number, otherwise y
One arg version: returns x if it's a valid (not NaN) number, otherwise 0
ARGUMENTS
x (series) Series of values to process.
y (float) Value that will be inserted instead of all NaN values in x series.
edit 1
something i tried as below that is not working
stockdata['kama'] = stockdata['kama'](-1) if stockdata['kama'](-1) !=0 \
else stockdata['close'] + stockdata['smooth']*(stockdata['close'] - \
stockdata['kama'](-1) if stockdata['kama'](-1) !=0 else stockdata['close'])
edit 2
the alternative i tried just to make sure at least one part is working but that is also failing (nz(kama[1], close))
stockdata['kama'] = np.where(stockdata['kama'][-1] != 0, stockdata['kama'][-1], stockdata['close'])
completely struck now if this line of
kama=nz(kama[1], close)+smooth*(close-nz(kama[1], close))
pine-script code not converted to python my whole logic will go for a toss. any of your working solutions are greatly appreciated.
edit 3:
the dataframe input of the series
open high low close adjusted_close \
date
2002-07-01 5.2397 5.5409 5.2397 5.4127 0.0634
2002-07-02 5.5234 5.5370 5.4214 5.4438 0.0638
2002-07-03 5.5060 5.5458 5.3281 5.4661 0.0640
2002-07-04 5.5011 5.5720 5.4175 5.5283 0.0647
2002-07-05 5.5633 5.6566 5.4749 5.5905 0.0655
2002-07-08 5.5011 5.7187 5.5011 5.6255 0.0659
2002-07-09 5.5905 5.7586 5.5681 5.6167 0.0658
2002-07-10 5.4885 5.4885 5.1465 5.2222 0.0612
2002-07-11 4.9784 5.2135 4.9784 5.1863 0.0607
2002-07-12 5.5011 5.5011 5.2446 5.3194 0.0623
2002-07-15 5.3243 5.4797 5.1912 5.3330 0.0625
2002-07-16 5.1999 5.4389 5.1999 5.3155 0.0623
2002-07-17 4.7024 5.1377 4.6189 5.0445 0.0591
2002-07-18 4.8803 5.1465 4.8356 5.0804 0.0595
2002-07-19 5.0270 5.2038 5.0221 5.1513 0.0603
2002-07-22 5.0804 5.1465 4.9687 4.9735 0.0582
2002-07-23 4.8181 5.0843 4.8181 5.0619 0.0593
2002-07-24 5.0580 5.1290 4.9376 5.0619 0.0593
2002-07-25 5.0580 5.0580 4.7918 4.8492 0.0568
volume dividend_amount split_coefficient Om \
date
2002-07-01 21923 0.0 1.0 NaN
2002-07-02 61045 0.0 1.0 NaN
2002-07-03 34161 0.0 1.0 NaN
2002-07-04 27893 0.0 1.0 NaN
2002-07-05 58976 0.0 1.0 NaN
2002-07-08 48910 0.0 1.0 5.472433
2002-07-09 321846 0.0 1.0 5.530900
2002-07-10 138434 0.0 1.0 5.525083
2002-07-11 15027 0.0 1.0 5.437150
2002-07-12 24187 0.0 1.0 5.437150
2002-07-15 50330 0.0 1.0 5.397317
2002-07-16 24928 0.0 1.0 5.347117
2002-07-17 21357 0.0 1.0 5.199100
2002-07-18 27532 0.0 1.0 5.097733
2002-07-19 13380 0.0 1.0 5.105833
2002-07-22 21666 0.0 1.0 5.035717
2002-07-23 40161 0.0 1.0 4.951350
2002-07-24 34480 0.0 1.0 4.927700
2002-07-25 38185 0.0 1.0 4.986967
Hm Lm Cm vClose diff \
date
2002-07-01 NaN NaN NaN NaN 1669.8373
2002-07-02 NaN NaN NaN NaN 1669.8062
2002-07-03 NaN NaN NaN NaN 1669.7839
2002-07-04 NaN NaN NaN NaN 1669.7217
2002-07-05 NaN NaN NaN NaN 1669.6595
2002-07-08 5.595167 5.397117 5.511150 5.493967 1669.6245
2002-07-09 5.631450 5.451850 5.545150 5.539837 1669.6333
2002-07-10 5.623367 5.406033 5.508217 5.515675 1670.0278
2002-07-11 5.567983 5.347750 5.461583 5.453617 1670.0637
2002-07-12 5.556167 5.318933 5.426767 5.434754 1669.9306
2002-07-15 5.526683 5.271650 5.383850 5.394875 1669.9170
2002-07-16 5.480050 5.221450 5.332183 5.345200 1669.9345
2002-07-17 5.376567 5.063250 5.236817 5.218933 1670.2055
2002-07-18 5.319567 5.011433 5.213183 5.160479 1670.1696
2002-07-19 5.317950 5.018717 5.207350 5.162463 1670.0987
2002-07-22 5.258850 4.972733 5.149700 5.104250 1670.2765
2002-07-23 5.192950 4.910550 5.104517 5.039842 1670.1881
2002-07-24 5.141300 4.866833 5.062250 4.999521 1670.1881
2002-07-25 5.128017 4.895650 5.029700 5.010083 1670.4008
signal noise efratio smooth
date
2002-07-01 5.4127 1670.3373 0.003240 0.416113
2002-07-02 5.4438 1670.3062 0.003259 0.416113
2002-07-03 5.4661 1670.2839 0.003273 0.416114
2002-07-04 5.5283 1670.2217 0.003310 0.416115
2002-07-05 5.5905 1670.1595 0.003347 0.416116
2002-07-08 5.6255 1670.1245 0.003368 0.416116
2002-07-09 5.6167 1670.1333 0.003363 0.416116
2002-07-10 5.2222 1670.5278 0.003126 0.416110
2002-07-11 5.1863 1670.5637 0.003105 0.416109
2002-07-12 5.3194 1670.4306 0.003184 0.416111
2002-07-15 5.3330 1670.4170 0.003193 0.416111
2002-07-16 5.3155 1670.4345 0.003182 0.416111
2002-07-17 5.0445 1670.7055 0.003019 0.416107
2002-07-18 5.0804 1670.6696 0.003041 0.416107
2002-07-19 5.1513 1670.5987 0.003084 0.416109
2002-07-22 4.9735 1670.7765 0.002977 0.416106
2002-07-23 5.0619 1670.6881 0.003030 0.416107
2002-07-24 5.0619 1670.6881 0.003030 0.416107
2002-07-25 4.8492 1670.9008 0.002902 0.416104
what is expected for kama=nz(kama[1], close)+smooth*(close-nz(kama[1], close))?
stockdata['kama'] =nz(stockdata[kama][-1],stockdata['close'] +stockdata['smooth']*(stockdata['close']-nz(stockdata['kama'][-1],stockdata['close'])
in this case for first iteration there will not be any previous kama value which should be taken care. all the inputs are given in dataframe format above.

You need to create the column kama first with the values of close:
import numpy as np
stockdata['kama'] = stockdata['close']
previous_kama = stockdata['kama'].shift()
previous_close = stockdata['close'].shift()
value = np.where(previous_kama.notnull(), previous_kama, previous_close)
stockdata['kama'] = value + stockdata['smooth'] * (previous_close - value)

Pandas : How to calculate PCT Change for all columns dynamically?

I got the following pandas df by using the following command, how to get PCT Change for all the columns dynamically for AAL , AAN ... 100 more
price['AABA_PCT_CHG'] = price.AABA.pct_change()
AABA AAL AAN AABA_PCT_CHG
0 16.120001 9.635592 18.836105 NaN
1 16.400000 8.363149 23.105881 0.017370
2 16.680000 8.460282 24.892321 0.017073
3 17.700001 8.829385 28.275263 0.061151
4 16.549999 8.839100 27.705627 -0.064972
5 15.040000 8.654548 27.754738 -0.091239

Apply on dataframe like
In [424]: price.pct_change().add_suffix('_PCT_CHG')
Out[424]:
AABA_PCT_CHG AAL_PCT_CHG AAN_PCT_CHG
0 NaN NaN NaN
1 0.017370 -0.132057 0.226680
2 0.017073 0.011614 0.077315
3 0.061151 0.043628 0.135903
4 -0.064972 0.001100 -0.020146
5 -0.091239 -0.020879 0.001773
In [425]: price.join(price.pct_change().add_suffix('_PCT_CHG'))
Out[425]:
AABA AAL AAN AABA_PCT_CHG AAL_PCT_CHG AAN_PCT_CHG
0 16.120001 9.635592 18.836105 NaN NaN NaN
1 16.400000 8.363149 23.105881 0.017370 -0.132057 0.226680
2 16.680000 8.460282 24.892321 0.017073 0.011614 0.077315
3 17.700001 8.829385 28.275263 0.061151 0.043628 0.135903
4 16.549999 8.839100 27.705627 -0.064972 0.001100 -0.020146
5 15.040000 8.654548 27.754738 -0.091239 -0.020879 0.001773

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas adding column values in a loop without using iloc - python

Related

How to multiply different columns in different dataframes using Pandas

Pandas dataframe merge row by addition

Create new columns based on previous columns with multiplication

python pandas dataframe equivalent function logic for nonzero value calculation

Pandas : How to calculate PCT Change for all columns dynamically?

Categories

Resources