how to subtract within pandas dataframe - python

I have a question on arithmetic within a dataframe. Please note that each of the below columns in my dataframe are based on one another except for 'holdings'
Here is a shortened version of my dataframe
'holdings' & 'cash' & 'total'
0.0 10000.0 10000.0
0.0 10000.0 10000.0
1000 9000.0 10000.0
1500 10000.0 11500.0
2000 10000.0 12000.0
initial_cap = 10000.0
But here is my problem... the first time I have holdings, the cash is calculated correctly where cash of 10000.0 - holdings of 1000.0 = 9000.0
I need cash to remain at 9000.0 until my holdings goes back to 0.0 again
Here are my calculations
In other words, how would you calculate cash so that it remains at 9000.0 until holdings goes back to 0.0
Here is how I want it to look like
'holdings' & 'cash' & 'total'
0.0 10000.0 10000.0
0.0 10000.0 10000.0
1000 9000.0 10000.0
1500 9000.0 10500.0
2000 9000.0 11000.0
cash = initial_cap - holdings

So I try to rephrase: You start with initial capital 10 and a given sequence of holdings {0, 0, 1, 1.5, 2} and want to create a cashvariable that is 10 whenever cash is 0. As soon as cash increases in an initial period by x, you want cash to be 10 - x until cash equals 0 again.
If this is correct, this is what I would do (the logic of total and all of this is still unclear to me, but this is what you added in the end, so I focus on this).
PS. Providing code to create your sample is considered nice
df = pd.DataFrame([0, 1, 2, 2, 0, 2, 3, 3], columns=['holdings'])
x = 10
# triggers are when cash is supposed to be zero
triggers = df['holdings'] == 0
# inits are when holdings change for the first time
inits = df.index[triggers].values + 1
df['cash'] = 0
for i in inits:
df['cash'][i:] = x - df['holdings'][i]
df['cash'][triggers] = 0
df
Out[339]:
holdings cash
0 0 0
1 1 9
2 2 9
3 2 9
4 0 0
5 2 8
6 3 8
7 3 8

Related

Problem iteration columns and rows Dataframe

Here is my problem :
Let’s say you have to buy and sell two objects with those following conditions:
You buy object A or B if its price goes below 150 (<150) and assuming that you can buy fraction of the object (so decimals are allowed)
If the following day the object is still below 150, then you just keep the object and do nothing
If the object is higher or equal to 150, then you sell the object and take profits
You start the game with 10000$
Here is the DataFrame with all the prices
df=pd.DataFrame({'Date':['2017-05-19','2017-05-22','2017-05-23','2017-05-24','2017-05-25','2017-05-26','2017-05-29'],
'A':[153,147,149,155,145,147,155],
'B':[139,152,141,141,141,152,152],})
df['Date']=pd.to_datetime(df['Date'])
df = df.set_index('Date')
The goal is to return a DataFrame with the number of object for A and B you hold and the number of cash you have left.
If the conditions are met, the allocation for each object is the half of the cash you have if you don’t hold any object (weight =1/2) and is the rest if you already have one object (weight=1)
Let’s look at df first, I will also develop the new data frame that I’m trying to create (let’s call it df_end) :
On 2017-05-19, object A is 153$ and B is 139$ : You buy 35.97 object B (=5000/139) as the price is <150 —> You have 5000$ left in cash.
On 2017-05-22, object A is 147$ and B is 152$ : You buy 34.01 object A (=5000/147) as the price is <150 + You sell 35.97 object B at 152$ as it is >=150 --> You have now 5467,44$ left in cash thanks to the selling of B.
On 2017-05-23, object A is 149$ and B is 141$ : You keep your position on Object A (34.01 object) as it’s still below 150 and you buy 38.77 Object B (=5467.44/141) as the price is <150 —> You have now 0$ left in cash.
On 2017-05-24, object A is 155$ and B is 141$ : You sell 34.01 object A at 155$ as it’s above 150$ and you keep 38.77 Object B as it’s still below 150 —> You have now 5271.55$ left in cash thanks to the selling of A
On 2017-05-25, object A is 145$ and B is 141$: You buy 36.35 object A (5271.55/145) as it’s below 150 and you keep 38.77 Object B as it’s still below 150 —> You have now 0$ in cash
On 2017-05-26, object A is 147$ and B is 152$: You sell 38.77 object B at 152 as it’s above 150 and you keep 36.35 Object A as it’s still below 150 —> You have now 5893.04$ in cash thanks to the selling of Object B
On 2017-05-29, object A is 155$ and B is 152$: You sell 36.35 object A at 155 as it’s above 150 and you do nothing else as B is not below 150 —> You have now 11.527,29$ in cash thanks to the selling of Object A.
Hence, the new dataframe df_end should look like this (this is the Result I am looking for)
A B Cash
Date
2017-05-19 0 35.97 5000
2017-05-22 34.01 0 5467.64
2017-05-23 34.01 38.77 0
2017-05-24 0 38.77 5272.11
2017-05-25 36.35 38.77 0
2017-05-26 36.35 0 5893.04
2017-05-29 0 0 11527.29
My principal problem is that we have to iterate over both rows and columns and this is the most difficult part.
It's been a week that I'm trying to find a solution but I still don't find any idea on that, that is why I tried to explain as clear as possible.
So if somebody has an idea on this issue, you are very welcome.
Thank you so much
You could try this:
import pandas as pd
df=pd.DataFrame({'Date':['2017-05-19','2017-05-22','2017-05-23','2017-05-24','2017-05-25','2017-05-26','2017-05-29'],
'A':[153,147,149,155,145,147,155],
'B':[139,152,141,141,141,152,152],})
df['Date']=pd.to_datetime(df['Date'])
df = df.set_index('Date')
print(df)
#Values before iterations
EntryCash=10000
newdata=[]
holding=False
#First iteration (Initial conditions)
firstrow=df.to_records()[0]
possibcash=EntryCash if holding else EntryCash/2
prevroa=possibcash/firstrow[1] if firstrow[1]<=150 else 0
prevrob=possibcash/firstrow[2] if firstrow[2]<=150 else 0
holding=any(i!=0 for i in [prevroa,prevrob])
newdata.append([df.to_records()[0][0],prevroa,prevrob,possibcash])
#others iterations
for row in df.to_records()[1:]:
possibcash=possibcash if holding else possibcash/2
a=row[1]
b=row[2]
if a>150:
if prevroa>0:
possibcash+=prevroa*a
a=0
else:
a=prevroa
else:
if prevroa==0:
a=possibcash/a
possibcash=0
else:
a=prevroa
if b>150:
if prevrob>0:
possibcash+=prevrob*b
b=0
else:
b=prevrob
else:
if prevrob==0:
b=possibcash/b
possibcash=0
else:
b=prevrob
prevroa=a
prevrob=b
newdata.append([row[0],a,b,possibcash])
holding=any(i!=0 for i in [a,b])
df_end=pd.DataFrame(newdata, columns=[df.index.name]+list(df.columns)+['Cash']).set_index('Date')
print(df_end)
Output:
df
A B
Date
2017-05-19 153 139
2017-05-22 147 152
2017-05-23 149 141
2017-05-24 155 141
2017-05-25 145 141
2017-05-26 147 152
2017-05-29 155 152
df_end
A B Cash
Date
2017-05-19 0.000000 35.971223 5000.000000
2017-05-22 34.013605 0.000000 5467.625899
2017-05-23 34.013605 38.777489 0.000000
2017-05-24 0.000000 38.777489 5272.108844
2017-05-25 36.359371 38.777489 0.000000
2017-05-26 36.359371 0.000000 5894.178274
2017-05-29 0.000000 0.000000 11529.880831
If you want it rounded to two decimals, you can add:
df_end=df_end.round(decimals=2)
df_end:
A B Cash
Date
2017-05-19 0.00 35.97 5000.00
2017-05-22 34.01 0.00 5467.63
2017-05-23 34.01 38.78 0.00
2017-05-24 0.00 38.78 5272.11
2017-05-25 36.36 38.78 0.00
2017-05-26 36.36 0.00 5894.18
2017-05-29 0.00 0.00 11529.88
Slight Differences Final Values
It is slight different to your desired output because sometimes you were rounding the values to two decimals and sometimes you didn't. For example:
In your second row you put:
#second row
2017-05-22 34.01 0 5467.64
That means you used the complete value of object A, first row, that is 35.971223 not 35.97:
35.97*152
Out[120]: 5467.44
35.971223*152
Out[121]: 5467.6258960000005 #---->closest to 5467.64
And at row 3, again you used the real value, not the rounded:
#row 3
2017-05-24 0 38.77 5272.11
#Values
34.013605*155
Out[122]: 5272.108775
34.01*155
Out[123]: 5271.549999999999
And finally, at the last two rows you used the rounded value, I guess, because:
#last two rows
2017-05-26 36.35 0 5893.04
2017-05-29 0 0 11527.29
#cash values
#penultimate row, cash value
38.777489*152
Out[127]: 5894.178328
38.77*152
Out[128]: 5893.040000000001
#last row, cash value
5894.04+(155*36.35)
Out[125]: 11528.29 #---->closest to 11527.29
5894.04+(155*36.359371)
Out[126]: 11529.742505

combine two formats together

I am doing the formatting of a dataframe. I need to do the thousand separator and the decimals. The problem is when I combine them together, only the last one is in effect. I guess many people may have the same confusion, as I have googled a lot, nothing is found.
I tried to use .map(lambda x:('%.2f')%x and format(x,',')) to combine the two required formats together, but only the last one is in effect
DF_T_1_EQUITY_CHANGE_Summary_ADE['Sum of EQUITY_CHANGE'].map(lambda x:format(x,',') and ('%.2f')%x)
DF_T_1_EQUITY_CHANGE_Summary_ADE['Sum of EQUITY_CHANGE'].map(lambda x:('%.2f')%x and format(x,','))
the first result is:
0 -2905.22
1 -6574.62
2 -360.86
3 -3431.95
Name: Sum of EQUITY_CHANGE, dtype: object
the second result is:
0 -2,905.2200000000003
1 -6,574.62
2 -360.86
3 -3,431.9500000000003
Name: Sum of EQUITY_CHANGE, dtype: object
I tried a new way, by using
DF_T_1_EQUITY_CHANGE_Summary_ADE.to_string(formatters={'style1': '${:,.2f}'.format})
the result is:
Row Labels Sum of EQUITY_CHANGE Sum of TRUE_PROFIT Sum of total_cost Sum of FOREX VOL Sum of BULLION VOL Oil Sum of CFD VOL Sum of BITCOIN VOL Sum of DEPOSIT Sum of WITHDRAW Sum of IN/OUT
0 ADE A BOOK USD -2,905.2200000000003 638.09 134.83 15.590000000000002 2.76 0.0 0.0 0 0.0 0.0 0.0
1 ADE B BOOK USD -6,574.62 -1,179.3299999999997 983.2099999999999 21.819999999999997 30.979999999999993 72.02 0.0 0 8,166.9 0.0 8,166.9
2 ADE A BOOK AUD -360.86 235.39 64.44 5.369999999999999 0.0 0.0 0.0 0 700.0 0.0 700.0
3 ADE B BOOK AUD -3,431.9500000000003 190.66 88.42999999999999 11.88 3.14 0.03 2.0 0 20,700.0 -30,000.0 -9,300.0
the result confuses me, as I set the .2f format which is not in effect.
Using the string formatter mini language you can add commas and set the decimals to 2 places using f'{:,.2f}'.
import pandas as pd
df = pd.DataFrame({'EQUITY_CHANGE': [-2905.219262257907,
-6574.619531995241,
-360.85959369471186,
-3431.9499712161164]}
)
df.EQUITY_CHANGE.apply(lambda x: f'{x:,.2f}')
# returns:
0 -2,905.22
1 -6,574.62
2 -360.86
3 -3,431.95
Name: EQUITY_CHANGE, dtype: object
map method is not in-place; it doesn't modify the Series but instead it returns a new one.
So just substitute the result of the map to the old one
Here doc:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html

Filter one DataFrame by unique values in another DataFrame

I have 2 Python Dataframes:
The first Dataframe contains all data imported to the DataFrame, which consists of "prodcode", "sentiment", "summaryText", "reviewText",etc. of all initial Review Data.
DFF = DFF[['prodcode', 'summaryText', 'reviewText', 'overall', 'reviewerID', 'reviewerName', 'helpful','reviewTime', 'unixReviewTime', 'sentiment','textLength']]
which produces:
prodcode summaryText reviewText overall reviewerID ... helpful reviewTime unixReviewTime sentiment textLength
0 B00002243X Work Well - Should Have Bought Longer Ones I needed a set of jumper cables for my new car... 5.0 A3F73SC1LY51OO ... [4, 4] 08 17, 2011 1313539200 2 516
1 B00002243X Okay long cables These long cables work fine for my truck, but ... 4.0 A20S66SKYXULG2 ... [1, 1] 09 4, 2011 1315094400 2 265
2 B00002243X Looks and feels heavy Duty Can't comment much on these since they have no... 5.0 A2I8LFSN2IS5EO ... [0, 0] 07 25, 2013 1374710400 2 1142
3 B00002243X Excellent choice for Jumper Cables!!! I absolutley love Amazon!!! For the price of ... 5.0 A3GT2EWQSO45ZG ... [19, 19] 12 21, 2010 1292889600 2 4739
4 B00002243X Excellent, High Quality Starter Cables I purchased the 12' feet long cable set and th... 5.0 A3ESWJPAVRPWB4 ... [0, 0] 07 4, 2012 1341360000 2 415
The second Dataframe is a grouping of all prodcodes and the ratio of sentiment score / all reviews made for that product. It is the ratio for that review score over all reviews scores made, for that particular product.
df1 = (
DFF.groupby(["prodcode", "sentiment"]).count()
.join(DFF.groupby("prodcode").count(), "prodcode", rsuffix="_r"))[['reviewText', 'reviewText_r']]
df1['result'] = df1['reviewText']/df1['reviewText_r']
df1 = df1.reset_index()
df1 = df1.pivot("prodcode", 'sentiment', 'result').fillna(0)
df1 = round(df1 * 100)
df1.astype('int')
sorted_df2 = df1.sort_values(['0', '1', '2'], ascending=False)
which produces the following DF:
sentiment 0 1 2
prodcode
B0024E6QOO 80.0 0.0 20.0
B000GPV2QA 67.0 17.0 17.0
B0067DNSUI 67.0 0.0 33.0
B00192JH4S 62.0 12.0 25.0
B0087FSA0C 60.0 20.0 20.0
B0002KM5L0 60.0 0.0 40.0
B000DZBP60 60.0 0.0 40.0
B000PJCBOE 60.0 0.0 40.0
B0033A5PPO 57.0 29.0 14.0
B003POL69C 57.0 14.0 29.0
B0002Z9L8K 56.0 31.0 12.0
What I am now trying to do filter my first dataframe in two ways. The first, by the results of the second dataframe. By that, I mean I want the first dataframe to be filtered by the prodcode's from the second dataframe where df1.sentiment['0'] > 40. From that list, I want to filter the first dataframe by those rows where 'sentiment' from the first dataframe = 0.
At a high level, I am trying to obtain the prodcode, summaryText and reviewText in the first dataframe for Products that had high ratios in lower sentiment scores, and whose sentiment is 0.
Something like this :
assuming all the data you need is in df1 and no merges are needed.
m = list(DFF['prodcode'].loc[DFF['sentiment'] == 0] # create a list matching your criteria
df.loc[(df['0'] > 40) & (df['sentiment'].isin(m)] # filter according to your conditions
I figured it out:
DF3 = pd.merge(DFF, df1, left_on='prodcode', right_on='prodcode')
print(DF3.loc[(DF3['0'] > 50.0) & (DF3['2'] < 50.0) & (DF3['sentiment'].isin(['0']))].sort_values('0', ascending=False))

Type specific output expected using pandas

I calculated changbtwread column using this code below for multiple type.
for v in df['Type'].unique():
df[f'Changebetweenreadings_{v}'] = df.loc[df['Type'].eq(v), 'Last'].diff()
Given
Type Last changbtwread_ada changbtwread_btc changbtwread_eur
0 ada 3071.56 NaN NaN NaN
1 ada 3097.82 26.26 NaN NaN
2 btc 1000.00 NaN NaN NaN
3 ada 2000.00 -1097.82 NaN NaN
4 btc 3000.00 NaN 2000.0 NaN
5 eur 1000.00 NaN NaN NaN
6 eur 1500.00 NaN NaN 500.0
Now that i need to calculate direction column based on these changebtw column.
My output should look like
Type change_dir_ada change_dir_btc change_dir_eur
ada Nut
ada Pos
btc Nut
ada Neg
btc Nut
eur
eur Pos
A quick fix i tried is using this code.
df.loc[df.Changebetweenreadings_btceur > 0, 'ChangeDirection_btceur'] = 'Pos'
df.loc[df.Changebetweenreadings_btceur < 0, 'ChangeDirection_btceur'] = 'Neg'
df.loc[df.Changebetweenreadings_btceur == 0, 'ChangeDirection_btceur'] = 'Nut'
df.loc[df.Changebetweenreadings_adabtc > 0, 'ChangeDirection_adabtc'] = 'Pos'
df.loc[df.Changebetweenreadings_adabtc < 0, 'ChangeDirection_adabtc'] = 'Neg'
df.loc[df.Changebetweenreadings_adabtc == 0, 'ChangeDirection_adabtc'] = 'Nut'
But i this is a lot of code and its not a dynamic way of doing i think.
I expect something like this.
for v in df['Type'].unique():
df[f'Changebetweenreadings_{v}'] #--> Do this calculation above.
It doesn't work for these values
change type dir_ada dir_btc
-3637.31 ada
-4E-08 ada Neg
-3637.31 ada Nut
3637.8 btc Nut
In place of Pos it gives random mapping.
I believe you need:
vals = ['Pos','Neg', 'Nut']
for v in df['Type'].unique():
df[f'change_dir_{v}'] = df.loc[df['Type'].eq(v), 'Last'].diff()
df[f'change_dir_{v}'] = np.select([df[f'change_dir_{v}'] > 0,
df[f'change_dir_{v}'] < 0,
df[f'change_dir_{v}']== 0], vals, '')
print (df)
Type Last change_dir_ada change_dir_btc change_dir_eur
0 ada 3071.56
1 ada 3097.80 Pos
2 btc 1000.00
3 ada 2000.00 Neg
4 btc 3000.00 Pos
5 eur 1000.00
6 eur 1500.00 Pos

Back Filling Dataframe

I have a dataframe with 3 columns. Something like this:
Data Initial_Amount Current
31-01-2018
28-02-2018
31-03-2018
30-04-2018 100 100
31-05-2018 100 90
30-06-2018 100 80
I would like to populate the prior rows with the Initial Amount as such:
Data Initial_Amount Current
31-01-2018 100 100
28-02-2018 100 100
31-03-2018 100 100
30-04-2018 100 100
31-05-2018 100 90
30-06-2018 100 80
So find the:
First non_empty row with Initial Amount populated
use that to backfill the initial Amounts to the starting date
If it is the first row and current is empty then copy Initial_Amount, else copy prior balance.
Regards,
Pandas fillna with fill method 'bfill' (uses next valid observation to fill gap) should do what you're looking for:
In [13]: df.fillna(method='bfill')
Out[13]:
Data Initial_Amount Current
0 31-01-2018 100.0 100.0
1 28-02-2018 100.0 100.0
2 31-03-2018 100.0 100.0
3 30-04-2018 100.0 100.0
4 31-05-2018 100.0 90.0
5 30-06-2018 100.0 80.0

Categories

Resources