I was wondering if there is a more efficient/cleaner way of doing the following. Say I have a dataframe that contains 2 columns, the percentage, (base on previous price) and the action, play/buy (1) or not play/sell (-1). Its basically about stocks.
For simplicity, consider the example df:
Percent Action
1.25 1
1.20 1
0.50 -1
0.75 1
I would like to generate the following. I only care about the final money amount, I am just showing this table for reference. Say we started with $100 and a state of not playing. Thus we should get the money amount of:
Playing Percent Action Money
No 1.25 1 $100
Yes 1.20 1 $120
Yes 0.50 -1 $60
No 0.75 1 $60
Yes ... ... ...
The amount didnt change in the first row since we weren't playing yet. Since the action is 1, we will play the next one. The percentage went up 20%, thus we get $120. The next action is still a 1, so we'll still be in the next one. The percentage went down to 50% so we end up with $60. Next action is -1, thus we will not play. The percentage went down to 75%, but since we weren't playing, our money stayed the same. And so on.
Currently, I have the code below. It works fine, but just wondering if there is a more efficient way using numpy/pandas functions. Mine basically iterates through each row and calculate the value.
playing = False
money = 10000
for index, row in df.iterrows():
## UPDATE MONEY IF PLAYING
if index > 0 and playing == True:
money = float(format(money*row['Percent'],'.2f'))
## BUY/SELL
if row['Action'] == 1:
if playing == False:
playing = True ## Buy, playing after this
elif row['Action'] == -1:
if playing == True:
playing = False ## Sell, not playing after this
You could try this:
# decide whether to play based on action
df['Playing'] = df.Action.shift().eq(1)
# replace Percent for not playing row with 1 and then calculate the cumulative product
df['Money'] = '$' + df.Percent.where(df.Playing, 1).cumprod().mul(100).astype(str)
df
#Percent Action Playing Money
#0 1.25 1 False $100.0
#1 1.20 1 True $120.0
#2 0.50 -1 True $60.0
#3 0.75 1 False $60.0
Related
I'm trying to come up with a formula to calculate the average entry/position price to further update my stop loss and take profit.
For example opened BTC buy position with amount of 1 when price was 20000.
Later when price dropped down to 19000 we made another buy using the same amount of 1, "avereging" the position to the middle, so end up with position at 19500 with amount of 2.
Where I'm struggling is what if we want to increase the order size on each price.
Say 1 at 20000, 1.5 at 19500, 2 at 19000 and so on.
Or made new buys of the same amount but shorter distance between.
Inital buy at 20000. then 19000 then 19150
Or combine these two variants.
I use mainly Python and Pandas. Maybe the latter one has some built-in function which I'm not aware of. I checked the official Pandas docs, but found only regular mean function.
Thanks to Yuri's suggestion to look into VWAP, I came up with the following code, which is more advanced and allows you to use different contract/volume sizes and increase/decrease "distance" between orders.
As an example here I used avarage price of BTC 20000 and increased steps distance using 1.1 multiplier as well as increased volume. Operated in Binance futures terms, where you can buy minimum 1 contract for 10$.
The idea is to find sweet spot for orders distance, volume, stop loss and take profit while avereging down.
# initial entry price
initial_price = 20000
# bottom price
bottom_price = 0
# enter on every 5% price drop
step = int(initial_price*0.05)
# 1.1 to increase distance between orders, 0.9 to decrease
step_multiplier = 1.1
# initial volume size in contracts
initial_volume = 1
# volume_multiplier, can't be less than 1, in case of use float, will be rounded to decimal number
volume_multiplier = 1.1
# defining empty arrays
prices = []
volumes = []
# checking if we are going to use simple approach with 1 contract volume and no sep or volume multiplier
if step_multiplier == 1 and volume_multiplier == 1:
prices = range(initial_price,bottom_price,-step)
else:
# defining current price and volume vars
curr_price = initial_price
curr_volume = initial_volume
# Checking if current price is still bigger then defined bottom price
while curr_price > bottom_price:
# adding current price to the list
prices.append(curr_price)
# calulating next order price
curr_price = curr_price-step*step_multiplier
# checking if volume multiplier is bigger then 1
if volume_multiplier > 1:
# adding current volume to the list
volumes.append(int(curr_volume))
# calulating next order volume
curr_volume = curr_volume*volume_multiplier
print("Prices:")
for price in prices:
print(price)
print("Volumes:")
for volume in volumes:
print(volume)
print("Prices array length", len(prices))
print("Volumes array length", len(volumes))
a = [item1 * item2 for item1, item2 in zip(prices, volumes)]
b = volumes
print("Average position price when price will reach",prices[-1], "is", sum(a)/sum(b))
I am working with some stock data in pandas and would like to count the number of times (or cycles) where the price is within a certain entry and exit price point.
For example, I have set:
entry price = 136.3
exit price = 136.6
My entry price is $136.3 and my exit price is $136.6. I want to begin a cycle every time the value approaches $136.3 from below $136.3, and close that cycle when the price approaches $136.6 from below $136.6.
For example, using the dataframe in the screenshot below, we will have:
a cycle begins at Timestamp 1656682503 ($136.04) and ends at Timestamp 1656682802 ($137.15)
a cycle begins at Timestamp 1656682803 ($136.24) and ends at Timestamp 1656682804 ($136.65)
So, the cycle only begins if it crosses the entry price, and only ends if it crosses the exit price. The fluctuations in between are ignored since they never cross the entry and exit price points.
Basically, I want to be able to build a count to count the number of cycles. In this case, the count = 2.
I have thought of using .sum(), .cumsum() or .groupby(), but am honestly lost in the sauce. Any help would be appreciated.
a janky hack would be to write a function that sorts by timestamp, then iterates over the dataset, setting some variable 'switch' to indicate if you're in or out of a cycle, and adding 0.5 counts to another variable each time you switch, or adding 1 count per 2 switches
def counter(input, entry, exit, start_in_cycle=False):
count = 0
if start_in_cycle==False:
cycle = False
elif start_in_cycle==True:
cycle = True
temp = input.sort_values('Timestamp', ignore_index=True)
for x in temp['value']:
if cycle == False:
if x >= entry:
cycle == True
else:
pass
if cycle == True:
if x <= exit:
cycle == False
count += 1
else:
pass
return count
count = counter(input_df, 136.3, 136.6, False)
Give this a try. This one only counts when a cycle is exited, and would count the first exit if the timeseries starts mid-cycle. Not sure if cycles would flip on >= and <= entry and exit, but you can adjust based on need.
I have a dataframe similar to the one below;
Price
return
indicator
5
0.05
1
6
0.20
-1
5
-0.16
1
Where the indicator is based upon the forecasted return on the following day.
what I would like to achieve is a strategy where when the indicator is positive 1, I buy the stock at the price on that date/row. Then if the indicator is negative we sell at that price. Then I would like to create a new column with represents the value of the portfolio on each day. Assuming I have $1000 to invest the value of the portfolio should equal the holdings and cash amount. Im assuming that any fraction of Stock can be purchased.
Im unsure where to start with this one. I tried calculating a the Buy/Hold strategy using;
df['Holding'] = df['return'].add(1).cumprod().*5000
this worked for a buy hold strategy but to modify it to the new strategy seems difficult.
I tried;
df['HOLDINg'] = (df['return'].add(1).cumprod()* 5000 * df['Indicator])
#to get the value of the buy or the sell
#then using
df['HOLDING'] = np.where(df['HOLDING'] >0, df['HOLDING'] , df['HON HOLDING 2']*-1)
#my logic was, if its positive its the value of the stock holding, and if its negative it is a cash inflow therefore I made it positive as it would be cash.
the issue is, my logic is massively flawed, as if the holding is cash the return shouldn't apply to it. further I don't think using the cumprod is correct with this strategy.
Has anyone used this strategy before and can offer tips of how to make it work?
thank you
I'm not sure about the returns and prices being in the correct place (they shouldn't really be in the same row if they represent the buying price (presumably yesterday's close), and the daily return (assuming the position was held for the whole day). But anyway...
import pandas as pd
# the data you provided
df = pd.read_csv("Data.csv", header=0)
# an initial starting row (explanation provided)
starting = pd.DataFrame({'Price': [0], 'return': [0], 'indicator': [0]})
# concatenate so starting is first row
df = pd.concat([starting, df]).reset_index(drop=True)
# setting holding to 0 at start (no shares), and cash at 1000 (therefore portfolio = 1000)
df[["Holding", "Cash", "Portfolio"]] = [0, 1000, 1000]
# buy/sell is the difference (explanation provided)
df["BuySell"] = df["indicator"].diff()
# simulating every day
for i in range(1, len(df)):
# buying
if df["BuySell"].iloc[i] > 0:
df["Holding"].iloc[i] += df["Cash"].iloc[i-1] / df["Price"].iloc[i]
df["Cash"].iloc[i] = 0
# selling
elif df["BuySell"].iloc[i] < 0:
df["Cash"].iloc[i] = df["Holding"].iloc[i-1] * df["Price"].iloc[i]
df["Holding"].iloc[i] = 0
# holding position
else:
df["Cash"].iloc[i] = df["Cash"].iloc[i-1]
df["Holding"].iloc[i] = df["Holding"].iloc[i-1]
# multiply holding by return (assuming all-in, so holding=0 not affected)
df["Holding"].iloc[i] *= (1 + df["return"].iloc[i])
df["Portfolio"].iloc[i] = df["Holding"].iloc[i] * df["Price"].iloc[i] + df["Cash"].iloc[i]
Explanations:
Starting row:
This is needed so that the loop can refer to the previous holdings and cash (it would be more of an inconvenience to add in an if statement in the loop if i=0).
Buy/Sell:
The difference is necessary here, as if the position changes from buy to sell, then obviously selling the shares (and vice versa). However, if the previous was buy/sell, the same as the current row, there would be no change (diff=0), with no shares bought or sold.
Portfolio:
This is an "equivalent" amount (the amount you would hold if you converted all shares to cash at the time).
Holding:
This is the number of shares held.
NOTE: from what I understood of your question, this is an all-in strategy - there is no percentage in, which has made this strategy more simplistic, but easier to code.
Output:
#Out:
# Price return indicator Holding Cash Portfolio BuySell
#0 0 0.00 0 0.00 1000 1000.0 NaN
#1 5 0.05 1 210.00 0 1050.0 1.0
#2 6 0.20 -1 0.00 1260 1260.0 -2.0
#3 5 -0.16 1 211.68 0 1058.4 2.0
Hopefully this will give you a good starting point to create something more to your specification and more advanced, such as with multiple shares, or being a certain percentage exposed, etc.
Dataframe
(Disregard the two index columns)
level_0
index
Year
Month
Day
Open
High
Low
Close
Volume
Length
Polarity
Sentiment_Negative
Sentiment_Neutral
Sentiment_Positive
Target_variable
Predicted
0
0
0
2020
1
19
8941.45
9164.36
8620.08
8706.25
3.42173e+10
937.167
0.0884653
0
0
1
0
0
1
1
1
2020
1
18
8927.21
9012.2
8827.33
8942.81
3.23378e+10
1177.5
0.176394
0
0
1
1
1
2
2
2
2020
1
17
8725.21
8958.12
8677.32
8929.04
3.63721e+10
1580
0.216762
0
0
1
0
0
3
3
3
2020
1
16
8812.48
8846.46
8612.1
8723.79
3.1314e+10
1336.33
0.182707
0
0
1
0
0
Description
The value of the target_variable is 1 if todays closing price is greater than yesterdays closing price
The value of the target_variable is 0 if todays closing price is less than yesterdays closing price
The predicted value is the output of my classifier.
Problem
I need to run some code that tracks how much money is gained if I invest when the classifier tells me to invest
I have started to code this
credit = 10000
for index, row in df.iterrows():
if row["Predicted"] == 1:
#print(row["Percentage_diff"])
credit = credit - 100
credit = credit + (100 * row["Percentage_diff"])
print(credit)
The idea is that I start off with a balance of 10,000 and invest 100 every time the classifier signals to. The only problem is that when I lose 8000 credits. Is the code correct and the classifier is very poor?
Or have I made an error in the code?
I am not a trading expert, so I assume that every day the classifier tells you to trade, you will buy with the opening price and sell with the close price.
You can start by calculating the percentage of profit or loss when the classifier tells you to trade. You can do that by subtracting the closing price from the opening and dividing it by the opening price.
df["perc_diff"] = (df["Close"] - df["Open"])/df["open"]
Of course, this will be negative when the classifier is wrong. To compute the cumulative profits/losses, all you want to do is to iteratively add/subtract your profit/loss to your capital. This means at a day with a profit/loss percentage of r, if you invest x dollars, your new credit is (1+r)*x. So a simple for loop can do it like that:
credit = 1 # your capital
for label, row in df.iterrows():
credit = (1 + row["Predicted"] * r) * row["perc_diff"]
print(credit)
Edit to address your updated problem:
If you want to specify an amount to invest rather than all your capital, then you can use this:
credit = 1 # your capital
to_invest = 0.1 # money to invest
for label, row in df.iterrows():
# update invest
invest_update = (1 + row["Predicted"] * row["perc_diff"]) * to_invest
credit = credit - to_invest + invest_update
print(credit)
The last two lines can be combined into one line:
credit = credit + row["Predicted"] * row["perc_diff"] * to_invest
I think the code is correct, and if you lose, then it is probably due to poor performance from your classifier, but this should be evident from your evaluation of the model (like accuracy and precision metrics). Also, if it is a classifier that is not made for time series (e.g. logistic regression), then it is very reasonable that it performs poorly.
Solution
df["Percentage_diff"] = (df["Close"] - df["Open"])/df["Open"]
credit = 10000
for index, row in df.iterrows():
if row["Predicted"] == 1:
#print(row["Percentage_diff"])
credit = credit - 100
credit = credit + ((100 * row["Percentage_diff"]) + 100)
print(credit)
This was the solution thanks to Ahmed.
If I start with an original balance of 10000 every time the classifier signals to invest I invest 100 dollars at opening and withdraw at close this calculates the balance.
This is my first ever question on here, so please forgive me if I don't explain it clearly, or overexplain. The task is to turn a for loop that contained 2 if statements in to dataframe.apply instead of the loop. I thought the way of doing it was turning the if statements inside the for loop into a defined function, then calling the function in the .apply line, but can only get so far. Not even sure I am trying to tackle this the right way. can provide original For loop code if necessary. Thanks in advance.
The goal is to import a csv of stock prices, compare the prices in one column to a moving average, which needed to be created, and if > MA, buy, if < MA, sell. Keep track of all buy/sells and determine overall wealth/return at the end. It worked as a for loop: for each x in prices, use the 2 if's, append prices to a list to determine ending wealth. I think I get to the point where I am to call the defined function into the .apply line, and errors out. In my code below there may still be some unnecessary lingering code from the for loop usage, but shouldn't interfere with the .apply attempt, just makes for messy coding until I figure it out.
df2 = pd.read_csv("MSFT.csv", index_col=0, parse_dates=True).sort_index(axis=0 ,ascending=True) #could get yahoo to work but not quandl, so imported the csv file from class
buyPrice = 0
sellPrice = 0
maWealth = 1.0
cash = 1
stock = 0
sma = 200
ma = np.round(df2['AdjClose'].rolling(window=sma, center=False).mean(), 2) #to create the moving average to compare to
n_days = len(df2['AdjClose'])
closePrices = df2['AdjClose'] #to only work with one column from original csv import
buy_data = []
sell_data = []
trade_price = []
wealth = []
def myiffunc(adjclose):
if closePrices > ma and cash == 1: # Buy if stock price > MA & if not bought yet
buyPrice = closePrices[0+ 1]
buy_data.append(buyPrice)
trade_price.append(buyPrice)
cash = 0
stock = 1
if closePrices < ma and stock == 1: # Sell if stock price < MA and if you have a stock to sell
sellPrice = closePrices[0+ 1]
sell_data.append(sellPrice)
trade_price.append(sellPrice)
cash = 1
stock = 0
wealth.append(1*(sellPrice / buyPrice))
closePrices.apply(myiffunc)
Checking the docs for apply, it seems like you need to use the index=1 version to process each row at a time, and pass two columns: the moving average and the closing price.
Something like this:
df2 = ...
df2['MovingAverage'] = ...
have_shares = False
def my_func(row):
global have_shares
if not have_shares and row['AdjClose'] > row['MovingAverage']:
# buy shares
have_shares = True
elif have_shares and row['AdjClose'] < row['MovingAverage']:
# sell shares
have_shares = False
However, it's worth pointing out that you can do the comparisons using numpy/pandas as well, just storing the results in another column:
df2['BuySignal'] = (df2.AdjClose > df2.MovingAverage)
df2['SellSignal'] = (df2.AdjClose < df2.MovingAverage)
Then you could .apply() a function that made use of the Buy/Sell signal columns.