I'm fetching some stock market data every seconds and need to print volume for just one iteration or second and then explicitly wait for 60 seconds and then do it
infinitely.
For example if i start at 9:00 AM then using first tick['exchange_timestamp'] as start_time i'd like to iterate over time or timedelta package of python such that it should give me the volume from the stock data after a interval of 1 min means at 9:01 AM and then doing it infinitely.
The code is like this currently:
def ROC(self,df,tick):
global start_time
global timespan
start_time = tick['exchange_timestamp']
prev_volume = tick['volume_traded']
timespan = start_time + timedelta(seconds = int(60))
for index, row in df.iterrows():
if start_time < timespan :
prev_volume = tick['volume_traded']
This should run for just one iteration then it should wait for a minute and do it continuously for infinite until runtime.
Please anyone can help me by solving this using some loops over time and
datetime package of python.
And i'm running this code as a method inside a class so tried with multithreading and multiprocessing but it didn't helped me in any ways so please try to solve this issue only with time and datetime package of python.
Thanks. :)
I suggest using a Generator Function that will implement your own custom iteration logic.
The next example illustrates an iteration logic of a 60-sec delay between iterations.
This way, you can add future iteration logic to the same generator, while leaving the principal loop simple. As seen in the example below, I added a circular looping and a global stop flag as such enhancements.
Example:
import time
import pandas as pd
demo_cnt = 0
data = {
"tickers" : ['NASDAQ:GOOG','NASDAQ:AMZN','NASDAQ:MSFT','NASDAQ:AAPL', 'NASDAQ:META'],
"values": [101.07, 112.62, 236.80, 141.62, 131.30]
}
# suggested generator function
def cyclic_dataframe_generator(df: pd.DataFrame):
index = 0
global cyclic_gen_stop
cyclic_gen_stop = False
while True:
yield index, df.iloc[[index]]
time.sleep(60)
index += 1
if index > df.shape[0] - 1:
index = 0
if cyclic_gen_stop:
break
#load data into a DataFrame object:
df = pd.DataFrame(data)
# loop with custom iteration logic
for index, df_row in cyclic_dataframe_generator(df):
print (f'index={index}\n{df_row}\n')
demo_cnt += 1
if demo_cnt >= 10:
cyclic_gen_stop = True
Output:
index=0
tickers values
0 NASDAQ:GOOG 101.07
index=1
tickers values
1 NASDAQ:AMZN 112.62
index=2
tickers values
2 NASDAQ:MSFT 236.8
index=3
tickers values
3 NASDAQ:AAPL 141.62
index=4
tickers values
4 NASDAQ:META 131.3
index=0
tickers values
0 NASDAQ:GOOG 101.07
index=1
tickers values
1 NASDAQ:AMZN 112.62
index=2
tickers values
2 NASDAQ:MSFT 236.8
index=3
tickers values
3 NASDAQ:AAPL 141.62
index=4
tickers values
4 NASDAQ:META 131.3
I'm creating stock trading signals based on price and indicator data. I'm wondering what the most efficient way to add indicators to a frame of price data, with the catch that I can fill NaN, null, or similar when the frame length is not long enough to support the indicator.
Here's what I'm doing now:
watchlistname = 'Russell_3000'
universe_symbols = norgatedata.watchlist_symbols(watchlistname)
print(f'Looking at {len(universe_symbols)} symbols in recent_ipo')
# print(universe_symbols)
# signal code
count = 0
signal = []
for symbol in universe_symbols:
try:
count += 1
if(count%300 == 0):
print(f'We have looked at {count} symbols.')
symbol_data = get_data(symbol)
if count == 1:
print(f'Looking at data on {symbol_data.iloc[-(test_days_back + 1), symbol_data.columns.get_loc("Date")]}')
if test_days_back != 0:
symbol_data = symbol_data[:-test_days_back]
ATR_length = 10
liquidity_ma_length = 10
ROC_Length = 40
HHV_Length = 10
ROC = 'ROC_' + str(ROC_Length)
ATR = 'ATRr_' + str(ATR_length)
roc_short = symbol_data.ta.roc(length=ROC_Length)
atr = symbol_data.ta.atr(length=ATR_length)
symbol_data = pd.concat([symbol_data, roc_short, atr], axis=1)
symbol_data['HHV'] = symbol_data['Close'].rolling(HHV_Length).max()
symbol_data['C*V'] = symbol_data['Close'] * symbol_data['Volume']
symbol_data['Liquidity'] = symbol_data['C*V'].rolling(liquidity_ma_length).min()
symbol_data.insert(0, 'bar_num', range(0, 0 + len(symbol_data)))
# print(symbol_data)
symbol_data = symbol_data.drop(columns=['Turnover', 'Unadjusted Close', 'Dividend', 'C*V'])
symbol_data = symbol_data.rename(columns={ROC:'ROC', ATR: 'ATR', 'Open':'open', 'High':'high',
'Low':'low', 'Close':'close', 'Volume':'volume',
'Liquidity':'liquidity'})
signal.append(symbol_data.iloc[-1])
current_trade_date = symbol_data.iloc[-1, symbol_data.columns.get_loc('Date')]
previous_trade_date = symbol_data.iloc[-2, symbol_data.columns.get_loc('Date')]
except Exception as e:
print(symbol, "+", e)
signal_data = pd.DataFrame(signal)
signal_data = signal_data.reset_index(drop=True)
This code does:
Creates a list of equities to loop through
Runs a for-loop for each equity
Gets OHLC data for the current symbol
Adds indicators to it.
Appends the last line of the OHLC frame to a list. this last line is the latest data and represents the current signals.
Take this list of 'last lines' to create a dataframe of all of the equities and indicator values associated with these equities.
I use a try/except block to handle the event that I pass a dataframe with, say, 15 rows of historical data. This short dataframe would raise an error as the indicators cannot be calculated.
In the current iteration, if one indicator is not able to be added(due to not enough dataframe length or otherwise), we skip the entire symbol. I would like to change this to add all indicators that can be calculated or fill with NaN, null, etc.
My leading idea is to create a function for each indicator, and to provide a check on the dataframe length before calculating the values, or even a try/except block for each indicator. My concern is this doesn't seem very scalable, the full system will calculate hundreds of indicators on each symbol.
Am I on the right track? Am I missing anything? This seems inefficient.
Let's say I have a Pandas Dataframe of the price and stock history of a product at 10 different points in time:
df = pd.DataFrame(index=[np.arange(10)])
df['price'] = 10,10,11,15,20,10,10,11,15,20
df['stock'] = 30,20,13,8,4,30,20,13,8,4
df
price stock
0 10 30
1 10 20
2 11 13
3 15 8
4 20 4
5 10 30
6 10 20
7 11 13
8 15 8
9 20 4
How do I perform operations between specific rows that meet certain criteria?
In my example row 0 and row 5 meet the criteria "stock over 25" and row 4 and row 9 meet the criteria "stock under 5".
I would like to calculate:
df['price'][4] - df['price'][0] and
df['price'][9] - df['price'][5]
but not
df['price'][9] - df['price'][0] or
df['price'][4] - df['price'][5].
In other words, I would like to calculate the price change between the most recent event where stock was under 5 vs the most recent event where stock was over 25; over the whole series.
Of course, I would like to do this over larger datasets where picking them manually is not good.
First, set up data frame and add some calculations:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=[np.arange(10)])
df['price'] = 10,10,11,15,20,10,10,11,15,20
df['stock'] = 30,20,13,8,4,30,20,13,8,4
df['stock_under_5'] = df['stock'] < 5
df['stock_over_25'] = df['stock'] > 25
df['cum_stock_under_5'] = df['stock_under_5'].cumsum()
df['change_stock_under_5'] = df['cum_stock_under_5'].diff()
df['change_stock_under_5'].iloc[0] = df['stock_under_5'].iloc[0]*1
df['next_row_change_stock_under_5'] = df['change_stock_under_5'].shift(-1)
df['cum_stock_over_25'] = df['stock_over_25'].cumsum()
df['change_stock_over_25'] = df['cum_stock_over_25'].diff()
df['change_stock_over_25'].iloc[0] = df['stock_over_25'].iloc[0]*1
df['next_row_change_stock_over_25'] = df['change_stock_over_25'].shift(-1)
df['row'] = np.arange(df.shape[0])
df['next_row'] = df['row'].shift(-1)
df['next_row_price'] = df['price'].shift(-1)
Next we find all windows where either the stock went over 25 or below 5 by grouping over the cumulative marker of those events.
changes = (
df.groupby(['cum_stock_under_5', 'cum_stock_over_25'])
.agg({'row':'first', 'next_row':'last', 'change_stock_under_5':'max', 'change_stock_over_25':'max',
'next_row_change_stock_under_5':'max', 'next_row_change_stock_over_25':'max',
'price':'first', 'next_row_price':'last'})
.assign(price_change = lambda x: x['next_row_price'] - x['price'])
.reset_index(drop=True)
)
For each window we find what happened at the beginning of the window: if change_stock_under_5 = 1 it means the window started with the stock going under 5, if change_stock_over_25 = 1 it started with the stock going over 25.
Same for the end of the window using the columns next_row_change_stock_under_5 and next_row_change_stock_over_25
Now, we can readily extract the stock price change in rows where the stock went from being over 25 to being under 5:
from_over_to_below = changes[(changes['change_stock_over_25']==1) & (changes['next_row_change_stock_under_5']==1)]
and the other way around:
from_below_to_over = changes[(changes['change_stock_under_5']==1) & (changes['next_row_change_stock_over_25']==1)]
You can for example calculate the average price change when the stock went from over 25 to below 5:
from_over_to_below.price_change.mean()
In order to give a better explanation, will separate the approach by creating two different functions:
The first one will be the event detection, let's call it detect_event.
The second one will calculate the the price between the current event and the previous one, in the list generated by the first function. We will call it calculate_price_change.
Starting with the first function, here it is key to understand very well the goals we want to reach or the constraints/conditions we want to satisfy.
Will leave two, of more, potential options, given the various interpretations of the question:
A. The initial is what I could get from my initial understanding
B. The second part will be one of the interpretations one could get from #Iyar Lyn comment (I can see more interpretations, but won't consider in this answer as the approach will be similar).
Within option A, we will create a function to detect where a stock is under 5 or 25
def detect_event(df):
# Create a list of the indexes of the events where stock was under 5 or over 25
events = []
# Loop through the dataframe
for i in range(len(df)):
# If stock is under 5, add the index to the list
if df['stock'][i] < 5:
events.append(i)
# If stock is over 25, add the index to the list
elif df['stock'][i] > 25:
events.append(i)
# Return the list of indexes of the events where stock was under 5 or over 25
return events
The comments make it self-explanatory, but, basically, this will return a list of indexes of the rows where stock is under 5 or over 25.
With OP's df this will return
events = detect_event(df)
[Out]:
[0, 4, 5, 9]
Within the option B, assuming one wants to know the events where the stock went from under 5 to over 25, and vice-versa, consecutively (there are more ways to interpret this), then one can use the following function
def detect_event(df):
# Create a list of the indexes of the events where we will store the elements in the conditions
events = []
for i, stock in enumerate(df['stock']):
# If the index is 0, add the index of the first event to the list of events
if i == 0:
events.append(i)
# If the index is not 0, check if the stock went from over 25 to under 5 or from under 5 to over 25
else:
# If the stock went from over 25 to under 5, add the index of the event to the list of events
if stock < 5 and df['stock'][i-1] > 25:
events.append(i)
# If the stock went from under 5 to over 25, add the index of the event to the list of events
elif stock > 25 and df['stock'][i-1] < 5:
events.append(i)
# Return the list of events
return events
With OP's df this will return
events = detect_event(df)
[Out]:
[0, 5]
Note that 0 is the element in the first position, that we are appending by default.
As for the second function, once the conditions are well defined, meaning we know clearly what we want, and adapted the first function, detect_event, accordingly, we can now detect the changes in the prices.
In order to detect the price change between the events that satisfy the conditions we defined previously, one will use a different function: calculate_price_change.
This function will take both the dataframe df and the list events generated by the previous function, and return a list with the prices diferences.
def calculate_price_change(df, events):
# Create a list to store the price change between the most recent event where stock was under 5 vs the most recent event where stock was over 25
price_change = []
# Loop through the list of indexes of the events
for i, event in enumerate(events):
# If the index is 0, the price change is 0
if i == 0:
price_change.append(0)
# If the index is not 0, calculate the price change between the current and past events
else:
price_change.append(df['price'][event] - df['price'][events[i-1]])
return price_change
Now we if one calls this last function using the df and the list created with the first function detect_event, one gets the following
price_change = calculate_price_change(df, events)
[Out]:
[0, 10, -10, 10]
Notes:
As it is, the question gives room for multiple interpretations. That's why my initial flag for "Needs details or clarity". For the future one might want to review: How do I ask a good question?
and its hyperlinks.
I understand that sometimes we won't be able to specify everything that we want (as we might not even know - due to various reasons), so communication is key. Therefore, appreciate Iyar Lin's time and contributions as they helped improve this answer.
I'm training a binary classifier to predict whether a certain sequence of industrial log events ends up in an error or not.
For each error, I need to capture the events that happened in the hour before the error-event. I'm using a pandas DataFrame and converted the time with pd.to_datetime() so I ended up with a Year/Month/Day/Hour/Minute/Second column, which is not the index of the dataframe.
Things I tried are pulling out the corresponding hours and minutes with this code below
hours = data2.event_timestamp.apply(lambda x: x.hour)
minutes = data2.event_timestamp.apply(lambda x: x.minute)
I managed to loop over the dataset and capture a fixed amount of events, disregarding time, that happen before the error with this code:
dataarray = []
for index, row in data2.iterrows():
array = np.asarray(row)
dataarray.append(array)
listwitheventswithnoerror = []
listwitheventswitherror = []
"""-----------------------------------------------------------------"""
for index, array in enumerate(dataarray):
if index > 50:
if array[1] == 0: # 0 is for non-errors
sample = dataarray[index-50:index]
listwitheventswitherror.append(sample)
for index, array in enumerate(dataarray):
if index > 50:
if array[1] != 0: #non zero is for errors
sample = dataarray[index-50:index]
listwitheventswithnoerror.append(sample)
I can't seem to grasp how I can change this code to instead of taking 50 events, take the events that happen in the hour before, regarding the time column. Help would be much appreciated.
This is my first ever question on here, so please forgive me if I don't explain it clearly, or overexplain. The task is to turn a for loop that contained 2 if statements in to dataframe.apply instead of the loop. I thought the way of doing it was turning the if statements inside the for loop into a defined function, then calling the function in the .apply line, but can only get so far. Not even sure I am trying to tackle this the right way. can provide original For loop code if necessary. Thanks in advance.
The goal is to import a csv of stock prices, compare the prices in one column to a moving average, which needed to be created, and if > MA, buy, if < MA, sell. Keep track of all buy/sells and determine overall wealth/return at the end. It worked as a for loop: for each x in prices, use the 2 if's, append prices to a list to determine ending wealth. I think I get to the point where I am to call the defined function into the .apply line, and errors out. In my code below there may still be some unnecessary lingering code from the for loop usage, but shouldn't interfere with the .apply attempt, just makes for messy coding until I figure it out.
df2 = pd.read_csv("MSFT.csv", index_col=0, parse_dates=True).sort_index(axis=0 ,ascending=True) #could get yahoo to work but not quandl, so imported the csv file from class
buyPrice = 0
sellPrice = 0
maWealth = 1.0
cash = 1
stock = 0
sma = 200
ma = np.round(df2['AdjClose'].rolling(window=sma, center=False).mean(), 2) #to create the moving average to compare to
n_days = len(df2['AdjClose'])
closePrices = df2['AdjClose'] #to only work with one column from original csv import
buy_data = []
sell_data = []
trade_price = []
wealth = []
def myiffunc(adjclose):
if closePrices > ma and cash == 1: # Buy if stock price > MA & if not bought yet
buyPrice = closePrices[0+ 1]
buy_data.append(buyPrice)
trade_price.append(buyPrice)
cash = 0
stock = 1
if closePrices < ma and stock == 1: # Sell if stock price < MA and if you have a stock to sell
sellPrice = closePrices[0+ 1]
sell_data.append(sellPrice)
trade_price.append(sellPrice)
cash = 1
stock = 0
wealth.append(1*(sellPrice / buyPrice))
closePrices.apply(myiffunc)
Checking the docs for apply, it seems like you need to use the index=1 version to process each row at a time, and pass two columns: the moving average and the closing price.
Something like this:
df2 = ...
df2['MovingAverage'] = ...
have_shares = False
def my_func(row):
global have_shares
if not have_shares and row['AdjClose'] > row['MovingAverage']:
# buy shares
have_shares = True
elif have_shares and row['AdjClose'] < row['MovingAverage']:
# sell shares
have_shares = False
However, it's worth pointing out that you can do the comparisons using numpy/pandas as well, just storing the results in another column:
df2['BuySignal'] = (df2.AdjClose > df2.MovingAverage)
df2['SellSignal'] = (df2.AdjClose < df2.MovingAverage)
Then you could .apply() a function that made use of the Buy/Sell signal columns.