Beginner Python Stock Screener - python

I try to build a simply stock screener. The screener should download the volume and avg. volume and should give me all stocks with the condition volume > avg. volume.
The Problem is, that something went wrong and i don't know how to fix the code. Its a overall problem in the code and i hope i can get some help. Often i get the message, that the data is empty. But i think there is something wrong with the tables and the conditions...
The first for loop gives me this and this is good, i think there is no mistake in the code
Ticker volume avg. volume
aapl 31 20
ayx 20 32
nflx 25 28
The second for loop is to check my condition ( volume > avg. volume)
but it gives me the following output:
ticker1 ... Stock
0 NaN ... ticker1
1 NaN ... Volume_1
2 NaN ... Average_Volume
[3 rows x 4 columns]
Process finished with exit code 0
normaly only apple fullfills the conditions and it must look like:
Stock volume avg. volume
aapl 31 20
Thats my code:
from yahoo_fin.stock_info import get_analysts_info, get_stats, get_live_price, get_quote_table
import pandas as pd
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
tickers = ['aapl', 'ayx' , 'nflx']
exportList = pd.DataFrame(columns = ['ticker1', 'Volume_1', 'Average_Volume'])
for ticker in tickers:
df = get_stats(ticker)
df['ticker'] = ticker
df = df.pivot(index = 'ticker', columns = 'Attribute', values = 'Value')
df['Volume_1'] = get_quote_table(ticker)['Volume']
df['Average_Volume'] = get_quote_table(ticker)['Avg. Volume']
df = df[['Volume_1', 'Average_Volume']]
df = df.reset_index()
df.columns = ('ticker', 'Volume_1', 'Average_Volume')
rs_df = pd.DataFrame (columns = ['ticker1', 'Volume_1', 'Average_Volume'])
for stock in rs_df:
Volume_1 = df["Volume_1"]
Average_Volume = df["Average_Volume"]
if float(Volume_1) < float(Average_Volume):
exportList = exportList.append({'Stock': stock, "Volume_1": Volume_1, "Average_Volume": Average_Volume},
ignore_index=True)
print('\n', exportList)

Related

Python-pandas : a strange filtering/remove error

I've been using pandas for a while but I am having a really strange issue with simple filtering in pandas.
OS: Mac OS
IDE: VSCode
Pandas version: 1.4.2
I am fetching the latest data from crypto exchanges(via ccxt api) and append them into a dataframe.
limit = 28
timeframe = '1h'
futures_exchange = ccxt.kucoinfutures({
'apiKey' : MY_API_KEY,
'secret' : MY_API_SECRET,
'enableRateLimit': True,
'password': MY_KUCOIN_PASS_PHRASE,
})
all_future_list = [ 'BTC/USDT:USDT', 'BTC/USD:BTC', 'ETH/USDT:USDT', 'BCH/USDT:USDT', 'BSV/USDT:USDT', ]
attempts = 0
while attempts<= 20:
alldf = pd.DataFrame()
try:
for i in all_future_list:
df = futures_exchange.fetchOHLCV(i, limit=limit, timeframe=timeframe)
df = pd.DataFrame(df, columns=[['timestamp', 'open', 'high', 'low', 'close', 'volume']])
df['ticker'] = i
df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
df = df[['timestamp', 'close', 'volume', 'ticker']]
df = df.tail(1)
alldf = pd.concat([alldf, df], ignore_index=True)
time.sleep(0.1)
except:
print(traceback.format_exc())
print(' ___ Network Error, restart fetching data _____ ')
attempts += 1
time.sleep(8)
continue
break
So far so good. Dataframe looks like...
timestamp close volume ticker
0 2022-09-17 10:00:00 19836.00 789336.0 BTC/USDT:USDT
1 2022-09-17 10:00:00 1411.2 982840 ETH/USDT:USDT
2 2022-09-17 10:00:00 120.95 55564.0 BCH/USDT:USDT
However, from there if I want to remove/filter a ticker from dataframe
new_df = alldf[alldf['ticker']!= 'BTC/USDT:USDT']
print(new_df)
Erroneous output:
timestamp close volume ticker
0 NaT NaN NaN NaN
1 NaT NaN NaN ETH/USDT:USDT
2 NaT NaN NaN BCH/USDT:USDT
3 NaT NaN NaN BSV/USDT:USDT
This should be a simple remove/filter but I dont understand why 'timestamp', 'close' and 'volume' columns are NaN
It seems if I write dataframe to a csv and then read it then I can get what I want.
alldf.to_csv('alldf.csv')
alldf = pd.read_csv('alldf.csv',index_col=0)
new_df = alldf[alldf['ticker']!= 'BTC/USDT:USDT']
print(new_df)
Output:
timestamp close volume ticker
1 2022-09-17 10:00:00 1411.2 982840 ETH/USDT:USDT
0 2022-09-17 10:00:00 120.95 55564.0 BCH/USDT:USDT
2 2022-09-17 11:00:00 51.95 3628.0 BSV/USDT:USDT
However, I dont want to write dataframe to csv and read it again just to filter out some tickers.
Can anyone help me ? Not sure whats going on here.
I guess this happens because that this piece of code
df = df[['timestamp', 'close', 'volume', 'ticker']]
df = df.tail(1)
subsets existing dataframe without actually copying the data. Therefore, target dataframe would get reference to original data, and when its changed - it gets broken.
Try this:
df = df[['timestamp', 'close', 'volume', 'ticker']]
df = df.tail(1).copy()
BTW, AFAIK, it's more efficient to keep new records in a dictionary and convert it to a dataframe at the end, rather then merging them one by one in a loop.

Change a multiindex Pandas

I would like to change my index from how it is seen in image 1 to image 2.
Here is the code to get Image 1:
stocks = pd.DataFrame()
Tickers = ['AAPL', 'TSLA', 'IBM', 'MSFT']
for tick in Tickers:
df = web.DataReader(tick, "av-daily", start=datetime(2015, 1, 1),end=datetime.today(),api_key='')
df['Stock'] = tick
stocks = stocks.append(df)
stocks.index = pd.to_datetime(stocks.index)
stocks = stocks.set_index('Stock', append = True)
vol = stocks[[‘volume’]]
weekly = vol.groupby([pd.Grouper(level=0, freq='W', label = 'left'), pd.Grouper(level='Stock')]).sum()
weekly.index.rename(['Date', 'Stock'], inplace = True)
weekly.unstack()
Image 1
Image 2
After you get the stocks DataFrame, do this:
weekly = stocks["volume"].unstack().resample("W").sum()
weekly.index = pd.MultiIndex.from_tuples([(dt.year, dt.week) for dt in weekly.index])
>>> weekly
volume
Stock AAPL IBM MSFT TSLA
2015 1 53204626 5525341 27913852 4764443
2 282868187 24440360 158596624 22622034
3 304226647 23272056 157088136 30799137
4 198737041 31230797 137352632 16215501
5 465842684 32927307 437786778 15720217
... ... ... ...
2021 23 327048055 22042806 107035149 105306562
24 456667151 23177438 128993727 107296122
25 354155878 17129373 117966870 153549954
26 321360130 29077036 104384023 103666230
27 213093382 12153414 54825591 42076410
weekly.droplevel(level=0,axis=1)

How to handle string "NAN' (Stock ticker) in datafame

I have found a ticker "NAN" (NAN:NUVEEN NEW YORK QUALITY MUNICIPAL INCOME FUND) but when I am trying to insert it into my data frame it's converting into None. I even tried to insert as str(ticker) with the value of ticker being 'NAN'. I am lost - how do I do that. Every stock ticker is working except the ticker 'NAN'
Exact code:
from previous code execution
ticker = 'NAN'
cusip = '67066X107'
cusipdf['ticker'] = np.where(( cusipdf['cusip'] == cusip ), str(ticker), cusipdf['ticker'] )
I did the following and I am able to replace the value of REPLACE to NAN. See below.
import pandas as pd
import numpy as np
c = ['ticker','cusip', 'value']
d = d = [['AMZN','51123X145',123.4567],
['REPLACE','62343X145',223.1237],
['AAPL','56789X225',312.5767],
['GOOG','42154X638',331.8793]]
import pandas as pd
df = pd.DataFrame(data=d,columns=c)
print (df)
t = 'NAN'
df['ticker'] = np.where((df['cusip'] == '62343X145' ), str(t), df['ticker'] )
print (df)
ticker cusip value
0 AMZN 51123X145 123.4567
1 REPLACE 62343X145 223.1237
2 AAPL 56789X225 312.5767
3 GOOG 42154X638 331.8793
ticker cusip value
0 AMZN 51123X145 123.4567
1 NAN 62343X145 223.1237
2 AAPL 56789X225 312.5767
3 GOOG 42154X638 331.8793

Python - NaN return (pandas - resample function)

I'm doing a finance study based on the youtube link below and I would like to understand why I got the NaN return instead of the expected calculation. What do I need to do in this script to reach the expected value?
YouTube case: https://www.youtube.com/watch?v=UpbpvP0m5d8
import investpy as env
import numpy as np
import pandas as pd
lt = ['ABEV3','CEAB3','ENBR3','FLRY3','IRBR3','ITSA4','JHSF3','STBP3']
prices = pd.DataFrame()
for i in lt:
df = env.get_stock_historical_data(stock=i, from_date='01/01/2020', to_date='29/05/2020', country='brazil')
df['Ativo'] = i
prices = pd.concat([prices, df], sort=True)
pivoted = prices.pivot(columns='Ativo', values='Close')
e_r = pivoted.resample('Y').last().pct_change().mean()
e_r
Return:
Ativo
ABEV3 NaN
CEAB3 NaN
ENBR3 NaN
FLRY3 NaN
IRBR3 NaN
ITSA4 NaN
JHSF3 NaN
STBP3 NaN
dtype: float64
You need to change the 'from_date' to have more than one year of data.
You current script returns one row and .pct_change() on one row of data returns NaN, because there is no previous row to compare against.
When I changed from_date to '01/01/2018'
import investpy as env
import numpy as np
import pandas as pd
lt = ['ABEV3','CEAB3','ENBR3','FLRY3','IRBR3','ITSA4','JHSF3','STBP3']
prices = pd.DataFrame()
for i in lt:
df = env.get_stock_historical_data(stock=i, from_date='01/01/2018', to_date='29/05/2020', country='brazil')
df['Ativo'] = i
prices = pd.concat([prices, df], sort=True)
pivoted = prices.pivot(columns='Ativo', values='Close')
e_r = pivoted.resample('Y').last().pct_change().mean()
e_r
I get the following output:
Ativo
ABEV3 -0.043025
CEAB3 -0.464669
ENBR3 0.180655
FLRY3 0.191976
IRBR3 -0.175084
ITSA4 -0.035767
JHSF3 1.283291
STBP3 0.223627
dtype: float64

Updating pandas DataFrame by key

I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:
trades['cusip'] = np.nan
trades['security_type'] = np.nan
I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].
I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.
I thought I could do something like:
pd.merge(trades, config, on=['ticker', 'date'], how='left')
But that doesn't update the columns, it just adds the config columns to trades.
The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.
for date in trades['date'].unique():
config = get_config_file_as_df(date)
## config['date'] == date
for ticker in trades['ticker'][trades['date'] == date]:
trades['cusip'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['cusip'][config['ticker'] == ticker].values[0]
trades['security_type'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['security_type'][config['ticker'] == ticker].values[0]
Suppose you have this setup:
import pandas as pd
import numpy as np
import datetime as DT
nan = np.nan
trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [nan, nan, 100, nan]
})
trades = trades.set_index(['ticker', 'date'])
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 NaN
# MSFT 2000-01-02 NaN
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this
# AAPL 2000-01-04 NaN
config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])
# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)
print(config)
# cusip
# ticker date
# AAPL 2000-01-04 NaN
# GOOG 2000-01-03 3
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
Then you can update NaN values in trades with values from config using the DataFrame.update method. Note that DataFrame.update matches rows based on indices (which is why set_index was called above).
trades.update(config, join = 'left', overwrite = False)
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL 2000-01-04 NaN

Categories

Resources