How to handle string "NAN' (Stock ticker) in datafame

How to handle string "NAN' (Stock ticker) in datafame - python

I have found a ticker "NAN" (NAN:NUVEEN NEW YORK QUALITY MUNICIPAL INCOME FUND) but when I am trying to insert it into my data frame it's converting into None. I even tried to insert as str(ticker) with the value of ticker being 'NAN'. I am lost - how do I do that. Every stock ticker is working except the ticker 'NAN'
Exact code:
from previous code execution
ticker = 'NAN'
cusip = '67066X107'
cusipdf['ticker'] = np.where(( cusipdf['cusip'] == cusip ), str(ticker), cusipdf['ticker'] )

I did the following and I am able to replace the value of REPLACE to NAN. See below.
import pandas as pd
import numpy as np
c = ['ticker','cusip', 'value']
d = d = [['AMZN','51123X145',123.4567],
['REPLACE','62343X145',223.1237],
['AAPL','56789X225',312.5767],
['GOOG','42154X638',331.8793]]
import pandas as pd
df = pd.DataFrame(data=d,columns=c)
print (df)
t = 'NAN'
df['ticker'] = np.where((df['cusip'] == '62343X145' ), str(t), df['ticker'] )
print (df)
ticker cusip value
0 AMZN 51123X145 123.4567
1 REPLACE 62343X145 223.1237
2 AAPL 56789X225 312.5767
3 GOOG 42154X638 331.8793
ticker cusip value
0 AMZN 51123X145 123.4567
1 NAN 62343X145 223.1237
2 AAPL 56789X225 312.5767
3 GOOG 42154X638 331.8793

Related

Beginner Python Stock Screener

I try to build a simply stock screener. The screener should download the volume and avg. volume and should give me all stocks with the condition volume > avg. volume.
The Problem is, that something went wrong and i don't know how to fix the code. Its a overall problem in the code and i hope i can get some help. Often i get the message, that the data is empty. But i think there is something wrong with the tables and the conditions...
The first for loop gives me this and this is good, i think there is no mistake in the code
Ticker volume avg. volume
aapl 31 20
ayx 20 32
nflx 25 28
The second for loop is to check my condition ( volume > avg. volume)
but it gives me the following output:
ticker1 ... Stock
0 NaN ... ticker1
1 NaN ... Volume_1
2 NaN ... Average_Volume
[3 rows x 4 columns]
Process finished with exit code 0
normaly only apple fullfills the conditions and it must look like:
Stock volume avg. volume
aapl 31 20
Thats my code:
from yahoo_fin.stock_info import get_analysts_info, get_stats, get_live_price, get_quote_table
import pandas as pd
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
tickers = ['aapl', 'ayx' , 'nflx']
exportList = pd.DataFrame(columns = ['ticker1', 'Volume_1', 'Average_Volume'])
for ticker in tickers:
df = get_stats(ticker)
df['ticker'] = ticker
df = df.pivot(index = 'ticker', columns = 'Attribute', values = 'Value')
df['Volume_1'] = get_quote_table(ticker)['Volume']
df['Average_Volume'] = get_quote_table(ticker)['Avg. Volume']
df = df[['Volume_1', 'Average_Volume']]
df = df.reset_index()
df.columns = ('ticker', 'Volume_1', 'Average_Volume')
rs_df = pd.DataFrame (columns = ['ticker1', 'Volume_1', 'Average_Volume'])
for stock in rs_df:
Volume_1 = df["Volume_1"]
Average_Volume = df["Average_Volume"]
if float(Volume_1) < float(Average_Volume):
exportList = exportList.append({'Stock': stock, "Volume_1": Volume_1, "Average_Volume": Average_Volume},
ignore_index=True)
print('\n', exportList)

How to organize pandas so the first column is just dates which correspond with 4 countries with percentage data in their cells?

The data here is web-scraped from a website, and this initial data in the variable 'r' has three columns, where there are three columns: 'Country', 'Date', '% vs 2019 (Daily)'. From these three columns I was able to extract only the ones I wanted from dates: "2021-01-01" to current/today. What I am trying to do (have spent hours), is trying to organize the data in such a way where there is one column with just the dates which correspond to the percentage data, then 4 other columns which are the country names: Denmark, Finland, Norway, Sweden. Underneath those four countries should have cells populated with the percent data. Have tried using [], loc, and iloc and various other combinations to filter the panda dataframes in such a way to make this happen, but to no avail.
Here is the code I have so far:
import requests
import pandas as pd
import json
import math
import datetime
from jinja2 import Template, Environment
from datetime import date
r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0output=jspn')
data = r.content
data = json.loads(data.decode('utf-8').split("(", 1)[1].rsplit(")", 1)[0])
d = [[i['c'][0]['v'], i['c'][2]['f'], (i['c'][5]['v'])*100 ] for i in data['table']['rows']]
df = pd.DataFrame(d, columns=['Country', 'Date', '% vs 2019 (Daily)'])
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
# EXTRACTING BETWEEN TWO DATES
df['Date'] = pd.to_datetime(df['Date'])
startdate = datetime.datetime.strptime('2021-01-01', "%Y-%m-%d").date()
enddate = datetime.datetime.strptime('2021-02-02', "%Y-%m-%d").date()
pd.Timestamp('today').floor('D')
df = df[(df['Date'] > pd.Timestamp(startdate).floor('D')) & (df['Date'] <= pd.Timestamp(enddate).floor('D'))]
Den = df.loc[df['Country'] == 'Denmark']
Fin = df.loc[df['Country'] == 'Finland']
Swe = df.loc[df['Country'] == 'Sweden']
Nor = df.loc[df['Country'] == 'Norway']
Den_data = Den.loc[: , "% vs 2019 (Daily)"]
Den_date = Den.loc[: , "Date"]
Nor_data = Nor.loc[: , "% vs 2019 (Daily)"]
Swe_data = Swe.loc[: , "% vs 2019 (Daily)"]
Fin_data = Fin.loc[: , "% vs 2019 (Daily)"]
Fin_date = Fin.loc[: , "Date"]
Den_data = Den.loc[: , "% vs 2019 (Daily)"]
df2 = pd.DataFrame()
df2['DEN_DATE'] = Den_date
df2['DENMARK'] = Den_data
df3 = pd.DataFrame()
df3['FIN_DATE'] = Fin_date
df3['FINLAND'] = Fin_data
Want it to be organized like this so I can eventually export it to excel:
Date | Denmark | Finland| Norway | Sweden
2020-01-01 | 1234 | 4321 | 5432 | 6574
...
Any help is greatly appreicated.
Thank you

Use isin to filter only the countries you are interested in getting the data. Then use pivot to return a reshaped dataframe organized by a given index and column values, in this case the index is the Date column, and the column values are the countries from the previous selection.
...
...
pd.Timestamp('today').floor('D')
df = df[(df['Date'] > pd.Timestamp(startdate).floor('D')) & (df['Date'] <= pd.Timestamp(enddate).floor('D'))]
countries_list=['Denmark', 'Finland', 'Norway', 'Sweden']
countries_selected = df[df.Country.isin(countries_list)]
result = countries_selected.pivot(index="Date", columns="Country")
print(result)
Output from result
% vs 2019 (Daily)
Country Denmark Finland Norway Sweden
Date
2021-01-02 -65.261383 -75.416667 -39.164087 -65.853659
2021-01-03 -60.405405 -77.408056 -31.763620 -66.385669
2021-01-04 -69.371429 -75.598086 -34.002770 -70.704467
2021-01-05 -73.690932 -79.251701 -33.815689 -73.450509
2021-01-06 -76.257310 -80.445151 -43.454791 -80.805484
...
...
2021-01-30 -83.931624 -75.545852 -63.751763 -76.260163
2021-01-31 -80.654339 -74.468085 -55.565777 -65.451895
2021-02-01 -81.494253 -72.419106 -49.610390 -75.473322
2021-02-02 -81.741233 -73.898305 -46.164021 -78.215223

Python - NaN return (pandas - resample function)

I'm doing a finance study based on the youtube link below and I would like to understand why I got the NaN return instead of the expected calculation. What do I need to do in this script to reach the expected value?
YouTube case: https://www.youtube.com/watch?v=UpbpvP0m5d8
import investpy as env
import numpy as np
import pandas as pd
lt = ['ABEV3','CEAB3','ENBR3','FLRY3','IRBR3','ITSA4','JHSF3','STBP3']
prices = pd.DataFrame()
for i in lt:
df = env.get_stock_historical_data(stock=i, from_date='01/01/2020', to_date='29/05/2020', country='brazil')
df['Ativo'] = i
prices = pd.concat([prices, df], sort=True)
pivoted = prices.pivot(columns='Ativo', values='Close')
e_r = pivoted.resample('Y').last().pct_change().mean()
e_r
Return:
Ativo
ABEV3 NaN
CEAB3 NaN
ENBR3 NaN
FLRY3 NaN
IRBR3 NaN
ITSA4 NaN
JHSF3 NaN
STBP3 NaN
dtype: float64

You need to change the 'from_date' to have more than one year of data.
You current script returns one row and .pct_change() on one row of data returns NaN, because there is no previous row to compare against.
When I changed from_date to '01/01/2018'
import investpy as env
import numpy as np
import pandas as pd
lt = ['ABEV3','CEAB3','ENBR3','FLRY3','IRBR3','ITSA4','JHSF3','STBP3']
prices = pd.DataFrame()
for i in lt:
df = env.get_stock_historical_data(stock=i, from_date='01/01/2018', to_date='29/05/2020', country='brazil')
df['Ativo'] = i
prices = pd.concat([prices, df], sort=True)
pivoted = prices.pivot(columns='Ativo', values='Close')
e_r = pivoted.resample('Y').last().pct_change().mean()
e_r
I get the following output:
Ativo
ABEV3 -0.043025
CEAB3 -0.464669
ENBR3 0.180655
FLRY3 0.191976
IRBR3 -0.175084
ITSA4 -0.035767
JHSF3 1.283291
STBP3 0.223627
dtype: float64

Rolling mean returns over DataFrame

I want to add columns to the following Dataframe for each stock of 5 year (60 month) rolling returns. The following code is used to obtain the financial data over the period 1995 to 2010.
quandl.ApiConfig.api_key = 'Enter Key'
stocks = ['MSFT', 'AAPL', 'WMT', 'GE', 'KO']
stockdata = quandl.get_table('WIKI/PRICES', ticker = stocks, paginate=True,
qopts = { 'columns': ['date', 'ticker', 'adj_close'] },
date = { 'gte': '1995-1-1', 'lte': '2010-12-31' })
# Setting date as index with columns of tickers and adjusted closing price
df = stockdata.pivot(index = 'date',columns='ticker')
df.index = pd.to_datetime(df.index)
df.resample('1M').mean()
df = df.pct_change()
df.head()
Out[1]:
rets
ticker AAPL BA F GE JNJ KO
date
1995-01-03 NaN NaN NaN NaN NaN NaN
1995-01-04 0.026055 -0.002567 0.026911 0.000000 0.006972 -0.019369
1995-01-05 -0.012697 0.002573 -0.008735 0.002549 -0.002369 -0.004938
1995-01-06 0.080247 0.018824 0.000000 -0.004889 -0.006758 0.000000
1995-01-09 -0.019048 0.000000 0.017624 -0.009827 -0.011585 -0.014887
df.tail()
Out[2]:
rets
ticker AAPL BA F GE JNJ KO
date
2010-12-27 0.003337 -0.004765 0.005364 0.008315 -0.005141 -0.007777
2010-12-28 0.002433 0.001699 -0.008299 0.007147 0.001938 0.004457
2010-12-29 -0.000553 0.002929 0.000598 -0.002729 0.001289 0.001377
2010-12-30 -0.005011 -0.000615 -0.002987 -0.004379 -0.003058 0.000764
2010-12-31 -0.003399 0.003846 0.005992 0.005498 -0.001453 0.004122
Any assistance of how to do this would be awesome!

The problem is in the multi-level index in the columns. We can start by selecting the second level index, and after that the rolling mean works:
means = df['rets'].rolling(60).mean()
means.tail()
Gives:

The error you are receiving is due to you passing the entire dataframe into the rolling function since your frame uses a multi index. You cant pass a multi index frame to a rolling function since rolling only accepts numpy arrays of 1 column. You’ll have to probably create a for loop and return the values individually per ticker

Updating pandas DataFrame by key

I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:
trades['cusip'] = np.nan
trades['security_type'] = np.nan
I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].
I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.
I thought I could do something like:
pd.merge(trades, config, on=['ticker', 'date'], how='left')
But that doesn't update the columns, it just adds the config columns to trades.
The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.
for date in trades['date'].unique():
config = get_config_file_as_df(date)
## config['date'] == date
for ticker in trades['ticker'][trades['date'] == date]:
trades['cusip'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['cusip'][config['ticker'] == ticker].values[0]
trades['security_type'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['security_type'][config['ticker'] == ticker].values[0]

Suppose you have this setup:
import pandas as pd
import numpy as np
import datetime as DT
nan = np.nan
trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [nan, nan, 100, nan]
})
trades = trades.set_index(['ticker', 'date'])
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 NaN
# MSFT 2000-01-02 NaN
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this
# AAPL 2000-01-04 NaN
config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])
# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)
print(config)
# cusip
# ticker date
# AAPL 2000-01-04 NaN
# GOOG 2000-01-03 3
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
Then you can update NaN values in trades with values from config using the DataFrame.update method. Note that DataFrame.update matches rows based on indices (which is why set_index was called above).
trades.update(config, join = 'left', overwrite = False)
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL 2000-01-04 NaN

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to handle string "NAN' (Stock ticker) in datafame - python

Related

Beginner Python Stock Screener

How to organize pandas so the first column is just dates which correspond with 4 countries with percentage data in their cells?

Python - NaN return (pandas - resample function)

Rolling mean returns over DataFrame

Updating pandas DataFrame by key

Categories

Resources