Here is my early attempts in using Python. I am getting stock data from Yahoo but I can see that the ticker, date column headers are lower than the high low open close.
I am definitely missing something. What is it?
import pandas as pd
import numpy as np
import datetime
import pandas_datareader as pdr
py.init_notebook_mode(connected=True)
# we download the stock prices for each ticker and then we do a mapping between data and name of the ticker
def get(tickers, startdate, enddate):
def data(ticker):
return (pdr.get_data_yahoo(ticker, start=startdate, end=enddate))
datas = map (data, tickers)
return(pd.concat(datas, keys=tickers, names=['ticker', 'date']))
# Define the stocks to download. We'll download of Apple, Microsoft and the S&P500 index.
tickers = ['AAPL','IBM']
# We would like all available data from 01/01/2000 until 31/12/2018.
start_date = datetime.datetime(2016, 1, 1)
end_date = datetime.datetime(2019, 12, 31)
all_data = get(tickers, start_date, end_date)
Screenshot
This dataframe uses a hierarchical index. ticker and date aren't columns, but are both part of the index. This means the rows are grouped firstly by ticker and then by date.
For more information on hierarchical indexes check out the Pandas docs
Related
I'm trying to retrieve the historical data for the stock AAPL, but the below code has me specify between certain dates. How can I make it so that it automatically pulls the data for the past 5 years until current date instead?
import time
import datetime
import pandas as pd
ticker = 'AAPL'
period1 = int(time.mktime(datetime.datetime(2022, 1, 1, 23, 59).timetuple()))
period2 = int(time.mktime(datetime.datetime(2022, 2, 28, 23, 59).timetuple()))
interval = '1d' # 1wk, 1m
query_string = f'https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={period1}&period2={period2}&interval={interval}&events=history&includeAdjustedClose=true'
df = pd.read_csv(query_string)
print(df)
df.to_csv('AAPL.csv')
You can use yfinance to retrieve the data,
install using pip:
pip install yfinance
Use this code to retrieve the past 5 years of historical data of 'AAPL',
import yfinance as yf
df = yf.download('AAPL', period='5y')
You can also use yf.Ticker to do it:
ticker = yf.Ticker('AAPL')
df = ticker.history(period="5y")
As of Dec 30, 2021 ----
I did figure this out. New to Python, so this is not optimized or the most elegant, but it does return just the day that ends any market week. Because of how I specify the start and end dates, the dataframe always starts with a Monday, and ends with the last market day. Basically, it looks at each date in consecutive rows, assigns the difference in days to a new column. Each row will return a -1, except for the last day of the market week. The very last row of all data also returns a "NaN", which I had to deal with. I then delete just the rows with -1 in the Days column. Thank you for the feedback....here is the rest of the code that does the work, which follows the code I previously supplied.
data['Date'] = pd.to_datetime(data['Date'])
data['Days_from_date'] = pd.DatetimeIndex(data['Date']).day
data['Days'] = data['Days_from_date'] - data['Days_from_date'].shift(-1)
data=data.replace(np.nan,-1)
data["Days"]=data["Days"].astype(int)
data = data[data['Days'] != -1]
data = data[data['Days'].ne(-1)]
This is the previous post.....
I currently have python code that gets historical market info for various ETF tickers over a set period of time (currently 50 days). I run this code through Power BI. When I get done testing, I will be getting approximately 40 weeks of data for 60-ish ETFs. Current code is copied below.
I would like to minimize the amount of data returned to just the CLOSE data generated on the last market day of each week. Usually this is Friday, but sometimes it can be Thursday, and I think possibly Wednesday.
I am coming up short on how to identify each week's last market day and then pulling in just that data into a dataframe. Alternatively, I suppose it could pull in all data, and then drop the unwanted rows - I'm not sure which would be a better solution, and, in any case, I can't figure out how to do it!
Current code here, using Python 3.10 and Visual Studio Code for testing....
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df_list = list()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, group_by="Ticker")
data['Ticker'] = ticker
df_list.append(data)
data = pd.concat(df_list)
data = data.drop(columns=["Adj Close", "High", "Low", "Open", "Volume"])
data = data.reset_index()
As I commented, I think you can get the desired data by getting the week number from the date data, grouping it and getting the last row. For example, if Friday is a holiday, I considered Thursday to be the last data of the week number.
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df = pd.DataFrame()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, progress=False)['Close'].to_frame('Close')
data['Ticker'] = ticker
df = df.append(data)
df.reset_index(inplace=True)
df['week_no'] = df['Date'].dt.isocalendar().week
data = df.groupby(['Ticker','week_no']).tail(1).sort_values('Date', ascending=True)
I want to calculate the number of business days between two dates and create a new pandas dataframe column with those days. I also have a holiday calendar and I want to exclude dates in the holiday calendar while making my calculation.
I looked around and I saw the numpy busday_count function as a useful tool for it. The function counts the number of business days between two dates and also allows you to include a holiday calendar.
I also looked around and I saw the holidays package which gives me the holiday dates for different countries. I thought it will be great to add this holiday calendar into the numpy function.
Then I proceeded as follows;
import pandas as pd
import numpy as np
import holidays
from datetime import datetime, timedelta, date
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']
}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UnitedKingdom')
start_date = [d.date for d in df['start']]
end_date = [d.date for d in df['end']]
holidays_numpy = holidays_country[start_date:end_date]
df['business_days'] = np.busday_count(begindates = start_date,
enddates = end_date,
holidays=holidays_numpy)
When I run this code, it throws this error TypeError: Cannot convert type '<class 'list'>' to date
When I looked further, I noticed that the start_date and end_date are lists and that might be whey the error was occuring.
I then changed the holidays_numpy variable to holidays_numpy = holidays_country['2019-01-01':'2019-12-31'] and it worked.
However, since my dates are different for each row in my dataframe, is there a way to set the two arguments in my holiday_numpy variable to select corresponding values (just like the zip function) each from start_date and end_date?
I'm also open to alternative ways of solving this problem.
This should work:
import pandas as pd
import numpy as np
import holidays
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UK')
def f(x):
return np.busday_count(x[0],x[1],holidays=holidays_country[x[0]:x[1]])
df['business_days'] = df[['start','end']].apply(f,axis=1)
df.head()
Things used to work great until several days ago. Now when I run the following:
from pandas_datareader import data
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
I get only the most recent data of ONE year shown below, as if the start_data and end_data did not seem to matter. Change them to different dates yielded the same results below. Does anyone know why?
Results:
df.head()
Open High Low Close Volume
Date
2016-09-21 129.13 130.00 128.39 129.94 14068336
2016-09-22 130.50 130.73 129.56 130.08 15538307
2016-09-23 127.56 128.60 127.30 127.96 28326266
2016-09-26 127.37 128.16 126.80 127.31 15064940
2016-09-27 127.61 129.01 127.43 128.69 15637111
Use fix-yahoo-finance and then use yahoo rather than Google as your source. It looks like Google has been locking down a lot of its data lately.
First you'll need to install fix-yahoo-finance. Just use pip install fix-yahoo-finance.
Then use get_data_yahoo:
from pandas_datareader import data
import fix_yahoo_finance as yf
yf.pdr_override()
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.get_data_yahoo(symbol, start_date, end_date)
df.head()
Open High Low Close Adj Close Volume
Date
2010-01-04 136.25000 136.61000 133.14000 133.89999 133.89999 7599900
2010-01-05 133.42999 135.48000 131.81000 134.69000 134.69000 8851900
2010-01-06 134.60001 134.73000 131.64999 132.25000 132.25000 7178800
2010-01-07 132.01000 132.32001 128.80000 130.00000 130.00000 11030200
2010-01-08 130.56000 133.67999 129.03000 133.52000 133.52000 9830500
Just replace google with yahoo. There are problem with google source right now. https://github.com/pydata/pandas-datareader/issues/394
from pandas_datareader import data
symbol = 'AMZN'
data_source='yahoo'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
Yahoo working as of January 01, 2020:
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2018, 2, 8)
df = web.DataReader('TSLA', 'yahoo', start, end)
print(df.head())
I want to download adjusted close prices and their corresponding dates from yahoo, but I can't seem to figure out how to get dates from pandas DataFrame.
I was reading an answer to this question
from pandas.io.data import DataReader
from datetime import datetime
goog = DataReader("GOOG", "yahoo", datetime(2000,1,1), datetime(2012,1,1))
print goog["Adj Close"]
and this part works fine; however, I need to extract the dates that correspond to the prices.
For example:
adj_close = np.array(goog["Adj Close"])
Gives me a 1-D array of adjusted closing prices, I am looking for 1-D array of dates, such that:
date = # what do I do?
adj_close[0] corresponds to date[0]
When I do:
>>> goog.keys()
Index([Open, High, Low, Close, Volume, Adj Close], dtype=object)
I see that none of the keys will give me anything similar to the date, but I think there has to be a way to create an array of dates. What am I missing?
You can get it by goog.index which is stored as a DateTimeIndex.
To get a series of date, you can do
goog.reset_index()['Date']
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
symbols_list = ['GOOG','IBM']
d = {}
for ticker in symbols_list:
d[ticker] = DataReader(ticker, "yahoo", '2014-01-01')
pan = pd.Panel(d)
df_adj_close = pan.minor_xs('Adj Close') #also use 'Open','High','Low','Adj Close' and 'Volume'
#the dates of the adjusted closes from the dataframe containing adjusted closes on multiple stocks
df_adj_close.index
# create a dataframe that has data on only one stock symbol
df_individual = pan.get('GOOG')
# the dates from the dataframe of just 'GOOG' data
df_individual.index