I want to download adjusted close prices and their corresponding dates from yahoo, but I can't seem to figure out how to get dates from pandas DataFrame.
I was reading an answer to this question
from pandas.io.data import DataReader
from datetime import datetime
goog = DataReader("GOOG", "yahoo", datetime(2000,1,1), datetime(2012,1,1))
print goog["Adj Close"]
and this part works fine; however, I need to extract the dates that correspond to the prices.
For example:
adj_close = np.array(goog["Adj Close"])
Gives me a 1-D array of adjusted closing prices, I am looking for 1-D array of dates, such that:
date = # what do I do?
adj_close[0] corresponds to date[0]
When I do:
>>> goog.keys()
Index([Open, High, Low, Close, Volume, Adj Close], dtype=object)
I see that none of the keys will give me anything similar to the date, but I think there has to be a way to create an array of dates. What am I missing?
You can get it by goog.index which is stored as a DateTimeIndex.
To get a series of date, you can do
goog.reset_index()['Date']
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
symbols_list = ['GOOG','IBM']
d = {}
for ticker in symbols_list:
d[ticker] = DataReader(ticker, "yahoo", '2014-01-01')
pan = pd.Panel(d)
df_adj_close = pan.minor_xs('Adj Close') #also use 'Open','High','Low','Adj Close' and 'Volume'
#the dates of the adjusted closes from the dataframe containing adjusted closes on multiple stocks
df_adj_close.index
# create a dataframe that has data on only one stock symbol
df_individual = pan.get('GOOG')
# the dates from the dataframe of just 'GOOG' data
df_individual.index
Related
Need to draw a bar chart using below data set. X axis needs to be Territory and Y axis needs to be average production in each territory and hue needs to contain the month from the date column.
Not exactly sure what you are asking. When you say average production, do you want to calculate average production from a Territory, or just display the value that is in the production column? If you clarify I can update my answer. In my example I just display the data from the production column. First export your spreadsheet to csv. Then you can do the following:
import calendar
import datetime
import pandas as pd
import plotly.express as ex
df = pd.read_csv("data.csv")
def get_month_names(dataframe: pd.DataFrame):
# Get all the dates
dates = dataframe["Date"].to_list()
# Convert date-string to datetime object
# I assume month/day/year, if it is day/month/year, swap %m and %d
date_objs = [datetime.datetime.strptime(date, "%m/%d/%Y %H:%M:%S") for date in dates]
# Get all the months
months = [date.month for date in date_objs]
# Get the names of the months
month_names = [calendar.month_name[month] for month in months]
return month_names
fig = ex.bar(x=df["Territory"],
y=df["Production"],
color=get_month_names(df))
fig.show()
this produces:
As of Dec 30, 2021 ----
I did figure this out. New to Python, so this is not optimized or the most elegant, but it does return just the day that ends any market week. Because of how I specify the start and end dates, the dataframe always starts with a Monday, and ends with the last market day. Basically, it looks at each date in consecutive rows, assigns the difference in days to a new column. Each row will return a -1, except for the last day of the market week. The very last row of all data also returns a "NaN", which I had to deal with. I then delete just the rows with -1 in the Days column. Thank you for the feedback....here is the rest of the code that does the work, which follows the code I previously supplied.
data['Date'] = pd.to_datetime(data['Date'])
data['Days_from_date'] = pd.DatetimeIndex(data['Date']).day
data['Days'] = data['Days_from_date'] - data['Days_from_date'].shift(-1)
data=data.replace(np.nan,-1)
data["Days"]=data["Days"].astype(int)
data = data[data['Days'] != -1]
data = data[data['Days'].ne(-1)]
This is the previous post.....
I currently have python code that gets historical market info for various ETF tickers over a set period of time (currently 50 days). I run this code through Power BI. When I get done testing, I will be getting approximately 40 weeks of data for 60-ish ETFs. Current code is copied below.
I would like to minimize the amount of data returned to just the CLOSE data generated on the last market day of each week. Usually this is Friday, but sometimes it can be Thursday, and I think possibly Wednesday.
I am coming up short on how to identify each week's last market day and then pulling in just that data into a dataframe. Alternatively, I suppose it could pull in all data, and then drop the unwanted rows - I'm not sure which would be a better solution, and, in any case, I can't figure out how to do it!
Current code here, using Python 3.10 and Visual Studio Code for testing....
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df_list = list()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, group_by="Ticker")
data['Ticker'] = ticker
df_list.append(data)
data = pd.concat(df_list)
data = data.drop(columns=["Adj Close", "High", "Low", "Open", "Volume"])
data = data.reset_index()
As I commented, I think you can get the desired data by getting the week number from the date data, grouping it and getting the last row. For example, if Friday is a holiday, I considered Thursday to be the last data of the week number.
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df = pd.DataFrame()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, progress=False)['Close'].to_frame('Close')
data['Ticker'] = ticker
df = df.append(data)
df.reset_index(inplace=True)
df['week_no'] = df['Date'].dt.isocalendar().week
data = df.groupby(['Ticker','week_no']).tail(1).sort_values('Date', ascending=True)
I want to calculate the number of business days between two dates and create a new pandas dataframe column with those days. I also have a holiday calendar and I want to exclude dates in the holiday calendar while making my calculation.
I looked around and I saw the numpy busday_count function as a useful tool for it. The function counts the number of business days between two dates and also allows you to include a holiday calendar.
I also looked around and I saw the holidays package which gives me the holiday dates for different countries. I thought it will be great to add this holiday calendar into the numpy function.
Then I proceeded as follows;
import pandas as pd
import numpy as np
import holidays
from datetime import datetime, timedelta, date
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']
}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UnitedKingdom')
start_date = [d.date for d in df['start']]
end_date = [d.date for d in df['end']]
holidays_numpy = holidays_country[start_date:end_date]
df['business_days'] = np.busday_count(begindates = start_date,
enddates = end_date,
holidays=holidays_numpy)
When I run this code, it throws this error TypeError: Cannot convert type '<class 'list'>' to date
When I looked further, I noticed that the start_date and end_date are lists and that might be whey the error was occuring.
I then changed the holidays_numpy variable to holidays_numpy = holidays_country['2019-01-01':'2019-12-31'] and it worked.
However, since my dates are different for each row in my dataframe, is there a way to set the two arguments in my holiday_numpy variable to select corresponding values (just like the zip function) each from start_date and end_date?
I'm also open to alternative ways of solving this problem.
This should work:
import pandas as pd
import numpy as np
import holidays
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UK')
def f(x):
return np.busday_count(x[0],x[1],holidays=holidays_country[x[0]:x[1]])
df['business_days'] = df[['start','end']].apply(f,axis=1)
df.head()
Here is my early attempts in using Python. I am getting stock data from Yahoo but I can see that the ticker, date column headers are lower than the high low open close.
I am definitely missing something. What is it?
import pandas as pd
import numpy as np
import datetime
import pandas_datareader as pdr
py.init_notebook_mode(connected=True)
# we download the stock prices for each ticker and then we do a mapping between data and name of the ticker
def get(tickers, startdate, enddate):
def data(ticker):
return (pdr.get_data_yahoo(ticker, start=startdate, end=enddate))
datas = map (data, tickers)
return(pd.concat(datas, keys=tickers, names=['ticker', 'date']))
# Define the stocks to download. We'll download of Apple, Microsoft and the S&P500 index.
tickers = ['AAPL','IBM']
# We would like all available data from 01/01/2000 until 31/12/2018.
start_date = datetime.datetime(2016, 1, 1)
end_date = datetime.datetime(2019, 12, 31)
all_data = get(tickers, start_date, end_date)
Screenshot
This dataframe uses a hierarchical index. ticker and date aren't columns, but are both part of the index. This means the rows are grouped firstly by ticker and then by date.
For more information on hierarchical indexes check out the Pandas docs
I am using the yfinance library to import data for a given stock. See code below:
import yfinance as yf
from datetime import datetime as dt
import pandas as pd
# Naming Constants
stock = "AAPL"
start_date = "2014-01-01"
end_date = "2018-01-01"
# Importing all the data into a dataFrame
stock_data = yf.download(stock, start=start_date, end=end_date)
When I call print(stock_data.index) I have the following:
DatetimeIndex(['2014-01-02', '2014-01-03', '2014-01-06', '2014-01-07', '2014-01-08', '2014-01-09', '2014-01-10', '2014-01-13', '2014-01-14', '2014-01-15',
...
'2017-12-15', '2017-12-18', '2017-12-19', '2017-12-20', '2017-12-21', '2017-12-22', '2017-12-26', '2017-12-27', '2017-12-28', '2017-12-29'],
dtype='datetime64[ns]', name='Date', length=1007, freq=None)
I wish to switch the frequency argument from None to daily since every Date refers to a trading day.
When I say stock_data.index.freq = 'B' I get the following error:
ValueError: Inferred frequency None from passed values does not conform to passed frequency B
And if I put stock_data = stock_data.asfreq('B'), it will change the frequency but it will add certain lines that were not there originally and fills them with NA values.
In other words, what is the offset ALIAS used for trading days?
You can find the list of alias from the Pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
The error with stock_data.index.freq = 'B' indicates that your timeseries frequency is not 'business-day', but undefined or 'None'.
With
stock_data = stock_data.asfreq('B')
your are re-indexing your timeseries to business-daily frequency: The missing timestamps will be added, and the missing stock data values are set to NaN. Now you need to decide how replace them, so have a look here: pandas.DataFrame.asfreq. So you could replace all NaN's with a fixed value like -999, but in general what you want to do with stock data is take the last valid value at a given point in time, which is forward filling the gaps:
stock_data = stock_data.asfreq('B', method='ffill')
It's always worth reading the docs.