Things used to work great until several days ago. Now when I run the following:
from pandas_datareader import data
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
I get only the most recent data of ONE year shown below, as if the start_data and end_data did not seem to matter. Change them to different dates yielded the same results below. Does anyone know why?
Results:
df.head()
Open High Low Close Volume
Date
2016-09-21 129.13 130.00 128.39 129.94 14068336
2016-09-22 130.50 130.73 129.56 130.08 15538307
2016-09-23 127.56 128.60 127.30 127.96 28326266
2016-09-26 127.37 128.16 126.80 127.31 15064940
2016-09-27 127.61 129.01 127.43 128.69 15637111
Use fix-yahoo-finance and then use yahoo rather than Google as your source. It looks like Google has been locking down a lot of its data lately.
First you'll need to install fix-yahoo-finance. Just use pip install fix-yahoo-finance.
Then use get_data_yahoo:
from pandas_datareader import data
import fix_yahoo_finance as yf
yf.pdr_override()
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.get_data_yahoo(symbol, start_date, end_date)
df.head()
Open High Low Close Adj Close Volume
Date
2010-01-04 136.25000 136.61000 133.14000 133.89999 133.89999 7599900
2010-01-05 133.42999 135.48000 131.81000 134.69000 134.69000 8851900
2010-01-06 134.60001 134.73000 131.64999 132.25000 132.25000 7178800
2010-01-07 132.01000 132.32001 128.80000 130.00000 130.00000 11030200
2010-01-08 130.56000 133.67999 129.03000 133.52000 133.52000 9830500
Just replace google with yahoo. There are problem with google source right now. https://github.com/pydata/pandas-datareader/issues/394
from pandas_datareader import data
symbol = 'AMZN'
data_source='yahoo'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
Yahoo working as of January 01, 2020:
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2018, 2, 8)
df = web.DataReader('TSLA', 'yahoo', start, end)
print(df.head())
Related
As of Dec 30, 2021 ----
I did figure this out. New to Python, so this is not optimized or the most elegant, but it does return just the day that ends any market week. Because of how I specify the start and end dates, the dataframe always starts with a Monday, and ends with the last market day. Basically, it looks at each date in consecutive rows, assigns the difference in days to a new column. Each row will return a -1, except for the last day of the market week. The very last row of all data also returns a "NaN", which I had to deal with. I then delete just the rows with -1 in the Days column. Thank you for the feedback....here is the rest of the code that does the work, which follows the code I previously supplied.
data['Date'] = pd.to_datetime(data['Date'])
data['Days_from_date'] = pd.DatetimeIndex(data['Date']).day
data['Days'] = data['Days_from_date'] - data['Days_from_date'].shift(-1)
data=data.replace(np.nan,-1)
data["Days"]=data["Days"].astype(int)
data = data[data['Days'] != -1]
data = data[data['Days'].ne(-1)]
This is the previous post.....
I currently have python code that gets historical market info for various ETF tickers over a set period of time (currently 50 days). I run this code through Power BI. When I get done testing, I will be getting approximately 40 weeks of data for 60-ish ETFs. Current code is copied below.
I would like to minimize the amount of data returned to just the CLOSE data generated on the last market day of each week. Usually this is Friday, but sometimes it can be Thursday, and I think possibly Wednesday.
I am coming up short on how to identify each week's last market day and then pulling in just that data into a dataframe. Alternatively, I suppose it could pull in all data, and then drop the unwanted rows - I'm not sure which would be a better solution, and, in any case, I can't figure out how to do it!
Current code here, using Python 3.10 and Visual Studio Code for testing....
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df_list = list()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, group_by="Ticker")
data['Ticker'] = ticker
df_list.append(data)
data = pd.concat(df_list)
data = data.drop(columns=["Adj Close", "High", "Low", "Open", "Volume"])
data = data.reset_index()
As I commented, I think you can get the desired data by getting the week number from the date data, grouping it and getting the last row. For example, if Friday is a holiday, I considered Thursday to be the last data of the week number.
import yfinance as yf
import pandas as pd
from datetime import date
from datetime import timedelta
enddate = date.today()
startdate = enddate - timedelta(days=50)
tickerStrings = ['VUG', 'VV', 'MGC', 'MGK', 'VOO', 'VXF', 'VBK', 'VB']
df = pd.DataFrame()
for ticker in tickerStrings:
data = yf.download(ticker, start=startdate, progress=False)['Close'].to_frame('Close')
data['Ticker'] = ticker
df = df.append(data)
df.reset_index(inplace=True)
df['week_no'] = df['Date'].dt.isocalendar().week
data = df.groupby(['Ticker','week_no']).tail(1).sort_values('Date', ascending=True)
I use this code to get BTC value but the date is starting previous day which my selected.
INPUT:
tickers=['BTC-USD'] # Name of asset
tarih="02-06-2021"
tarih2="05-06-2021"
start=dt.datetime.strptime(tarih, '%d-%m-%Y')
end=dt.datetime.strptime(tarih2, '%d-%m-%Y')
returns=pd.DataFrame()
liste=[]
for ticker in tickers:
data=web.DataReader(ticker,'yahoo',start,end)
data=pd.DataFrame(data)
data[ticker]=data['Adj Close'] #can work with change percentage in order to get more accurate data
if returns.empty:
returns=data[[ticker]]
else:
returns = returns.join(data[[ticker]],how='outer')#add right column
for dt in daterange(start, end):
dates=dt.strftime("%d-%m-%Y")
with open("fng_value.txt", "r") as filestream:
for line in filestream:
date = line.split(",")[0]
if dates == date:
fng_value=line.split(",")[1]
liste.append(fng_value)
print(returns.head(25))
OUTPUT:
BTC-USD
Date
2021-06-01 37575.179688
2021-06-02 39208.765625
2021-06-03 36894.406250
2021-06-04 35551.957031
2021-06-05 35862.378906
DataReader accepts a start parameter as a string, date, or datetime. Apparently, sometimes using start date (e.g. 2021-06-02) retrieves data starting from the previous day on 2021-06-01. Try to use a datetime with timezone and an hour late in the day to hack the date if it doesn't return what you expect it to.
See if this works:
import pandas_datareader.data as web
import pandas as pd
from pytz import timezone
from datetime import datetime, date
tarih = "02-06-2021"
tarih2 = "05-06-2021"
# start/end can be date, datetime, or string
#start = date(2021, 6, 2)
#end = date(2021, 6, 5)
#start = 'JUN-02-2021'
#end = 'JUN-05-2021'
start = datetime.strptime(tarih, '%d-%m-%Y').replace(hour=23, tzinfo=timezone('EST'))
end = datetime.strptime(tarih2, '%d-%m-%Y').replace(tzinfo=timezone('EST'))
tickers=['BTC-USD'] # Name of asset
for ticker in tickers:
data = web.DataReader(ticker, 'yahoo', start, end)
data = pd.DataFrame(data)
print(data)
This returns the data from 6/2 to 6/5.
Sorry if this is obvious, but I'm using yfinance to create a stock analysis program, but I can't get anything in this month, it's the start of the month (as of now August 3rd) but my program can't fetch data after July 31st
Here's my program recording a 5 day window:
from pandas_datareader import data as pdr
import yfinance as yf
import datetime
from dateutil.relativedelta import *
import calendar
yf.pdr_override()
today =datetime.date.today()
yesterday = today-datetime.timedelta(5)
a= pdr.get_data_yahoo('AAPL', start=yesterday,end=today)
print(a)
and the output is
Open High Low Close Adj Close Volume
Date
2020-07-31 411.540009 425.660004 403.299988 425.040009 425.040009 93584200
Specify the correct date range
Since today is a Monday, the data for today was probably not available yet.
from datetime import date, timedelta
import yfinance as yf
from pandas_datareader import data as pdr
start = date(2020, 7, 1)
end = date(2020, 7, 31)
a = yf.download('AAPL', start=start, end=end)
# also works, but you don't need both yf and pdr
a = pdr.data.get_data_yahoo('AAPL', start=start, end=end)
# display(a.head())
Open High Low Close Adj Close Volume
Date
2020-07-01 365.119995 367.359985 363.910004 364.109985 364.109985 27684300
2020-07-02 367.850006 370.470001 363.640015 364.109985 364.109985 28510400
2020-07-06 370.000000 375.779999 369.869995 373.850006 373.850006 29663900
2020-07-07 375.410004 378.619995 372.230011 372.690002 372.690002 28106100
2020-07-08 376.720001 381.500000 376.359985 381.369995 381.369995 29273000
With your date ranges
today = date.today()
yesterday = today - timedelta(5)
a = pdr.get_data_yahoo('AAPL', start=yesterday, end=today)
High Low Open Close Volume Adj Close
Date
2020-07-29 380.920013 374.850006 375.000000 380.160004 22582300 380.160004
2020-07-30 385.190002 375.070007 376.750000 384.760010 39532500 384.760010
2020-07-31 425.660004 403.299988 411.540009 425.040009 93584200 425.040009
2020-08-03 446.545685 431.579987 432.799988 435.750000 76237006 435.750000
Here is my early attempts in using Python. I am getting stock data from Yahoo but I can see that the ticker, date column headers are lower than the high low open close.
I am definitely missing something. What is it?
import pandas as pd
import numpy as np
import datetime
import pandas_datareader as pdr
py.init_notebook_mode(connected=True)
# we download the stock prices for each ticker and then we do a mapping between data and name of the ticker
def get(tickers, startdate, enddate):
def data(ticker):
return (pdr.get_data_yahoo(ticker, start=startdate, end=enddate))
datas = map (data, tickers)
return(pd.concat(datas, keys=tickers, names=['ticker', 'date']))
# Define the stocks to download. We'll download of Apple, Microsoft and the S&P500 index.
tickers = ['AAPL','IBM']
# We would like all available data from 01/01/2000 until 31/12/2018.
start_date = datetime.datetime(2016, 1, 1)
end_date = datetime.datetime(2019, 12, 31)
all_data = get(tickers, start_date, end_date)
Screenshot
This dataframe uses a hierarchical index. ticker and date aren't columns, but are both part of the index. This means the rows are grouped firstly by ticker and then by date.
For more information on hierarchical indexes check out the Pandas docs
I am trying to get the Adj Close prices from Yahoo Finance into a DataFrame. I have all the stocks I want but I am not able to sort on date.
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2014,3,28)
f = web.DataReader(stocks, 'yahoo',start,end)
cleanData = f.ix[ls_key]
dataFrame = pd.DataFrame(cleanData)
print dataFrame[:5]
I get the following result, which is almost perfect.
IBM MSFT ORCL TSLA YELP
Date
2014-01-02 184.52 36.88 37.61 150.10 67.92
2014-01-03 185.62 36.64 37.51 149.56 67.66
2014-01-06 184.99 35.86 37.36 147.00 71.72
2014-01-07 188.68 36.14 37.74 149.36 72.66
2014-01-08 186.95 35.49 37.61 151.28 78.42
However, the Date is not an Item. so when I run:
print dataFrame['Date']
I get the error:
KeyError: u'no item named Date'
Hope anyone can help me adding the Date.
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2013, 1, 1)
end = datetime.datetime(2016, 1, 27)
df = web.DataReader("GOOGL", 'yahoo', start, end)
dates =[]
for x in range(len(df)):
newdate = str(df.index[x])
newdate = newdate[0:10]
dates.append(newdate)
df['dates'] = dates
print df.head()
print df.tail()
Date is in the index values.
To get it into a column value, you should just use:
dataframe.reset_index(inplace=True,drop=False)
Then you can use
dataframe['Date']
because "Date" will now be one of the keys in your columns of the dataframe.
Use dataFrame.index to directly access date or to add an explicit column, use dataFrame["Date"] = dataframe.index
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2014,3,28)
f = web.DataReader(stocks, 'yahoo',start,end)
cleanData = f.ix[ls_key]
dataFrame = pd.DataFrame(cleanData)
dataFrame["Date"] = dataframe.index
print dataFrame["Date"] ## or print dataFrame.index
This should do it.
import pandas as pd
from pandas.io.data import DataReader
symbols_list = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
d = {}
for ticker in symbols_list:
d[ticker] = DataReader(ticker, "yahoo", '2014-12-01')
pan = pd.Panel(d)
df1 = pan.minor_xs('Adj Close')
print(df1)
#df_percent_chg = df1.pct_change()
The sub-package pandas.io.data is removed from the latest pandas package and it is available to install separately as pandas-datareader
use git to install the package.
in the linux terminal:
git clone https://github.com/pydata/pandas-datareader.git
cd pandas-datareader
python setup.py install
now you can use import pandas_datareader to your python script for Remote data Access.
For more information Use this link to visit the latest documentation
f is a Panel
You can get a DataFrame and reset index (Date) using:
f.loc['Adj Close',:,:].reset_index()
but I'm not sure reset_index() is very useful as you can get Date using
f.loc['Adj Close',:,:].index
You might have a look at
http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing
about indexing
print(dataFrame.index[0])
2014-01-02 00:00:00
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2016, 1, 1)
web.DataReader('GOOGL', 'yahoo', start, end)