Pandas yahoo finance DataReader - python

I am trying to get the Adj Close prices from Yahoo Finance into a DataFrame. I have all the stocks I want but I am not able to sort on date.
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2014,3,28)
f = web.DataReader(stocks, 'yahoo',start,end)
cleanData = f.ix[ls_key]
dataFrame = pd.DataFrame(cleanData)
print dataFrame[:5]
I get the following result, which is almost perfect.
IBM MSFT ORCL TSLA YELP
Date
2014-01-02 184.52 36.88 37.61 150.10 67.92
2014-01-03 185.62 36.64 37.51 149.56 67.66
2014-01-06 184.99 35.86 37.36 147.00 71.72
2014-01-07 188.68 36.14 37.74 149.36 72.66
2014-01-08 186.95 35.49 37.61 151.28 78.42
However, the Date is not an Item. so when I run:
print dataFrame['Date']
I get the error:
KeyError: u'no item named Date'
Hope anyone can help me adding the Date.

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2013, 1, 1)
end = datetime.datetime(2016, 1, 27)
df = web.DataReader("GOOGL", 'yahoo', start, end)
dates =[]
for x in range(len(df)):
newdate = str(df.index[x])
newdate = newdate[0:10]
dates.append(newdate)
df['dates'] = dates
print df.head()
print df.tail()

Date is in the index values.
To get it into a column value, you should just use:
dataframe.reset_index(inplace=True,drop=False)
Then you can use
dataframe['Date']
because "Date" will now be one of the keys in your columns of the dataframe.

Use dataFrame.index to directly access date or to add an explicit column, use dataFrame["Date"] = dataframe.index
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2014,3,28)
f = web.DataReader(stocks, 'yahoo',start,end)
cleanData = f.ix[ls_key]
dataFrame = pd.DataFrame(cleanData)
dataFrame["Date"] = dataframe.index
print dataFrame["Date"] ## or print dataFrame.index

This should do it.
import pandas as pd
from pandas.io.data import DataReader
symbols_list = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
d = {}
for ticker in symbols_list:
d[ticker] = DataReader(ticker, "yahoo", '2014-12-01')
pan = pd.Panel(d)
df1 = pan.minor_xs('Adj Close')
print(df1)
#df_percent_chg = df1.pct_change()

The sub-package pandas.io.data is removed from the latest pandas package and it is available to install separately as pandas-datareader
use git to install the package.
in the linux terminal:
git clone https://github.com/pydata/pandas-datareader.git
cd pandas-datareader
python setup.py install
now you can use import pandas_datareader to your python script for Remote data Access.
For more information Use this link to visit the latest documentation

f is a Panel
You can get a DataFrame and reset index (Date) using:
f.loc['Adj Close',:,:].reset_index()
but I'm not sure reset_index() is very useful as you can get Date using
f.loc['Adj Close',:,:].index
You might have a look at
http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing
about indexing

print(dataFrame.index[0])
2014-01-02 00:00:00

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2016, 1, 1)
web.DataReader('GOOGL', 'yahoo', start, end)

Related

Why does the filtered date range differ from my start date?

I use this code to get BTC value but the date is starting previous day which my selected.
INPUT:
tickers=['BTC-USD'] # Name of asset
tarih="02-06-2021"
tarih2="05-06-2021"
start=dt.datetime.strptime(tarih, '%d-%m-%Y')
end=dt.datetime.strptime(tarih2, '%d-%m-%Y')
returns=pd.DataFrame()
liste=[]
for ticker in tickers:
data=web.DataReader(ticker,'yahoo',start,end)
data=pd.DataFrame(data)
data[ticker]=data['Adj Close'] #can work with change percentage in order to get more accurate data
if returns.empty:
returns=data[[ticker]]
else:
returns = returns.join(data[[ticker]],how='outer')#add right column
for dt in daterange(start, end):
dates=dt.strftime("%d-%m-%Y")
with open("fng_value.txt", "r") as filestream:
for line in filestream:
date = line.split(",")[0]
if dates == date:
fng_value=line.split(",")[1]
liste.append(fng_value)
print(returns.head(25))
OUTPUT:
BTC-USD
Date
2021-06-01 37575.179688
2021-06-02 39208.765625
2021-06-03 36894.406250
2021-06-04 35551.957031
2021-06-05 35862.378906
DataReader accepts a start parameter as a string, date, or datetime. Apparently, sometimes using start date (e.g. 2021-06-02) retrieves data starting from the previous day on 2021-06-01. Try to use a datetime with timezone and an hour late in the day to hack the date if it doesn't return what you expect it to.
See if this works:
import pandas_datareader.data as web
import pandas as pd
from pytz import timezone
from datetime import datetime, date
tarih = "02-06-2021"
tarih2 = "05-06-2021"
# start/end can be date, datetime, or string
#start = date(2021, 6, 2)
#end = date(2021, 6, 5)
#start = 'JUN-02-2021'
#end = 'JUN-05-2021'
start = datetime.strptime(tarih, '%d-%m-%Y').replace(hour=23, tzinfo=timezone('EST'))
end = datetime.strptime(tarih2, '%d-%m-%Y').replace(tzinfo=timezone('EST'))
tickers=['BTC-USD'] # Name of asset
for ticker in tickers:
data = web.DataReader(ticker, 'yahoo', start, end)
data = pd.DataFrame(data)
print(data)
This returns the data from 6/2 to 6/5.

How to resample datetime data to get only the specifc hours of the day?

So I have some stock data that is for every minute over the 2020. Though my issue is that I only want the data from 9:30 in the morning to 4:00 in the afternoon. Currently the data includes afterhour prices as well which I would like to filter out. The code for the data is:
import pandas as pd
#get stock prices
d = pd.read_csv(r"C:\Users\B1880\Downloads\AMD_stock_data\AMD_2020_2020.txt")
d.columns = ['Dates', 'Open', 'High', 'Low', 'Close', 'Volume']
d.index.name = 'Dates'
The URL for the data is: https://docs.google.com/spreadsheets/d/1uxVjEJkEmDZwu44pNxsg5ZBonqbTFak8HoESbxo0AM0/edit#gid=1360727590
Thanks!
You can convert the "Dates" columns to datetime and then filter by time.
>>> import datetime
>>> amd_df["Dates"] = amd_df["Dates"].apply(pd.to_datetime)
>>> amd_df = amd_df[amd_df["Dates"].dt.time >= datetime.time(hour=9, minute=30)]
>>> amd_df = amd_df[amd_df["Dates"].dt.time <= datetime.time(hour=16, minute=0)]
You can filter with between_time:
df2 = df.set_index('Dates')
df2 = df2.between_time('9:30', '16:00')

Why is the ticker and date different

Here is my early attempts in using Python. I am getting stock data from Yahoo but I can see that the ticker, date column headers are lower than the high low open close.
I am definitely missing something. What is it?
import pandas as pd
import numpy as np
import datetime
import pandas_datareader as pdr
py.init_notebook_mode(connected=True)
# we download the stock prices for each ticker and then we do a mapping between data and name of the ticker
def get(tickers, startdate, enddate):
def data(ticker):
return (pdr.get_data_yahoo(ticker, start=startdate, end=enddate))
datas = map (data, tickers)
return(pd.concat(datas, keys=tickers, names=['ticker', 'date']))
# Define the stocks to download. We'll download of Apple, Microsoft and the S&P500 index.
tickers = ['AAPL','IBM']
# We would like all available data from 01/01/2000 until 31/12/2018.
start_date = datetime.datetime(2016, 1, 1)
end_date = datetime.datetime(2019, 12, 31)
all_data = get(tickers, start_date, end_date)
Screenshot
This dataframe uses a hierarchical index. ticker and date aren't columns, but are both part of the index. This means the rows are grouped firstly by ticker and then by date.
For more information on hierarchical indexes check out the Pandas docs

Keyerror 'Date' when using pandas datareader

I am trying to get the value of Bitcoin from yahoo finance using pandas data reader, and then save this data to a csv file. Where is the error here, and how do I fix it?
import pandas as pd
import pandas_datareader.data as web
start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 11, 30)
df = web.DataReader('BTC', 'yahoo', start, end)
df.to_csv('BTC.csv')
print(df.head())
This was coded in spyder, python 3.7 if it is relevant...
This should work. Use 'BTC-USD' stock/security value:
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2017, 1, 1)
end = dt.datetime(2019, 11, 30)
df = web.DataReader('BTC-USD', 'yahoo', start, end)
df.to_csv('BTC.csv')
print(df.head())
or
df = web.get_data_yahoo('BTC-USD', start, end)
I received the 'Keyerror 'Date' when using pandas datareader' error and found two errors in my script that fixed the issue:
The name of the entity was incorrect, for example using 'APPL' instead of 'AAPL'.
There was no data for the date parameters I was using.
Hope this helps!

issues downloading stock data from google finance using panda datareader

Things used to work great until several days ago. Now when I run the following:
from pandas_datareader import data
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
I get only the most recent data of ONE year shown below, as if the start_data and end_data did not seem to matter. Change them to different dates yielded the same results below. Does anyone know why?
Results:
df.head()
Open High Low Close Volume
Date
2016-09-21 129.13 130.00 128.39 129.94 14068336
2016-09-22 130.50 130.73 129.56 130.08 15538307
2016-09-23 127.56 128.60 127.30 127.96 28326266
2016-09-26 127.37 128.16 126.80 127.31 15064940
2016-09-27 127.61 129.01 127.43 128.69 15637111
Use fix-yahoo-finance and then use yahoo rather than Google as your source. It looks like Google has been locking down a lot of its data lately.
First you'll need to install fix-yahoo-finance. Just use pip install fix-yahoo-finance.
Then use get_data_yahoo:
from pandas_datareader import data
import fix_yahoo_finance as yf
yf.pdr_override()
symbol = 'AMZN'
data_source='google'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.get_data_yahoo(symbol, start_date, end_date)
df.head()
Open High Low Close Adj Close Volume
Date
2010-01-04 136.25000 136.61000 133.14000 133.89999 133.89999 7599900
2010-01-05 133.42999 135.48000 131.81000 134.69000 134.69000 8851900
2010-01-06 134.60001 134.73000 131.64999 132.25000 132.25000 7178800
2010-01-07 132.01000 132.32001 128.80000 130.00000 130.00000 11030200
2010-01-08 130.56000 133.67999 129.03000 133.52000 133.52000 9830500
Just replace google with yahoo. There are problem with google source right now. https://github.com/pydata/pandas-datareader/issues/394
from pandas_datareader import data
symbol = 'AMZN'
data_source='yahoo'
start_date = '2010-01-01'
end_date = '2016-01-01'
df = data.DataReader(symbol, data_source, start_date, end_date)
Yahoo working as of January 01, 2020:
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2018, 2, 8)
df = web.DataReader('TSLA', 'yahoo', start, end)
print(df.head())

Categories

Resources