Python: JSON Data to Pandas, Nest List Comprehension

Python: JSON Data to Pandas, Nest List Comprehension - python

I'm using YahooFinancials to get the stock price and volume for a list of several companies. I can extract the prices and volume to separate dataframes, but would like to get both price and volume into the same dataframe without having to merge them after the fact. I believe what I need is a nested list comprehension, but I'm not quite sure how to achieve this?
My code as follows:
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
from yahoofinancials import YahooFinancials
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import date, timedelta
import warnings
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use('seaborn')
start = date(2007,1,1)
end = date(2020,6,4)
today = date.today()
tomorrow = str(end + timedelta(days=1))
portfolio = ['AMZN', 'GOOGL', 'MSFT']
yahoo_financials = YahooFinancials(portfolio)
data = yahoo_financials.get_historical_price_data(start_date=str(start), end_date=str(today), time_interval='daily')
prices = pd.DataFrame({a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in portfolio})
volume = pd.DataFrame({a: {x['formatted_date']: x['volume'] for x in data[a]['prices']} for a in portfolio})
Ideally, the output looks something like this:
date AMZNPrice AMZNVolume GOOGLPrice GOOGLVolume MSFTPrice MSFTVolume
6/9/2020 2600.860107 5176000 1452.079956 1681200 189.800003 29783900
6/10/2020 2647.449951 4946000 1464.699951 1588100 196.839996 43872300
6/11/2020 2557.959961 5800100 1401.900024 2357200 186.270004 52854700
6/12/2020 2545.02002 5429600 1412.920044 1832900 187.740005 43345700
6/15/2020 2572.679932 3865100 1420.73999 1523400 188.940002 32712500

Try this:
data = yahoo_financials.get_historical_price_data(start_date=str(start), end_date=str(today), time_interval='daily')
dfs = []
for s in portfolio:
df = pd.json_normalize(data[s]['prices'])
df['stock'] = s
df = df[['stock', 'formatted_date', 'adjclose', 'volume']]
dfs.append(df)
df = pd.concat(dfs)
df = pd.pivot(df, index='formatted_date', columns='stock', values=['adjclose', 'volume'])
df.columns = ['_'.join(col) for col in df.columns.values]
print(df)
Output:
adjclose_AMZN adjclose_GOOGL adjclose_MSFT volume_AMZN volume_GOOGL volume_MSFT
formatted_date
2007-01-03 38.700001 234.029022 22.123693 12405100.0 15397500.0 76935100.0
2007-01-04 38.900002 241.871872 22.086641 6318400.0 15759400.0 45774500.0
2007-01-05 38.369999 243.838837 21.960684 6619700.0 13730400.0 44607200.0
2007-01-08 37.500000 242.032028 22.175547 6783000.0 9499200.0 50220200.0
2007-01-09 37.779999 242.992996 22.197784 5703000.0 10752000.0 44636600.0
... ... ... ... ... ... ...
2020-06-09 2600.860107 1452.079956 189.800003 5176000.0 1681200.0 29783900.0
2020-06-10 2647.449951 1464.699951 196.839996 4946000.0 1588100.0 43872300.0
2020-06-11 2557.959961 1401.900024 186.270004 5800100.0 2357200.0 52854700.0
2020-06-12 2545.020020 1412.920044 187.740005 5429600.0 1832900.0 43345700.0
2020-06-15 2572.679932 1420.739990 188.940002 3865100.0 1523400.0 32712500.0

Related

How can I sort timestamp from following data dictionary?

Code:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
data=pd.to_datetime(data_[prices],unit='ms')
print(data)
Output:
Requirement:
But I required output in which 4 columns:
Timestamp, Prices, Market_caps, Total_volume
And I want to change the timestamp format into to_datetime
In the above codes, I just sort the bitcoin data from pycoingecko
Example:

You can convert this into a dataframe format like this:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
prices = pd.DataFrame(bdata['prices'], columns=['TimeStamp', 'Price']).set_index('TimeStamp')
market_caps = pd.DataFrame(bdata['market_caps'], columns=['TimeStamp', 'Market Cap']).set_index('TimeStamp')
total_volumes = pd.DataFrame(bdata['total_volumes'], columns=['TimeStamp', 'Total Volumes']).set_index('TimeStamp')
# combine the separate dataframes
df_market = pd.concat([prices, market_caps, total_volumes], axis=1)
# convert the index to a datetime dtype
df_market.index = pd.to_datetime(df_market.index, unit='ms')
Code adapted from this answer.

You can extract the timestamp column and convert it into date as following with minimum change to your code, you can follow up by merging the new column to your array:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
#data=pd.to_datetime(data_["prices"],unit='ms')
df = pd.DataFrame([pd.Series(x) for x in data_["prices"]])
df.columns = ["timestamp","data"]
df=pd.to_datetime(df["timestamp"],unit='ms')
print(df)

How to filter columns in a multindex dataframe (pandas)

I have the below dataframe:
WBA ... HD
Open High Low Close ... l-pc h-l h-pc l-pc
Datetime ...
2020-06-08 09:30:00-04:00 45.490002 46.090000 45.490002 46.049999 ... NaN 2.100006 NaN NaN
2020-06-08 09:35:00-04:00 46.070000 46.330002 46.040001 46.330002 ... 0.009998 1.119904 0.402496 0.717407
2020-06-08 09:40:00-04:00 46.330002 46.660000 46.240002 46.610001 ... 0.090000 0.874893 0.359894 0.514999
2020-06-08 09:45:00-04:00 46.624100 46.950001 46.624100 46.880001 ... 0.014099 0.639999 0.349991 0.290009
2020-06-08 09:50:00-04:00 46.880001 46.990002 46.820000 46.919998 ... 0.060001 0.490005 0.169998 0.320007
this dataframe was obtained using the below code:
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
for i in tickers:
df[i,"h-l"]=abs(df[i]['High']-df[i]['Low'])
df[i,'h-pc']=abs (df[i]["High"]-df[i]['Adj Close'].shift(1))
df[i,'l-pc']=abs(df[i]["Low"]-df[i]['Adj Close'].shift(1))
I am trying to apply this function for all the tickers mentioned in the "tickers" list:
df['tr']=dff[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()
for the tickers I need to find the "tr" and then using the tr i have to find the "atr" I am not able get the"tr"

Be systematic about accessing columns through tuples and it all just works.
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
# df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
dfc = df.copy()
for t in tickers:
dfc[(t,"h-l")] = abs(dfc.loc[:,(t,'High')] - dfc.loc[:,(t,'Low')])
dfc[(t,"h-pc")] = abs(dfc.loc[:,(t,'High')] - dfc.loc[:,(t,'Adj Close')].shift(1))
dfc[(t,"l-pc")] = abs(dfc.loc[:,(t,'Low')] - dfc.loc[:,(t,'Adj Close')].shift(1))
# access all the new columns through tuples e.g ("WBA","h-l") ...
dfc["tr"] = dfc[[(t, c) for t in tickers for c in ['h-l','h-pc','l-pc']]].max(axis=1)
n=5
dfc["atr"] = dfc['tr'].rolling(window=n, min_periods=n).mean()

How to add a column in multilevel Dataframe using pandas and yfinance?

I have the below code:
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs (df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()
When I am trying to run it I am getting the below mentioned error:
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'High'
I tried using this code:
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
the report extracted has mathematical errors as there is no separation between the tickers.
When I actually need is for each and every ticker mentioned in the tickers list it should create a column where called "h-l" where it subtracts the high of that row with the low of that row and so on.

Option 1: Multi-Level Column Names
Multi-level columns are accessed by passing a tuple
df[('WMB', 'High')]
Package versions used
print(pd.__version__) at least '1.0.5'
print(yf.__version__) is '0.1.54'
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
# iterate over level 0 ticker names
for ticker in tickers:
df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
# df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean() # not included becasue n is not defined
# sort the columns
df = df.reindex(sorted(df.columns), axis=1)
# display(df.head())
HD WBA
Adj Close Close High Low Open Volume h-l h-pc l-pc tr Adj Close Close High Low Open Volume h-l h-pc l-pc tr
Datetime
2020-06-08 09:30:00-04:00 253.937500 253.937500 253.960007 252.360001 252.490005 210260.0 1.600006 NaN NaN 1.600006 46.049999 46.049999 46.070000 45.490002 45.490002 239860.0 0.579998 NaN NaN 0.579998
2020-06-08 09:35:00-04:00 253.470001 253.470001 254.339996 253.220093 253.990005 95906.0 1.119904 0.402496 0.717407 1.119904 46.330002 46.330002 46.330002 46.040001 46.070000 104259.0 0.290001 0.280003 0.009998 0.290001
2020-06-08 09:40:00-04:00 253.580002 253.580002 253.829895 252.955002 253.429993 55868.0 0.874893 0.359894 0.514999 0.874893 46.610001 46.610001 46.660000 46.240002 46.330002 113174.0 0.419998 0.329998 0.090000 0.419998
2020-06-08 09:45:00-04:00 253.740005 253.740005 253.929993 253.289993 253.529999 61892.0 0.639999 0.349991 0.290009 0.639999 46.880001 46.880001 46.950001 46.624100 46.624100 121388.0 0.325901 0.340000 0.014099 0.340000
2020-06-08 09:50:00-04:00 253.703400 253.703400 253.910004 253.419998 253.740005 60809.0 0.490005 0.169998 0.320007 0.490005 46.919998 46.919998 46.990002 46.820000 46.880001 154239.0 0.170002 0.110001 0.060001 0.170002
Option 2: Single-Level Column Names
As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
With the tickers in a column instead of a multi-level column headers, use pandas.DataFrame.gropuby on the Ticker column.
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')
# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
# function with calculations
def my_calculations(df):
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
# df['atr']=df['tr'].rolling(window=n, min_periods=n).mean() # n is not defined in the question
return df
# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])

Here are some columns I've created. Finding the percent change from the previous day, finding the range, and the percent range.
df['% Change'] = (df['Adj Close'] / df['Adj Close'].shift(1))-1
df['Range'] = df['High'] - df['Low']
df['% Range'] = df['Range'] / df['Open']

How to properly download stocks data in python

I have been trying to extract stock prices using pandas_datareader. data, but
I kept receiving an error message.
I have checked other threads relating to this problem and, I have tried downloading data reader using conda install DataReader and also tried pip install DataReader.
import pandas as pd
import datetime
from pandas import Series,DataFrame
import pandas_datareader.data as web
pandas_datareader.__version__
'0.6.0'
start=datetime.datetime(2009,1,1)
end=datetime.datetime(2019,1,1)
df=web.DataReader( 'AT&T Inc T',start,end)
df.head()
My expected result should be a data frame with all the features and rows of the stock.
Below is the error message I got:
Please, how do I fix this problem?
Thanks.
<ipython-input-45-d75bedd6b2dd> in <module>
1 start=datetime.datetime(2009,1,1)
2 end=datetime.datetime(2019,1,1)
----> 3 df=web.DataReader( 'AT&T Inc T',start,end)
4 df.head()
~\Anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name,
data_source, start, end, retry_count, pause, session, access_key)
456 else:
457 msg = "data_source=%r is not implemented" % data_source
--> 458 raise NotImplementedError(msg)
459
460
NotImplementedError: data_source=datetime.datetime(2009, 1, 1, 0, 0) is not implemented

The following worked:
import pandas as pd
import datetime
from pandas import Series,DataFrame
import pandas_datareader
import pandas_datareader.data as web
pandas_datareader.__version__
start=datetime.datetime(2009,1,1)
end=datetime.datetime(2019,1,1)
df=web.DataReader( 'T', "yahoo", start,end)
print(df.head())
The data log is as the follows:
High Low ... Volume Adj Close
Date ...
2009-01-02 29.459999 28.430000 ... 21879800.0 16.438549
2009-01-05 28.889999 28.059999 ... 32414700.0 15.885386
2009-01-06 28.700001 28.000000 ... 28746100.0 15.812749
2009-01-07 27.650000 27.000000 ... 30532700.0 15.427205
2009-01-08 27.350000 26.820000 ... 21431200.0 15.410195
[5 rows x 6 columns]

This is how I would do it.
import pandas as pd
import numpy as np
from pandas_datareader import data as wb
import datetime as dt
start = '2019-6-20'
end = '2019-7-20'
tickers = ['CSCO',
'AXP',
'HD',
'PG']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
df = pd.concat(price_data)
df.dtypes
df.head()
df.shape
pd.set_option('display.max_columns', 500)
df = df.reset_index()
df = df.set_index('Date')
table = df.pivot(columns='ticker')
# By specifying col[1] in below list comprehension
# You can select the stock names under multi-level column
table.columns = [col[1] for col in table.columns]
table.head()
Result:
AXP CSCO HD PG
Date
2019-06-20 124.530563 57.049965 211.250000 111.021019
2019-06-21 124.341156 56.672348 209.389999 110.484497
2019-06-24 123.752991 56.821407 205.500000 111.607231
2019-06-25 122.776054 55.728306 204.740005 111.001152
2019-06-26 123.204704 56.245041 206.419998 109.023956

Pandas Yahoo Stock API

I am new to Pandas (and Python) and trying to working with the Yahoo API for stock prices.
I need to get the data, loop through it and grab the dates and values.
here is the code
df = pd.get_data_yahoo( symbols = 'AAPL',
start = datetime( 2011, 1, 1 ),
end = datetime( 2012, 1, 1 ),
interval = 'm' )
results are:
df
Open High Low Close Volume
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
I can get the dates but not the month date value because it is the index(?)
How best to loop through the data for this information? This is about processing the data and not sorting or searching it.

If you need to iterate over the rows in your dataframe, and do some processing, then pandas.DataFrame.apply() works great.
Code:
Some mock processing code...
def process_data(row):
# the index becomes the name when converted to a series (row)
print(row.name.month, row.Close)
Test Code:
import datetime as dt
from pandas_datareader import data
df = data.get_data_yahoo(
'AAPL',
start=dt.datetime(2011, 1, 1),
end=dt.datetime(2011, 5, 1),
interval='m')
print(df)
# process each row
df.apply(process_data, axis=1)
Results:
Open High Low Close Volume \
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
2011-04-01 351.110016 355.130005 320.160004 350.130005 128252100
Adj Close
Date
2011-01-03 43.962147
2011-02-01 45.761730
2011-03-01 45.152802
2011-04-01 45.362682
1 339.320007
2 353.210022
3 348.51001
4 350.130005

here is what made my life groovy when trying to work with the data from Yahoo.
First was getting the date from the dataframe index.
df = df.assign( date = df.index.date )
here are a few others I found helpful from dealing with the data.
df [ 'diff' ] = df [ 'Close' ].diff( )
df [ 'pct_chg' ] = df [ 'Close' ].pct_change()
df [ 'hl' ] = df [ 'High' ] - df [ 'Low' ]
Pandas is amazing stuff.

I believe this should work for you.
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2013, 1, 1)
end = datetime.datetime(2016, 1, 27)
df = web.DataReader("GOOGL", 'yahoo', start, end)
dates =[]
for x in range(len(df)):
newdate = str(df.index[x])
newdate = newdate[0:10]
dates.append(newdate)
df['dates'] = dates
print df.head()
print df.tail()
Also, take a look at the link below for more helpful hints of how to do these kinds of things.
https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#yahoo-finance

from pandas_datareader import data as pdr
from datetime import date
import yfinance as yf
yf.pdr_override()
import pandas as pd
import requests
import json
from os import listdir
from os.path import isfile, join
# Tickers List
tickers_list = ['AAPL', 'GOOGL','FB', 'WB' , 'MO']
today = date.today()
# We can get data by our choice by giving days bracket
start_date= "2010-01-01"
files=[]
def getData(ticker):
print (ticker)
data = pdr.get_data_yahoo(ticker, start=start_date, end=today)
dataname= ticker+'_'+str(today)
files.append(dataname)
SaveData(data, dataname)
# Create an data folder to save these data file in data folder.
def SaveData(df, filename):
df.to_csv('./data/'+filename+'.csv')
for tik in tickers_list:
getData(tik)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: JSON Data to Pandas, Nest List Comprehension - python

Related

How can I sort timestamp from following data dictionary?

How to filter columns in a multindex dataframe (pandas)

How to add a column in multilevel Dataframe using pandas and yfinance?

How to properly download stocks data in python

Pandas Yahoo Stock API

Categories

Resources