Plot three lines - one line, per symbol, per date - python

Plot three lines - one line, per symbol, per date
import pandas as pd
import matplotlib.pyplot as plt
symbol price interest
Date
2016-04-22 AAPL 445.50 0.00
2016-04-22 GOOG 367.02 21.52
2016-04-22 MSFT 248.94 3.44
2016-04-15 AAPL 425.51 0.00
2016-04-15 GOOG 338.57 13.06
2016-04-15 MSFT 226.66 1.15
Currently I split the dataframe into three different frames:
df1 = df[df.symbol == 'AAPL']
df2 = df[df.symbol == 'GOOG']
df3 = df[df.symbol == 'MSFT']
Then I plot them:
plt.plot(df1.index, df1.price.values,
df2.index, df2.price.values,
df3.index, df3.price.values)
Is it possible to plot these three symbols prices straight from the dataframe?

try this:
ax = df[df.symbol=='AAPL'].plot()
df[df.symbol=='GOOG'].plot(ax=ax)
df[df.symbol=='MSFT'].plot(ax=ax)
plt.show()

# Create sample data.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100, 3), columns=list('ABC'), index=pd.date_range('2016-1-1', periods=100)).cumsum().reset_index().rename(columns={'index': 'date'})
df = pd.melt(df, id_vars='date', value_vars=['A', 'B', 'C'], value_name='price', var_name='symbol')
df['interest'] = 100
>>> df.head()
date symbol price interest
0 2016-01-01 A 1.764052 100
1 2016-01-02 A 4.004946 100
2 2016-01-03 A 4.955034 100
3 2016-01-04 A 5.365632 100
4 2016-01-05 A 6.126670 100
# Generate plot.
plot_df = (df.loc[df.symbol.isin(['A', 'B', 'C']), ['date', 'symbol', 'price']]
.set_index(['symbol', 'date'])
.unstack('symbol'))
plot_df.columns = plot_df.columns.droplevel()
>>> plot_df.plot())

Related

How to append to another df from inside a for loop

How can you append to an existing df from inside a for loop? For example:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
colnames = ['Datetime', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Name']
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
What should I do next so that df is appended to df1?
You could try pandas.concat()
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
df1 = pd.concat([df1, df], ignore_index=True)
Instead of concat dataframe in each loop, you could also try append dataframe to a list
dfs = []
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
dfs.append(df)
df_ = pd.concat(dfs, ignore_index=True)
print(df_)
High Low Open Close Volume Adj Close Name
0 32.049999 31.549999 31.900000 31.549999 1867000.0 24.759203 amc
1 31.799999 30.879999 31.750000 31.000000 1225900.0 24.327585 amc
2 31.000000 30.350000 30.950001 30.799999 932100.0 24.170631 amc
3 30.900000 30.250000 30.700001 30.350000 1099000.0 23.817492 amc
4 30.700001 30.100000 30.549999 30.650000 782500.0 24.052916 amc
... ... ... ... ... ... ... ...
2515 179.009995 176.339996 176.690002 178.960007 100589400.0 178.960007 aapl
2516 179.610001 176.699997 178.550003 177.770004 92633200.0 177.770004 aapl
2517 178.029999 174.399994 177.839996 174.610001 103049300.0 174.610001 aapl
2518 174.880005 171.940002 174.029999 174.309998 78699800.0 174.309998 aapl
2519 174.880005 171.940002 174.029999 174.309998 78751328.0 174.309998 aapl
[2520 rows x 7 columns]
What you're trying to do won't quite work, since the data retrieved by DataReader has several columns and you need that data for several stocks. However, each of those columns is a time series.
So what you probably want is something that looks like this:
Stock amc
Field High Low Open ...
2022-03-30 29.230000 25.350000 ...
2022-03-31 25.920000 23.260000 ...
2022-04-01 25.280001 22.340000 ...
2022-04-01 25.280001 22.340000 ...
...
And you'd be able to access like df[('amc', 'Low')] to get a time series for that stock, or like df[('amc', 'Low')]['2022-04-01'][0] to get the 'Low' value for 'amc' on April 1st.
This gets you exactly that:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
df = pd.DataFrame()
for stock_name in stocks:
stock_df = web.DataReader(stock_name, data_source='yahoo')
for col in stock_df:
df[(stock_name, col)] = stock_df[col]
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['Stock', 'Field'])
print(f'\nall data:\n{"-"*40}\n', df)
print(f'\none series:\n{"-"*40}\n', df[('aapl', 'Volume')])
print(f'\nsingle value:\n{"-"*40}\n', df[('amc', 'Low')]['2022-04-01'][0])
The solution uses a MultiIndex to achieve what you need. It first loads all the data as retrieved from the API into columns labeled with tuples of stock name and field, and it then converts that into a proper MultiIndex after loading completes.
Output:
all data:
----------------------------------------
Stock amc ... aapl
Field High Low ... Volume Adj Close
Date ...
2017-04-04 32.049999 31.549999 ... 79565600.0 34.171505
2017-04-05 31.799999 30.879999 ... 110871600.0 33.994480
2017-04-06 31.000000 30.350000 ... 84596000.0 33.909496
2017-04-07 30.900000 30.250000 ... 66688800.0 33.833969
2017-04-10 30.700001 30.100000 ... 75733600.0 33.793839
... ... ... ... ... ...
2022-03-29 34.330002 26.410000 ... 100589400.0 178.960007
2022-03-30 29.230000 25.350000 ... 92633200.0 177.770004
2022-03-31 25.920000 23.260000 ... 103049300.0 174.610001
2022-04-01 25.280001 22.340000 ... 78699800.0 174.309998
2022-04-01 25.280001 22.340000 ... 78751328.0 174.309998
[1260 rows x 12 columns]
one series:
----------------------------------------
Date
2017-04-04 79565600.0
2017-04-05 110871600.0
2017-04-06 84596000.0
2017-04-07 66688800.0
2017-04-10 75733600.0
...
2022-03-29 100589400.0
2022-03-30 92633200.0
2022-03-31 103049300.0
2022-04-01 78699800.0
2022-04-01 78751328.0
Name: (aapl, Volume), Length: 1260, dtype: float64
single value:
----------------------------------------
22.34000015258789

How to make a pandas dataframe for-loop (for a stock market API)

I'm trying to get a stock data metric from an API into a Pandas Dataframe (Debt/Equity ratio for a company).
I've been successful in getting the data for a single company, but would like to do it with several companies at a time.
The code I used for a single company is:
# Variables
ticker = "AAPL"
FMP_API = "<api_key_here>"
data = "balance-sheet-statement"
def get_jsonparsed_data(url):
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
# Download info from API
url = "https://financialmodelingprep.com/api/v3/"+data+"/"+ticker+"?limit=120&apikey="+FMP_API
results = get_jsonparsed_data(url)
df = json_normalize(results)
# Calculate Debt/Equity Ratio
df[ticker] = df.totalLiabilities / df.totalStockholdersEquity
df = df[["date", ticker]].round(2)
# Convert the column Date, now in string type to a datetime type
# Make the Date Column the Index
# Creating a new dataframe with the new index and add the Date column name
# Dropp the extra Date Column
datetime_series = pd.to_datetime(df['date'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
df = df.set_index(datetime_index).rename_axis('date', axis=1)
df.drop('date',axis=1,inplace=True)
df.head()
The result I get is:
date AAPL
2020-09-26 3.96
2019-09-28 2.74
2018-09-29 2.41
2017-09-30 1.80
2016-09-24 1.51
date
AAPL float64
dtype: object
What I would like to get is:
ticker = ["AAPL", "FB", "GOOG", "AMZN"]
date AAPL FB GOOG AMZN
2020-09-26 3.96 0.24 0.44 2.44
2019-09-28 2.74 0.32 0.37 2.63
2018-09-29 2.41 0.16 0.31 2.73
2017-09-30 1.80 0.14 0.29 3.74
2016-09-24 1.51 0.10 0.20 3.32
date
AAPL float64
FB float64
GOOG float64
AMZN float64
dtype: object
I tried using a for loop, but i keep writing over the same dataframe and can only get the values for the last ticker in the list.
Use pandas merge to merge two of the same dataframes with different tickers by the same index.
You can also simplify your code where you set the date as the index:
df.date = pd.to_datetime(df['date'])
df = df.set_index(date)
Found the anwser.
The idea is to create an empty list and a for loop.
Each loop gets the data from one ticker and in the end we make a dataframe by concatenating the list.
Here is the code:
FMP_API = "<api_code_here>"
data = "balance-sheet-statement"
tickers = ['AAPL', 'MSFT']
df_list = []
for ticker in tickers:
url = "https://financialmodelingprep.com/api/v3/" + data + "/" + ticker + "?limit=120&apikey=" + FMP_API
df = pd.read_json(url)
df.set_index('date', inplace=True)
df[ticker] = df.totalLiabilities / df.totalStockholdersEquity
df_list.append(df[ticker])
df_final = pd.concat(df_list, axis=1)
print(df_final)

Rolling mean returns over DataFrame

I want to add columns to the following Dataframe for each stock of 5 year (60 month) rolling returns. The following code is used to obtain the financial data over the period 1995 to 2010.
quandl.ApiConfig.api_key = 'Enter Key'
stocks = ['MSFT', 'AAPL', 'WMT', 'GE', 'KO']
stockdata = quandl.get_table('WIKI/PRICES', ticker = stocks, paginate=True,
qopts = { 'columns': ['date', 'ticker', 'adj_close'] },
date = { 'gte': '1995-1-1', 'lte': '2010-12-31' })
# Setting date as index with columns of tickers and adjusted closing price
df = stockdata.pivot(index = 'date',columns='ticker')
df.index = pd.to_datetime(df.index)
df.resample('1M').mean()
df = df.pct_change()
df.head()
Out[1]:
rets
ticker AAPL BA F GE JNJ KO
date
1995-01-03 NaN NaN NaN NaN NaN NaN
1995-01-04 0.026055 -0.002567 0.026911 0.000000 0.006972 -0.019369
1995-01-05 -0.012697 0.002573 -0.008735 0.002549 -0.002369 -0.004938
1995-01-06 0.080247 0.018824 0.000000 -0.004889 -0.006758 0.000000
1995-01-09 -0.019048 0.000000 0.017624 -0.009827 -0.011585 -0.014887
df.tail()
Out[2]:
rets
ticker AAPL BA F GE JNJ KO
date
2010-12-27 0.003337 -0.004765 0.005364 0.008315 -0.005141 -0.007777
2010-12-28 0.002433 0.001699 -0.008299 0.007147 0.001938 0.004457
2010-12-29 -0.000553 0.002929 0.000598 -0.002729 0.001289 0.001377
2010-12-30 -0.005011 -0.000615 -0.002987 -0.004379 -0.003058 0.000764
2010-12-31 -0.003399 0.003846 0.005992 0.005498 -0.001453 0.004122
Any assistance of how to do this would be awesome!
The problem is in the multi-level index in the columns. We can start by selecting the second level index, and after that the rolling mean works:
means = df['rets'].rolling(60).mean()
means.tail()
Gives:
The error you are receiving is due to you passing the entire dataframe into the rolling function since your frame uses a multi index. You cant pass a multi index frame to a rolling function since rolling only accepts numpy arrays of 1 column. You’ll have to probably create a for loop and return the values individually per ticker

Pandas/NumPy -- Plotting Dates as X axis

My Goal is just to plot this simple data, as a graph, with x data being dates ( date showing in x-axis) and price as the y-axis. Understanding that the dtype of the NumPy record array for the field date is datetime64[D] which means it is a 64-bit np.datetime64 in 'day' units. While this format is more portable, Matplotlib cannot plot this format natively yet. We can plot this data by changing the dates to DateTime.date instances instead, which can be achieved by converting to an object array: which I did below view the astype('0'). But I am still getting
this error :
view limit minimum -36838.00750000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-DateTime value to an axis that has DateTime units
code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(r'avocado.csv')
df2 = df[['Date','AveragePrice','region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Date'] = df2.Date.astype('O')
plt.style.use('ggplot')
ax = df2[['Date','AveragePrice']].plot(kind='line', title ="Price Change",figsize=(15,10),legend=True, fontsize=12)
ax.set_xlabel("Period",fontsize=12)
ax.set_ylabel("Price",fontsize=12)
plt.show()
df.head(3)
Unnamed: 0 Date AveragePrice Total Volume 4046 4225 4770 Total Bags Small Bags Large Bags XLarge Bags type year region
0 0 2015-12-27 1.33 64236.62 1036.74 54454.85 48.16 8696.87 8603.62 93.25 0.0 conventional 2015 Albany
1 1 2015-12-20 1.35 54876.98 674.28 44638.81 58.33 9505.56 9408.07 97.49 0.0 conventional 2015 Albany
2 2 2015-12-13 0.93 118220.22 794.70 109149.67 130.50 8145.35 8042.21 103.14 0.0 conventional 2015 Albany
df2 = df[['Date', 'AveragePrice', 'region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2 = df2[['Date', 'AveragePrice']]
df2 = df2.sort_values(['Date'])
df2 = df2.set_index('Date')
print(df2)
ax = df2.plot(kind='line', title="Price Change")
ax.set_xlabel("Period", fontsize=12)
ax.set_ylabel("Price", fontsize=12)
plt.show()
output:

Updating pandas DataFrame by key

I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:
trades['cusip'] = np.nan
trades['security_type'] = np.nan
I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].
I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.
I thought I could do something like:
pd.merge(trades, config, on=['ticker', 'date'], how='left')
But that doesn't update the columns, it just adds the config columns to trades.
The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.
for date in trades['date'].unique():
config = get_config_file_as_df(date)
## config['date'] == date
for ticker in trades['ticker'][trades['date'] == date]:
trades['cusip'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['cusip'][config['ticker'] == ticker].values[0]
trades['security_type'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['security_type'][config['ticker'] == ticker].values[0]
Suppose you have this setup:
import pandas as pd
import numpy as np
import datetime as DT
nan = np.nan
trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [nan, nan, 100, nan]
})
trades = trades.set_index(['ticker', 'date'])
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 NaN
# MSFT 2000-01-02 NaN
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this
# AAPL 2000-01-04 NaN
config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])
# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)
print(config)
# cusip
# ticker date
# AAPL 2000-01-04 NaN
# GOOG 2000-01-03 3
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
Then you can update NaN values in trades with values from config using the DataFrame.update method. Note that DataFrame.update matches rows based on indices (which is why set_index was called above).
trades.update(config, join = 'left', overwrite = False)
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL 2000-01-04 NaN

Categories

Resources