multi-index dataframe causes wide separation between plotted data - python

I have the follow plot:
my pandas dataset is using multi index pandas, like
bellow is my code:
ax = plt.gca()
df['adjClose'].plot(ax=ax, figsize=(12,4), rot=9, grid=True, label='price', color='orange')
df['ma5'].plot(ax=ax, label='ma5', color='yellow')
df['ma100'].plot(ax=ax, label='ma100', color='green')
# df.plot.scatter(x=df.index, y='buy')
x = pd.to_datetime(df.unstack(level=0).index, format='%Y/%m/%d')
# plt.scatter(x, df['buy'].values)
ax.scatter(x, y=df['buy'].values, label='buy', marker='^', color='red')
ax.scatter(x, y=df['sell'].values, label='sell', marker='v', color='green')
plt.show()
Data from .csv
symbol,date,close,high,low,open,volume,adjClose,adjHigh,adjLow,adjOpen,adjVolume,divCash,splitFactor,ma5,ma100,buy,sell
601398,2020-01-01 00:00:00+00:00,5.88,5.88,5.88,5.88,0,5.2991971571,5.2991971571,5.2991971571,5.2991971571,0,0.0,1.0,,,,
601398,2020-01-02 00:00:00+00:00,5.97,6.03,5.91,5.92,234949400,5.3803073177,5.4343807581,5.3262338773,5.3352461174,234949400,0.0,1.0,,,,
601398,2020-01-03 00:00:00+00:00,5.99,6.02,5.96,5.97,152213050,5.3983317978,5.425368518,5.3712950777,5.3803073177,152213050,0.0,1.0,,,,
601398,2020-01-06 00:00:00+00:00,5.97,6.05,5.95,5.96,226509710,5.3803073177,5.4524052382,5.3622828376,5.3712950777,226509710,0.0,1.0,,,,
the above data is what looks after I have done to save csv, but after reload, it lost original structure like below

The issue, as can be seen in the plot, is the first 3 lines are plotted against the dataframe index, which presents as a tuple. The scatter plots are plotted against datetime values, x, which is not a value on the ax axis, so they're plotted far the to right.
- the axis is a bunch of stacked tuples, like
Don't convert the dataframe to a multi-index. If you're doing something, which creates the multi-index, then do df.reset_index(level=x, inplace=True) where x represents the level where 'symbol' is in the multi-index.
After removing 'symbol' from the index, convert 'date' to a datetime dtype with df.index = pd.to_datetime(df.index).date
Presumably, there's more than one unique 'symbol' in the dataframe, so a separate plot should be drawn for each.
Tested in pandas 1.3.1, python 3.8, and matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# load the data from the csv
df = pd.read_csv('file.csv')
# convert date to a datetime format and extract only the date component
df.date = pd.to_datetime(df.date).dt.date
# set date as the index
df.set_index('date', inplace=True)
# this is what the dataframe should look like before plotting
symbol close high low open volume adjClose adjHigh adjLow adjOpen adjVolume divCash splitFactor ma5 ma100 buy sell
date
2020-01-01 601398 5.88 5.88 5.88 5.88 0 5.30 5.30 5.30 5.30 0 0.0 1.0 NaN NaN NaN NaN
2020-01-02 601398 5.97 6.03 5.91 5.92 234949400 5.38 5.43 5.33 5.34 234949400 0.0 1.0 NaN NaN NaN NaN
2020-01-03 601398 5.99 6.02 5.96 5.97 152213050 5.40 5.43 5.37 5.38 152213050 0.0 1.0 NaN NaN NaN NaN
2020-01-06 601398 5.97 6.05 5.95 5.96 226509710 5.38 5.45 5.36 5.37 226509710 0.0 1.0 NaN NaN NaN NaN
# extract the unique symbols
symbols = df.symbol.unique()
# get the number of unique symbols
sym_len = len(symbols)
# create a number of subplots based on the number of unique symbols in df
fig, axes = plt.subplots(nrows=sym_len, ncols=1, figsize=(12, 4*sym_len))
# if there's only 1 symbol, axes won't be iterable, so we put it in a list
if type(axes) != np.ndarray:
axes = [axes]
# iterate through each symbol and plot the relevant data to an axes
for ax, sym in zip(axes, symbols):
# select the data for the relevant symbol
data = df[df.symbol.eq(sym)]
# plot data
data[['adjClose', 'ma5', 'ma100']].plot(ax=ax, title=f'Data for Symbol: {sym}', ylabel='Value')
ax.scatter(data.index, y=data['buy'], label='buy', marker='^', color='red')
ax.scatter(data.index, y=data['sell'], label='sell', marker='v', color='green')
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
fig.tight_layout()
data.high and data.low are plotted for the scatter plots, since data.buy and data.sell are np.nan in the test data.
df can be conveniently created with:
sample = {'symbol': [601398, 601398, 601398, 601398], 'date': ['2020-01-01 00:00:00+00:00', '2020-01-02 00:00:00+00:00', '2020-01-03 00:00:00+00:00', '2020-01-06 00:00:00+00:00'], 'close': [5.88, 5.97, 5.99, 5.97], 'high': [5.88, 6.03, 6.02, 6.05], 'low': [5.88, 5.91, 5.96, 5.95], 'open': [5.88, 5.92, 5.97, 5.96], 'volume': [0, 234949400, 152213050, 226509710], 'adjClose': [5.2991971571, 5.3803073177, 5.3983317978, 5.3803073177], 'adjHigh': [5.2991971571, 5.4343807581, 5.425368518, 5.4524052382], 'adjLow': [5.2991971571, 5.3262338773, 5.3712950777, 5.3622828376], 'adjOpen': [5.2991971571, 5.3352461174, 5.3803073177, 5.3712950777], 'adjVolume': [0, 234949400, 152213050, 226509710], 'divCash': [0.0, 0.0, 0.0, 0.0], 'splitFactor': [1.0, 1.0, 1.0, 1.0], 'ma5': [np.nan, np.nan, np.nan, np.nan], 'ma100': [np.nan, np.nan, np.nan, np.nan], 'buy': [np.nan, np.nan, np.nan, np.nan], 'sell': [np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(sample)

Just find another way to solve my problem:
df = df.unstack(level=0)
This is tested work for me
I think is similar as bellow bases on #Trentons last advise:
df.reset_index(level=0, inplace=True)
df.index = df.index.date

Related

Plot Price as Horizontal Line for Non Zero Volume Values

My Code:
import matplotlib.pyplot as plt
plt.style.use('seaborn-ticks')
import pandas as pd
import numpy as np
path = 'C:\\File\\Data.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '08/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/1000) < 60, 0, (df.volume/1000))
df.plot('Time','Price')
dff = df[df.Volume > 60].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
plt.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=None)
plt.show()
My Plot Output is as below:
The Output of dff Datframe as below:
Date Time Price Volume
0 08/02/2019 13:39:43 685.35 97.0
1 08/02/2019 13:39:57 688.80 68.0
2 08/02/2019 13:43:50 683.00 68.0
3 08/02/2019 13:43:51 681.65 92.0
4 08/02/2019 13:49:42 689.95 70.0
5 08/02/2019 13:52:00 695.20 64.0
6 08/02/2019 14:56:42 686.25 68.0
7 08/02/2019 15:03:15 685.35 63.0
8 08/02/2019 15:03:31 683.15 69.0
9 08/02/2019 15:08:08 684.00 61.0
I want to plot the Prices of this table as Vertical Lines as per the below image. Any Help..
Based on your image, I think you mean horizontal lines. Either way it's pretty simple, Pyplot has hlines/vlines builtins. In your case, try something like
plt.hlines(dff['Price'], '08/02/2019', '09/02/2019')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
path = 'File.txt'
df = pd.read_csv(path, sep=",")
df.columns = ['Date','Time','Price','volume']
df = df[df.Date == '05/02/2019'].reset_index(drop=True)
df['Volume'] = np.where((df.volume/7500) < 39, 0, (df.volume/7500))
df["Time"] = pd.to_datetime(df['Time'])
df.plot(x="Time",y='Price', rot=0)
plt.title("Date: " + str(df['Date'].iloc[0]))
dff = df[df.Volume > 39].reset_index(drop=True)
dff = dff[['Date','Time','Price','Volume']]
print(dff)
dict = dff.to_dict('index')
for x in range(0, len(dict)):
plt.axhline(y=dict[x]['Price'],linewidth=2, color='blue')
plt.subplots_adjust(left=0.05, bottom=0.06, right=0.95, top=0.96, wspace=None, hspace=None)
plt.show()

Pandas/NumPy -- Plotting Dates as X axis

My Goal is just to plot this simple data, as a graph, with x data being dates ( date showing in x-axis) and price as the y-axis. Understanding that the dtype of the NumPy record array for the field date is datetime64[D] which means it is a 64-bit np.datetime64 in 'day' units. While this format is more portable, Matplotlib cannot plot this format natively yet. We can plot this data by changing the dates to DateTime.date instances instead, which can be achieved by converting to an object array: which I did below view the astype('0'). But I am still getting
this error :
view limit minimum -36838.00750000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-DateTime value to an axis that has DateTime units
code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(r'avocado.csv')
df2 = df[['Date','AveragePrice','region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Date'] = df2.Date.astype('O')
plt.style.use('ggplot')
ax = df2[['Date','AveragePrice']].plot(kind='line', title ="Price Change",figsize=(15,10),legend=True, fontsize=12)
ax.set_xlabel("Period",fontsize=12)
ax.set_ylabel("Price",fontsize=12)
plt.show()
df.head(3)
Unnamed: 0 Date AveragePrice Total Volume 4046 4225 4770 Total Bags Small Bags Large Bags XLarge Bags type year region
0 0 2015-12-27 1.33 64236.62 1036.74 54454.85 48.16 8696.87 8603.62 93.25 0.0 conventional 2015 Albany
1 1 2015-12-20 1.35 54876.98 674.28 44638.81 58.33 9505.56 9408.07 97.49 0.0 conventional 2015 Albany
2 2 2015-12-13 0.93 118220.22 794.70 109149.67 130.50 8145.35 8042.21 103.14 0.0 conventional 2015 Albany
df2 = df[['Date', 'AveragePrice', 'region']]
df2 = (df2.loc[df2['region'] == 'Albany'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2 = df2[['Date', 'AveragePrice']]
df2 = df2.sort_values(['Date'])
df2 = df2.set_index('Date')
print(df2)
ax = df2.plot(kind='line', title="Price Change")
ax.set_xlabel("Period", fontsize=12)
ax.set_ylabel("Price", fontsize=12)
plt.show()
output:

Pandas dataframe groupby plot

I have a dataframe which is structured as:
Date ticker adj_close
0 2016-11-21 AAPL 111.730
1 2016-11-22 AAPL 111.800
2 2016-11-23 AAPL 111.230
3 2016-11-25 AAPL 111.790
4 2016-11-28 AAPL 111.570
...
8 2016-11-21 ACN 119.680
9 2016-11-22 ACN 119.480
10 2016-11-23 ACN 119.820
11 2016-11-25 ACN 120.740
...
How can I plot based on the ticker the adj_close versus Date?
Simple plot,
you can use:
df.plot(x='Date',y='adj_close')
Or you can set the index to be Date beforehand, then it's easy to plot the column you want:
df.set_index('Date', inplace=True)
df['adj_close'].plot()
If you want a chart with one series by ticker on it
You need to groupby before:
df.set_index('Date', inplace=True)
df.groupby('ticker')['adj_close'].plot(legend=True)
If you want a chart with individual subplots:
grouped = df.groupby('ticker')
ncols=2
nrows = int(np.ceil(grouped.ngroups/ncols))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(12,4), sharey=True)
for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
grouped.get_group(key).plot(ax=ax)
ax.legend()
plt.show()
Similar to Julien's answer above, I had success with the following:
fig, ax = plt.subplots(figsize=(10,4))
for key, grp in df.groupby(['ticker']):
ax.plot(grp['Date'], grp['adj_close'], label=key)
ax.legend()
plt.show()
This solution might be more relevant if you want more control in matlab.
Solution inspired by: https://stackoverflow.com/a/52526454/10521959
The question is How can I plot based on the ticker the adj_close versus Date?
This can be accomplished by reshaping the dataframe to a wide format with .pivot or .groupby, or by plotting the existing long form dataframe directly with seaborn.
In the following sample data, the 'Date' column has a datetime64[ns] Dtype.
Convert the Dtype with pandas.to_datetime if needed.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
Imports and Sample Data
import pandas as pd
import pandas_datareader as web # for sample data; this can be installed with conda if using Anaconda, otherwise pip
import seaborn as sns
import matplotlib.pyplot as plt
# sample stock data, where .iloc[:, [5, 6]] selects only the 'Adj Close' and 'tkr' column
tickers = ['aapl', 'acn']
df = pd.concat((web.DataReader(ticker, data_source='yahoo', start='2020-01-01', end='2022-06-21')
.assign(ticker=ticker) for ticker in tickers)).iloc[:, [5, 6]]
# display(df.head())
Date Adj Close ticker
0 2020-01-02 73.785904 aapl
1 2020-01-03 73.068573 aapl
2 2020-01-06 73.650795 aapl
3 2020-01-07 73.304420 aapl
4 2020-01-08 74.483604 aapl
# display(df.tail())
Date Adj Close ticker
1239 2022-06-14 275.119995 acn
1240 2022-06-15 281.190002 acn
1241 2022-06-16 270.899994 acn
1242 2022-06-17 275.380005 acn
1243 2022-06-21 282.730011 acn
pandas.DataFrame.pivot & pandas.DataFrame.plot
pandas plots with matplotlib as the default backend.
Reshaping the dataframe with pandas.DataFrame.pivot converts from long to wide form, and puts the dataframe into the correct format to plot.
.pivot does not aggregate data, so if there is more than 1 observation per index, per ticker, then use .pivot_table
Adding subplots=True will produce a figure with two subplots.
# reshape the long form data into a wide form
dfp = df.pivot(index='Date', columns='ticker', values='Adj Close')
# display(dfp.head())
ticker aapl acn
Date
2020-01-02 73.785904 203.171112
2020-01-03 73.068573 202.832764
2020-01-06 73.650795 201.508224
2020-01-07 73.304420 197.157654
2020-01-08 74.483604 197.544434
# plot
ax = dfp.plot(figsize=(11, 6))
Use seaborn, which accepts long form data, so reshaping the dataframe to a wide form isn't necessary.
seaborn is a high-level api for matplotlib
sns.lineplot: axes-level plot
fig, ax = plt.subplots(figsize=(11, 6))
sns.lineplot(data=df, x='Date', y='Adj Close', hue='ticker', ax=ax)
sns.relplot: figure-level plot
Adding row='ticker', or col='ticker', will generate a figure with two subplots.
g = sns.relplot(kind='line', data=df, x='Date', y='Adj Close', hue='ticker', aspect=1.75)

How to make time column to be x-axis of a plot?

I am reading an excel file using pandas and trying to read one of its sheets to plot some results. Below is my code;
df = pd.read_excel('Base_Case-Position.xlsx',sheetname = 'Sheet1', header=0, parse_col="A:E")
print df
Time WB NB WBO SBO
0 09:00:00 0.242661 0.839820 0.449634 0.484678
1 10:00:00 0.809247 1.545173 1.129107 1.147414
2 11:00:00 1.519679 2.051029 1.766170 1.699770
3 12:00:00 1.748682 2.291056 2.018005 1.879778
4 13:00:00 1.790151 2.384782 2.123876 1.913225
5 14:00:00 1.966337 2.612614 2.344493 2.094139
6 15:00:00 2.295261 3.030992 2.686752 2.503890
7 16:00:00 2.412628 3.232904 2.772683 2.737191
8 17:00:00 2.476746 3.354741 2.781410 2.923059
I would like to plot columns WB, NB, WBO and SBO against Time and would like to add Time column as xtick.
Below is the way I am doing it at the moment but does not OK to me as I am manually setting up labels. I have created a list for Xticks. Is there a better way of doing this?
%matplotlib inline
import matplotlib.pyplot as plt
a = np.arange(9)
plt.scatter(a,df.WB,color='r',alpha=1)
plt.plot(a,df.NB, color='g',alpha=1)
plt.plot(a,df.WBO, color='k',alpha=1)
plt.plot(a,df.SBO,color='m',alpha=1)
plt.xlabel('Time (hr)')
plt.ylabel('Energy Consumption (kWh)')
labels = ['9:00', '10:00', '11:00', '12:00','13:00', '14:00', '15:00', '16:00','17:00']
plt.xticks(a,labels,rotation='horizontal')
plt.grid(True)
plt.tick_params(which='major', length=4)
plt.tick_params(which='minor', length=4)
plt.show()
Well, this would be the simplest way to do it:
df.set_index('Time').plot()
plt.xticks(map(str, df['Time']), rotation=50)
plt.show()

Plot three lines - one line, per symbol, per date

Plot three lines - one line, per symbol, per date
import pandas as pd
import matplotlib.pyplot as plt
symbol price interest
Date
2016-04-22 AAPL 445.50 0.00
2016-04-22 GOOG 367.02 21.52
2016-04-22 MSFT 248.94 3.44
2016-04-15 AAPL 425.51 0.00
2016-04-15 GOOG 338.57 13.06
2016-04-15 MSFT 226.66 1.15
Currently I split the dataframe into three different frames:
df1 = df[df.symbol == 'AAPL']
df2 = df[df.symbol == 'GOOG']
df3 = df[df.symbol == 'MSFT']
Then I plot them:
plt.plot(df1.index, df1.price.values,
df2.index, df2.price.values,
df3.index, df3.price.values)
Is it possible to plot these three symbols prices straight from the dataframe?
try this:
ax = df[df.symbol=='AAPL'].plot()
df[df.symbol=='GOOG'].plot(ax=ax)
df[df.symbol=='MSFT'].plot(ax=ax)
plt.show()
# Create sample data.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(100, 3), columns=list('ABC'), index=pd.date_range('2016-1-1', periods=100)).cumsum().reset_index().rename(columns={'index': 'date'})
df = pd.melt(df, id_vars='date', value_vars=['A', 'B', 'C'], value_name='price', var_name='symbol')
df['interest'] = 100
>>> df.head()
date symbol price interest
0 2016-01-01 A 1.764052 100
1 2016-01-02 A 4.004946 100
2 2016-01-03 A 4.955034 100
3 2016-01-04 A 5.365632 100
4 2016-01-05 A 6.126670 100
# Generate plot.
plot_df = (df.loc[df.symbol.isin(['A', 'B', 'C']), ['date', 'symbol', 'price']]
.set_index(['symbol', 'date'])
.unstack('symbol'))
plot_df.columns = plot_df.columns.droplevel()
>>> plot_df.plot())

Categories

Resources