Related
I have been working on pandas data that has too many time x points/ticks. I have found several solutions to reduce it, but my problem is I'm using two different data set with different time points. So, how to not only reduce the x-ticks as well as aligning these two data time points?
Yes, I know sharex.
My plot generating code is following:
fig, (ax1, ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(10,7), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot( df.time, df['B'], color='k')
ax1.plot( df.time, df['Bx'], color='r')
ax1.plot( df.time, df['By'], color='b')
ax1.plot( df.time, df['Bz'], color='g')
ax1.xaxis.grid(True,alpha=0.3)
ax1.set_ylabel('Bx,By,Bz,B[nT]')
ax2.plot(df1.time, df1['v_total'],color='k')
ax2.plot(df1.time, df1['Vx'],color='r')
ax2.plot(df1.time, df1['Vy'],color='b')
ax2.plot(df1.time, df1['Vz'],color='g')
ax2.xaxis.grid(True,alpha=0.3)
ax2.set_ylabel('Vx,Vy,Vz,V[km/s]')
ax3.plot(df1.time, df1['n'],color='k')
ax3.xaxis.grid(True,alpha=0.3)
ax3.set_ylabel('Np[1/cm^3]')
ax4.plot(df1.time, df1['T'],color='k')
ax4.xaxis.grid(True,alpha=0.3)
ax4.set_ylabel('T[k]')
#loc = mdates.MinuteLocator([0,30])
#ax2.xaxis.set_major_locator(loc)
#ax2.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax3.xaxis.set_major_locator(loc)
#ax3.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax4.xaxis.set_major_locator(loc)
#ax4.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax2 = plt.gca()
#ax2.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax2.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax3 = plt.gca()
#ax3.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax3.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax4 = plt.gca()
#ax4.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax4.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
fig.suptitle('Shock format')
plt.savefig('plot.png')
plt.savefig('plot1.pdf')
plt.show()
Here df is the one with many points and I want to reduce df x ticks/time points as well as aligning df1 to df. Hashtagged lines are my try, but it takes too long and giving me this warning "Locator attempting to generate 359569 ticks ([-113.5, ..., 2383.5]), which exceeds Locator.MAXTICKS (1000)."
The graph output is following
What my goal should look like is this
I have created the data as I see fit as it seems difficult to provide data.
The main point is that the byminute is the number of minutes, and the default value is 1 minute in the range (60). The interval specifies at what interval that minute increment should be displayed. So for the 15 minute increments, I used np.range(0,60,15) and two intervals. The result is 00 and 30 minute increments.
import pandas as pd
import numpy as np
date_rng = pd.date_range('2022-05-31 00:00:00', freq='1s', periods=43200)
df = pd.DataFrame({'datetime': pd.to_datetime(date_rng), 'value':np.random.randn(43200).cumsum()})
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
fig, (ax1,ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(12,6), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot(df.datetime, df['value'], color='b')
ax2.plot(df.datetime, df['value'], color='g')
ax3.plot(df.datetime, df['value'], color='r')
ax4.plot(df.datetime, df['value'], color='k')
minutes = mdates.MinuteLocator(byminute=np.arange(0,60,15),interval=2)
minutes_fmt = mdates.DateFormatter('%d %H:%M')
ax4.xaxis.set_major_locator(minutes)
ax4.xaxis.set_major_formatter(minutes_fmt)
ax4.tick_params(axis='x', labelrotation=45)
plt.show()
This question is from this tutorial found here:
I want my plot to look like the one below but with time series data and the zoomed data not being x_lim , y_lim data but from a different source.
So in the plot above i would like the intraday data that is from a different source and the plot below would be daily data for some stock. But because they both have different source i cannot use a limit to zoom. For this i will be using yahoo datareader for daily and yfinance for intraday.
The code:
import pandas as pd
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
df = web.DataReader('goog', 'yahoo')
df.Close = pd.to_numeric(df['Close'], errors='coerce')
fig = plt.figure(figsize=(6, 5))
plt.subplots_adjust(bottom = 0., left = 0, top = 1., right = 1)
sub1 = fig.add_subplot(2,2,1)
sub1 = df.Close.plot()
sub2 = fig.add_subplot(2,1,2) # two rows, two columns, second cell
df.Close.pct_change().plot(ax =sub2)
sub2.plot(theta, y, color = 'orange')
con1 = ConnectionPatch(xyA=(df[1:2].index, df[2:3].Close), coordsA=sub1.transData,
xyB=(df[4:5].index, df[5:6].Close), coordsB=sub2.transData, color = 'green')
fig.add_artist(con1)
I am having trouble with xy coordinates. With the code above i am getting :
TypeError: Cannot cast array data from dtype('O') to dtype('float64')
according to the rule 'safe'
xyA=(df[1:2].index, df[2:3].Close)
What i had done here is that my xvalue is the date df[1:2].index and my y value is the price df[2:3].Close
Is converting the df to an array and then ploting my only option here? If there is any other way to get the ConnectionPatch to work kindly please advise.
df.dtypes
High float64
Low float64
Open float64
Close float64
Volume int64
Adj Close float64
dtype: object
The way matplotlib dates are plotted are by converting dates to floats as a number of days, starting with 0 on 1970-1-1, i.e. the POSIX timestamp zero. It’s different from that timestamp as it’s not the same resolution, i.e. “1” is a day instead of a second.
There’s 3 ways to compute that number,
either use matplotlib.dates.date2num
or use .toordinal() which gives you the right resolution and remove the offset corresponding to 1970-1-1,
or get the POSIX timestamp and divide by the number of seconds in a day:
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
con1 = ConnectionPatch(xyA=(df.index[0].toordinal() - pd.Timestamp(0).toordinal(), df['Close'].iloc[0]), coordsA=sub1.transData,
xyB=(df.index[1].toordinal() - pd.Timestamp(0).toordinal(), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(df.index[-1].timestamp() / 86_400, df['Close'].iloc[-1]), coordsA=sub1.transData,
xyB=(df.index[-1].timestamp() / 86_400, df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
You also need to make sure that you’re using values that are in range for the targeted axes, in your example you use Close values on sub2 which contains pct_change’d values.
Of course if you want the bottom of the boxes as in your example it’s easier to express the coordinates using the axes transform instead of the data transform:
from matplotlib.dates import date2num
con1 = ConnectionPatch(xyA=(0, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[1]), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(1, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[-1]), df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
To plot your candlesticks, I’d recommend using the mplfinance (previously matplotlib.finance) package:
import mplfinance as mpf
sub3 = fig.add_subplot(2, 2, 2)
mpf.plot(df.iloc[30:70], type='candle', ax=sub3)
Putting all this together in a single script, it could look like this:
import pandas as pd, mplfinance as mpf, matplotlib.pyplot as plt
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
from matplotlib.dates import date2num, ConciseDateFormatter, AutoDateLocator
from matplotlib.ticker import PercentFormatter
# Get / compute data
df = web.DataReader('goog', 'yahoo')
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
# Pick zoom range
zoom_start = df.index[30]
zoom_end = df.index[30 + 8 * 5] # 8 weeks ~ 2 months
# Create figures / axes
fig = plt.figure(figsize=(18, 12))
top_left = fig.add_subplot(2, 2, 1)
top_right = fig.add_subplot(2, 2, 2)
bottom = fig.add_subplot(2, 1, 2)
fig.subplots_adjust(hspace=.35)
# Plot all 3 data
df['Close'].plot(ax=bottom, linewidth=1, rot=0, title='Daily closing value', color='purple')
bottom.set_ylim(0)
df.loc[zoom_start:zoom_end, 'Change'].plot(ax=top_left, linewidth=1, rot=0, title='Daily Change, zoomed')
top_left.yaxis.set_major_formatter(PercentFormatter())
# Here instead of df.loc[...] use your intra-day data
mpf.plot(df.loc[zoom_start:zoom_end], type='candle', ax=top_right, xrotation=0, show_nontrading=True)
top_right.set_title('Last day OHLC')
# Put ConciseDateFormatters on all x-axes for fancy date display
for ax in fig.axes:
locator = AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(ConciseDateFormatter(locator))
# Add the connection patches
fig.add_artist(ConnectionPatch(
xyA=(0, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_start), df.loc[zoom_start, 'Close']), coordsB=bottom.transData,
color='green'
))
fig.add_artist(ConnectionPatch(
xyA=(1, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_end), df.loc[zoom_end, 'Close']), coordsB=bottom.transData,
color='green'
))
plt.show()
I´d like to create a stacked barplot of asset weights representing a financial portfolio over time. I tried several approaches for that one, but got the most pleasing results with matplotlib's stackplot function. However, I am not able to display negative asset weights in my stackplot, thus receiving wrong figures. I am using Python (3.8.3) and Matplotlib (3.3.2).
The following displays the head of the asset weights dataframe to plot:
w_minvar1nc.head()
SMACAP GROWTH MOMTUM MINVOL QUALITY
Date
2015-02-20 0.012942 0.584273 -0.114441 0.387773 0.129454
2015-02-23 0.013129 0.584528 -0.115836 0.386448 0.131732
2015-02-24 0.013487 0.584404 -0.116585 0.386364 0.132330
2015-02-25 0.015145 0.572256 -0.117796 0.387583 0.142811
2015-02-26 0.015113 0.567198 -0.114580 0.387807 0.144462
The following displays a simple code snippet of my current approach to the stackplot:
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc.index, w_minvar1nc.SMACAP, w_minvar1nc.GROWTH, w_minvar1nc.MOMTUM, w_minvar1nc.MINVOL, w_minvar1nc.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
And here comes the corresponding stackplot itself in which you can see that the negative asset weights don't show up:
Does anyone know how to deal with that problem? Any ideas would be much appreciated.
PS: Of course I've already tried other approaches such as stacking the data manually and then create a regular barplot etc. And in this case the positive and negative asset weights are actually displayed correctly, but this approach also leads to even bigger problems regarding the formatting of the x-axis because of the daily data.
If the columns are separated into positive and negative weights, you can plot them separately:
from matplotlib import pyplot as plt
import pandas as pd
#fake data
import numpy as np
np.random.seed(123)
n = 100
df = pd.DataFrame({"Dates": pd.date_range("20180101", periods=n, freq="10d"),
"A": 0.2 + np.random.random(n)/10,
"B": -np.random.random(n)/10,
"C": -0.1-np.random.random(n)/10,
"D": 0.3+ np.random.random(n)/10})
df.set_index("Dates", inplace=True)
df["E"] = 1 - df.A - df.D - df.B - df.C
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
ax.stackplot(df.index, df.A, df.D, df.E)
ax.stackplot(df.index, df.B, df.C)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
plt.show()
Sample output:
Enclosed the solution to the problem with huge credit to #Mr. T:
# split data into negative and positive values
w_minvar1nc_pos = w_minvar1nc[w_minvar1nc >= 0].fillna(0)
w_minvar1nc_neg = w_minvar1nc[w_minvar1nc < 0].fillna(0)
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc_pos.index, w_minvar1nc_pos.SMACAP, w_minvar1nc_pos.GROWTH, w_minvar1nc_pos.MOMTUM, w_minvar1nc_pos.MINVOL, w_minvar1nc_pos.QUALITY)
ax.stackplot(w_minvar1nc_neg.index, w_minvar1nc_neg.SMACAP, w_minvar1nc_neg.GROWTH, w_minvar1nc_neg.MOMTUM, w_minvar1nc_neg.MINVOL, w_minvar1nc_neg.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
I want to display my third plot x-axis data in the datetime like my other two plots (see linked figure). I have used similar approaches to each graph, but resampled the third dataset to plot precipitation in a bar graph for every hour in my time period. When I originally attempted to format the date for the third plot as I did in the previous two, the x-axis labels either disappeared or the data doesn't plot correctly. In the link below, the data is displayed the way I intended.
Three subplots of rainfall
My timeseries data appears like this, where I'm only concerned about 'Reading' and 'Value':
Reading,Receive,Value,Unit,Quality
2018-04-07 13:09:28,2018-04-07 13:09:35,0.00,in,A
2018-04-07 06:01:25,2018-04-07 06:01:35,0.04,in,A
2018-04-07 04:38:15,2018-04-07 04:38:35,0.04,in,A
Here is how I achieved the correct scheme in the second plot:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.patches as patches
import matplotlib.dates as mdates
import datetime as dt
#read data from csv
data2 = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Accumulation_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data2.set_index('Reading',inplace=True)
#plot data
ax2 = plt.subplot(3, 1, 2)
data2.plot(ax=ax2)
#set ticks every 12 hours
ax2.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0,24,12)))
plt.xticks(rotation=0, ha='center')
#format date
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %d\n%H:%M:%S'))
ax2.legend().set_visible(False)
ax2.set_title('Accumulated Rainfall\nApril 5-7, 2018')
ax2.set_xlabel('')
ax2.set_ylabel('Inches Since Oct 1 2017')
ax2.set_ylim(17.5, 22)
arrow_date2 = mdates.datestr2num('04/07/2018 04:30:00')
start_date2 = mdates.datestr2num('04/07/2018 03:00:00')
end_date2 = mdates.datestr2num('04/07/2018 06:00:00')
text_date2 = mdates.datestr2num('04/07/2018 03:00:00')
ax2.axvspan(start_date2, end_date2, 0.86, 0.97, color='green', alpha=0.35)
ax2.annotate("Approximate time of\nSlope Failure", xy=(arrow_date2, 21.5), xycoords='data', xytext=(text_date2, 19), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
My code so far for the third subplot:
#read data from csv
data =pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum().reset_index()
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(kind='bar',ax=ax3, x='Reading', y='Value', width=0.9)
#set ticks every other hour
plt.xticks(ha='center')
for label in ax3.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
plt.show()
How do I fix my code to make the axis labels plot in the way I want them to plot?
My code was wrong, obviously. When I resampled the data, I reset the index. This created a new index column that was messing with my desired x values ('Reading'). Additionally, I shouldn't have been plotting 'x' in resamp.plot. This solution helped: Plotting with Pandas. Here is the corrected code:
#read data from csv
data = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum() # changed here
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(ax=ax3, y='Value', kind='bar', width=0.9) # changed here
ax3.set_xticklabels([dt.strftime('%b %d\n%H:%M:%S') for dt in resamp.index])
plt.xticks(rotation=0, ha='center')
for i, tick in enumerate(ax3.xaxis.get_major_ticks()):
if (i % (4) != 0): # 4 hours
tick.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
ax3.set_ylim(0.00, 0.40)
plt.show()
I have a dataframe with the data below.
ex_dict = {'revenue': [613663, 1693667, 2145183, 2045065, 2036406,
1708862, 1068232, 1196899, 2185852, 2165778, 2144738, 2030337,
1784067],
'abs_percent_diff': [0.22279211315310588, 0.13248909660765254,
0.12044821447874667, 0.09438674840975962, 0.1193588387687364,
0.062100921139322744, 0.05875297161175445, 0.06240362963749895,
0.05085338590212515, 0.034877614941165744, 0.012263947005671703,
0.029227374323993634, 0.023411816504907524],
'ds': [dt.date(2017,1,1), dt.date(2017,1,2), dt.date(2017,1,3),
dt.date(2017,1,4), dt.date(2017,1,5), dt.date(2017,1,6),
dt.date(2017,1,7), dt.date(2017,1,8), dt.date(2017,1,9),
dt.date(2017,1,10), dt.date(2017,1,11), dt.date(2017,1,12),
dt.date(2017,1,13)],
'yhat_normal': [501853.9074623253, 1952329.3521464923, 1914575.7673396615,
1868685.8215084015, 1819261.1068672044, 1608945.031482406,
1008953.0123101478, 1126595.36037955, 2302965.598289115,
2244044.9351591542, 2171367.536396199, 2091465.0313570146,
1826836.562382966]}
df_vis=pd.DataFrame.from_dict(ex_dict)
I want to graph yhat_normal and revenue on the same y-axis and abs_percent_diff on a y-axis with a different scale.
df_vis = df_vis.set_index('ds')
df_vis[['rev', 'yhat_normal']].plot(figsize=(20, 12))
I can easily graph rev and yhat_normal with the code above, but I am struggling to get abs_percent_diff on a different y-axis scale. I tried converting my columns to numpy arrays and doing this, but it looks terrible.
npdate = df_vis.as_matrix(columns= ['ds'])
nppredictions = df_vis.as_matrix(columns= ['yhat_normal'])
npactuals = df_vis.as_matrix(columns= ['rev'])
npmape = df_vis.as_matrix(columns=['abs_percent_diff'])
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
fig.set_size_inches(20,10)
ax1.plot_date(npdate, nppredictions, ls= '-', color= 'b')
ax1.plot_date(npdate, npactuals, ls='-', color='g')
ax2.plot_date(npdate, npmape, 'r-')
ax1.set_xlabel('X data')
ax1.set_ylabel('Y1 data', color='g')
ax2.set_ylabel('Y2 data', color='b')
plt.show()
This is what I want. Where the red line is the abs_percent_diff. Obviously, I drew the line by hand so it is not accurate.
I'm not sure if I got the problem correclty, but it seems you simply want to draw one of the dataframe columns at the bottom of the plot area.
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
ex_dict = {'revenue': [613663, 1693667, 2145183, 2045065, 2036406,
1708862, 1068232, 1196899, 2185852, 2165778, 2144738, 2030337,
1784067],
'abs_percent_diff': [0.22279211315310588, 0.13248909660765254,
0.12044821447874667, 0.09438674840975962, 0.1193588387687364,
0.062100921139322744, 0.05875297161175445, 0.06240362963749895,
0.05085338590212515, 0.034877614941165744, 0.012263947005671703,
0.029227374323993634, 0.023411816504907524],
'ds': [dt.date(2017,1,1), dt.date(2017,1,2), dt.date(2017,1,3),
dt.date(2017,1,4), dt.date(2017,1,5), dt.date(2017,1,6),
dt.date(2017,1,7), dt.date(2017,1,8), dt.date(2017,1,9),
dt.date(2017,1,10), dt.date(2017,1,11), dt.date(2017,1,12),
dt.date(2017,1,13)],
'yhat_normal': [501853.9074623253, 1952329.3521464923, 1914575.7673396615,
1868685.8215084015, 1819261.1068672044, 1608945.031482406,
1008953.0123101478, 1126595.36037955, 2302965.598289115,
2244044.9351591542, 2171367.536396199, 2091465.0313570146,
1826836.562382966]}
df_vis=pd.DataFrame.from_dict(ex_dict)
df_vis = df_vis.set_index('ds')
ax = df_vis[['revenue','yhat_normal']].plot(figsize=(13, 8))
ax2 = df_vis['abs_percent_diff'].plot(secondary_y=True, ax=ax)
ax2.set_ylim(0,1)
plt.show()