How to reduce x ticks in matplotlib? - python

I have been working on pandas data that has too many time x points/ticks. I have found several solutions to reduce it, but my problem is I'm using two different data set with different time points. So, how to not only reduce the x-ticks as well as aligning these two data time points?
Yes, I know sharex.
My plot generating code is following:
fig, (ax1, ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(10,7), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot( df.time, df['B'], color='k')
ax1.plot( df.time, df['Bx'], color='r')
ax1.plot( df.time, df['By'], color='b')
ax1.plot( df.time, df['Bz'], color='g')
ax1.xaxis.grid(True,alpha=0.3)
ax1.set_ylabel('Bx,By,Bz,B[nT]')
ax2.plot(df1.time, df1['v_total'],color='k')
ax2.plot(df1.time, df1['Vx'],color='r')
ax2.plot(df1.time, df1['Vy'],color='b')
ax2.plot(df1.time, df1['Vz'],color='g')
ax2.xaxis.grid(True,alpha=0.3)
ax2.set_ylabel('Vx,Vy,Vz,V[km/s]')
ax3.plot(df1.time, df1['n'],color='k')
ax3.xaxis.grid(True,alpha=0.3)
ax3.set_ylabel('Np[1/cm^3]')
ax4.plot(df1.time, df1['T'],color='k')
ax4.xaxis.grid(True,alpha=0.3)
ax4.set_ylabel('T[k]')
#loc = mdates.MinuteLocator([0,30])
#ax2.xaxis.set_major_locator(loc)
#ax2.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax3.xaxis.set_major_locator(loc)
#ax3.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax4.xaxis.set_major_locator(loc)
#ax4.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax2 = plt.gca()
#ax2.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax2.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax3 = plt.gca()
#ax3.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax3.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax4 = plt.gca()
#ax4.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax4.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
fig.suptitle('Shock format')
plt.savefig('plot.png')
plt.savefig('plot1.pdf')
plt.show()
Here df is the one with many points and I want to reduce df x ticks/time points as well as aligning df1 to df. Hashtagged lines are my try, but it takes too long and giving me this warning "Locator attempting to generate 359569 ticks ([-113.5, ..., 2383.5]), which exceeds Locator.MAXTICKS (1000)."
The graph output is following
What my goal should look like is this

I have created the data as I see fit as it seems difficult to provide data.
The main point is that the byminute is the number of minutes, and the default value is 1 minute in the range (60). The interval specifies at what interval that minute increment should be displayed. So for the 15 minute increments, I used np.range(0,60,15) and two intervals. The result is 00 and 30 minute increments.
import pandas as pd
import numpy as np
date_rng = pd.date_range('2022-05-31 00:00:00', freq='1s', periods=43200)
df = pd.DataFrame({'datetime': pd.to_datetime(date_rng), 'value':np.random.randn(43200).cumsum()})
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
fig, (ax1,ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(12,6), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot(df.datetime, df['value'], color='b')
ax2.plot(df.datetime, df['value'], color='g')
ax3.plot(df.datetime, df['value'], color='r')
ax4.plot(df.datetime, df['value'], color='k')
minutes = mdates.MinuteLocator(byminute=np.arange(0,60,15),interval=2)
minutes_fmt = mdates.DateFormatter('%d %H:%M')
ax4.xaxis.set_major_locator(minutes)
ax4.xaxis.set_major_formatter(minutes_fmt)
ax4.tick_params(axis='x', labelrotation=45)
plt.show()

Related

custom xlabel ticks in Seaborn heatmaps

I have plotted a heatmap which is displayed below. on the xaxis it shows time of the day and y axis shows date. I want to show xaxis at every hour instead of the random xlabels it displays here.
I tried following code but the resulting heatmap overrites all xlabels together:
t = pd.date_range(start='00:00:00', end='23:59:59', freq='60T').time
df = pd.DataFrame(index=t)
df.reset_index(inplace=True)
df['index'] = df['index'].astype('str')
sns_hm = sns.heatmap(data=mat, cbar=True, lw=0,cmap=colormap,xticklabels=df['index'])
The following code supposes mat is a dataframe with columns for some timestamps for each of a number of days. Each of the days, the same timestamps need to appear again.
After drawing the heatmap, the left and right limits of the x-axis are retrieved. Supposing these go from 0 to 24 hour, the range can be subdivided into 25 positions, one for each of the hours.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
from matplotlib.colors import ListedColormap, to_hex
# first, create some test data
df = pd.DataFrame()
df["date"] = pd.date_range('20220304', periods=19000, freq=DateOffset(seconds=54))
df["val"] = (((np.random.rand(len(df)) ** 100).cumsum() / 2).astype(int) % 2) * 100
df['day'] = df['date'].dt.strftime('%d-%m-%Y')
df['time'] = df['date'].dt.strftime('%H:%M:%S')
mat = df.pivot(index='day', columns='time', values='val')
colors = list(plt.cm.Greens(np.linspace(0.2, 0.9, 10)))
ax = sns.heatmap(mat, cmap=colors, cbar_kws={'ticks': range(0, 101, 10)})
xmin, xmax = ax.get_xlim()
tick_pos = np.linspace(xmin, xmax, 25)
tick_labels = [f'{h:02d}:00:00' for h in range(len(tick_pos))]
ax.set_xticks(tick_pos)
ax.set_xticklabels(tick_labels, rotation=90)
ax.set(xlabel='', ylabel='')
plt.tight_layout()
plt.show()
The left plot shows the default tick labels, the right plot the customized labels.

How to display negative values in matplotlib's stackplot?

I´d like to create a stacked barplot of asset weights representing a financial portfolio over time. I tried several approaches for that one, but got the most pleasing results with matplotlib's stackplot function. However, I am not able to display negative asset weights in my stackplot, thus receiving wrong figures. I am using Python (3.8.3) and Matplotlib (3.3.2).
The following displays the head of the asset weights dataframe to plot:
w_minvar1nc.head()
SMACAP GROWTH MOMTUM MINVOL QUALITY
Date
2015-02-20 0.012942 0.584273 -0.114441 0.387773 0.129454
2015-02-23 0.013129 0.584528 -0.115836 0.386448 0.131732
2015-02-24 0.013487 0.584404 -0.116585 0.386364 0.132330
2015-02-25 0.015145 0.572256 -0.117796 0.387583 0.142811
2015-02-26 0.015113 0.567198 -0.114580 0.387807 0.144462
The following displays a simple code snippet of my current approach to the stackplot:
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc.index, w_minvar1nc.SMACAP, w_minvar1nc.GROWTH, w_minvar1nc.MOMTUM, w_minvar1nc.MINVOL, w_minvar1nc.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
And here comes the corresponding stackplot itself in which you can see that the negative asset weights don't show up:
Does anyone know how to deal with that problem? Any ideas would be much appreciated.
PS: Of course I've already tried other approaches such as stacking the data manually and then create a regular barplot etc. And in this case the positive and negative asset weights are actually displayed correctly, but this approach also leads to even bigger problems regarding the formatting of the x-axis because of the daily data.
If the columns are separated into positive and negative weights, you can plot them separately:
from matplotlib import pyplot as plt
import pandas as pd
#fake data
import numpy as np
np.random.seed(123)
n = 100
df = pd.DataFrame({"Dates": pd.date_range("20180101", periods=n, freq="10d"),
"A": 0.2 + np.random.random(n)/10,
"B": -np.random.random(n)/10,
"C": -0.1-np.random.random(n)/10,
"D": 0.3+ np.random.random(n)/10})
df.set_index("Dates", inplace=True)
df["E"] = 1 - df.A - df.D - df.B - df.C
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
ax.stackplot(df.index, df.A, df.D, df.E)
ax.stackplot(df.index, df.B, df.C)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
plt.show()
Sample output:
Enclosed the solution to the problem with huge credit to #Mr. T:
# split data into negative and positive values
w_minvar1nc_pos = w_minvar1nc[w_minvar1nc >= 0].fillna(0)
w_minvar1nc_neg = w_minvar1nc[w_minvar1nc < 0].fillna(0)
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc_pos.index, w_minvar1nc_pos.SMACAP, w_minvar1nc_pos.GROWTH, w_minvar1nc_pos.MOMTUM, w_minvar1nc_pos.MINVOL, w_minvar1nc_pos.QUALITY)
ax.stackplot(w_minvar1nc_neg.index, w_minvar1nc_neg.SMACAP, w_minvar1nc_neg.GROWTH, w_minvar1nc_neg.MOMTUM, w_minvar1nc_neg.MINVOL, w_minvar1nc_neg.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()

date and graph alignment - Economic analysis

I'm am running a fundamental economic analysis and when I get to visualising and charting I am not able to align the dates with the graph.
I wanted the most recent date entry to show on the right and the rest of the dates to show every two years.
I have tried literally everything and cant find the solution.
Here is my code:
%matplotlib inline
import pandas as pd
from matplotlib import pyplot
import matplotlib.dates as mdates
df = pd.read_csv('https://fred.stlouisfed.org/graph/fredgraph.csvbgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=off&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=NAEXKP01EZQ657S&scale=left&cosd=1995-04-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Quarterly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2020-09-21&revision_date=2020-09-21&nd=1995-04-01')
df = df.set_index('DATE')
df['12MonthAvg'] = df.rolling(window=12).mean().dropna(how='all')
df['9MonthAvg'] = df['12MonthAvg'].rolling(window=12).mean().dropna(how='all')
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
pyplot.style.use("seaborn")
pyplot.subplots(figsize=(10, 5), dpi=85)
df['Spread'].plot().set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
df['Spread'].plot().axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
df['Spread'].plot().spines['left'].set_position(('outward', 10))
df['Spread'].plot().spines['bottom'].set_position(('outward', 10))
df['Spread'].plot().spines['right'].set_visible(False)
df['Spread'].plot().spines['top'].set_visible(False)
df['Spread'].plot().yaxis.set_ticks_position('left')
df['Spread'].plot().xaxis.set_ticks_position('bottom')
df['Spread'].plot().text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=pyplot.gca().transAxes, fontsize=14, ha='center', color='blue')
df['Spread'].plot().fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
print(df['Spread'].tail(3))
pyplot.autoscale()
pyplot.show()
And the output:
This is the raw data:
There is a couple of corrections to your code.
In your URL insert "?" after fredgraph.csv. It starts so called query string,
where bgcolor is the first parameter.
Read your DataFrame with additional parameters:
df = pd.read_csv('...', parse_dates=[0], index_col=[0])
The aim is to:
read Date column as datetime,
set it as the index.
Create additional columns as:
df['12MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=12).mean()
df['9MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=9).mean()
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
Corrections:
9MonthAvg (as I think) should be computed from the source column,
not from 12MonthAvg,
dropna here is not needed, as you create whole column anyway.
Now is the place to use dropna() on Spread column and save it in
a dedicated variable:
spread = df['Spread'].dropna()
Draw your figure the following way:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(10, 5), dpi=85)
plt.plot_date(spread.index, spread, fmt='-')
ax.set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
ax.axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=ax.transAxes, fontsize=14, ha='center', color='blue')
ax.xaxis.set_major_formatter(mdates.DateFormatter(fmt='%Y-%m-%d'))
plt.show()
Corrections:
plt.subplots returns fig and ax, so I saved them (actually, only ax
is needed).
When one axis contains dates, it is better to use plot_date.
I changed the way DateFormatter is set.
Using the above code I got the following picture:

How do I index or plot datetimes after resampling so they display on a bar plot axis correctly?

I want to display my third plot x-axis data in the datetime like my other two plots (see linked figure). I have used similar approaches to each graph, but resampled the third dataset to plot precipitation in a bar graph for every hour in my time period. When I originally attempted to format the date for the third plot as I did in the previous two, the x-axis labels either disappeared or the data doesn't plot correctly. In the link below, the data is displayed the way I intended.
Three subplots of rainfall
My timeseries data appears like this, where I'm only concerned about 'Reading' and 'Value':
Reading,Receive,Value,Unit,Quality
2018-04-07 13:09:28,2018-04-07 13:09:35,0.00,in,A
2018-04-07 06:01:25,2018-04-07 06:01:35,0.04,in,A
2018-04-07 04:38:15,2018-04-07 04:38:35,0.04,in,A
Here is how I achieved the correct scheme in the second plot:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.patches as patches
import matplotlib.dates as mdates
import datetime as dt
#read data from csv
data2 = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Accumulation_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data2.set_index('Reading',inplace=True)
#plot data
ax2 = plt.subplot(3, 1, 2)
data2.plot(ax=ax2)
#set ticks every 12 hours
ax2.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0,24,12)))
plt.xticks(rotation=0, ha='center')
#format date
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %d\n%H:%M:%S'))
ax2.legend().set_visible(False)
ax2.set_title('Accumulated Rainfall\nApril 5-7, 2018')
ax2.set_xlabel('')
ax2.set_ylabel('Inches Since Oct 1 2017')
ax2.set_ylim(17.5, 22)
arrow_date2 = mdates.datestr2num('04/07/2018 04:30:00')
start_date2 = mdates.datestr2num('04/07/2018 03:00:00')
end_date2 = mdates.datestr2num('04/07/2018 06:00:00')
text_date2 = mdates.datestr2num('04/07/2018 03:00:00')
ax2.axvspan(start_date2, end_date2, 0.86, 0.97, color='green', alpha=0.35)
ax2.annotate("Approximate time of\nSlope Failure", xy=(arrow_date2, 21.5), xycoords='data', xytext=(text_date2, 19), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
My code so far for the third subplot:
#read data from csv
data =pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum().reset_index()
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(kind='bar',ax=ax3, x='Reading', y='Value', width=0.9)
#set ticks every other hour
plt.xticks(ha='center')
for label in ax3.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
plt.show()
How do I fix my code to make the axis labels plot in the way I want them to plot?
My code was wrong, obviously. When I resampled the data, I reset the index. This created a new index column that was messing with my desired x values ('Reading'). Additionally, I shouldn't have been plotting 'x' in resamp.plot. This solution helped: Plotting with Pandas. Here is the corrected code:
#read data from csv
data = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum() # changed here
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(ax=ax3, y='Value', kind='bar', width=0.9) # changed here
ax3.set_xticklabels([dt.strftime('%b %d\n%H:%M:%S') for dt in resamp.index])
plt.xticks(rotation=0, ha='center')
for i, tick in enumerate(ax3.xaxis.get_major_ticks()):
if (i % (4) != 0): # 4 hours
tick.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
ax3.set_ylim(0.00, 0.40)
plt.show()

Matplotlib Different Scaled Y-Axes

I have a dataframe with the data below.
ex_dict = {'revenue': [613663, 1693667, 2145183, 2045065, 2036406,
1708862, 1068232, 1196899, 2185852, 2165778, 2144738, 2030337,
1784067],
'abs_percent_diff': [0.22279211315310588, 0.13248909660765254,
0.12044821447874667, 0.09438674840975962, 0.1193588387687364,
0.062100921139322744, 0.05875297161175445, 0.06240362963749895,
0.05085338590212515, 0.034877614941165744, 0.012263947005671703,
0.029227374323993634, 0.023411816504907524],
'ds': [dt.date(2017,1,1), dt.date(2017,1,2), dt.date(2017,1,3),
dt.date(2017,1,4), dt.date(2017,1,5), dt.date(2017,1,6),
dt.date(2017,1,7), dt.date(2017,1,8), dt.date(2017,1,9),
dt.date(2017,1,10), dt.date(2017,1,11), dt.date(2017,1,12),
dt.date(2017,1,13)],
'yhat_normal': [501853.9074623253, 1952329.3521464923, 1914575.7673396615,
1868685.8215084015, 1819261.1068672044, 1608945.031482406,
1008953.0123101478, 1126595.36037955, 2302965.598289115,
2244044.9351591542, 2171367.536396199, 2091465.0313570146,
1826836.562382966]}
df_vis=pd.DataFrame.from_dict(ex_dict)
I want to graph yhat_normal and revenue on the same y-axis and abs_percent_diff on a y-axis with a different scale.
df_vis = df_vis.set_index('ds')
df_vis[['rev', 'yhat_normal']].plot(figsize=(20, 12))
I can easily graph rev and yhat_normal with the code above, but I am struggling to get abs_percent_diff on a different y-axis scale. I tried converting my columns to numpy arrays and doing this, but it looks terrible.
npdate = df_vis.as_matrix(columns= ['ds'])
nppredictions = df_vis.as_matrix(columns= ['yhat_normal'])
npactuals = df_vis.as_matrix(columns= ['rev'])
npmape = df_vis.as_matrix(columns=['abs_percent_diff'])
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
fig.set_size_inches(20,10)
ax1.plot_date(npdate, nppredictions, ls= '-', color= 'b')
ax1.plot_date(npdate, npactuals, ls='-', color='g')
ax2.plot_date(npdate, npmape, 'r-')
ax1.set_xlabel('X data')
ax1.set_ylabel('Y1 data', color='g')
ax2.set_ylabel('Y2 data', color='b')
plt.show()
This is what I want. Where the red line is the abs_percent_diff. Obviously, I drew the line by hand so it is not accurate.
I'm not sure if I got the problem correclty, but it seems you simply want to draw one of the dataframe columns at the bottom of the plot area.
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
ex_dict = {'revenue': [613663, 1693667, 2145183, 2045065, 2036406,
1708862, 1068232, 1196899, 2185852, 2165778, 2144738, 2030337,
1784067],
'abs_percent_diff': [0.22279211315310588, 0.13248909660765254,
0.12044821447874667, 0.09438674840975962, 0.1193588387687364,
0.062100921139322744, 0.05875297161175445, 0.06240362963749895,
0.05085338590212515, 0.034877614941165744, 0.012263947005671703,
0.029227374323993634, 0.023411816504907524],
'ds': [dt.date(2017,1,1), dt.date(2017,1,2), dt.date(2017,1,3),
dt.date(2017,1,4), dt.date(2017,1,5), dt.date(2017,1,6),
dt.date(2017,1,7), dt.date(2017,1,8), dt.date(2017,1,9),
dt.date(2017,1,10), dt.date(2017,1,11), dt.date(2017,1,12),
dt.date(2017,1,13)],
'yhat_normal': [501853.9074623253, 1952329.3521464923, 1914575.7673396615,
1868685.8215084015, 1819261.1068672044, 1608945.031482406,
1008953.0123101478, 1126595.36037955, 2302965.598289115,
2244044.9351591542, 2171367.536396199, 2091465.0313570146,
1826836.562382966]}
df_vis=pd.DataFrame.from_dict(ex_dict)
df_vis = df_vis.set_index('ds')
ax = df_vis[['revenue','yhat_normal']].plot(figsize=(13, 8))
ax2 = df_vis['abs_percent_diff'].plot(secondary_y=True, ax=ax)
ax2.set_ylim(0,1)
plt.show()

Categories

Resources