I have a dataframe with a DatetimeIndex like so:
In [3]: index = pd.date_range('September 1 2014', 'September 1 2015', freq='M')
...: index
Out[3]:
DatetimeIndex(['2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31',
'2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
'2015-05-31', '2015-06-30', '2015-07-31', '2015-08-31'],
dtype='datetime64[ns]', freq='M'
Plotting without changing the x tick labels or explicit date formatting yields an x-axis from 0-12.
My figure contains 13 subplots in one column. I'm trying to set the x-axis on the last plot using AutoDateLocator() at the end of the code after all the subplots are plotted:
fig.axes[-1].xaxis.set_major_locator(mdates.AutoDateLocator())
Which returns the following error:
ValueError: ordinal must be >= 1
I tried converting the dataframe index with dates2num as suggested here but it yielded the same result:
df.index = mdates.date2num(df.index.to_pydatetime())
I tried consulting the documentation but I couldn't dig up any other way to make matplotlib recognize the x-axis as dates.
Here is the complete code:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
index = pd.date_range('September 1 2014', 'September 1 2015', freq='M')
data = np.random.random([12, 13])
df = DataFrame(data=data, index=index)
# Draw figure
fig = plt.figure(figsize=(19,20), dpi=72)
fig.suptitle('Chart', fontsize=40, color='#333333')
# Draw charts
for i in range(0, len(df)):
ax = plt.subplot(len(df),1, i+1)
# Set ticks and labels
ax.tick_params(direction='in', length=45, pad=60,colors='0.75')
ax.yaxis.set_tick_params(size=0)
ax.set_yticklabels([])
ax.set_ylim([0, 2])
plt.ylabel(df.columns[i], rotation=0, labelpad=60, fontsize=20, color='#404040')
# Remove spines
sns.despine(ax=ax, bottom=True)
# Draw baseline, data, and threshold lines
# Threshold
ax.axhline(1.6, color='#a0db8e', linestyle='--', label='Threshold')
# Draw baseline
ax.axhline(1, color='0.55',label='Enterprise')
# Plot data
ax.plot(df.iloc[:, i], color='#14509f',label='Data')
# Subplot spacing
fig.subplots_adjust(hspace=1)
# Hide tick labels and first/last tick lines
plt.setp([a.get_xticklabels() for a in fig.axes[:-1]], visible=False)
plt.setp([a.get_xticklines()[-2:] for a in fig.axes],visible=False)
# Date in x-axis
fig.axes[-1].xaxis.set_major_locator(mdates.AutoDateLocator())
fig.axes[-1].xaxis.set_major_formatter(mdates.DateFormatter('%Y.%m.%d'))
Related
I'm attempting to plot a pandas stacked bar plot with the x axis showing Months on the major ticks, or years on Jan 1, ideally with small ticks identifying the weeks but with no label.
I have a dataset with a datetime index that was then grouped by week and then I plot that dataset. If I don't attempt to control the settings the dates show up but are vertical and don't fit. So I used the set formatter to fix that but then the axes changed to 1970 as if following an index number instead of date. If I replace the pandas plotting with a regular bar chart, the "ConciseDateFormatter" works as desired/expected. But I wanted to use stacked with pandas as creating a regular stacked bar chart is a pain. I don't understand why I can't control pandas axes like I can a regular plot.
One thing I notice is that the index is shown as an object. If I convert it to to_datetime() it then adds 00:00 for times that I don't want on the axes or my data.
My data is a simple set of weekly random data:
date A B C D
3/20/2022 1.540765154 0.504616419 1.543679189 2.952934623
3/27/2022 1.781135128 4.594966635 4.799026389 3.499803401
4/3/2022 0.254059207 0.69835265 0.323039575 1.628138491
4/10/2022 3.112760301 0.287056897 4.372938373 0.130817579
4/17/2022 0.497273044 0.913246096 1.296612207 1.250610278
4/24/2022 1.370087689 3.124985109 4.322253295 4.49571603
5/1/2022 3.952629538 3.976896924 1.679311114 1.265443147
5/8/2022 3.470328161 1.266161308 3.990502436 1.364929959
5/15/2022 2.296588269 4.639761391 0.04685036 1.438471692
5/22/2022 3.443458637 2.66592719 0.968656871 2.349325343
5/29/2022 1.820278464 4.794211675 2.435710815 2.156110694
6/5/2022 4.328825266 0.049132356 1.842839099 3.665701299
6/12/2022 0.184631564 0.412976815 4.787477069 4.80052839
6/19/2022 4.846734385 3.471474741 1.808871854 2.440013553
6/26/2022 1.612870444 0.70191857 3.55713114 1.438699834
7/3/2022 2.896859156 4.025996887 0.209608767 4.174881655
Code:
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
maxval = 200
values = ['A','B','C','D']
cum = [v + '_CUM' for v in values]
df = pd.read_csv('test_data.csv', index_col='date', parse_dates=True,
infer_datetime_format=True)
#df.index = pd.to_datetime(df.index.date).strftime("%b %d")
df = df.join(df.cumsum(), rsuffix="_CUM")
df = df.join(df[cum]/maxval * 100, rsuffix="_LIFE")
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=False, squeeze=False,
facecolor='white')
axs = axs.flatten()
ax = axs[0]
df[values].plot.bar(ax=ax, grid=True, stacked=True, legend=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter
(ax.xaxis.get_major_locator()))
# ax.xaxis.set_tick_params(rotation = 0)
plt.show(block=False)
I have plotted a heatmap which is displayed below. on the xaxis it shows time of the day and y axis shows date. I want to show xaxis at every hour instead of the random xlabels it displays here.
I tried following code but the resulting heatmap overrites all xlabels together:
t = pd.date_range(start='00:00:00', end='23:59:59', freq='60T').time
df = pd.DataFrame(index=t)
df.reset_index(inplace=True)
df['index'] = df['index'].astype('str')
sns_hm = sns.heatmap(data=mat, cbar=True, lw=0,cmap=colormap,xticklabels=df['index'])
The following code supposes mat is a dataframe with columns for some timestamps for each of a number of days. Each of the days, the same timestamps need to appear again.
After drawing the heatmap, the left and right limits of the x-axis are retrieved. Supposing these go from 0 to 24 hour, the range can be subdivided into 25 positions, one for each of the hours.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
from matplotlib.colors import ListedColormap, to_hex
# first, create some test data
df = pd.DataFrame()
df["date"] = pd.date_range('20220304', periods=19000, freq=DateOffset(seconds=54))
df["val"] = (((np.random.rand(len(df)) ** 100).cumsum() / 2).astype(int) % 2) * 100
df['day'] = df['date'].dt.strftime('%d-%m-%Y')
df['time'] = df['date'].dt.strftime('%H:%M:%S')
mat = df.pivot(index='day', columns='time', values='val')
colors = list(plt.cm.Greens(np.linspace(0.2, 0.9, 10)))
ax = sns.heatmap(mat, cmap=colors, cbar_kws={'ticks': range(0, 101, 10)})
xmin, xmax = ax.get_xlim()
tick_pos = np.linspace(xmin, xmax, 25)
tick_labels = [f'{h:02d}:00:00' for h in range(len(tick_pos))]
ax.set_xticks(tick_pos)
ax.set_xticklabels(tick_labels, rotation=90)
ax.set(xlabel='', ylabel='')
plt.tight_layout()
plt.show()
The left plot shows the default tick labels, the right plot the customized labels.
Hi I am trying to change the x-axis into a month axis but cant seem to do it.
I can't find a way to insert the datetime axis, and when I do I run into errors.
This is my datetime variables, basically its simple, just want to create a weekly counter that starts on 2020 for 1 year:
date = pd.date_range(start='2020-01-01', periods=52, freq='w')
This is my plot code:
cumm19=fcst2019.cumsum()
cumm20=fcst2020.cumsum()
cumm20w=fcst2020w.cumsum()
plt.axhline(y=cumm19['no._of_cases'][51], color='r', label= '2019 numbers (15,910)')
plt.plot(cumm20[:21], label ='2020 current')
plt.plot(cumm20[20:], label ='2020 predictions')
plt.plot(cumm20w[20:], label ='2020 worse case predictions')
plt.xlabel('Months')
plt.ylabel('Cumulative no. of cases')
plt.legend()
plt.show()
How do i combine them?
If you set the datetime variable as the index of the dataframes, the x-axis ticks automatically get date labels with an appropriate format:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
import matplotlib.pyplot as plt # v 3.3.4
import matplotlib.dates as mdates
# Create sample datasets
date = pd.date_range(start='2020-01-01', periods=52, freq='w')
fcst2020 = pd.DataFrame(dict(fcst=np.repeat([500, 750], [21, 31])), index=date)
fcst2020w = pd.DataFrame(dict(fcst=np.repeat([500, 600], [21, 31])), index=date)
cumm20 = fcst2020.cumsum()
cumm20w = fcst2020w.cumsum()
# Plot data
plt.plot(cumm20[:21], label ='2020 current')
plt.plot(cumm20[20:], label ='2020 predictions')
plt.plot(cumm20w[20:], label ='2020 worse case predictions')
plt.axhline(y=15910, color='r', label= '2019 numbers (15,910)', zorder=0)
plt.xlabel('Months')
plt.ylabel('Cumulative no. of cases')
plt.legend()
plt.show()
You can customize the format of the tick labels by using the matplotlib dates DateFormatter:
# Edit x-axis tick labels
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m'))
plt.show()
I want to display my third plot x-axis data in the datetime like my other two plots (see linked figure). I have used similar approaches to each graph, but resampled the third dataset to plot precipitation in a bar graph for every hour in my time period. When I originally attempted to format the date for the third plot as I did in the previous two, the x-axis labels either disappeared or the data doesn't plot correctly. In the link below, the data is displayed the way I intended.
Three subplots of rainfall
My timeseries data appears like this, where I'm only concerned about 'Reading' and 'Value':
Reading,Receive,Value,Unit,Quality
2018-04-07 13:09:28,2018-04-07 13:09:35,0.00,in,A
2018-04-07 06:01:25,2018-04-07 06:01:35,0.04,in,A
2018-04-07 04:38:15,2018-04-07 04:38:35,0.04,in,A
Here is how I achieved the correct scheme in the second plot:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.patches as patches
import matplotlib.dates as mdates
import datetime as dt
#read data from csv
data2 = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Accumulation_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data2.set_index('Reading',inplace=True)
#plot data
ax2 = plt.subplot(3, 1, 2)
data2.plot(ax=ax2)
#set ticks every 12 hours
ax2.xaxis.set_major_locator(mdates.HourLocator(byhour=range(0,24,12)))
plt.xticks(rotation=0, ha='center')
#format date
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b %d\n%H:%M:%S'))
ax2.legend().set_visible(False)
ax2.set_title('Accumulated Rainfall\nApril 5-7, 2018')
ax2.set_xlabel('')
ax2.set_ylabel('Inches Since Oct 1 2017')
ax2.set_ylim(17.5, 22)
arrow_date2 = mdates.datestr2num('04/07/2018 04:30:00')
start_date2 = mdates.datestr2num('04/07/2018 03:00:00')
end_date2 = mdates.datestr2num('04/07/2018 06:00:00')
text_date2 = mdates.datestr2num('04/07/2018 03:00:00')
ax2.axvspan(start_date2, end_date2, 0.86, 0.97, color='green', alpha=0.35)
ax2.annotate("Approximate time of\nSlope Failure", xy=(arrow_date2, 21.5), xycoords='data', xytext=(text_date2, 19), textcoords='data', arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
My code so far for the third subplot:
#read data from csv
data =pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum().reset_index()
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(kind='bar',ax=ax3, x='Reading', y='Value', width=0.9)
#set ticks every other hour
plt.xticks(ha='center')
for label in ax3.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
plt.show()
How do I fix my code to make the axis labels plot in the way I want them to plot?
My code was wrong, obviously. When I resampled the data, I reset the index. This created a new index column that was messing with my desired x values ('Reading'). Additionally, I shouldn't have been plotting 'x' in resamp.plot. This solution helped: Plotting with Pandas. Here is the corrected code:
#read data from csv
data = pd.read_csv('Arroyo_Corte_Madera_del_Presidio_38021_Precipitation_Increment_0.txt', usecols=['Reading','Value'], parse_dates=['Reading'])
#set date as index
data.set_index('Reading',inplace=True)
resamp = data.resample('1H').sum() # changed here
#plot data
ax3 = plt.subplot(3, 1, 3)
resamp.plot(ax=ax3, y='Value', kind='bar', width=0.9) # changed here
ax3.set_xticklabels([dt.strftime('%b %d\n%H:%M:%S') for dt in resamp.index])
plt.xticks(rotation=0, ha='center')
for i, tick in enumerate(ax3.xaxis.get_major_ticks()):
if (i % (4) != 0): # 4 hours
tick.set_visible(False)
ax3.legend().set_visible(False)
ax3.set_title('Rainfall in Hours\nApril 6-7, 2018')
ax3.set_xlabel('')
ax3.set_ylabel('Precipitation Increment (in)')
ax3.set_ylim(0.00, 0.40)
plt.show()
I'm trying to build matplotlib charts whose x-axis is a dateIndex from a pandas dataframe. Trying to mimic some examples from matplotlib, I've been unsuccessful. The xaxis ticks and labels never appear.
I thought maybe matplotlib wasn't properly digesting the pandas index, so I converted it to ordinal with the matplotlib date2num helper function, but that gave the same result.
# https://matplotlib.org/api/dates_api.html
# https://matplotlib.org/examples/api/date_demo.html
import datetime as dt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
import matplotlib.dates as mpd
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
majorLocator = years
majorFormatter = yearsFmt #FormatStrFormatter('%d')
minorLocator = months
y1 = np.arange(100)*0.14+1
y2 = -(np.arange(100)*0.04)+12
"""neither of these indices works"""
x = pd.date_range(start='4/1/2012', periods=len(y1))
#x = map(mpd.date2num, pd.date_range(start='4/1/2012', periods=len(y1)))
fig, ax = plt.subplots()
ax.plot(x,y1)
ax.plot(x,y2)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
datemin = x[0]
datemax = x[-1]
ax.set_xlim(datemin, datemax)
fig.autofmt_xdate()
plt.show()
The problem is the following. pd.date_range(start='4/1/2012', periods=len(y1)) creates dates from the first of April 2012 to the 9th of July 2012.
Now you set the major locator to be a YearLocator. This means, that you want to have a tick for each year on the axis. However, all dates are within the same year 2012. So there is no major tick to be shown within the plot range.
The suggestion would be to use a MonthLocator instead, such that the first of each month is ticked. Also if would make sense to use a formatter, which actually shows the months, e.g. '%b %Y'. You may use a DayLocator for the minor ticks, if you want, to show the small tickmarks for each day.
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.xaxis.set_minor_locator(mdates.DayLocator())
Complete example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
y1 = np.arange(100)*0.14+1
y2 = -(np.arange(100)*0.04)+12
x = pd.date_range(start='4/1/2012', periods=len(y1))
fig, ax = plt.subplots()
ax.plot(x,y1)
ax.plot(x,y2)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.xaxis.set_minor_locator(mdates.DayLocator())
fig.autofmt_xdate()
plt.show()
You could use pd.DataFrame.plot to handle most of that
df = pd.DataFrame(dict(
y1=y1, y2=y2
), index=x)
df.plot()