Related
Setting minute minor ticks for 1-second sampled data raises: OverflowError: int too big to convert
Consider this dataframe with a sample interval of 1 second that spans about 30 minutes:
import matplotlib.pyplot as plt
from matplotlib.dates import MinuteLocator
import pandas as pd
ndex = pd.date_range('2021-08-01 07:07:07', '2021-08-01 07:41:12', freq='1S', name='Time')
df = pd.DataFrame(data=np.random.randint(1, 100, len(ndex)), index=ndex, columns=['A'])
And now we plot it:
fig, ax = plt.subplots()
df.plot(color='red', marker='x', lw=0, ms=0.2, ax=ax)
Which creates a plot without any complaints:
Now I'd like to have minor ticks at every minute.
I've tried this:
ax.xaxis.set_minor_locator(MinuteLocator())
But that fails with OverflowError: int too big to convert
pandas.DataFrame.plot uses matplotlib as the default plotting backend, but it encodes date ticks as unix timestamps, which results in OverflowError: int too big to convert.
The default here is kind='line', but marker='x', lw=0, ms=0.2 are used in the OP to make a hacky scatter plot.
pandas.DataFrame.plot.scatter will work correctly.
Using matplotlib.pyplot.scatter will work as expected.
matplotlib: Date tick labels
Matplotlib date plotting is done by converting date instances into days since an epoch (by default 1970-01-01T00:00:00)
seaborn.scatterplot will also work:
sns.scatterplot(x=df.index, y=df.A, color='red', marker='x', ax=ax)
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
matplotlib.pyplot.scatter
The extra formatting has the effect of removing the month ('01') that would precede the time in the tick labels (e.g. '%m %H:%M').
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(25, 6))
ax.scatter(x=df.index, y=df.A, color='red', marker='x')
hourlocator = mdates.HourLocator(interval=1) # adds some extra formatting, but not required
majorFmt = mdates.DateFormatter('%H:%M') # adds some extra formatting, but not required
ax.xaxis.set_major_locator(mdates.MinuteLocator())
ax.xaxis.set_major_formatter(majorFmt) # adds some extra formatting, but not required
_ = plt.xticks(rotation=90)
pandas.DataFrame.plot.scatter
Also pandas.DataFrame.plot with kind='scatter'
ax = df.reset_index().plot(kind='scatter', x='Time', y='A', color='red', marker='x', figsize=(25, 6), rot=90)
# reset the index so Time will be a column to assign to x
ax = df.reset_index().plot.scatter(x='Time', y='A', color='red', marker='x', figsize=(25, 6), rot=90)
ax.xaxis.set_major_locator(mdates.MinuteLocator())
Note the difference in the xticks produced by the two methods
pandas.DataFrame.plot xticks
ax = df.plot(color='red', marker='x', lw=0, ms=0.2, figsize=(25, 6))
# extract the xticks to see the format
ticks = ax.get_xticks()
print(ticks)
[out]:
array([1627801627, 1627803672], dtype=int64)
# convert the column to unix format to compare
(df.index - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
[out]:
Int64Index([1627801627, 1627801628, 1627801629, 1627801630, 1627801631,
1627801632, 1627801633, 1627801634, 1627801635, 1627801636,
...
1627803663, 1627803664, 1627803665, 1627803666, 1627803667,
1627803668, 1627803669, 1627803670, 1627803671, 1627803672],
dtype='int64', name='Time', length=2046)
matplotlib.pyplot.scatter xticks
fig, ax = plt.subplots(figsize=(25, 6))
ax.scatter(x=df.index, y=df.A, color='red', marker='x')
ticks2 = ax.get_xticks()
print(ticks2)
[out]:
array([18840.29861111, 18840.30208333, 18840.30555556, 18840.30902778,
18840.3125 , 18840.31597222, 18840.31944444])
im having serious trouble modifying how and which x-axis labels are presented in my plot.
I have a datetime index and want to reduce the number of xticks been shown and remove the year from it. Should be simple, right?! But, for some reason, the plot disappears after i set major formatter and locator. Here is a working example:
import datetime
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
teste = pd.DataFrame(index=pd.date_range('2019-01-01','2019-12-31',freq='2D'),columns=['A','B','C'])
teste['A']=.4
teste['B']=.5
teste['C']=.1
for col in teste.columns:
variation = np.random.rand(len(teste))
teste[col]+=variation/10.0
teste['total']=teste.sum(axis=1)
for col in teste.columns:
teste[col]/=teste['total']
ax = plt.figure(figsize=(24,10)).add_axes([0,0,1,1])
teste.drop('total',axis=1).plot(kind='bar',stacked='True',ax=ax,width=1,colormap='coolwarm')
ax.tick_params(labelsize=14)
ax.set_xlabel('')
ax.set_title('Teste',fontsize=28)
ax.set_ylabel('Share (%)',fontsize=22)
ax.tick_params(axis='both',labelsize=20)
ax.legend(bbox_to_anchor=(1.05, 1),fontsize=22, loc='upper left', borderaxespad=0.)
As you can see, the xticks are unreadable. But when i try to format:
ax = plt.figure(figsize=(24,10)).add_axes([0,0,1,1])
teste.drop('total',axis=1).plot(kind='bar',stacked='True',ax=ax,width=1,colormap='RdBu')
ax.xaxis_date()
ax.xaxis.set_major_locator(mdates.DayLocator(interval=10))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d/%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%d/%m"))
ax.set_xlim(teste.index[0],teste.index[-1])
ax.margins(0)
ax.tick_params(labelsize=14)
ax.set_xlabel('')
ax.set_title('Teste',fontsize=28)
ax.set_ylabel('Share (%)',fontsize=22)
ax.tick_params(axis='both',labelsize=20)
ax.legend(bbox_to_anchor=(1.05, 1),fontsize=22, loc='upper left', borderaxespad=0.)
The plot vanishes. What am i doing wrong? I`ve tried everything. plt.MaxNLocator(N=10) also doesn't work. It spreads the first N points all over the axis, completely disregarding where it actually should be.
Any help would be greatly appreciated.
Thanks in advance,
Edit: #Trenton McKinney:
Removing ax.set_xlim(teste.index[0],teste.index[-1]) makes the plot appear but without the xticks.
I used the method shown on the Matplotlib website: Stacked Bar Graph
With a bar plot, every bar has a location [0, ..., n]
ind selects the locs to label
dates are the names of the selected ticks
ax = plt.figure(figsize=(24,10)).add_axes([0,0,1,1])
teste.drop('total',axis=1).plot(kind='bar',stacked='True',ax=ax,width=1,colormap='RdBu')
# locations of tick marks to label
ind = np.arange(0, len(teste.index)+1, 10)
# label for ticks
dates = teste.index.date[0::10] # %y-%m-%d format
# dates = teste.index.strftime('%d/%m')[0::10] # %d/%m format
# set the xticks
plt.xticks(ind, dates)
# only used to show locs and labels if you're having trouble
# locs, labels = plt.xticks()
# label_t = [x.get_text() for x in labels]
# formatting
ax.margins(0)
ax.tick_params(labelsize=14)
ax.set_xlabel('')
ax.set_title('Teste',fontsize=28)
ax.set_ylabel('Share (%)',fontsize=22)
ax.tick_params(axis='both',labelsize=20)
ax.legend(bbox_to_anchor=(1.05, 1),fontsize=22, loc='upper left', borderaxespad=0.)
plt.show()
Optionally
fig, ax = plt.subplots(figsize=(20, 8))
p1 = ax.bar(teste.index, teste.A)
p2 = ax.bar(teste.index, teste.B, bottom=teste.A)
p3 = ax.bar(teste.index, teste.C, bottom=teste.A+teste.B)
ax.xaxis_date()
ax.xaxis.set_major_locator(mdates.DayLocator(interval=10))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d/%m"))
ax.set_xlim(teste.index[0],teste.index[-1])
plt.xticks(rotation=45, ha='right') # or (rotation=90, ha='center')
plt.show()
I have a log which describes my home ADSL speeds.
Log entries are in the following format, where the fields are datetime;level;downspeed;upspeed;testhost:
2020-01-06 18:09:45;INFO;211.5;29.1;0;host:spd-pub-rm-01-01.fastwebnet.it
2020-01-06 18:14:39;WARNING;209.9;28.1;0;host:spd-pub-rm-01-01.fastwebnet.it
2020-01-08 10:51:27;INFO;211.6;29.4;0;host:spd-pub-rm-01-01.fastwebnet.it
(for a full sample file -> https://www.dropbox.com/s/tfmj9ozxe5millx/test.log?dl=0 for you to download for the code below)
I wish to plot a matplot figure with the download speeds on the left axis, the upload speeds (which are on a smaller and lower range of values) and have the shortened datetimes under the x tick marks possibly at 45 degrees angle.
"""Plots the adsl-log generated log."""
import matplotlib.pyplot as plt
# import matplotlib.dates as mdates
import pandas as pd
# set field delimiter and set column names which will also cause reading from row 1
data = pd.read_csv("test.log", sep=';', names=[
'datetime', 'severity', 'down', 'up', 'loss', 'server'])
# we need to filter out ERROR records (with 0 speeds)
indexNames = data[data['severity'] == 'ERROR'].index
data.drop(indexNames, inplace=True)
# convert datetime pandas objecti to datetime64
data['datetime'] = pd.to_datetime(data['datetime'])
# use a dataframe with just the data I need; cleaner
speeds_df = data[['datetime', 'down', 'up']]
speeds_df.info() # this shows datetime column is really a datetime64 value now
# now let's plot
fig, ax = plt.subplots()
y1 = speeds_df.plot(ax=ax, x='datetime', y='down', grid=True, label="DL", legend=True, linewidth=2,ylim=(100,225))
y2 = speeds_df.plot(ax=ax, x='datetime', y='up', secondary_y=True, label="UL", legend=True, linewidth=2, ylim=(100,225))
plt.show()
I am now obtaining the plot I need but would appreciate some clarification about the roles of the ax, y1 and y2 axes in the above code.
First, assigning y1 and y2 objects is unnecessary as you will never use them later on. Also, legend=True is the default.
Per matplotlib.pyplot.subplots docs, the return of ax is:
ax : axes.Axes object or array of Axes objects
Per pandas.DataFrame.plot, the ax argument:
ax : matplotlib axes object, default None
Therefore, you are first initializing an array of axes objects (defaulting to one item, nrow=1 and nrow=2), and then assigning it/them according to the pandas plots. Now, normally, you would be overwriting the assignment of ax with ax=ax, but since you employ a secondary y-axis, plots overlay with each other:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(figsize=(8,4))
# ASSIGN AXES OBJECTS ACCORDINGLY
speeds_df.plot(ax=axs, x='datetime', y='down', grid=True, label="DL", linewidth=2, ylim=(100,225))
speeds_df.plot(ax=axs, x='datetime', y='up', secondary_y=True, label="UL", linewidth=2, ylim=(100,225))
plt.show()
To illustrate how axes objects can be extended, see below with multiple (non-overlaid) plots.
Example of multiple subplots using nrows=2:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(nrows=2, figsize=(8,4))
# ASSIGN AXES OBJECTS WITH INDEXING AND NO Y LIMITS
speeds_df.plot(ax=axs[0], x='datetime', y='down', grid=True, label="DL", linewidth=2)
plt.subplots_adjust(hspace = 1)
speeds_df.plot(ax=axs[1], x='datetime', y='up', label="UL", linewidth=2)
plt.show()
Example of multiple plots using ncols=2:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(ncols=2, figsize=(12,4))
# ASSIGN AXES OBJECTS WITH INDEXING AND NO Y LIMITS
speeds_df.plot(ax=axs[0], x='datetime', y='down', grid=True, label="DL", linewidth=2)
speeds_df.plot(ax=axs[1], x='datetime', y='up', label="UL", linewidth=2)
plt.show()
You can even use subplots=True after setting date/time field as index:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(figsize=(8,4))
# ASSIGN AXES OBJECT PLOTTING ALL COLUMNS
speeds_df.set_index('datetime').plot(ax=axs, subplots=True, grid=True, label="DL", linewidth=2)
plt.show()
So thanks to #Parfait I hope I understood things correctly. Here the working code:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
###### Prepare the data to plot
# set field delimiter and set column names which will also cause reading from row 1
data = pd.read_csv('test.log', sep=';', names=[
'datetime', 'severity', 'down', 'up', 'loss', 'server'])
# we need to filter out ERROR records (with 0 speeds)
indexNames = data[data['severity'] == 'ERROR'].index
data.drop(indexNames, inplace=True)
# convert datetime pandas object to datetime64
data['datetime'] = pd.to_datetime(data['datetime'])
# use a dataframe with just the data I need; cleaner
speeds_df = data[['datetime', 'down', 'up']]
# now plot the graph
fig, ax = plt.subplots()
color = 'tab:green'
ax.set_xlabel('thislabeldoesnotworkbutcolordoes', color=color)
ax.tick_params(axis='x', labelcolor=color)
color = 'tab:red'
speeds_df.plot(ax=ax, x='datetime', y='down', label="DL", legend=True, linewidth=2, color=color)
ax.set_ylabel('DL', color=color)
ax.tick_params(axis='y', labelcolor=color)
color = 'tab:blue'
ax2 = speeds_df.plot(ax=ax, x='datetime', y='up', secondary_y=True, label="UL", legend=True, linewidth=2, color=color)
ax2.set_ylabel('UL', color=color)
ax2.tick_params(axis='y', labelcolor=color)
# using ylim in the plot command params does not work the same
# cannot show a grid since the two scales are different
ax.set_ylim(10, 225)
ax2.set_ylim(15, 50)
plt.show()
Which gives:
What I still don't get is:
a) why the x-axis label only seems to honour the color but not the string value :(
b) why the ylim=(n,m) parameters in the df plot does not work well and I have to use the ax.set_ylim constructs instead
I have a 2x2 graph with date in x-axis in both graphs. I have used datetime.strptime to bring a string into type = datetime.datetime object format.
However I am planning to have some 12 subplots and doing this the following way seems messy.
Is there a better 'pythonic' way?
This is what I have:
xx.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%y %H:%M'))
plt.grid(True)
plt.ylabel('paramA',fontsize=8, color = "blue")
plt.tick_params(axis='both', which='major', labelsize=8)
plt.plot(date_list, myarray[:,0], '-b', label='paramA')
plt.setp(plt.xticks()[1], rotation=30, ha='right') # ha is the same as horizontalalignment
xx = plt.subplot(2,1,2)
xx.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%y %H:%M'))
plt.grid(True)
plt.ylabel('paramB', 'amount of virtual mem',fontsize=8, color = "blue")
plt.tick_params(axis='both', which='major', labelsize=8)
plt.plot(date_list, myarray[:,1], '-y', label='paramB')plt.setp(plt.xticks()[1], rotation=30, ha='right') # ha is the same as horizontalalignment ```
PS: Initially I tried defining the plot as follows. This however did not work:
fig, axs = plt.subplots(2,1,figsize=(15,15))
plt.title('My graph')
for ax in enumerate(axs):
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%y %H:%M:%S'))
You failed to provide any data or a Minimal, Complete, and Verifiable example. Nevertheless, something like this should work. You can extend it to your real case by using desired number of rows and columns in the first command.
fig, axes = plt.subplots(nrows=2, ncols=3)
labels = ['paramA', 'paramB', 'paramC', 'paramD', 'paramE', 'paramF']
for i, ax in enumerate(axes.flatten()):
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%y %H:%M'))
ax.grid(True)
ax.set_ylabel(labels[i], fontsize=8, color="blue")
ax.tick_params(axis='both', which='major', labelsize=8)
ax.plot(date_list, myarray[:,i], '-b', label=labels[i])
plt.setp(plt.xticks()[1], rotation=30, ha='right') # ha is the same as horizontalalignment
EDIT:
Change your code to
fig, axs = plt.subplots(2,1,figsize=(15,15))
plt.title('My graph')
for ax in axs:
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%y %H:%M:%S'))
I am trying to figure out how change the rotation of dates on the x-axis of my chart. See below the image. I have examples of how to do it, but they don't match well as I have twin y axis's. Can you help change the rotation of the dates?
Here is my code:
fig, ax1 = plt.subplots()
fig = plt.figure(figsize=(8,6))
t = df['date']
s1 = df['msft']
ax1.plot(t, s1, 'b-')
ax1.set_xlabel('Dates')
ax1.legend(loc=0)
ax1.grid()
# Make the y-axis label, ticks and tick labels match the line color.
ax1.set_ylabel('Price', color='b')
ax1.tick_params('y', colors='b')
ax2 = ax1.twinx()
s2 = df['amzn']
ax2.plot(t, s2, 'r-')
ax2.set_ylabel('amzn', color='r')
ax2.tick_params('date', colors='r')
ax2.legend(loc=0)
fig.tight_layout()
plt.show()
I added ax1.set_xticklabels(t, rotation=45), this line got the dates to be at a 45 degree angle. –