Matplotlib dates.DateFormatter forcing the display of nanoseconds - python

I encounter an issue with Matplotlib.dates.DateFormatter :
I want to convert timestamps in Date format which is simple usually with the straftime but when using it on matplotlib i don't have the dynamic position on my graph so I used the md.DateFormatter('%H:%M:%S.%f') to have the X values as a date format with the dynamic index.
The fact is, my dates have too much values, I don't want the nanoseconds but I don't know how to remove them. I searched on StackOverflow to find a solution but applying a date[:-3] won't work as I have a datetime format...
Do you have a solution? It's maybe trivial but can't find any solution right now...
Thanks in advance.
NB : What I call the dynamic index is when you are on the graph and you can see the exact X and Y value of your pointer at the bottom
Here is an applicable example :
df =
timestamp val
0 2022-03-13 03:19:59.999070 X1
1 2022-03-13 03:20:00.004070 X2
2 2022-03-13 03:20:00.009070 X3
3 2022-03-13 03:20:00.014070 X4
And I try to plot this with :
ax=plt.gca()
xfmt = md.DateFormatter('%H:%M:%S.%f')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.setp(ax.get_xticklabels(), rotation=40)
plt.show()
In conclusin, what I want is to remove the 070 in the graph but if I remove it beforehand, DateFormatter will replace it by 000 which is as useless as it was..

If you want to change both the tick labels and the format of the number shown on the interactive status bar, you could define your own function to deliver your desired format, then use a FuncFormatter to display those values on your plot.
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as md
import pandas as pd
# dummy data
ts = pd.date_range("2022-03-13 03:19:59.999070",
"2022-03-13 03:20:00.014070", periods=4)
df = pd.DataFrame({'timestamp': ts, 'val':[0, 1, 2, 3]})
fig, ax = plt.subplots()
# define our own function to drop the last three characters
xfmt = lambda x, pos: md.DateFormatter('%H:%M:%S.%f')(x)[:-3]
# use that function as the major formatter, using FuncFormatter
ax.xaxis.set_major_formatter(plt.FuncFormatter(xfmt))
plt.setp(ax.get_xticklabels(), rotation=40)
ax.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.tight_layout()
plt.show()
Note the matching tick format and status bar format.
If, however, you do not want to change the tick labels, but only change the value on the status bar, we can do that by reassigning the ax.format_coord function, using the a similar idea for the function we defined above, but also adding in the y value for display
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as md
import pandas as pd
# dummy data
ts = pd.date_range("2022-03-13 03:19:59.999070",
"2022-03-13 03:20:00.014070", periods=4)
df = pd.DataFrame({'timestamp': ts, 'val':[0, 1, 2, 3]})
fig, ax = plt.subplots()
xfmt = md.DateFormatter('%H:%M:%S.%f')
xfmt2 = lambda x, y: "x={}, y={:g}".format(xfmt(x)[:-3], y)
# use original formatter here with microseconds
ax.xaxis.set_major_formatter(plt.FuncFormatter(xfmt))
# and the millisecond function here
ax.format_coord = xfmt2
plt.setp(ax.get_xticklabels(), rotation=40)
ax.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.tight_layout()
plt.show()
Note the difference between the status bar and the tick formats here.

Related

Measurement length for X and Y-axis

I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:

Plot xtick label with half hour frequency

I want the X label is like:
00:00 00:30 01:00 01:30 02:00 ...... 23:30
My code:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import random
data = [random.random() for i in range(48)]
times = pd.date_range('16-09-2017', periods=48, freq='30MIN')
fig, ax = plt.subplots(1)
fig.autofmt_xdate()
plt.plot(times, data)
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
But my X-Label looks like this:
Whats the problem?
I have 48 values, each value represents a value for a half hour of a day
You can use the MinuteLocator and explicitly set it for every 0 and 30 minutes.
minlocator = mdates.MinuteLocator(byminute=[0,30])
ax.xaxis.set_major_locator(minlocator)
And to clean it up - remove extraneous tick marks and fill out the empty space.
xticks = ax.get_xticks()
ax.set_xticks(xticks[2:-2]);
hh = pd.Timedelta('30min')
ax.set_xlim(times[0] - hh, times[-1] + hh)
Edit:
As my answer was already accepted but didn't work correctly, I added simplified solutions for matplotlib and pandas
The key is to set x-ticks parameter correctly
In your case it could look like this:
data = [random.random() for i in range(48)]
times = pd.date_range('16-09-2017', periods=48, freq='30MIN')
In both cases, you want to use only hours and minutes:
hour_minutes = times.strftime('%H:%M')
1. Matplotlib solution
plt.figure(figsize=(12,5))
plt.plot(range(len(data)),data)
# .plot(times, data)
plt.xticks(range(len(hour_minutes)), hour_minutes, size='small',
rotation=45, horizontalalignment='center')
plt.show()
2. Pandas solution
# create dataframe from arrays (not neccessary, but nice)
df = pd.DataFrame({'values': data,
'hour_minutes': hour_minutes})
# specify size of plot
value_plot = df.plot(figsize=(12,5), title='Value by Half-hours')
# first set number of ticks
value_plot.set_xticks(df.index)
# and label them after
value_plot.set_xticklabels(df.hour_minutes, rotation=45, size='small')
# get the plot figure and save it
fig = value_plot.get_figure()
fig.savefig('value_plot.png')
But I also like the alternative method that is proposed here :)

pandas .plot() x-axis tick frequency -- how can I show more ticks?

I am plotting time series using pandas .plot() and want to see every month shown as an x-tick.
Here is the dataset structure
Here is the result of the .plot()
I was trying to use examples from other posts and matplotlib documentation and do something like
ax.xaxis.set_major_locator(
dates.MonthLocator(revenue_pivot.index, bymonthday=1,interval=1))
But that removed all the ticks :(
I also tried to pass xticks = df.index, but it has not changed anything.
What would be the rigth way to show more ticks on x-axis?
No need to pass any args to MonthLocator. Make sure to use x_compat in the df.plot() call per #Rotkiv's answer.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.rand(100,2), index=pd.date_range('1-1-2018', periods=100))
ax = df.plot(x_compat=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.show()
formatted x-axis with set_major_locator
unformatted x-axis
You could also format the x-axis ticks and labels of a pandas DateTimeIndex "manually" using the attributes of a pandas Timestamp object.
I found that much easier than using locators from matplotlib.dates which work on other datetime formats than pandas (if I am not mistaken) and thus sometimes show an odd behaviour if dates are not converted accordingly.
Here's a generic example that shows the first day of each month as a label based on attributes of pandas Timestamp objects:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data
dim = 8760
idx = pd.date_range('1/1/2000 00:00:00', freq='h', periods=dim)
df = pd.DataFrame(np.random.randn(dim, 2), index=idx)
# select tick positions based on timestamp attribute logic. see:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html
positions = [p for p in df.index
if p.hour == 0
and p.is_month_start
and p.month in range(1, 13, 1)]
# for date formatting, see:
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
labels = [l.strftime('%m-%d') for l in positions]
# plot with adjusted labels
ax = df.plot(kind='line', grid=True)
ax.set_xlabel('Time (h)')
ax.set_ylabel('Foo (Bar)')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
plt.show()
yields:
Hope this helps!
The right way to do that described here
Using the x_compat parameter, it is possible to suppress automatic tick resolution adjustment
df.A.plot(x_compat=True)
If you want to just show more ticks, you can also dive deep into the structure of pd.plotting._converter:
dai = ax.xaxis.minor.formatter.plot_obj.date_axis_info
dai['fmt'][dai['fmt'] == b''] = b'%b'
After plotting, the formatter is a TimeSeries_DateFormatter and _set_default_format has been called, so self.plot_obj.date_axis_info is not None. You can now manipulate the structured array .date_axis_info to be to your liking, namely contain less b'' and more b'%b'
Remove tick labels:
ax = df.plot(x='date', y=['count'])
every_nth = 10
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Lower every_nth to include more labels, raise to keep fewer.

matplotlib how to specify time locator's start-ticking timestamp?

All I want is quite straight forward, I just want the locator ticks to start at a specified timestamp:
peudo code: locator.set_start_ticking_at( datetime_dummy )
I have no luck finding anything so far.
Here is the portion of the code for this question:
axes[0].set_xlim(datetime_dummy) # datetime_dummy = '2015-12-25 05:34:00'
import matplotlib.dates as matdates
seclocator = matdates.SecondLocator(interval=20)
minlocator = matdates.MinuteLocator(interval=1)
hourlocator = matdates.HourLocator(interval=12)
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
hourlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
axes[0].xaxis.set_major_locator(minlocator)
axes[0].xaxis.set_major_formatter(majorFmt)
plt.setp(axes[0].xaxis.get_majorticklabels(), rotation=90 )
axes[0].xaxis.set_minor_locator(seclocator)
axes[0].xaxis.set_minor_formatter(minorFmt)
plt.setp(axes[0].xaxis.get_minorticklabels(), rotation=90 )
# other codes
# save fig as a picture
The x axis ticks of above code will get me:
How do I tell the minor locator to align with the major locator?
How do I tell the locators which timestamp to start ticking at?
what I have tried:
set_xlim doesn't do the trick
seclocator.tick_values(datetime_dummy, datetime_dummy1) doesn't do anything
Instead of using the interval keyword parameter, use bysecond and byminute to specify exactly which seconds and minutes you with to mark. The bysecond and byminute parameters are used to construct a dateutil rrule. The rrule generates datetimes which match certain specified patterns (or, one might say, "rules").
For example, bysecond=[20, 40] limits the datetimes to those whose seconds
equal 20 or 40. Thus, below, the minor tick marks only appear for datetimes
whose soconds equal 20 or 40.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as matdates
N = 100
fig, ax = plt.subplots()
x = np.arange(N).astype('<i8').view('M8[s]').tolist()
y = (np.random.random(N)-0.5).cumsum()
ax.plot(x, y)
seclocator = matdates.SecondLocator(bysecond=[20, 40])
minlocator = matdates.MinuteLocator(byminute=range(60)) # range(60) is the default
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_locator(minlocator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90)
ax.xaxis.set_minor_locator(seclocator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90)
plt.subplots_adjust(bottom=0.5)
plt.show()
#unutbu: Many thanks: I've been looking everywhere for the answer to a related problem!
#eliu: I've adapted unutbu's excellent answer to demonstrate how you can define lists (to create different 'dateutil' rules) which give you complete control over which x-ticks are displayed. Try un-commenting each example below in turn and play around with the values to see the effect. Hope this helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
idx = pd.date_range('2017-01-01 05:03', '2017-01-01 18:03', freq = 'min')
df = pd.Series(np.random.randn(len(idx)), index = idx)
fig, ax = plt.subplots()
# Choose which major hour ticks are displayed by creating a 'dateutil' rule e.g.:
# Only use the hours in an explicit list:
# hourlocator = mdates.HourLocator(byhour=[6,12,8])
# Use the hours in a range defined by: Start, Stop, Step:
# hourlocator = mdates.HourLocator(byhour=range(8,15,2))
# Use every 3rd hour:
# hourlocator = mdates.HourLocator(interval = 3)
# Set the format of the major x-ticks:
majorFmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(hourlocator)
ax.xaxis.set_major_formatter(majorFmt)
#... and ditto to set minor_locators and minor_formatters for minor x-ticks if needed as well)
ax.plot(df.index, df.values, color = 'black', linewidth = 0.4)
fig.autofmt_xdate() # optional: makes 30 deg tilt on tick labels
plt.show()

Format datetime labels to include weekday name for pandas plot

I would like to add the corresponding weekday names (Mon, Tues, etc.) to the xlabels for a pandas timeseries plot.
import pandas as pd
import numpy as np
import pylab as p
import datetime
dates = pd.date_range(datetime.datetime.today().date(), periods=10, freq='D')
data = pd.DataFrame(np.arange(10),index=dates,columns=['A'])
a = data['A'].plot()
p.tight_layout()
p.show()
I have tried adjusting the formatting using:
from matplotlib.dates import DateFormatter
formatter = DateFormatter('%a %d-%m-%Y')
a.xaxis.set_major_formatter(formatter)
But this does not work, leading to incorrect day and year.
It seems there should be a very simple solution, but I cannot find it.
Here's what I thought would work but didn't:
from matplotlib.ticker import FuncFormatter
from matplotlib import pyplot as plt
ax = data.A.plot()
ax.xaxis.set_major_formatter(FuncFormatter(lambda d, _: d.strftime('%a')))
or
ax = plt.subplot()
ax.plot(data.index, data.A)
ax.xaxis.set_major_formatter(FuncFormatter(lambda d, _: d.strftime('%a')))
These both go wrong in different ways. It seems the formatter inputs turn out to be floats rather than dates in both cases. In the first the function only gets applied to the first and last ticks. You can see this by passing
ax.xaxis.set_major_formatter(FuncFormatter(lambda d, _: d)
Here's a solution which is pretty flexible:
ax = plt.subplot()
ax.plot(data.index, data.A)
ticks = ax.set_xticklabels([d.strftime('%a') for d in data.index])
You can swap the list comprehension in the last line for whatever you like.
EDIT:
I think I've figure out what these numbers representing the xticks mean.
In [37]:
ax = plt.subplot()
ax.plot(data.index, data.A)
print ax.get_xticks()
[ 735824. 735825. 735826. 735827. 735828. 735829. 735830. 735831.
735832. 735833.]
These seem to represent the number of days since the start of 1 AD: According to this: http://www.epochconverter.com/epoch/seconds-days-since-year-0.php
"There are 736189 days between 0000-00-00 and today (Aug 14, 2015)."
Which is exactly 735824 (the first tick) + 365. So far so bad.
You could (I won't bother) write a function to convert this number and ones like into dates. Another approach would be:
def get_day(tick):
date = dates[0] + datetime.timedelta(tick - ticks[0])
return date.strftime('%a')
ax = plt.subplot()
ax.plot(data.index, data.A)
ticks = ax.get_xticks()
ax.xaxis.set_major_formatter(FuncFormatter(lambda tick, _: get_day(tick)))
Again, you can sub the date format you want into get_day. Not sure if this will solve the panning/zooming problem but at least it gives a way of setting the tick labels using a function.

Categories

Resources