I have the following persistent problem:
The following code should draw a straight line:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot_date(d,v,"-")
But all I get is a jagged line because "plot_date" somehow fills up the dates in "d" with the weekends.
Is there a way to force matplotlib to take my dates (only business days) as is without filing them up with weekend dates?
>>>d
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
'2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
'2012-01-12', '2012-01-13', '2012-01-16', '2012-01-17',
'2012-01-18', '2012-01-19', '2012-01-20', '2012-01-23',
'2012-01-24', '2012-01-25', '2012-01-26', '2012-01-27',
'2012-01-30', '2012-01-31', '2012-02-01'],
dtype='datetime64[ns]', freq='B')
plot_date does a trick, it converts dates to number of days since 1-1-1 and uses these numbers to plot, then converts the ticks to dates again in order to draw nice tick labels. So using plot_date each day count as 1, business or not.
You can plot your data against a uniform range of numbers but if you want dates as tick labels you need to do it yourself.
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot(range(d.size), v)
xticks = plt.xticks()[0]
xticklabels = [(d[0] + x).strftime('%Y-%m-%d') for x in xticks.astype(int)]
plt.xticks(xticks, xticklabels)
plt.autoscale(True, axis='x', tight=True)
But be aware that the labels can be misleading. The segment between 2012-01-02 and 2012-01-09 represents five days, not seven.
Related
I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:
I am plotting values from a dataframe where time is the x-axis. The time is formatted as 00:00 to 23:45. I only want to display the specific times 00:00, 06:00, 12:00, 18:00 on the x-axis of my plot. How can this be done? I have posted two figures, the first shows the format of my dataframe after setting the index to time. And the second shows my figure. Thank you for your help!
monday.set_index("Time", drop=True, inplace=True)
monday_figure = monday.plot(kind='line', legend = False,
title = 'Monday Average Power consumption')
monday_figure.xaxis.set_major_locator(plt.MaxNLocator(8))
Edit: Adding data as text:
Time,DayOfWeek,kW
00:00:00,Monday,5.8825
00:15:00,Monday,6.0425
00:30:00,Monday,6.0025
00:45:00,Monday,5.7475
01:00:00,Monday,6.11
01:15:00,Monday,5.8025
01:30:00,Monday,5.6375
01:45:00,Monday,5.85
02:00:00,Monday,5.7250000000000005
02:15:00,Monday,5.66
02:30:00,Monday,6.0025
02:45:00,Monday,5.71
03:00:00,Monday,5.7425
03:15:00,Monday,5.6925
03:30:00,Monday,5.9475
03:45:00,Monday,6.380000000000001
04:00:00,Monday,5.65
04:15:00,Monday,5.8725
04:30:00,Monday,5.865
04:45:00,Monday,5.71
05:00:00,Monday,5.6925
05:15:00,Monday,5.9975000000000005
05:30:00,Monday,5.905000000000001
05:45:00,Monday,5.93
06:00:00,Monday,5.6025
06:15:00,Monday,6.685
06:30:00,Monday,7.955
06:45:00,Monday,8.9225
07:00:00,Monday,10.135
07:15:00,Monday,12.9475
07:30:00,Monday,14.327499999999999
07:45:00,Monday,14.407499999999999
08:00:00,Monday,15.355
08:15:00,Monday,16.2175
08:30:00,Monday,18.355
08:45:00,Monday,18.902499999999996
09:00:00,Monday,19.0175
09:15:00,Monday,20.0025
09:30:00,Monday,20.355
09:45:00,Monday,20.3175
10:00:00,Monday,20.8025
10:15:00,Monday,20.765
10:30:00,Monday,21.07
10:45:00,Monday,19.9825
11:00:00,Monday,20.94
11:15:00,Monday,22.1325
11:30:00,Monday,20.6275
11:45:00,Monday,21.4475
12:00:00,Monday,22.092499999999998
The image above is produced using the code from the comment below.
Make sure you have a datetime index using pd.to_datetime when plotting timeseries.
I then used matplotlib.mdates to detect the desired ticks and format them in the plot. I don't know if it can be done from pandas with df.plot.
See matplotlib date tick labels. You can customize the HourLocator or use a different locator to suit your needs. Minor ticks are created the same way with ax.xaxis.set_minor_locator. Hope it helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Using your dataframe
df = pd.read_clipboard(sep=',')
# Make sure you have a datetime index
df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time')
fig, ax = plt.subplots(1,1)
ax.plot(df['kW'])
# Use mdates to detect hours
locator = mdates.HourLocator(byhour=[0,6,12,18])
ax.xaxis.set_major_locator(locator)
# Format x ticks
formatter = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(formatter)
# rotates and right aligns the x labels, and moves the bottom of the axes up to make room for them
fig.autofmt_xdate()
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
I'm basically trying to plot a graph where the x axis represent the month of the year. The data is stored in a numpy.array, with dimensions k x months. Here it follows a minimal example (my data is not this crazy):
import numpy
import matplotlib
import matplotlib.pyplot as plt
cmap = plt.get_cmap('Set3')
colors = [cmap(i) for i in numpy.linspace(0, 1, len(complaints))]
data = numpy.random.rand(18,12)
y = range(data.shape[1])
plt.figure(figsize=(15, 7), dpi=200)
for i in range(data.shape[0]):
plt.plot(y, data[i,:], color=colors[i], linewidth=5)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.xticks(numpy.arange(0, 12, 1))
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Complaints')
plt.title('Number of Complaints per Hour in 2015')
I'd like to have the xticks as strings instead of numbers. I'm wondering if I have to create a list of strings, manually, or if there is another way to translate the numbers to months. I have to do the same for weekdays, for example.
I've been looking to these examples:
http://matplotlib.org/examples/pylab_examples/finance_demo.html
http://matplotlib.org/examples/pylab_examples/date_demo2.html
But I'm not using datetime.
Althought this answer works well, for this case you can avoid defining your own FuncFormatter by using the pre-defined ones from matplotlib for dates, by using matplotlib.dates rather than matplotlib.ticker:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
# Define time range with 12 different months:
# `MS` stands for month start frequency
x_data = pd.date_range('2018-01-01', periods=12, freq='MS')
# Check how this dates looks like:
print(x_data)
y_data = np.random.rand(12)
fig, ax = plt.subplots()
ax.plot(x_data, y_data)
# Make ticks on occurrences of each month:
ax.xaxis.set_major_locator(mdates.MonthLocator())
# Get only the month to show in the x-axis:
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
# '%b' means month as locale’s abbreviated name
plt.show()
Obtaining:
DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01',
'2018-05-01', '2018-06-01', '2018-07-01', '2018-08-01',
'2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01'],
dtype='datetime64[ns]', freq='MS')
This is an alternative plotting method plot_date, which you might want to use if your independent variable are datetime like, instead of using the more general plot method:
import datetime
data = np.random.rand(24)
#a list of time: 00:00:00 to 23:00:00
times = [datetime.datetime.strptime(str(i), '%H') for i in range(24)]
#'H' controls xticklabel format, 'H' means only the hours is shown
#day, year, week, month, etc are not shown
plt.plot_date(times, data, fmt='H')
plt.setp(plt.gca().xaxis.get_majorticklabels(),
'rotation', 90)
The benefit of it is that now you can easily control the density of xticks, if we want to have a tick every hour, we will insert these lines after plot_date:
##import it if not already imported
#import matplotlib.dates as mdates
plt.gca().xaxis.set_major_locator(mdates.HourLocator())
You can still use formatters to format your results in the way you want. For example, to have month names printed, let us first define a function taking an integer to a month abbreviation:
def getMonthName(month_number):
testdate=datetime.date(2010,int(month_number),1)
return testdate.strftime('%b')
Here, I have created an arbitrary date with the correct month and returned that month. Check the datetime documentation for available format codes if needed. If that is always easier than just setting a list by hand is another question. Now let us plot some monthly testdata:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
x_data=np.arange(1,12.5,1)
y_data=x_data**2 # Just some arbitrary data
plt.plot(x_data,y_data)
plt.gca().xaxis.set_major_locator(mtick.FixedLocator(x_data)) # Set tick locations
plt.gca().xaxis.set_major_formatter(mtick.FuncFormatter(lambda x,p:getMonthName(x)))
plt.show()
The message here is that you can use matplotlib.ticker.FuncFormatter to use any function to obtain a tick label. The function takes two arguments (value and position) and returns a string.
Here's my plot, which is generated using the following code:
bins = np.linspace(0,24,25)
plt.hist(hours,bins, edgecolor='black', linewidth = 1.2, color = 'red')
I would like the x axis to show 24 entries, from 12:00AM to 11:00 PM ideally rotated left 90 degrees.
I see two paths: convert the actual data to time values so the histogram reads in time values or simply add a custom x axis with 12:00AM, 1:00 AM, etc. What's the easiest / cleanest approach here? I'm not familiar with how to do either. For reference, "hours" is a int64 array.
Here's a working example:
import numpy as np
import matplotlib.pyplot as plt
bins = np.arange(0,25)
hours = np.random.rand(50)*25
fig, ax = plt.subplots()
labels = []
for i in bins:
if i<12:
labels.append("{}:00AM".format(i))
elif i == 12:
labels.append("12:00PM")
else:
labels.append("{}:00PM".format(i-12))
ax.hist(hours, bins)
ax.set_xticks(bins + 0.5) # 0.5 is half of the "1" auto width
ax.set_xticklabels(labels, rotation='vertical')
fig.subplots_adjust(bottom = 0.2) # makes space for the vertical
#labels.
plt.show()
which gives:
I've changed the linspace to arange as it returns integers
To get a nice time format on the xaxis, the idea could be to calculate the histogram in terms of numbers which can be interpreted as datetimes.
In case you only have times, you would not mind too much about the actual date. So dividing the data by 24 gives fraction of a day. Since matplotlib interpretes numbers as days since 0001-01-01 UTC, plus 1, one then needs to add some whole number >=2 not to run into trouble with negative dates.
Then usual matplotlib.dates locators and formatters can be used to get nice ticklabels. "%I:%M %p" would give the time representation in hours by 12 with am/pm appendix.
import numpy as np; np.random.seed(3)
import matplotlib.pyplot as plt
import matplotlib.dates
data = np.random.normal(12,7, size=200)
data = data[(data >=0) & (data <24)]
f = lambda x: 2+x/24.
bins=np.arange(25)
plt.hist(f(data), bins=f(bins))
plt.gca().xaxis.set_major_locator(matplotlib.dates.HourLocator())
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%I:%M %p"))
plt.setp(plt.gca().get_xticklabels(),rotation=90)
plt.tight_layout()
plt.show()
(This would hence be the histogram of datetimes of the 2nd of january 0001.)