Is it possible to create a calendar heatmap without using pandas?
If so, can someone post a simple example?
I have dates like Aug-16 and a count value like 16 and I thought this would be a quick and easy way to show intensity of counts between days for a long period of time.
Thank you
It's certainly possible, but you'll need to jump through a few hoops.
First off, I'm going to assume you mean a calendar display that looks like a calendar, as opposed to a more linear format (a linear formatted "heatmap" is much easier than this).
The key is reshaping your arbitrary-length 1D series into an Nx7 2D array where each row is a week and columns are days. That's easy enough, but you also need to properly label months and days, which can get a touch verbose.
Here's an example. It doesn't even remotely try to handle crossing across year boundaries (e.g. Dec 2014 to Jan 2015, etc). However, hopefully it gets you started:
import datetime as dt
import matplotlib.pyplot as plt
import numpy as np
def main():
dates, data = generate_data()
fig, ax = plt.subplots(figsize=(6, 10))
calendar_heatmap(ax, dates, data)
plt.show()
def generate_data():
num = 100
data = np.random.randint(0, 20, num)
start = dt.datetime(2015, 3, 13)
dates = [start + dt.timedelta(days=i) for i in range(num)]
return dates, data
def calendar_array(dates, data):
i, j = zip(*[d.isocalendar()[1:] for d in dates])
i = np.array(i) - min(i)
j = np.array(j) - 1
ni = max(i) + 1
calendar = np.nan * np.zeros((ni, 7))
calendar[i, j] = data
return i, j, calendar
def calendar_heatmap(ax, dates, data):
i, j, calendar = calendar_array(dates, data)
im = ax.imshow(calendar, interpolation='none', cmap='summer')
label_days(ax, dates, i, j, calendar)
label_months(ax, dates, i, j, calendar)
ax.figure.colorbar(im)
def label_days(ax, dates, i, j, calendar):
ni, nj = calendar.shape
day_of_month = np.nan * np.zeros((ni, 7))
day_of_month[i, j] = [d.day for d in dates]
for (i, j), day in np.ndenumerate(day_of_month):
if np.isfinite(day):
ax.text(j, i, int(day), ha='center', va='center')
ax.set(xticks=np.arange(7),
xticklabels=['M', 'T', 'W', 'R', 'F', 'S', 'S'])
ax.xaxis.tick_top()
def label_months(ax, dates, i, j, calendar):
month_labels = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
months = np.array([d.month for d in dates])
uniq_months = sorted(set(months))
yticks = [i[months == m].mean() for m in uniq_months]
labels = [month_labels[m - 1] for m in uniq_months]
ax.set(yticks=yticks)
ax.set_yticklabels(labels, rotation=90)
main()
Edit: I now see the question asks for a plot without pandas. Even so, this question is a first page Google result for "python calendar heatmap", so I will leave this here. I recommend using pandas anyway. You probably already have it as a dependency of another package, and pandas has by far the best APIs for working with datetime data (pandas.Timestamp and pandas.DatetimeIndex).
The only Python package that I can find for these plots is calmap which is unmaintained and incompatible with recent matplotlib. So I decided to write my own. It produces plots like the following:
Here is the code. The input is a series with a datetime index giving the values for the heatmap:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
DAYS = ['Sun.', 'Mon.', 'Tues.', 'Wed.', 'Thurs.', 'Fri.', 'Sat.']
MONTHS = ['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May', 'June', 'July', 'Aug.', 'Sept.', 'Oct.', 'Nov.', 'Dec.']
def date_heatmap(series, start=None, end=None, mean=False, ax=None, **kwargs):
'''Plot a calendar heatmap given a datetime series.
Arguments:
series (pd.Series):
A series of numeric values with a datetime index. Values occurring
on the same day are combined by sum.
start (Any):
The first day to be considered in the plot. The value can be
anything accepted by :func:`pandas.to_datetime`. The default is the
earliest date in the data.
end (Any):
The last day to be considered in the plot. The value can be
anything accepted by :func:`pandas.to_datetime`. The default is the
latest date in the data.
mean (bool):
Combine values occurring on the same day by mean instead of sum.
ax (matplotlib.Axes or None):
The axes on which to draw the heatmap. The default is the current
axes in the :module:`~matplotlib.pyplot` API.
**kwargs:
Forwarded to :meth:`~matplotlib.Axes.pcolormesh` for drawing the
heatmap.
Returns:
matplotlib.collections.Axes:
The axes on which the heatmap was drawn. This is set as the current
axes in the `~matplotlib.pyplot` API.
'''
# Combine values occurring on the same day.
dates = series.index.floor('D')
group = series.groupby(dates)
series = group.mean() if mean else group.sum()
# Parse start/end, defaulting to the min/max of the index.
start = pd.to_datetime(start or series.index.min())
end = pd.to_datetime(end or series.index.max())
# We use [start, end) as a half-open interval below.
end += np.timedelta64(1, 'D')
# Get the previous/following Sunday to start/end.
# Pandas and numpy day-of-week conventions are Monday=0 and Sunday=6.
start_sun = start - np.timedelta64((start.dayofweek + 1) % 7, 'D')
end_sun = end + np.timedelta64(7 - end.dayofweek - 1, 'D')
# Create the heatmap and track ticks.
num_weeks = (end_sun - start_sun).days // 7
heatmap = np.zeros((7, num_weeks))
ticks = {} # week number -> month name
for week in range(num_weeks):
for day in range(7):
date = start_sun + np.timedelta64(7 * week + day, 'D')
if date.day == 1:
ticks[week] = MONTHS[date.month - 1]
if date.dayofyear == 1:
ticks[week] += f'\n{date.year}'
if start <= date < end:
heatmap[day, week] = series.get(date, 0)
# Get the coordinates, offset by 0.5 to align the ticks.
y = np.arange(8) - 0.5
x = np.arange(num_weeks + 1) - 0.5
# Plot the heatmap. Prefer pcolormesh over imshow so that the figure can be
# vectorized when saved to a compatible format. We must invert the axis for
# pcolormesh, but not for imshow, so that it reads top-bottom, left-right.
ax = ax or plt.gca()
mesh = ax.pcolormesh(x, y, heatmap, **kwargs)
ax.invert_yaxis()
# Set the ticks.
ax.set_xticks(list(ticks.keys()))
ax.set_xticklabels(list(ticks.values()))
ax.set_yticks(np.arange(7))
ax.set_yticklabels(DAYS)
# Set the current image and axes in the pyplot API.
plt.sca(ax)
plt.sci(mesh)
return ax
def date_heatmap_demo():
'''An example for `date_heatmap`.
Most of the sizes here are chosen arbitrarily to look nice with 1yr of
data. You may need to fiddle with the numbers to look right on other data.
'''
# Get some data, a series of values with datetime index.
data = np.random.randint(5, size=365)
data = pd.Series(data)
data.index = pd.date_range(start='2017-01-01', end='2017-12-31', freq='1D')
# Create the figure. For the aspect ratio, one year is 7 days by 53 weeks.
# We widen it further to account for the tick labels and color bar.
figsize = plt.figaspect(7 / 56)
fig = plt.figure(figsize=figsize)
# Plot the heatmap with a color bar.
ax = date_heatmap(data, edgecolor='black')
plt.colorbar(ticks=range(5), pad=0.02)
# Use a discrete color map with 5 colors (the data ranges from 0 to 4).
# Extending the color limits by 0.5 aligns the ticks in the color bar.
cmap = mpl.cm.get_cmap('Blues', 5)
plt.set_cmap(cmap)
plt.clim(-0.5, 4.5)
# Force the cells to be square. If this is set, the size of the color bar
# may look weird compared to the size of the heatmap. That can be corrected
# by the aspect ratio of the figure or scale of the color bar.
ax.set_aspect('equal')
# Save to a file. For embedding in a LaTeX doc, consider the PDF backend.
# http://sbillaudelle.de/2015/02/23/seamlessly-embedding-matplotlib-output-into-latex.html
fig.savefig('heatmap.pdf', bbox_inches='tight')
# The firgure must be explicitly closed if it was not shown.
plt.close(fig)
Disclaimer: This is is a plug for my own package. Though I am a couple of years late to help OP, I hope that someone else will find it useful.
I did some digging around on a related issue. I ended up writing a new package exactly for this purpose when I couldn't find any other package that met all my requirements.
The package is still unpolished and it still has a sparse documentation, but I published it on PyPi anyway to make it available for others. Any feedback is appreciated, either here or on my GitHub.
july
The package is called july and can be installed with pip:
$ pip install july
Here are some use cases straight from the README:
Import packages and generate data
import numpy as np
import july
from july.utils import date_range
dates = date_range("2020-01-01", "2020-12-31")
data = np.random.randint(0, 14, len(dates))
GitHub Activity like plot:
july.heatmap(dates, data, title='Github Activity', cmap="github")
Daily heatmap for continuous data (with colourbar):
july.heatmap(
osl_df.date, # Here, osl_df is a pandas data frame.
osl_df.temp,
cmap="golden",
colorbar=True,
title="Average temperatures: Oslo , Norway"
)
Outline each month with month_grid=True
july.heatmap(dates=dates,
data=data,
cmap="Pastel1",
month_grid=True,
horizontal=True,
value_label=False,
date_label=False,
weekday_label=True,
month_label=True,
year_label=True,
colorbar=False,
fontfamily="monospace",
fontsize=12,
title=None,
titlesize="large",
dpi=100)
Finally, you can also create month or calendar plots:
# july.month_plot(dates, data, month=5) # This will plot only May.
july.calendar_plot(dates, data)
Similar packages:
calplot by Tom Kwok.
GitHub: Link
Install: pip install calplot
Actively maintained and better documentation than july.
Pandas centric, takes in a pandas series with dates and values.
Very good option if you are only looking for the heatmap functionality and don't need month_plot or calendar_plot.
calmap by Martijn Vermaat.
GitHub: Link
Install: pip install calmap
The package that calplot sprung out from.
Seems to be longer actively maintained.
I was looking to create a calendar heatmap where each month is displayed separately. I also needed to annotate each day with the day number (day_of_month) and it's value label.
I've been inspired by the answers posted here and also the following sites:
Here, although in R
Heatmap using pcolormesh
However I didn't seem to find something exactly as I was looking for, so I've decided to post my solution here to perhaps save others wanting the same kind of plot some time.
My example uses a bit of Pandas simply to generate some dummy data, so you can easily plug your own data source instead. Other than that it's just matplotlib.
Output from the code is given below. For my needs I also wanted to highlight days where the data was 0 (see 1st January).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
# Settings
years = [2018] # [2018, 2019, 2020]
weeks = [1, 2, 3, 4, 5, 6]
days = ['M', 'T', 'W', 'T', 'F', 'S', 'S']
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August',
'September', 'October', 'November', 'December']
def generate_data():
idx = pd.date_range('2018-01-01', periods=365, freq='D')
return pd.Series(range(len(idx)), index=idx)
def split_months(df, year):
"""
Take a df, slice by year, and produce a list of months,
where each month is a 2D array in the shape of the calendar
:param df: dataframe or series
:return: matrix for daily values and numerals
"""
df = df[df.index.year == year]
# Empty matrices
a = np.empty((6, 7))
a[:] = np.nan
day_nums = {m:np.copy(a) for m in range(1,13)} # matrix for day numbers
day_vals = {m:np.copy(a) for m in range(1,13)} # matrix for day values
# Logic to shape datetimes to matrices in calendar layout
for d in df.iteritems(): # use iterrows if you have a DataFrame
day = d[0].day
month = d[0].month
col = d[0].dayofweek
if d[0].is_month_start:
row = 0
day_nums[month][row, col] = day # day number (0-31)
day_vals[month][row, col] = d[1] # day value (the heatmap data)
if col == 6:
row += 1
return day_nums, day_vals
def create_year_calendar(day_nums, day_vals):
fig, ax = plt.subplots(3, 4, figsize=(14.85, 10.5))
for i, axs in enumerate(ax.flat):
axs.imshow(day_vals[i+1], cmap='viridis', vmin=1, vmax=365) # heatmap
axs.set_title(month_names[i])
# Labels
axs.set_xticks(np.arange(len(days)))
axs.set_xticklabels(days, fontsize=10, fontweight='bold', color='#555555')
axs.set_yticklabels([])
# Tick marks
axs.tick_params(axis=u'both', which=u'both', length=0) # remove tick marks
axs.xaxis.tick_top()
# Modify tick locations for proper grid placement
axs.set_xticks(np.arange(-.5, 6, 1), minor=True)
axs.set_yticks(np.arange(-.5, 5, 1), minor=True)
axs.grid(which='minor', color='w', linestyle='-', linewidth=2.1)
# Despine
for edge in ['left', 'right', 'bottom', 'top']:
axs.spines[edge].set_color('#FFFFFF')
# Annotate
for w in range(len(weeks)):
for d in range(len(days)):
day_val = day_vals[i+1][w, d]
day_num = day_nums[i+1][w, d]
# Value label
axs.text(d, w+0.3, f"{day_val:0.0f}",
ha="center", va="center",
fontsize=7, color="w", alpha=0.8)
# If value is 0, draw a grey patch
if day_val == 0:
patch_coords = ((d - 0.5, w - 0.5),
(d - 0.5, w + 0.5),
(d + 0.5, w + 0.5),
(d + 0.5, w - 0.5))
square = Polygon(patch_coords, fc='#DDDDDD')
axs.add_artist(square)
# If day number is a valid calendar day, add an annotation
if not np.isnan(day_num):
axs.text(d+0.45, w-0.31, f"{day_num:0.0f}",
ha="right", va="center",
fontsize=6, color="#003333", alpha=0.8) # day
# Aesthetic background for calendar day number
patch_coords = ((d-0.1, w-0.5),
(d+0.5, w-0.5),
(d+0.5, w+0.1))
triangle = Polygon(patch_coords, fc='w', alpha=0.7)
axs.add_artist(triangle)
# Final adjustments
fig.suptitle('Calendar', fontsize=16)
plt.subplots_adjust(left=0.04, right=0.96, top=0.88, bottom=0.04)
# Save to file
plt.savefig('calendar_example.pdf')
for year in years:
df = generate_data()
day_nums, day_vals = split_months(df, year)
create_year_calendar(day_nums, day_vals)
There is probably a lot of room for optimisation, but this gets what I need done.
Below is a code that can be used to generate a calendar map for daily profiles of a value.
"""
Created on Tue Sep 4 11:17:25 2018
#author: woldekidank
"""
import numpy as np
from datetime import date
import datetime
import matplotlib.pyplot as plt
import random
D = date(2016,1,1)
Dord = date.toordinal(D)
Dweekday = date.weekday(D)
Dsnday = Dord - Dweekday + 1 #find sunday
square = np.array([[0, 0],[ 0, 1], [1, 1], [1, 0], [0, 0]])#x and y to draw a square
row = 1
count = 0
while row != 0:
for column in range(1,7+1): #one week per row
prof = np.ones([24, 1])
hourly = np.zeros([24, 1])
for i in range(1,24+1):
prof[i-1, 0] = prof[i-1, 0] * random.uniform(0, 1)
hourly[i-1, 0] = i / 24
plt.title('Temperature Profile')
plt.plot(square[:, 0] + column - 1, square[:, 1] - row + 1,color='r') #go right each column, go down each row
if date.fromordinal(Dsnday).month == D.month:
if count == 0:
plt.plot(hourly, prof)
else:
plt.plot(hourly + min(square[:, 0] + column - 1), prof + min(square[:, 1] - row + 1))
plt.text(column - 0.5, 1.8 - row, datetime.datetime.strptime(str(date.fromordinal(Dsnday)),'%Y-%m-%d').strftime('%a'))
plt.text(column - 0.5, 1.5 - row, date.fromordinal(Dsnday).day)
Dsnday = Dsnday + 1
count = count + 1
if date.fromordinal(Dsnday).month == D.month:
row = row + 1 #new row
else:
row = 0 #stop the while loop
Below is the output from this code
Related
I am using series and matplotlib to plot some graph like that :
basically, I build them using
ax.plot(df)
where df is a series with index : 'Sep', 'Oct', 'Nov', ....
and value some number.
What I wanted to have is fill the background, but only after the 'Mar' index (for example)
Currently the only things I was able to do is to create a series with only 'Mar' to 'Jul' as index with constant value to identify the "area", but ideally I would fill the background with some color to identify clearly both part of the graph (before mars it is data realised, after it is some prediction of my algo)
For example, if I use :
ax.set_facecolor('silver')
I have this result :
but that should appear only after 'Mars', but I don't find any way to filter the background to apply this kind of function
As mentioned, you can use axvspan() to add a suitable background colour.
For example:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import date2num
import random
from datetime import datetime, timedelta
# Create some random data
data = []
x = datetime(2021, 1, 1)
y = 0
day = timedelta(days=1)
for _ in range(100):
x += day
y += random.random()
data.append([x, y])
df = pd.DataFrame(data, columns=['Date', 'Value'])
df.plot('Date', 'Value')
ax = plt.gca()
march = date2num(datetime(2021, 3, 1))
latest = date2num(data[-1][0])
ax.axvspan(xmin=march, xmax=latest, facecolor='silver')
plt.show()
Giving you:
Below I have my code to plot my graph.
#can change the 'iloc[x:y]' component to plot sections of chart
#ax = df['Data'].iloc[300:].plot(color = 'black', title = 'Past vs. Expected Future Path')
ax = df.plot('Date','Data',color = 'black', title = 'Past vs. Expected Future Path')
df.loc[df.index >= idx, 'up2SD'].plot(color = 'r', ax = ax)
df.loc[df.index >= idx, 'down2SD'].plot(color = 'r', ax = ax)
df.loc[df.index >= idx, 'Data'].plot(color = 'b', ax = ax)
plt.show()
#resize the plot
plt.rcParams["figure.figsize"] = [10,6]
plt.show()
Lines 2 (commented out) and 3 both work to plot all of the lines together as seen, however I wish to have the dates on the x-axis and also be able to be able to plot sections of the graph (defined by x-axis, i.e. date1 to date2).
Using line 3 I can plot with dates on the x-axis, however using ".iloc[300:]" like in line 2 does not appear to work as the 3 coloured lines disconnect from the main line as seen below:
ax = df.iloc[300:].plot('Date','Data',color = 'black', title = 'Past vs. Expected Future Path')
Using line 2, I can edit the x-axis' length, however it doesn't have dates on the x-axis.
Does anyone have any advice on how to both have dates and be able to edit the x-axis periods?
For this to work as desired, you need to set the 'date' column as index of the dataframe. Otherwise, df.plot has no way to know what needs to be used as x-axis. With the date set as index, pandas accepts expressions such as df.loc[df.index >= '20180101', 'data2'] to select a time range and a specific column.
Here is some example code to demonstrate the concept.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
dates = pd.date_range('20160101', '20191231', freq='D')
data1 = np.random.normal(-0.5, 0.2, len(dates))
data2 = np.random.normal(-0.7, 0.2, len(dates))
df = pd.DataFrame({'date': dates, 'data1':data1, 'data2':data2})
df.set_index('date', inplace=True)
df['data1'].iloc[300:].plot(color='crimson')
df.loc[df.index >= '20180101', 'data2'].plot(color='dodgerblue')
plt.tight_layout()
plt.show()
I have a pandas dataframe with 5 years daily time series data. I want to make a monthly plot from whole datasets so that the plot should shows variation (std or something else) within monthly data. Simillar figure I tried to create but did not found a way to do that:
for example, I have a sudo daily precipitation data:
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'pre':ppt},index=dates)
Manually I can do it like:
one = df['pre']['1999-12-01':'2000-11-29'].values
two = df['pre']['2000-12-01':'2001-11-30'].values
three = df['pre']['2001-12-01':'2002-11-30'].values
four = df['pre']['2002-12-01':'2003-11-30'].values
five = df['pre']['2003-12-01':'2004-11-29'].values
df = pd.DataFrame({'2000':one,'2001':two,'2002':three,'2003':four,'2004':five})
std = df.std(axis=1)
lw = df.mean(axis=1)-std
up = df.mean(axis=1)+std
plt.fill_between(np.arange(365), up, lw, alpha=.4)
I am looking for the more pythonic way to do that instead of doing it manually!
Any helps will be highly appreciated
If I'm understanding you correctly you'd like to plot your daily observations against a monthly periodic mean +/- 1 standard deviation. And that's what you get in my screenshot below. Nevermind the lackluster design and color choice. We'll get to that if this is something you can use. And please notice that I've replaced your ppt = np.random.rand(1900) with ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum() just to make the data look a bit more like your screenshot.
Here I've aggregated the daily data by month, and retrieved mean and standard deviation for each month. Then I've merged that data with the original dataframe so that you're able to plot both the source and the grouped data like this:
# imports
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import numpy as np
# Data that matches your setup, but with a random
# seed to make it reproducible
np.random.seed(42)
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
#ppt = np.random.rand(1900)
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'ppt':ppt},index=dates)
# A subset
df = df.tail(200)
# Add a yearmonth column
df['YearMonth'] = df.index.map(lambda x: 100*x.year + x.month)
# Create aggregated dataframe
df2 = df.groupby('YearMonth').agg(['mean', 'std']).reset_index()
df2.columns = ['YearMonth', 'mean', 'std']
# Merge original data and aggregated data
df3 = pd.merge(df,df2,how='left',on=['YearMonth'])
df3 = df3.set_index(df.index)
df3 = df3[['ppt', 'mean', 'std']]
# Function to make your plot
def monthplot():
fig, ax = plt.subplots(1)
ax.set_facecolor('white')
# Define upper and lower bounds for shaded variation
lower_bound = df3['mean'] + df3['std']*-1
upper_bound = df3['mean'] + df3['std']
fig, ax = plt.subplots(1)
ax.set_facecolor('white')
# Source data and mean
ax.plot(df3.index,df3['mean'], lw=0.5, color = 'red')
ax.plot(df3.index, df3['ppt'], lw=0.1, color = 'blue')
# Variation and shaded area
ax.fill_between(df3.index, lower_bound, upper_bound, facecolor='grey', alpha=0.5)
fig = ax.get_figure()
# Assign months to X axis
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
X = plt.gca().xaxis
X.set_major_locator(locator)
X.set_major_formatter(fmt)
fig.show()
monthplot()
Check out this post for more on axis formatting and this post on how to add a YearMonth column.
In your example, you have a few mistakes, but I think it isn't important.
Do you want all years to be on the same graphic (like in your example)? If you do, this may help you:
df['month'] = df.index.strftime("%m-%d")
df['year'] = df.index.year
df.set_index(['month']).drop(['year'],1).plot()
This is a first time I am working with matplotlib and the task might be really trivial, but turns out to be hard for me.
I have following data: tickets numbers, dates when they were resolved and dates when whey were supposed to be resolved.
What I want to do is to draw a plot with tickets on x axis and dates on y axis. Then for every ticket I need to have 2 bars: first one with height equal to the date it was resolved, and another one with height equal to the date is was supposed to be resolved.
What I have right now:
a list of all tickets
tickets = []
a list with all dates (both resolved and expected)
all_dates = []
lists with sets of (ticket, datetime) for expected time and resolved time:
tickets_estimation = []
tickets_real = []
the code I am at right now:
plt.xticks(arange(len(tickets)), tickets, rotation=90)
plt.yticks(arange(len(all_dates)), all_dates)
plt.show()
which shows me following plot:
So how can I do the rest? Please pay attention that I need to map tickets numbers at X axis to the dates on Y axis.
Ok, here is is simplified at where I stack:
I cannot figure out how to draw even a single bar so its X axis will be a ticket and its Y axis will be a date of it's resolution.
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from numpy import arange
date = ['3 Jan 2013', '4 Jan 2013', '5 Jan 2013']
tickets = ['ENV-666', 'ENV-999', 'ENV-1000']
# Convert to matplotlib's internal date format.
y = mdates.datestr2num(date)
x = arange(len(tickets))
fig, ax = plt.subplots()
ax.plot(x,y)
ax.yaxis_date()
# Optional. Just rotates x-ticklabels in this case.
fig.autofmt_xdate()
plt.show()
This works find, it shows a plot with a line. But if I change
ax.plit(x,y) to ax.bar(x,y) I am receiving an error:
ValueError: ordinal must be >= 1
Bar is your friend: https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.bar.html
Then just use your fantasy to plot it as you like. For example a black line for the due date, red boxes for overdue and green boxes for on time.
supp = np.array([3,5,7])
res = np.array([1,7,7])
H = res-supp
H[H==0] = 0.01
ind = np.arange(len(D1))
plt.bar(ind[H>0], H[H>0], bottom=supp[H>0], color='r');
plt.bar(ind[H<0], H[H<0], bottom=supp[H<0], color='g');
plt.bar(ind, 0.1, bottom=supp, color='k');
The problem is to plot a straight line with uneven distribution of dates. Using the series values data fixes the curviness problem, but loses the timeline (the dates). Is there a way to fix this?
Edit: Why aren't the dates mapped straight to ticks on x axis:
0 -> 2017-02-17,
1 -> 2017-02-20,
... ?
Now there seems to be 12 ticks for the orange line but only 8 datapoints.
import pandas as pd
import matplotlib.pyplot as plt
def straight_line(index):
y = [3 + 2*x for x in range(len(index))]
zserie = pd.Series(y, index=index)
return zserie
if __name__ == '__main__':
start = '2017-02-10'
end = '2017-02-17'
index = pd.date_range(start,end)
index1 = pd.DatetimeIndex(['2017-02-17', '2017-02-20', '2017-02-21', '2017-02-22',
'2017-02-23', '2017-02-24', '2017-02-27', '2017-02-28',],
dtype='datetime64[ns]', name='pvm', freq=None)
plt.figure(1, figsize=(8, 4))
zs = straight_line(index)
zs.plot()
zs = straight_line(index1)
zs.plot()
plt.figure(2, figsize=(8, 4))
zs = straight_line(index1)
plt.plot(zs.values)
The graph is treating the dates correctly as a continuous variable. The days of index_1 should be plotted at x coordinates of 17, 20, 21, 22, 23, 24, 27, and 28. So, the graph with the orange line is correct.
The problem is with the way you calculate the y-values in the straight_line() function. You are treating the dates as if they are just categorical values and ignoring the gaps between the dates. A linear regression calculation won't do this--it will treat the dates as continuous values.
To get a straight line in your example code you should convert the values in index_1 from absolute dates to relative differences by using td = (index - index[0]) (which returns a pandas TimedeltaIndex) and then use the days from td for the x-values of your calculation. I've shown how you can do this in the reg_line() function below:
import pandas as pd
import matplotlib.pyplot as plt
def reg_line(index):
td = (index - index[0]).days #array containing the number of days since the first day
y = 3 + 2*td
zserie = pd.Series(y, index=index)
return zserie
if __name__ == '__main__':
start = '2017-02-10'
end = '2017-02-17'
index = pd.date_range(start,end)
index1 = pd.DatetimeIndex(['2017-02-17', '2017-02-20', '2017-02-21', '2017-02-22',
'2017-02-23', '2017-02-24', '2017-02-27', '2017-02-28',],
dtype='datetime64[ns]', name='pvm', freq=None)
plt.figure(1, figsize=(8, 4))
zs = reg_line(index)
zs.plot(style=['o-'])
zs = reg_line(index1)
zs.plot(style=['o-'])
Which produces the following figure:
NOTE: I've added points to the graph to make it clear which values are being drawn on the figure. As you can see, the orange line is straight even though there are no values for some of the days within the range.