Matplotlib - highlighting weekends on x axis? - python

I've a time series (typically energy usage) recorded over a range of days. Since usage tends to be different over the weekend I want to highlight the weekends.
I've done what seems sensible:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import random
#Create dummy data.
start=datetime.datetime(2022,10,22,0,0)
finish=datetime.datetime(2022,11,7,0,0)
def randomWalk():
i=0
while True:
i=i+random.random()-0.5
yield i
s = pd.Series({i: next(randomWalk()) for i in pd.date_range(start, finish,freq='h')})
# Plot it.
plt.figure(figsize=[12, 8]);
s.plot();
# Color the labels according to the day of week.
for label, day in zip(plt.gca().xaxis.get_ticklabels(which='minor'),
pd.date_range(start,finish,freq='d')):
label.set_color('red' if day.weekday() > 4 else 'black')
But what I get is wrong. Two weekends appear one off, and the third doesn't show at all.
I've explored the 'label' objects, but their X coordinate is just an integer, and doesn't seem meaningful. Using DateFormatter just gives nonsense.
How would be best to fix this, please?

OK - since matplotlib only provides the information we need to the Tick Label Formatter functions, that's what we have to use:
minorLabels=plt.gca().xaxis.get_ticklabels(which='minor')
majorLabels=plt.gca().xaxis.get_ticklabels(which='major')
def MinorFormatter(dateInMinutes, index):
# Formatter: first param is value (date in minutes, would you believe), second is which item in order.
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
minorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return day.day
def MajorFormatter(dateInMinutes, index):
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
majorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return "" if (index==0 or index==len(majorLabels)-1) else day.strftime("%d\n%b\n%Y")
plt.gca().xaxis.set_minor_formatter(MinorFormatter)
plt.gca().xaxis.set_major_formatter(MajorFormatter)
Pretty clunky, but it works. Could be fragile, though - anyone got a better answer?

Matplotlib is meant for scientific use and although technically styling is possible, it's really hard and not worth the effort.
Consider using Plotly instead of Matplotlib as below:
#pip install plotly in terminal
import plotly.express as px
# read plotly express provided sample dataframe
df = px.data.tips()
# create plotly figure with color_discrete_map property specifying color per day
fig = px.bar(df, x="day", y="total_bill", color='day',
color_discrete_map={"Sat": "orange", "Sun": "orange", "Thur": "blue", "Fri": "blue"}
)
# send to browser
fig.show()
Solves your problem using a lot fewer lines. Only thing here is you need to make sure your data is in a Pandas DataFrame rather than Series with column names which you can pass into plotly.express.bar or scatter plot.

Related

How do I show text from a third dataframe column when hovering over a line chart made from 2 other columns?

So I have a dataframe with 3 columns: date, price, text
import pandas as pd
from datetime import datetime
import random
columns = ('dates','prices','text')
datelist = pd.date_range(datetime.today(), periods=5).tolist()
prices = []
for i in range(0, 5):
prices.append(random.randint(50, 60))
text =['AAA','BBB','CCC','DDD','EEE']
df = pd.DataFrame({'dates': datelist, 'price':prices, 'text':text})
dates price text
0 2022-11-23 14:11:51.142574 51 AAA
1 2022-11-24 14:11:51.142574 57 BBB
2 2022-11-25 14:11:51.142574 52 CCC
3 2022-11-26 14:11:51.142574 51 DDD
4 2022-11-27 14:11:51.142574 59 EEE
I want to plot date and price on a line chart, but when I hover over the line I want it to show the text from the row corresponding to that date.
eg when I hover over the point corresponding to 2022-11-27 I want the text to show 'EEE'
ive tried a few things in matplotlib etc but can only get data from the x and y axis to show but I cant figure out how to show data from a different column.
You could use Plotly.
import plotly.graph_objects as go
fig = go.Figure(data=go.Scatter(x=df['dates'], y=df['price'], mode='lines+markers', text=df['text']))
fig.show()
You should be aware that cursor & dataframe indexing will probably work well with points on a scatter plot, but it is a little bit trickier to handle a lineplot.
With a lineplot, matplotlib draws the line between 2 data points (basically, it's linear interpolation), so a specific logic must be taken care of to:
specify the intended behavior
implement the corresponding mouseover behavior when the cursor lands "between" 2 data points.
The lib/links below may provide tools to handle scatter plots and lineplots, but I am not expert enough to point you to this specific part in either the SO link nor the mplcursors link.
(besides, the exact intended behavioor was not clearly stated in your initial question; consider editing/clarifying)
So, alternatively to DankyKang's answer, have a look at this SO question and answers that cover a large panel of possibilities for mouseover: How to add hovering annotations to a plot
A library worth noting is this one: https://mplcursors.readthedocs.io/en/stable/
Quoting:
mplcursors provides interactive data selection cursors for Matplotlib. It is inspired from mpldatacursor, with a much simplified API.
mplcursors requires Python 3, and Matplotlib≥3.1.
Specifically this example based on dataframes: https://mplcursors.readthedocs.io/en/stable/examples/dataframe.html
Quoting:
DataFrames can be used similarly to any other kind of input. Here, we generate a scatter plot using two columns and label the points using all columns.
This example also applies a shadow effect to the hover panel.
copy-pasta of code example, should this answer be considered not complete enough :
from matplotlib import pyplot as plt
from matplotlib.patheffects import withSimplePatchShadow
import mplcursors
from pandas import DataFrame
df = DataFrame(
dict(
Suburb=["Ames", "Somerset", "Sawyer"],
Area=[1023, 2093, 723],
SalePrice=[507500, 647000, 546999],
)
)
df.plot.scatter(x="Area", y="SalePrice", s=100)
def show_hover_panel(get_text_func=None):
cursor = mplcursors.cursor(
hover=2, # Transient
annotation_kwargs=dict(
bbox=dict(
boxstyle="square,pad=0.5",
facecolor="white",
edgecolor="#ddd",
linewidth=0.5,
path_effects=[withSimplePatchShadow(offset=(1.5, -1.5))],
),
linespacing=1.5,
arrowprops=None,
),
highlight=True,
highlight_kwargs=dict(linewidth=2),
)
if get_text_func:
cursor.connect(
event="add",
func=lambda sel: sel.annotation.set_text(get_text_func(sel.index)),
)
return cursor
def on_add(index):
item = df.iloc[index]
parts = [
f"Suburb: {item.Suburb}",
f"Area: {item.Area:,.0f}m²",
f"Sale price: ${item.SalePrice:,.0f}",
]
return "\n".join(parts)
show_hover_panel(on_add)
plt.show()

Matplotlib & Pandas DateTime Compatibility

Problem: I am trying to make a very simple bar chart in Matplotlib of a Pandas DataFrame. The DateTime index is causing confusion, however: Matplotlib does not appear to understand the Pandas DateTime, and is labeling the years incorrectly. How can I fix this?
Code
# Make date time series
index_dates = pd.date_range('2018-01-01', '2021-01-01')
# Make data frame with some random data, using the date time index
df = pd.DataFrame(index=index_dates,
data = np.random.rand(len(index_dates)),
columns=['Data'])
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
df.plot.bar(ax=ax)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Instead of showing up as 2018-2021, however, the years show up as 1970 - 1973.
I've already looked at the answers here, here, and documentation here. I know the date timeindex is in fact a datetime index because when I call df.info() it shows it as a datetime index, and when I call index_dates[0].year it returns 2018. How can I fix this? Thank you!
The problem is with mixing df.plot.bar and matplotlib here.
df.plot.bar sets tick locations starting from 0 (and assigns labels), while matplotlib.dates expects the locations to be the number of days since 1970-01-01 (more info here).
If you do it with matplotlib directly, it shows labels correctly:
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
plt.bar(x=df.index, height=df['Data'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Output:

Plotting a times series using matplotlib with 24 hours on the y-axis

If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)

Problems assigning color to bars in Pandas v0.20 and matplotlib

I am struggling for a while with the definition of colors in a bar plot using Pandas and Matplotlib. Let us imagine that we have following dataframe:
import pandas as pd
pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]
dfnames = pd.DataFrame(
[pers1,pers2, pers3],
columns=["name","type","importance"]
)
Now, I want to create a bar plot with the importance as the numerical value, the names of the people as ticks and use the type column to assign colors. I have read other questions (for example: Define bar chart colors for Pandas/Matplotlib with defined column) but it doesn't work...
So, first I have to define colors and assign them to different values:
colors = {'apostel':'blue','lord':'green'}
And finally use the .plot() function:
dfnames.plot(
x="name",
y="importance",
kind="bar",
color = dfnames['type'].map(colors)
)
Good. The only problem is that all bars are green:
Why?? I don't know... I am testing it in Spyder and Jupyter... Any help? Thanks!
As per this GH16822, this is a regression bug introduced in version 0.20.3, wherein only the first colour was picked from the list of colours passed. This was not an issue with prior versions.
The reason, according to one of the contributors was this -
The problem seems to be in _get_colors. I think that BarPlot should
define a _get_colors that does something like
def _get_colors(self, num_colors=None, color_kwds='color'):
color = self.kwds.get('color')
if color is None:
return super()._get_colors(self, num_colors=num_colors, color_kwds=color_kwds)
else:
num_colors = len(self.data) # maybe? may not work for some cases
return _get_standard_colors(color=kwds.get('color'), num_colors=num_colors)
There's a couple of options for you -
The most obvious choice would be to update to the latest version of pandas (currently v0.22)
If you need a workaround, there's one (also mentioned in the issue tracker) whereby you wrap the arguments within an extra tuple -
dfnames.plot(x="name",
y="importance",
kind="bar",
color=[tuple(dfnames['type'].map(colors))]
Though, in the interest of progress, I'd recommend updating your pandas.
I find another solution to your problem and it works!
I used directly matplotlib library instead of using plot attribute of the data frame :
here is the code :
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline # for jupyter notebook
pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]
dfnames = pd.DataFrame([pers1,pers2, pers3], columns=["name","type","importance"])
fig, ax = plt.subplots()
bars = ax.bar(dfnames.name, dfnames.importance)
colors = {'apostel':'blue','lord':'green'}
for index, bar in enumerate(bars) :
color = colors.get(dfnames.loc[index]['type'],'b') # get the color key in your df
bar.set_facecolor(color[0])
plt.show()
And here is the results :

How to format x-axis time-series tick marks with missing dates

How can I format the x-axis so that the spacing between periods is "to scale". As in, the distance between 10yr and 30yr should be much larger than the distance between 1yr and 2yr.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import Quandl as ql
yield_ = ql.get("USTREASURY/YIELD")
today = yield_.iloc[-1,:]
month_ago = yield_.iloc[-1000,:]
df = pd.concat([today, month_ago], axis=1)
df.columns = ['today', 'month_ago']
df.plot(style={'today': 'ro-', 'month_ago': 'bx--'},title='Treasury Yield Curve, %');
plt.show()
I want my chart to look like this...
I think doing this while staying purely within Pandas might be tricky. You first need to create a new matplotlib figure and axe. The following might not work exactly but will give you a good idea.
df['years']=[1/12.,0.25,0.5,1,2,3,5,7,10,20,30]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
df.plot(x='years',y='today',ax=ax,kind='scatter')
df.plot(x='years',y='month_ago',ax=ax,kind='scatter')
plt.show()
If you want your axe labels to look like your chart you'll also need to set the lower and upper limit of your axis so they look good and then do something like:
ax.set_xticklabels(list(df.index))

Categories

Resources