custom xlabel ticks in Seaborn heatmaps - python

I have plotted a heatmap which is displayed below. on the xaxis it shows time of the day and y axis shows date. I want to show xaxis at every hour instead of the random xlabels it displays here.
I tried following code but the resulting heatmap overrites all xlabels together:
t = pd.date_range(start='00:00:00', end='23:59:59', freq='60T').time
df = pd.DataFrame(index=t)
df.reset_index(inplace=True)
df['index'] = df['index'].astype('str')
sns_hm = sns.heatmap(data=mat, cbar=True, lw=0,cmap=colormap,xticklabels=df['index'])

The following code supposes mat is a dataframe with columns for some timestamps for each of a number of days. Each of the days, the same timestamps need to appear again.
After drawing the heatmap, the left and right limits of the x-axis are retrieved. Supposing these go from 0 to 24 hour, the range can be subdivided into 25 positions, one for each of the hours.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
from matplotlib.colors import ListedColormap, to_hex
# first, create some test data
df = pd.DataFrame()
df["date"] = pd.date_range('20220304', periods=19000, freq=DateOffset(seconds=54))
df["val"] = (((np.random.rand(len(df)) ** 100).cumsum() / 2).astype(int) % 2) * 100
df['day'] = df['date'].dt.strftime('%d-%m-%Y')
df['time'] = df['date'].dt.strftime('%H:%M:%S')
mat = df.pivot(index='day', columns='time', values='val')
colors = list(plt.cm.Greens(np.linspace(0.2, 0.9, 10)))
ax = sns.heatmap(mat, cmap=colors, cbar_kws={'ticks': range(0, 101, 10)})
xmin, xmax = ax.get_xlim()
tick_pos = np.linspace(xmin, xmax, 25)
tick_labels = [f'{h:02d}:00:00' for h in range(len(tick_pos))]
ax.set_xticks(tick_pos)
ax.set_xticklabels(tick_labels, rotation=90)
ax.set(xlabel='', ylabel='')
plt.tight_layout()
plt.show()
The left plot shows the default tick labels, the right plot the customized labels.

Related

Setting the axes date formatting of a pandas stacked bar subplot

I'm attempting to plot a pandas stacked bar plot with the x axis showing Months on the major ticks, or years on Jan 1, ideally with small ticks identifying the weeks but with no label.
I have a dataset with a datetime index that was then grouped by week and then I plot that dataset. If I don't attempt to control the settings the dates show up but are vertical and don't fit. So I used the set formatter to fix that but then the axes changed to 1970 as if following an index number instead of date. If I replace the pandas plotting with a regular bar chart, the "ConciseDateFormatter" works as desired/expected. But I wanted to use stacked with pandas as creating a regular stacked bar chart is a pain. I don't understand why I can't control pandas axes like I can a regular plot.
One thing I notice is that the index is shown as an object. If I convert it to to_datetime() it then adds 00:00 for times that I don't want on the axes or my data.
My data is a simple set of weekly random data:
date A B C D
3/20/2022 1.540765154 0.504616419 1.543679189 2.952934623
3/27/2022 1.781135128 4.594966635 4.799026389 3.499803401
4/3/2022 0.254059207 0.69835265 0.323039575 1.628138491
4/10/2022 3.112760301 0.287056897 4.372938373 0.130817579
4/17/2022 0.497273044 0.913246096 1.296612207 1.250610278
4/24/2022 1.370087689 3.124985109 4.322253295 4.49571603
5/1/2022 3.952629538 3.976896924 1.679311114 1.265443147
5/8/2022 3.470328161 1.266161308 3.990502436 1.364929959
5/15/2022 2.296588269 4.639761391 0.04685036 1.438471692
5/22/2022 3.443458637 2.66592719 0.968656871 2.349325343
5/29/2022 1.820278464 4.794211675 2.435710815 2.156110694
6/5/2022 4.328825266 0.049132356 1.842839099 3.665701299
6/12/2022 0.184631564 0.412976815 4.787477069 4.80052839
6/19/2022 4.846734385 3.471474741 1.808871854 2.440013553
6/26/2022 1.612870444 0.70191857 3.55713114 1.438699834
7/3/2022 2.896859156 4.025996887 0.209608767 4.174881655
Code:
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
maxval = 200
values = ['A','B','C','D']
cum = [v + '_CUM' for v in values]
df = pd.read_csv('test_data.csv', index_col='date', parse_dates=True,
infer_datetime_format=True)
#df.index = pd.to_datetime(df.index.date).strftime("%b %d")
df = df.join(df.cumsum(), rsuffix="_CUM")
df = df.join(df[cum]/maxval * 100, rsuffix="_LIFE")
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=False, squeeze=False,
facecolor='white')
axs = axs.flatten()
ax = axs[0]
df[values].plot.bar(ax=ax, grid=True, stacked=True, legend=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter
(ax.xaxis.get_major_locator()))
# ax.xaxis.set_tick_params(rotation = 0)
plt.show(block=False)

Plotting in a zooming in matplotlib subplot

This question is from this tutorial found here:
I want my plot to look like the one below but with time series data and the zoomed data not being x_lim , y_lim data but from a different source.
So in the plot above i would like the intraday data that is from a different source and the plot below would be daily data for some stock. But because they both have different source i cannot use a limit to zoom. For this i will be using yahoo datareader for daily and yfinance for intraday.
The code:
import pandas as pd
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
df = web.DataReader('goog', 'yahoo')
df.Close = pd.to_numeric(df['Close'], errors='coerce')
fig = plt.figure(figsize=(6, 5))
plt.subplots_adjust(bottom = 0., left = 0, top = 1., right = 1)
sub1 = fig.add_subplot(2,2,1)
sub1 = df.Close.plot()
sub2 = fig.add_subplot(2,1,2) # two rows, two columns, second cell
df.Close.pct_change().plot(ax =sub2)
sub2.plot(theta, y, color = 'orange')
con1 = ConnectionPatch(xyA=(df[1:2].index, df[2:3].Close), coordsA=sub1.transData,
xyB=(df[4:5].index, df[5:6].Close), coordsB=sub2.transData, color = 'green')
fig.add_artist(con1)
I am having trouble with xy coordinates. With the code above i am getting :
TypeError: Cannot cast array data from dtype('O') to dtype('float64')
according to the rule 'safe'
xyA=(df[1:2].index, df[2:3].Close)
What i had done here is that my xvalue is the date df[1:2].index and my y value is the price df[2:3].Close
Is converting the df to an array and then ploting my only option here? If there is any other way to get the ConnectionPatch to work kindly please advise.
df.dtypes
High float64
Low float64
Open float64
Close float64
Volume int64
Adj Close float64
dtype: object
The way matplotlib dates are plotted are by converting dates to floats as a number of days, starting with 0 on 1970-1-1, i.e. the POSIX timestamp zero. It’s different from that timestamp as it’s not the same resolution, i.e. “1” is a day instead of a second.
There’s 3 ways to compute that number,
either use matplotlib.dates.date2num
or use .toordinal() which gives you the right resolution and remove the offset corresponding to 1970-1-1,
or get the POSIX timestamp and divide by the number of seconds in a day:
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
con1 = ConnectionPatch(xyA=(df.index[0].toordinal() - pd.Timestamp(0).toordinal(), df['Close'].iloc[0]), coordsA=sub1.transData,
xyB=(df.index[1].toordinal() - pd.Timestamp(0).toordinal(), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(df.index[-1].timestamp() / 86_400, df['Close'].iloc[-1]), coordsA=sub1.transData,
xyB=(df.index[-1].timestamp() / 86_400, df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
You also need to make sure that you’re using values that are in range for the targeted axes, in your example you use Close values on sub2 which contains pct_change’d values.
Of course if you want the bottom of the boxes as in your example it’s easier to express the coordinates using the axes transform instead of the data transform:
from matplotlib.dates import date2num
con1 = ConnectionPatch(xyA=(0, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[1]), df['Change'].iloc[1]), coordsB=sub2.transData, color='green')
fig.add_artist(con1)
con2 = ConnectionPatch(xyA=(1, 0), coordsA=sub1.transAxes,
xyB=(date2num(df.index[-1]), df['Change'].iloc[-1]), coordsB=sub2.transData, color='green')
fig.add_artist(con2)
To plot your candlesticks, I’d recommend using the mplfinance (previously matplotlib.finance) package:
import mplfinance as mpf
sub3 = fig.add_subplot(2, 2, 2)
mpf.plot(df.iloc[30:70], type='candle', ax=sub3)
Putting all this together in a single script, it could look like this:
import pandas as pd, mplfinance as mpf, matplotlib.pyplot as plt
from pandas_datareader import data as web
from matplotlib.patches import ConnectionPatch
from matplotlib.dates import date2num, ConciseDateFormatter, AutoDateLocator
from matplotlib.ticker import PercentFormatter
# Get / compute data
df = web.DataReader('goog', 'yahoo')
df['Close'] = pd.to_numeric(df['Close'], errors='coerce')
df['Change'] = df['Close'].pct_change()
# Pick zoom range
zoom_start = df.index[30]
zoom_end = df.index[30 + 8 * 5] # 8 weeks ~ 2 months
# Create figures / axes
fig = plt.figure(figsize=(18, 12))
top_left = fig.add_subplot(2, 2, 1)
top_right = fig.add_subplot(2, 2, 2)
bottom = fig.add_subplot(2, 1, 2)
fig.subplots_adjust(hspace=.35)
# Plot all 3 data
df['Close'].plot(ax=bottom, linewidth=1, rot=0, title='Daily closing value', color='purple')
bottom.set_ylim(0)
df.loc[zoom_start:zoom_end, 'Change'].plot(ax=top_left, linewidth=1, rot=0, title='Daily Change, zoomed')
top_left.yaxis.set_major_formatter(PercentFormatter())
# Here instead of df.loc[...] use your intra-day data
mpf.plot(df.loc[zoom_start:zoom_end], type='candle', ax=top_right, xrotation=0, show_nontrading=True)
top_right.set_title('Last day OHLC')
# Put ConciseDateFormatters on all x-axes for fancy date display
for ax in fig.axes:
locator = AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(ConciseDateFormatter(locator))
# Add the connection patches
fig.add_artist(ConnectionPatch(
xyA=(0, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_start), df.loc[zoom_start, 'Close']), coordsB=bottom.transData,
color='green'
))
fig.add_artist(ConnectionPatch(
xyA=(1, 0), coordsA=top_left.transAxes,
xyB=(date2num(zoom_end), df.loc[zoom_end, 'Close']), coordsB=bottom.transData,
color='green'
))
plt.show()

Formatting time series axis in Seaborn

I'm learning Seaborn and trying to figure out how I can format an X axis for dates over a yearly period, so that it is readable. Let's assume we have a dataframe which holds weather measurements for each day of an entire year (365 rows).
sns.scatterplot(x = df_weather["DATE"], y = df_weather["MAX_TEMPERATURE_C"], color = 'red')
sns.scatterplot(x = df_weather["DATE"], y = df_weather["MIN_TEMPERATURE_C"], color = 'blue')
plt.show()
How can I ensure that the X axis labels are readable? Ideally, one label per month would be fine.
Thanks!
Not very sure what your column date is like, but maybe try something like below, first generate some data, I have the date as a string which I guess is something like yours:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
DATE = pd.date_range('2020-01-01', periods=365, freq='D').strftime('%y%y-%m-%d')
MIN = np.random.uniform(low=10,high=25,size = len(index))
MAX = MIN + np.random.uniform(low=5,high=10,size =len(index))
df = pd.DataFrame({'DATE':DATE,'MIN':MIN,'MAX':MAX})
Plot like you did using sns:
fig, ax = plt.subplots(figsize = (10,4))
ax = sns.scatterplot(x = "DATE", y = "MAX",data=df, color = 'red')
ax = sns.scatterplot(x = "DATE", y = "MIN",data=df, color = 'blue')
Now we define the start of the mths to define ticks:
mths = pd.date_range('2020-01-01', periods=12, freq='MS')
ax.set_xticks(mths.strftime('%y%y-%m-%d'))
ax.set(xticklabels=mths.strftime('%b'))
plt.show()
And it should look ok:

Python Matplotlib - Smooth plot line for x-axis with date values

Im trying to smooth a graph line out but since the x-axis values are dates im having great trouble doing this. Say we have a dataframe as follows
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
startDate = '2015-05-15'
endDate = '2015-12-5'
index = pd.date_range(startDate, endDate)
data = np.random.normal(0, 1, size=len(index))
cols = ['value']
df = pd.DataFrame(data, index=index, columns=cols)
Then we plot the data
fig, axs = plt.subplots(1,1, figsize=(18,5))
x = df.index
y = df.value
axs.plot(x, y)
fig.show()
we get
Now to smooth this line there are some usefull staekoverflow questions allready like:
Generating smooth line graph using matplotlib,
Plot smooth line with PyPlot
Creating numpy linspace out of datetime
But I just cant seem to get some code working to do this for my example, any suggestions?
You can use interpolation functionality that is shipped with pandas. Because your dataframe has a value for every index already, you can populate it with an index that is more sparse, and fill every previously non-existent indices with NaN values. Then, after choosing one of many interpolation methods available, interpolate and plot your data:
index_hourly = pd.date_range(startDate, endDate, freq='1H')
df_smooth = df.reindex(index=index_hourly).interpolate('cubic')
df_smooth = df_smooth.rename(columns={'value':'smooth'})
df_smooth.plot(ax=axs, alpha=0.7)
df.plot(ax=axs, alpha=0.7)
fig.show()
There is one workaround, we will create two plots - 1) non smoothed /interploted with date labels 2) smoothed without date labels.
Plot the 1) using argument linestyle=" " and convert the dates to be plotted on x-axis to string type.
Plot the 2) using the argument linestyle="-" and interpolating the x-axis and y-axis using np.linespace and make_interp_spline respectively.
Following is the use of the discussed workaround for your code.
# your initial code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.interpolate import make_interp_spline
%matplotlib inline
startDate = "2015-05-15"
endDate = "2015-07-5" #reduced the end date so smoothness is clearly seen
index = pd.date_range(startDate, endDate)
data = np.random.normal(0, 1, size=len(index))
cols = ["value"]
df = pd.DataFrame(data, index=index, columns=cols)
fig, axs = plt.subplots(1, 1, figsize=(40, 12))
x = df.index
y = df.value
# workaround by creating linespace for length of your x axis
x_new = np.linspace(0, len(df.index), 300)
a_BSpline = make_interp_spline(
[i for i in range(0, len(df.index))],
df.value,
k=5,
)
y_new = a_BSpline(x_new)
# plot this new plot with linestyle = "-"
axs.plot(
x_new[:-5], # removing last 5 entries to remove noise, because interpolation outputs large values at the end.
y_new[:-5],
"-",
label="interpolated"
)
# to get the date on x axis we will keep our previous plot but linestyle will be None so it won't be visible
x = list(x.astype(str))
axs.plot(x, y, linestyle=" ", alpha=0.75, label="initial")
xt = [x[i] for i in range(0,len(x),5)]
plt.xticks(xt,rotation="vertical")
plt.legend()
fig.show()
Resulting Plot
Overalpped plot to see the smoothing.
Depending on what exactly you mean by "smoothing," the easiest way can be the use of savgol_filter or something similar. Unlike with interpolated splines, this method means that the smoothed line does not pass through the measured points, effectively filtering out higher-frequency noise.
from scipy.signal import savgol_filter
...
windowSize = 21
polyOrder = 1
smoothed = savgol_filter(values, windowSize, polyOrder)
axes.plot(datetimes, smoothed, color=chart.color)
The higher the polynomial order value, the closer the smoothed line is to the raw data.
Here is an example.

Matplotlib: cbar.set_xticklabels has no effects

I've assigned the 365 days of a year to several clusters and I'm now trying to plot them on a heatmap.
My code works fine except that cbar.set_ticks(some_range) has no effects: the tick labels on my colorbar have the right text but the wrong position
Here is a MCVE
from datetime import date
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import matplotlib
import seaborn as sns
#create some random data
n_cluster = 4
index = pd.date_range('01/01/2016', end='31/12/2016', freq='1D')
df = pd.DataFrame(np.random.randint(0, n_cluster, len(index)),
index=index, columns=['cluster'])
pivot = df.pivot_table('cluster',
columns=[lambda x: x.weekofyear],
index= [lambda x: x.dayofweek])
#yticklabels of the heatmap
days = [date(2018, 1, d).strftime('%a')[:3] for d in range(1, 8)]
#get a discrete cmap
cmap = plt.cm.get_cmap('RdBu', n_cluster)
fig = plt.figure(figsize=(10,3))
gs = matplotlib.gridspec.GridSpec(1, 2, width_ratios=[50,1])
ax = plt.subplot(gs[0])
cbar = plt.subplot(gs[1])
sns.heatmap(pivot, square=True, cmap=cmap,
yticklabels=days, ax=ax, cbar_ax=cbar)
#There is something wrong here
cbar.set_yticks([i + 1/(2.0*n_cluster) for i in np.arange(0, 1, 1.0/n_cluster)])
#This one is ok
cbar.set_yticklabels(range(0, n_cluster))
Thanks for your help
As a workaround, the following adds the correct labels in the correct place,
cbar.yaxis.set_ticks([0.125, 0.375, 0.625, 0.875])
which looks like,
EDIT:
Or the more general suggestion of mfitzp,
cbar.yaxis.set_ticks([i + 1/(2.0*n_cluster)
for i in np.arange(0, 1, 1.0/n_cluster)])

Categories

Resources