I would like to make a heatmap from a pandas DataFrame (or Series) with DateTimeIndex so that I have hours on the x-axis and days on the y-axis, both ticklabels displayed in DateTimeIndex style.
If I do the following:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randint(10, size=4*24*200))
df.index = pd.date_range(start='2019-02-01 11:30:00', periods=200*24*4, freq='15min')
df['minute'] = df.index.hour*60 + df.index.minute
df['dayofyear'] = df.index.month + df.index.dayofyear
df = df.pivot(index='dayofyear', columns='minute', values=df.columns[0])
sns.heatmap(df)
The index obviously loses the DateTime format:
What I instead want is something like this (which I achieved with a complicated, not generalizable function that apparently doesn't even work properly):
Does someone know a neat way to create this kind of heatmap with python?
EDIT:
The function I created:
def plot_heatmap(df_in, plot_column=0, figsize=(20,12), vmin=None, vmax=None, cmap='jet', xlabel='hour (UTC)', ylabel='day', rotation=0, freq='5s'):
'''
Plots heatmap with date labels
df_in: pandas DataFrame od pandas Series
plot_column: column to plot if DataFrame has multiple columns
...
'''
# convert to DataFrame in case a Series is passed:
try:
df_in = df_in.to_frame()
except AttributeError:
pass
# make copy in order not to overrite input (in case input is an object attribute)
df = df_in.copy()
# pad missing dates:
idx = pd.date_range(df_in.index[0], df_in.index[-1], freq=freq)
df = df.reindex(idx, fill_value=np.nan)
df['hour'] = df.index.hour*3600 + df.index.minute*60 + df.index.second
df['dayofyear'] = df.index.month + df.index.dayofyear
# Create mesh for heatmap plotting:
pivot = df.pivot(index='dayofyear', columns='hour', values=df.columns[plot_column])
# plot
plt.figure(figsize=figsize)
sns.heatmap(pivot, cmap=cmap)
# set xticks
plt.xticks(np.linspace(0,pivot.shape[1],25), labels=range(25))
plt.xlabel(xlabel)
# set yticks
ylabels = []
ypositions = []
day0 = df['dayofyear'].unique().min()
for day in df['dayofyear'].unique():
day_delta = day-day0
# create pandas Timestamp
temp_tick = df.index[0] + pd.Timedelta('%sD' %day_delta)
# check wheter tick shall be shown or not
if temp_tick.day==1 or temp_tick.day==15:
temp_tick_nice = '%s-%s-%s' %(temp_tick.year, temp_tick.month, temp_tick.day)
ylabels.append(temp_tick_nice)
ypositions.append(day_delta)
plt.yticks(ticks=ypositions, labels=ylabels, rotation=0)
plt.ylabel(ylabel)
The date format going away because you did:
df['dayofyear'] = df.index.month + df.index.dayofyear
Here, both series are integers, so df['dayofyear'] is also integer-typed.
Instead, do:
df['dayofyear'] = df.index.date
Then you get as output:
The best solution I found now that also works if the frequency of the DatetimeIndex is <1min is the following:
import pandas as pd
import numpy as np
import seaborn as sns
freq = '30s'
df = pd.DataFrame(np.random.randint(10, size=4*24*200*20))
df.index = pd.date_range(start='2019-02-01 11:30:00', periods=200*24*4*20, freq=freq)
df['hour'] = df.index.strftime('%H:%M:%S')
df['dayofyear'] = df.index.date
df = df.pivot(index='dayofyear', columns='hour', values=df.columns[0])
df.columns = pd.DatetimeIndex(df.columns).strftime('%H:%M')
df.index = pd.DatetimeIndex(df.index).strftime('%m/%Y')
xticks_spacing = int(pd.Timedelta('2h')/pd.Timedelta(freq))
ax = sns.heatmap(df, xticklabels=xticks_spacing, yticklabels=30)
plt.yticks(rotation=0)
Which produces this result:
The only flaw yet is that the month ticks positions are not well defined and precise with this method...
Related
Data I'm working with: https://drive.google.com/file/d/1xb7icmocz-SD2Rkq4ykTZowxW0uFFhBl/view?usp=sharing
Hey everyone,
I am a bit stuck with editing a plot.
Basically, I would like my x value to display the months in the year, but it doesn't seem to work because of the data type (?). Do you have any idea how I could get my plot to have months in the x axis?
If you need more context about the data, please let me know!!!
Thank you!
Here's my code for the plot and the initial data modifications:
import matplotlib.pyplot as plt
import mplleaflet
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import numpy as np
df = pd.read_csv("data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv")
df['degrees']=df['Data_Value']/10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date']<'2015-01-01']
df3 = df[df['Date']>='2015-01-01']
max_temp = df2.groupby([(df2.Date.dt.month),(df2.Date.dt.day)])['degrees'].max()
min_temp = df2.groupby([(df2.Date.dt.month),(df2.Date.dt.day)])['degrees'].min()
max_temp2 = df3.groupby([(df3.Date.dt.month),(df3.Date.dt.day)])['degrees'].max()
min_temp2 = df3.groupby([(df3.Date.dt.month),(df3.Date.dt.day)])['degrees'].min()
max_temp.plot(x ='Date', y='degrees', kind = 'line')
min_temp.plot(x ='Date',y='degrees', kind= 'line')
plt.fill_between(range(len(min_temp)),min_temp, max_temp, color='C0', alpha=0.2)
ax = plt.gca()
ax.set(xlabel="Date",
ylabel="Temperature",
title="Extreme Weather in 2015")
plt.legend()
plt.tight_layout()
x = plt.gca().xaxis
for item in x.get_ticklabels():
item.set_rotation(45)
plt.show()
Plot I'm getting:
Option 1 (Most Similar Approach)
Change the index based on month abbreviations using Index.map and calendar
This is just for df2:
import calendar
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
max_temp = df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees'].max()
min_temp = df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees'].min()
# Update the index to be the desired display format for x-axis
max_temp.index = max_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
min_temp.index = min_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
max_temp.plot(x='Date', y='degrees', kind='line')
min_temp.plot(x='Date', y='degrees', kind='line')
plt.fill_between(range(len(min_temp)), min_temp, max_temp,
color='C0', alpha=0.2)
ax = plt.gca()
ax.set(xlabel="Date", ylabel="Temperature", title="Extreme Weather 2005-2014")
x = plt.gca().xaxis
for item in x.get_ticklabels():
item.set_rotation(45)
plt.margins(x=0)
plt.legend()
plt.tight_layout()
plt.show()
As an aside: the title "Extreme Weather in 2015" is incorrect because this data includes all years before 2015. This is "Extreme Weather 2005-2014"
The year range can be checked with min and max as well:
print(df2.Date.dt.year.min(), '-', df2.Date.dt.year.max())
# 2005 - 2014
The title could be programmatically generated with:
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
Option 2 (Simplifying groupby step)
Simplify the code using groupby aggregate to create a single DataFrame then convert the index in the same way as above:
import calendar
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
# Get Max and Min Degrees in Single Groupby
df2_temp = (
df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees']
.agg(['max', 'min'])
)
# Convert Index to whatever display format is desired:
df2_temp.index = df2_temp.index.map(lambda x: f'{calendar.month_abbr[x[0]]}')
# Plot
ax = df2_temp.plot(
kind='line', rot=45,
xlabel="Date", ylabel="Temperature",
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
)
# Fill between
plt.fill_between(range(len(df2_temp)), df2_temp['min'], df2_temp['max'],
color='C0', alpha=0.2)
plt.margins(x=0)
plt.tight_layout()
plt.show()
Option 3 (Best overall functionality)
Convert the index to a datetime using pd.to_datetime. Choose any leap year to uniform the data (it must be a leap year so Feb-29 does not raise an error). Then set the set_major_formatter using the format string %b to use the month abbreviation:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("...")
df['degrees'] = df['Data_Value'] / 10
df['Date'] = pd.to_datetime(df['Date'])
df2 = df[df['Date'] < '2015-01-01']
# Get Max and Min Degrees in Single Groupby
df2_temp = (
df2.groupby([df2.Date.dt.month, df2.Date.dt.day])['degrees']
.agg(['max', 'min'])
)
# Convert to DateTime of Same Year
# (Must be a leap year so Feb-29 doesn't raise an error)
df2_temp.index = pd.to_datetime(
'2000-' + df2_temp.index.map(lambda s: '-'.join(map(str, s)))
)
# Plot
ax = df2_temp.plot(
kind='line', rot=45,
xlabel="Date", ylabel="Temperature",
title=f"Extreme Weather {df2.Date.dt.year.min()}-{df2.Date.dt.year.max()}"
)
# Fill between
plt.fill_between(df2_temp.index, df2_temp['min'], df2_temp['max'],
color='C0', alpha=0.2)
# Set xaxis formatter to month abbr with the %b format string
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
plt.tight_layout()
plt.show()
The benefit of this approach is that the index is a datetime and therefore will format better than the string representations of options 1 and 2.
I have a pandas dataframe with 5 years daily time series data. I want to make a monthly plot from whole datasets so that the plot should shows variation (std or something else) within monthly data. Simillar figure I tried to create but did not found a way to do that:
for example, I have a sudo daily precipitation data:
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'pre':ppt},index=dates)
Manually I can do it like:
one = df['pre']['1999-12-01':'2000-11-29'].values
two = df['pre']['2000-12-01':'2001-11-30'].values
three = df['pre']['2001-12-01':'2002-11-30'].values
four = df['pre']['2002-12-01':'2003-11-30'].values
five = df['pre']['2003-12-01':'2004-11-29'].values
df = pd.DataFrame({'2000':one,'2001':two,'2002':three,'2003':four,'2004':five})
std = df.std(axis=1)
lw = df.mean(axis=1)-std
up = df.mean(axis=1)+std
plt.fill_between(np.arange(365), up, lw, alpha=.4)
I am looking for the more pythonic way to do that instead of doing it manually!
Any helps will be highly appreciated
If I'm understanding you correctly you'd like to plot your daily observations against a monthly periodic mean +/- 1 standard deviation. And that's what you get in my screenshot below. Nevermind the lackluster design and color choice. We'll get to that if this is something you can use. And please notice that I've replaced your ppt = np.random.rand(1900) with ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum() just to make the data look a bit more like your screenshot.
Here I've aggregated the daily data by month, and retrieved mean and standard deviation for each month. Then I've merged that data with the original dataframe so that you're able to plot both the source and the grouped data like this:
# imports
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import numpy as np
# Data that matches your setup, but with a random
# seed to make it reproducible
np.random.seed(42)
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
#ppt = np.random.rand(1900)
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'ppt':ppt},index=dates)
# A subset
df = df.tail(200)
# Add a yearmonth column
df['YearMonth'] = df.index.map(lambda x: 100*x.year + x.month)
# Create aggregated dataframe
df2 = df.groupby('YearMonth').agg(['mean', 'std']).reset_index()
df2.columns = ['YearMonth', 'mean', 'std']
# Merge original data and aggregated data
df3 = pd.merge(df,df2,how='left',on=['YearMonth'])
df3 = df3.set_index(df.index)
df3 = df3[['ppt', 'mean', 'std']]
# Function to make your plot
def monthplot():
fig, ax = plt.subplots(1)
ax.set_facecolor('white')
# Define upper and lower bounds for shaded variation
lower_bound = df3['mean'] + df3['std']*-1
upper_bound = df3['mean'] + df3['std']
fig, ax = plt.subplots(1)
ax.set_facecolor('white')
# Source data and mean
ax.plot(df3.index,df3['mean'], lw=0.5, color = 'red')
ax.plot(df3.index, df3['ppt'], lw=0.1, color = 'blue')
# Variation and shaded area
ax.fill_between(df3.index, lower_bound, upper_bound, facecolor='grey', alpha=0.5)
fig = ax.get_figure()
# Assign months to X axis
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
X = plt.gca().xaxis
X.set_major_locator(locator)
X.set_major_formatter(fmt)
fig.show()
monthplot()
Check out this post for more on axis formatting and this post on how to add a YearMonth column.
In your example, you have a few mistakes, but I think it isn't important.
Do you want all years to be on the same graphic (like in your example)? If you do, this may help you:
df['month'] = df.index.strftime("%m-%d")
df['year'] = df.index.year
df.set_index(['month']).drop(['year'],1).plot()
I looked at the responses to this original question (see here but doesn't seem to solve my issue.)
import pandas as pd
import pandas_datareader.data
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv(mypath + filename, \
skiprows=4,index_col=0,usecols=['Day', 'Cushing OK Crude Oil Future Contract 1 Dollars per Barrel'], \
skipfooter=0,engine='python')
df.index = pd.to_datetime(df.index)
fig = plt.figure(figsize=plt.figaspect(0.25))
ax = fig.add_subplot(1,1,1)
ax.grid(axis='y',color='lightgrey', linestyle='--', linewidth=0.5)
ax.grid(axis='x',color='lightgrey', linestyle='none', linewidth=0.5)
df['Cushing OK Crude Oil Future Contract 1 Dollars per
Barrel'].plot(ax=ax,grid = True, \
color='blue',fontsize=14,legend=False)
plt.show()
The graph turns out fine but I can't figure out a way to show only a certain date range. I have tried everything.
type(df) = pandas.core.frame.DataFrame
type(df.index) = pandas.core.indexes.datetimes.DatetimeIndex
also, the format for the column 'Day' is YYYY-MM-DD
Assuming you have a datetime index on your dataframe (it looks that way), you can slice using .loc like so:
% matplotlib inline
import pandas as pd
import numpy as np
data = pd.DataFrame({'values': np.random.rand(31)}, index = pd.date_range('2018-01-01', '2018-01-31'))
# Plot the entire dataframe.
data.plot()
# Plot a slice of the dataframe.
data.loc['2018-01-05':'2018-01-10', 'values'].plot(legend = False)
Gives:
The orange series is the slice.
In pandas, my dataframe has the following structure:
raw_data = {'date': ['1975-07-03','1975-07-03','1975-07-04','1975-08-01'],
'time': [515,1014,1401,1201], 'value': [1,-1,2,11]}
df = pd.DataFrame(raw_data, columns = ['date', 'time', 'value'])
This question is similar to this one, but I cannot figure out how to modify it.
I need to plot the values in the column "value" versus the two columns "date" and "time". Note that here "time" really is hh:mm.
Edit
Since the year does not change on the x-axis I should have date and time in the format "Month-Day Hour:Minute"
IIUC:
(df.assign(date=pd.to_datetime(df['date'] + ' ' + df['time'].astype(str).replace(r'(\d){2})(\d{2})', r'\1:\2')))
.plot(x='date', y='value'))
Extending the other answer to include marking specific data points as ticklabels/ticks can be done by using date2num to convert the dates into their tick positions. There are probably better ways to manipulate the date formatting in matplotlib but this method will work.
EDIT: Ensure padding of hhmm if less than 4 characters, more ideomatic pandas
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
raw_data = {'date': ['1975-07-03','1975-07-03','1975-07-04','1975-08-01'],
'time': [415,1014,1401,1201], 'value': [1,-1,2,11]}
def fix_time_str(df):
df['date'] = (df['date'] + ' ' +
df['time'].apply(lambda x: str(x).zfill(4)).replace(r'(\d){2})(\d{2})', r'\1:\2'))
return df
df = (pd.DataFrame(raw_data, columns = ['date', 'time', 'value'])).pipe(fix_time_str).assign(date= lambda x: pd.to_datetime(x['date']))
fig, ax = plt.subplots(1,1, figsize = (8,5))
xtick_locs = mpl.dates.date2num(df['date'].tolist())
xtick_labels = df['date'].astype(str).tolist()
xtick_labels = ["{}-{}".format(*i.split('-')[1:])[:-3] for i in xtick_labels]
ax.plot(df['date'], df['value'])
ax.set_xticks(xtick_locs)
ax.set_xticklabels(xtick_labels)
ax.tick_params(axis='x', rotation=90)
The DataFrame has timestamped data and I want to visually compare the daily temporal evolution of the data. If I groupby day and plot the graphs; they are obviously displaced horizontaly in time due to differences in their dates.
I want to plot a date agnostic graph of the day wise trends on a time only axis. Towards that end I have resorted to shifting the data back by an appropriate number of days as demonstrated in the following code
import pandas as pd
import datetime
import matplotlib.pyplot as plt
index1 = pd.date_range('20141201', freq='H', periods=2)
index2 = pd.date_range('20141210', freq='2H', periods=4)
index3 = pd.date_range('20141220', freq='3H', periods=5)
index = index1.append([index2, index3])
df = pd.DataFrame(list(range(1, len(index)+1)), index=index, columns=['a'])
gbyday = df.groupby(df.index.day)
first_day = gbyday.keys.min() # convert all data to this day
plt.figure()
ax = plt.gca()
for n,g in gbyday:
g.shift(-(n-first_day+1), 'D').plot(ax=ax, style='o-', label=str(n))
plt.show()
resulting in the following plot
Question: Is this the pandas way of doing it? In other words how can I achieve this more elegantly?
You can select the hour attribute of the index after grouping like this:
In [36]: fig, ax = plt.subplots()
In [35]: for label, s in gbyday:
....: ax.plot(s.index.hour, s, 'o-', label=label)
It might be a little too late for this answer, but in case anyone is still looking for it.
This solution works on different months (it was an issue if using the code from the original question) and keeps fractional hours.
import pandas as pd
import matplotlib.pyplot as plt
index0 = pd.date_range('20141101', freq='H', periods=2)
index1 = pd.date_range('20141201', freq='H', periods=2)
index2 = pd.date_range('20141210', freq='2H', periods=4)
index3 = pd.date_range('20141220', freq='3H', periods=5)
index = index1.append([index2, index3, index0])
df = pd.DataFrame(list(range(1, len(index)+1)), index=index, columns=['a'])
df['time_hours'] = (df.index - df.index.normalize()) / pd.Timedelta(hours=1)
fig, ax = plt.subplots()
for n,g in df.groupby(df.index.normalize()):
ax.plot(g['time_hours'], g['a'], label=n, marker='o')
ax.legend(loc='best')
plt.show()