I have a CSV which was generated with this script which looks like this:
date,last_activity
2021-11-03 07:39:14,160
2021-11-03 07:39:44,1594
2021-11-03 07:57:15,4270
2021-11-03 07:57:45,23201
2021-11-03 07:58:15,7
2021-11-03 07:58:45,1015
2021-11-03 07:59:15,2
2021-11-03 07:59:45,3496
2021-11-03 08:28:16,6093
2021-11-03 08:28:46,5513
2021-11-03 08:31:46,16639
I would like to visualize those timestamps as an "activity bar", e.g. like this:
Hence:
The x-axis should show the time / the date
I want to be able to add a title over it
The red stripes indicate when there is a date timestamp.
To make it simpler, the last_activity could be ignored.
The simplest solution I can imagine would be to use one pixel per minute of the day. I can round 2021-11-03 07:39:14 to 2021-11-03 07:39 and just say "I've seen a timestamp for 7:39 -> color that pixel". However, I would only know how to do this directly with matplotlib (pixel-by-pixel). Is there a simpler way with Pandas?
EDIT: I just checked out plotly and it seems to be built well enough to handle your exact problem. The solution using this library completely knocks out my previous attempt with matplotlib in my opinion.
plotly supports using date (so no workaround using timestamps required), and has hover-annotation as well. Here is the code:
import pandas
import dateutil
import plotly.graph_objects as go
# Load and transform the data
filedata = pandas.read_csv("test.csv")
datelist = filedata["date"].to_list()
timestamplist = [dateutil.parser.parse(x) for x in datelist]
length = len(datelist)
# Create the figure
fig = go.Figure()
fig.add_trace(
go.Scatter(x=timestamplist, y=[0] * length, mode="markers", marker_size=20)
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(
showgrid=False,
zeroline=True,
zerolinecolor="black",
zerolinewidth=3,
showticklabels=False,
)
fig.update_layout(height=200, plot_bgcolor="white", title="My Timeline Title")
fig.show()
And here is the result. Note that X-axis has date markers as you wanted, and the annotation also appears on hovering the mouse pointer over the data points.
Previous/Old answer using matplotlib:
You can use matplotlib to plot the timeline. In order to place the marks correctly, we will need to convert it to timestamps.
However, you also want to view the date associated with it. To get around that, I can suggest to use mplcursors library which prepares an annotation when you click on the datapoint.
You can probably use eventplot in matplotlib to plot this. Here's my rookie attempt:
import pandas
import dateutil.parser # For parsing date
import matplotlib.pyplot as plt # for plotting
import mplcursors # For clickable annontation
filedata = pandas.read_csv('test.csv') # Read database
datelist = filedata['date'].to_list()
start = dateutil.parser.parse(datelist[0]).timestamp() # Timestamp of the first date. PS: I am assuming that the data is sorted, and I'm taking the first element only.
# Normalize all timestamps by subtracting the first.
timestamplist = [round(dateutil.parser.parse(x).timestamp() - start) for x in datelist]
plt.title('My timeline plot') # Title of the plot
linept = plt.eventplot(timestamplist, orientation='horizontal') # Insert a line for every timestamp
x = mplcursors.cursor(linept) # Adds annontation to every point
x.connect("add", lambda sel: sel.annotation.set_text(f'{datelist[timestamplist.index(round(sel.target[0]))]}')) # Display corresponding date
#plt.xticks(date(datelist))
plt.tick_params(labelleft=False, left=False) # Remove the Y axis on the left
plt.show() # Display plot
This gives such a plot:
Related
So I have a dataframe with 3 columns: date, price, text
import pandas as pd
from datetime import datetime
import random
columns = ('dates','prices','text')
datelist = pd.date_range(datetime.today(), periods=5).tolist()
prices = []
for i in range(0, 5):
prices.append(random.randint(50, 60))
text =['AAA','BBB','CCC','DDD','EEE']
df = pd.DataFrame({'dates': datelist, 'price':prices, 'text':text})
dates price text
0 2022-11-23 14:11:51.142574 51 AAA
1 2022-11-24 14:11:51.142574 57 BBB
2 2022-11-25 14:11:51.142574 52 CCC
3 2022-11-26 14:11:51.142574 51 DDD
4 2022-11-27 14:11:51.142574 59 EEE
I want to plot date and price on a line chart, but when I hover over the line I want it to show the text from the row corresponding to that date.
eg when I hover over the point corresponding to 2022-11-27 I want the text to show 'EEE'
ive tried a few things in matplotlib etc but can only get data from the x and y axis to show but I cant figure out how to show data from a different column.
You could use Plotly.
import plotly.graph_objects as go
fig = go.Figure(data=go.Scatter(x=df['dates'], y=df['price'], mode='lines+markers', text=df['text']))
fig.show()
You should be aware that cursor & dataframe indexing will probably work well with points on a scatter plot, but it is a little bit trickier to handle a lineplot.
With a lineplot, matplotlib draws the line between 2 data points (basically, it's linear interpolation), so a specific logic must be taken care of to:
specify the intended behavior
implement the corresponding mouseover behavior when the cursor lands "between" 2 data points.
The lib/links below may provide tools to handle scatter plots and lineplots, but I am not expert enough to point you to this specific part in either the SO link nor the mplcursors link.
(besides, the exact intended behavioor was not clearly stated in your initial question; consider editing/clarifying)
So, alternatively to DankyKang's answer, have a look at this SO question and answers that cover a large panel of possibilities for mouseover: How to add hovering annotations to a plot
A library worth noting is this one: https://mplcursors.readthedocs.io/en/stable/
Quoting:
mplcursors provides interactive data selection cursors for Matplotlib. It is inspired from mpldatacursor, with a much simplified API.
mplcursors requires Python 3, and Matplotlib≥3.1.
Specifically this example based on dataframes: https://mplcursors.readthedocs.io/en/stable/examples/dataframe.html
Quoting:
DataFrames can be used similarly to any other kind of input. Here, we generate a scatter plot using two columns and label the points using all columns.
This example also applies a shadow effect to the hover panel.
copy-pasta of code example, should this answer be considered not complete enough :
from matplotlib import pyplot as plt
from matplotlib.patheffects import withSimplePatchShadow
import mplcursors
from pandas import DataFrame
df = DataFrame(
dict(
Suburb=["Ames", "Somerset", "Sawyer"],
Area=[1023, 2093, 723],
SalePrice=[507500, 647000, 546999],
)
)
df.plot.scatter(x="Area", y="SalePrice", s=100)
def show_hover_panel(get_text_func=None):
cursor = mplcursors.cursor(
hover=2, # Transient
annotation_kwargs=dict(
bbox=dict(
boxstyle="square,pad=0.5",
facecolor="white",
edgecolor="#ddd",
linewidth=0.5,
path_effects=[withSimplePatchShadow(offset=(1.5, -1.5))],
),
linespacing=1.5,
arrowprops=None,
),
highlight=True,
highlight_kwargs=dict(linewidth=2),
)
if get_text_func:
cursor.connect(
event="add",
func=lambda sel: sel.annotation.set_text(get_text_func(sel.index)),
)
return cursor
def on_add(index):
item = df.iloc[index]
parts = [
f"Suburb: {item.Suburb}",
f"Area: {item.Area:,.0f}m²",
f"Sale price: ${item.SalePrice:,.0f}",
]
return "\n".join(parts)
show_hover_panel(on_add)
plt.show()
I've a time series (typically energy usage) recorded over a range of days. Since usage tends to be different over the weekend I want to highlight the weekends.
I've done what seems sensible:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import random
#Create dummy data.
start=datetime.datetime(2022,10,22,0,0)
finish=datetime.datetime(2022,11,7,0,0)
def randomWalk():
i=0
while True:
i=i+random.random()-0.5
yield i
s = pd.Series({i: next(randomWalk()) for i in pd.date_range(start, finish,freq='h')})
# Plot it.
plt.figure(figsize=[12, 8]);
s.plot();
# Color the labels according to the day of week.
for label, day in zip(plt.gca().xaxis.get_ticklabels(which='minor'),
pd.date_range(start,finish,freq='d')):
label.set_color('red' if day.weekday() > 4 else 'black')
But what I get is wrong. Two weekends appear one off, and the third doesn't show at all.
I've explored the 'label' objects, but their X coordinate is just an integer, and doesn't seem meaningful. Using DateFormatter just gives nonsense.
How would be best to fix this, please?
OK - since matplotlib only provides the information we need to the Tick Label Formatter functions, that's what we have to use:
minorLabels=plt.gca().xaxis.get_ticklabels(which='minor')
majorLabels=plt.gca().xaxis.get_ticklabels(which='major')
def MinorFormatter(dateInMinutes, index):
# Formatter: first param is value (date in minutes, would you believe), second is which item in order.
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
minorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return day.day
def MajorFormatter(dateInMinutes, index):
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
majorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return "" if (index==0 or index==len(majorLabels)-1) else day.strftime("%d\n%b\n%Y")
plt.gca().xaxis.set_minor_formatter(MinorFormatter)
plt.gca().xaxis.set_major_formatter(MajorFormatter)
Pretty clunky, but it works. Could be fragile, though - anyone got a better answer?
Matplotlib is meant for scientific use and although technically styling is possible, it's really hard and not worth the effort.
Consider using Plotly instead of Matplotlib as below:
#pip install plotly in terminal
import plotly.express as px
# read plotly express provided sample dataframe
df = px.data.tips()
# create plotly figure with color_discrete_map property specifying color per day
fig = px.bar(df, x="day", y="total_bill", color='day',
color_discrete_map={"Sat": "orange", "Sun": "orange", "Thur": "blue", "Fri": "blue"}
)
# send to browser
fig.show()
Solves your problem using a lot fewer lines. Only thing here is you need to make sure your data is in a Pandas DataFrame rather than Series with column names which you can pass into plotly.express.bar or scatter plot.
I have a plotly graph of the EUR/JPY exchange rate across a few months in 15 minute time intervals, so as a result, there is no data from friday evenings to sunday evenings.
Here is a portion of the data, note the skip in the index (type: DatetimeIndex) over the weekend:
Plotting this data in plotly results in a gap over the missing dates Using the dataframe above:
import plotly.graph_objs as go
candlesticks = go.Candlestick(x=data.index, open=data['Open'], high=data['High'],
low=data['Low'], close=data['Close'])
fig = go.Figure(layout=cf_layout)
fig.add_trace(trace=candlesticks)
fig.show()
Ouput:
As you can see, there are gaps where the missing dates are. One solution I've found online is to change the index to text using:
data.index = data.index.strftime("%d-%m-%Y %H:%M:%S")
and plotting it again, which admittedly does work, but has it's own problem. The x-axis labels look atrocious:
I would like to produce a graph that plots a graph like in the second plot where there are no gaps, but the x-axis is displayed like as it is on the first graph. Or at least displayed in a much more concise and responsive format, as close to the first graph as possible.
Thank you in advance for any help!
Even if some dates are missing in your dataset, plotly interprets your dates as date values, and shows even missing dates on your timeline. One solution is to grab the first and last dates, build a complete timeline, find out which dates are missing in your original dataset, and include those dates in:
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
This will turn this figure:
Into this:
Complete code:
import plotly.graph_objects as go
from datetime import datetime
import pandas as pd
import numpy as np
# sample data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
# remove some dates to build a similar case as in the question
df = df.drop(df.index[75:110])
df = df.drop(df.index[210:250])
df = df.drop(df.index[460:480])
# build complete timepline from start date to end date
dt_all = pd.date_range(start=df['Date'].iloc[0],end=df['Date'].iloc[-1])
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d") for d in pd.to_datetime(df['Date'])]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d").tolist() if not d in dt_obs]
# make fiuge
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['AAPL.Open'], high=df['AAPL.High'],
low=df['AAPL.Low'], close=df['AAPL.Close'])
])
# hide dates with no values
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
fig.update_layout(yaxis_title='AAPL Stock')
fig.show()
Just in case someone here wants to remove gaps for outside trading hours and weekends,
As shown below, using rangebreaks is the way to do it.
fig = go.Figure(data=[go.Candlestick(x=df['date'], open=df['Open'], high=df['High'], low=df['Low'], close=df['Close'])])
fig.update_xaxes(
rangeslider_visible=True,
rangebreaks=[
# NOTE: Below values are bound (not single values), ie. hide x to y
dict(bounds=["sat", "mon"]), # hide weekends, eg. hide sat to before mon
dict(bounds=[16, 9.5], pattern="hour"), # hide hours outside of 9.30am-4pm
# dict(values=["2020-12-25", "2021-01-01"]) # hide holidays (Christmas and New Year's, etc)
]
)
fig.update_layout(
title='Stock Analysis',
yaxis_title=f'{symbol} Stock'
)
fig.show()
here's Plotly's doc.
thanks for the amazing sample! works on daily data but with intraday / 5min data rangebreaks only leave one day on chart
# build complete timepline
dt_all = pd.date_range(start=df.index[0],end=df.index[-1], freq="5T")
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in pd.to_datetime(df.index, format="%Y-%m-%d %H:%M:%S")]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
To fix problem with intraday data, you can use the dvalue parameter of rangebreak with the right ms value.
For example, 1 hour = 3.6e6 ms, so use dvalue with this value.
Documentation here : https://plotly.com/python/reference/layout/xaxis/
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks, dvalue=3.6e6)])
I am trying to plot a graph with dates (pandas datetime) on the x axis. However, they are plotting in numerical format instead (showing up as exponents).
Example of dates:
0 2014-05-01
1 2014-05-02
2 2014-05-03
3 2014-05-04
4 2014-05-05
Name: date, dtype: datetime64[ns]
Code for plotly:
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
data = [trace1]
py.iplot(data)
My x-axis:
Not sure how to fix this??
You need to add layout and specify parameter xaxis in it. Such as here.
So try this:
# Create trace
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
# Add trace in data
data = [trace1]
# Create layout. With layout you can customize plotly plot
layout = dict(title = 'Scatter',
# Add what you want to see at xaxis
xaxis = df_iso_h.date
)
#Do not forget added layout to fig!
fig = dict(data=data, layout=layout)
# Plot scatter
py.iplot(data, filename="scatterplot")
This should help you.
Update: Try to convert datetime column with strftime (new column should be in object format!):
df_iso_h["date"] = df_iso_h["date"].dt.strftime("%d-%m-%Y")
If not worked, add this column in xaxis. Maybe plotly do not support datetime format yyyy-mm-dd... Notice, you xaxis will be looks like 01-05-2014
Figured it out... Plotly does not take pandas datetime, so I had to convert my pandas datetime to python datetime.datetime or datetime.date.
It seems that this was a regression introduced in plotly.py Version 3.2.0 and has been fixed in Version 3.2.1
You can now simply pass the pandas datetime column to plotly and it will handle the proper conversion for you like in the past.
See https://github.com/plotly/plotly.py/issues/1160
I'm working on a project with loads of temperature data and I'm currently processing and plotting all of my data. However, I keep falling foul when I try to set x_lims on my plots between a time1 (9:00) and time2 (21:00)
Data background:
The sensor has collected data every second for two weeks and I've split the main data file into smaller daily files (e.g. dayX). Each day contains a timestamp (column = 'timeStamp') and a mean temperature (column = 'meanT').
The data for each day has been presliced just slightly over the window I want to plot (i.e. dayX contains data from 8:55:00 - 21:05:00). The dataset contains NaN values at some points as the sensors were not worn and data needed to be discarded.
Goal:
What I want to do is to be able to plot the dayX data between a set time interval (x_lim = 9:00 - 21:00). As I have many days of data, I eventually want to plot each day using the same x axis (I want them as separate figures however, not subplots), but each day has different gaps in the main data set, so I want to set constant x lims. As I have many different days of data, I'd rather not have to specify the date as well as the time.
Example data:
dayX =
timeStamp meanT
2018-05-10 08:55:00 NaN
. .
. .
. .
2018-05-10 18:20:00 32.4
. .
. .
. .
2018-05-10 21:05:00 32.0
What I've tried:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = pd.read_csv('path/to/file/dayX.csv)
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format=%Y %m %d %H:%M:%S.%f')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
fig.autofmt_xdate()
plt.show()
Which gives:
If I remove the limit line however, the data plots okay, but the limits are automatically selected
# Get rid of this line:
ax1.set_xlim(pd.Timestamp('9:00'), pd.Timestamp('21:00'))
# Get this:
I'm really not sure why this is going wrong or what else I should be trying.
Your timeStamp is a datetime object. All you got to do is pass the datetime objects as the limits.
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.dates import date2num, DateFormatter
dayX = df
dayX['timeStamp'] = pd.to_datetime(dayX['timeStamp'], format='%Y-%m-%d %H:%M:%S')
fig, ax1 = plt.subplots(1,1)
ax1.plot(dayX['timeStamp'], dayX['meanT'])
ax1.xaxis.set_major_formatter(DateFormatter('%H:%M'))
ax1.set_xlim(df['timeStamp'].min().replace(hour=9), df['timeStamp'].min().replace(hour=21))
fig.autofmt_xdate()
plt.show()
Output:
You probably need to construct a full timestamp object since it'll default to today's date, which has no data in your case. the following snippet shoudl replace the ax1.set_xlim line in your code, and should also work for starting and ending multiday time ranges on specific hours of your choosing.
min_h = 9 # hours
max_h = 21 # hours
start = dayX['timeStamp'].min()
end = dayX['timeStamp'].max()
xmin = pd.Timestamp(year=start.year, month=start.month, day=start.day, hour=min_h)
xmax = pd.Timestamp(year=end.year, month=end.month, day=end.day, hour=max_h)
ax1.set_xlim(xmin, xmax)