I am trying to plot a graph with dates (pandas datetime) on the x axis. However, they are plotting in numerical format instead (showing up as exponents).
Example of dates:
0 2014-05-01
1 2014-05-02
2 2014-05-03
3 2014-05-04
4 2014-05-05
Name: date, dtype: datetime64[ns]
Code for plotly:
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
data = [trace1]
py.iplot(data)
My x-axis:
Not sure how to fix this??
You need to add layout and specify parameter xaxis in it. Such as here.
So try this:
# Create trace
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
# Add trace in data
data = [trace1]
# Create layout. With layout you can customize plotly plot
layout = dict(title = 'Scatter',
# Add what you want to see at xaxis
xaxis = df_iso_h.date
)
#Do not forget added layout to fig!
fig = dict(data=data, layout=layout)
# Plot scatter
py.iplot(data, filename="scatterplot")
This should help you.
Update: Try to convert datetime column with strftime (new column should be in object format!):
df_iso_h["date"] = df_iso_h["date"].dt.strftime("%d-%m-%Y")
If not worked, add this column in xaxis. Maybe plotly do not support datetime format yyyy-mm-dd... Notice, you xaxis will be looks like 01-05-2014
Figured it out... Plotly does not take pandas datetime, so I had to convert my pandas datetime to python datetime.datetime or datetime.date.
It seems that this was a regression introduced in plotly.py Version 3.2.0 and has been fixed in Version 3.2.1
You can now simply pass the pandas datetime column to plotly and it will handle the proper conversion for you like in the past.
See https://github.com/plotly/plotly.py/issues/1160
Related
I have a CSV which was generated with this script which looks like this:
date,last_activity
2021-11-03 07:39:14,160
2021-11-03 07:39:44,1594
2021-11-03 07:57:15,4270
2021-11-03 07:57:45,23201
2021-11-03 07:58:15,7
2021-11-03 07:58:45,1015
2021-11-03 07:59:15,2
2021-11-03 07:59:45,3496
2021-11-03 08:28:16,6093
2021-11-03 08:28:46,5513
2021-11-03 08:31:46,16639
I would like to visualize those timestamps as an "activity bar", e.g. like this:
Hence:
The x-axis should show the time / the date
I want to be able to add a title over it
The red stripes indicate when there is a date timestamp.
To make it simpler, the last_activity could be ignored.
The simplest solution I can imagine would be to use one pixel per minute of the day. I can round 2021-11-03 07:39:14 to 2021-11-03 07:39 and just say "I've seen a timestamp for 7:39 -> color that pixel". However, I would only know how to do this directly with matplotlib (pixel-by-pixel). Is there a simpler way with Pandas?
EDIT: I just checked out plotly and it seems to be built well enough to handle your exact problem. The solution using this library completely knocks out my previous attempt with matplotlib in my opinion.
plotly supports using date (so no workaround using timestamps required), and has hover-annotation as well. Here is the code:
import pandas
import dateutil
import plotly.graph_objects as go
# Load and transform the data
filedata = pandas.read_csv("test.csv")
datelist = filedata["date"].to_list()
timestamplist = [dateutil.parser.parse(x) for x in datelist]
length = len(datelist)
# Create the figure
fig = go.Figure()
fig.add_trace(
go.Scatter(x=timestamplist, y=[0] * length, mode="markers", marker_size=20)
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(
showgrid=False,
zeroline=True,
zerolinecolor="black",
zerolinewidth=3,
showticklabels=False,
)
fig.update_layout(height=200, plot_bgcolor="white", title="My Timeline Title")
fig.show()
And here is the result. Note that X-axis has date markers as you wanted, and the annotation also appears on hovering the mouse pointer over the data points.
Previous/Old answer using matplotlib:
You can use matplotlib to plot the timeline. In order to place the marks correctly, we will need to convert it to timestamps.
However, you also want to view the date associated with it. To get around that, I can suggest to use mplcursors library which prepares an annotation when you click on the datapoint.
You can probably use eventplot in matplotlib to plot this. Here's my rookie attempt:
import pandas
import dateutil.parser # For parsing date
import matplotlib.pyplot as plt # for plotting
import mplcursors # For clickable annontation
filedata = pandas.read_csv('test.csv') # Read database
datelist = filedata['date'].to_list()
start = dateutil.parser.parse(datelist[0]).timestamp() # Timestamp of the first date. PS: I am assuming that the data is sorted, and I'm taking the first element only.
# Normalize all timestamps by subtracting the first.
timestamplist = [round(dateutil.parser.parse(x).timestamp() - start) for x in datelist]
plt.title('My timeline plot') # Title of the plot
linept = plt.eventplot(timestamplist, orientation='horizontal') # Insert a line for every timestamp
x = mplcursors.cursor(linept) # Adds annontation to every point
x.connect("add", lambda sel: sel.annotation.set_text(f'{datelist[timestamplist.index(round(sel.target[0]))]}')) # Display corresponding date
#plt.xticks(date(datelist))
plt.tick_params(labelleft=False, left=False) # Remove the Y axis on the left
plt.show() # Display plot
This gives such a plot:
Problem: I am trying to make a very simple bar chart in Matplotlib of a Pandas DataFrame. The DateTime index is causing confusion, however: Matplotlib does not appear to understand the Pandas DateTime, and is labeling the years incorrectly. How can I fix this?
Code
# Make date time series
index_dates = pd.date_range('2018-01-01', '2021-01-01')
# Make data frame with some random data, using the date time index
df = pd.DataFrame(index=index_dates,
data = np.random.rand(len(index_dates)),
columns=['Data'])
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
df.plot.bar(ax=ax)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Instead of showing up as 2018-2021, however, the years show up as 1970 - 1973.
I've already looked at the answers here, here, and documentation here. I know the date timeindex is in fact a datetime index because when I call df.info() it shows it as a datetime index, and when I call index_dates[0].year it returns 2018. How can I fix this? Thank you!
The problem is with mixing df.plot.bar and matplotlib here.
df.plot.bar sets tick locations starting from 0 (and assigns labels), while matplotlib.dates expects the locations to be the number of days since 1970-01-01 (more info here).
If you do it with matplotlib directly, it shows labels correctly:
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
plt.bar(x=df.index, height=df['Data'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Output:
I'm using plotly to create a stacked bar chart, with each bar representing a quarter end date. The data is pulled into a dataframe via SQL and the dates are parsed in the read_sql statement.
When graphing the dates on the x-axis are displayed as 10/01 instead of 9/30, 4/1 instead of 3/31, etc.
Any idea how I can just display the dates correctly?
Here's a sample
import plotly.express as px
fig = px.bar(df.groupby('dt_quarter').head(10), x='dt_quarter', y="amount", color="name", title="Stack Bar Test")
fig.update_layout(yaxis_title_text = 'Amount ($)',xaxis_title_text='Date', legend_title_text='Sector', legend_traceorder='reversed')
fig.show()
What I ended up doing was creating a new column in my dataframe that displays the date in 'QXYYYY' format (e.g. Q42020, etc.). I then used that as my x axis and it works fine.
For creating the new column:
alldata['quarter'] = pd.PeriodIndex(alldata.dt_quarter, freq='Q').astype('str')
I have a plotly graph of the EUR/JPY exchange rate across a few months in 15 minute time intervals, so as a result, there is no data from friday evenings to sunday evenings.
Here is a portion of the data, note the skip in the index (type: DatetimeIndex) over the weekend:
Plotting this data in plotly results in a gap over the missing dates Using the dataframe above:
import plotly.graph_objs as go
candlesticks = go.Candlestick(x=data.index, open=data['Open'], high=data['High'],
low=data['Low'], close=data['Close'])
fig = go.Figure(layout=cf_layout)
fig.add_trace(trace=candlesticks)
fig.show()
Ouput:
As you can see, there are gaps where the missing dates are. One solution I've found online is to change the index to text using:
data.index = data.index.strftime("%d-%m-%Y %H:%M:%S")
and plotting it again, which admittedly does work, but has it's own problem. The x-axis labels look atrocious:
I would like to produce a graph that plots a graph like in the second plot where there are no gaps, but the x-axis is displayed like as it is on the first graph. Or at least displayed in a much more concise and responsive format, as close to the first graph as possible.
Thank you in advance for any help!
Even if some dates are missing in your dataset, plotly interprets your dates as date values, and shows even missing dates on your timeline. One solution is to grab the first and last dates, build a complete timeline, find out which dates are missing in your original dataset, and include those dates in:
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
This will turn this figure:
Into this:
Complete code:
import plotly.graph_objects as go
from datetime import datetime
import pandas as pd
import numpy as np
# sample data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
# remove some dates to build a similar case as in the question
df = df.drop(df.index[75:110])
df = df.drop(df.index[210:250])
df = df.drop(df.index[460:480])
# build complete timepline from start date to end date
dt_all = pd.date_range(start=df['Date'].iloc[0],end=df['Date'].iloc[-1])
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d") for d in pd.to_datetime(df['Date'])]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d").tolist() if not d in dt_obs]
# make fiuge
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['AAPL.Open'], high=df['AAPL.High'],
low=df['AAPL.Low'], close=df['AAPL.Close'])
])
# hide dates with no values
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks)])
fig.update_layout(yaxis_title='AAPL Stock')
fig.show()
Just in case someone here wants to remove gaps for outside trading hours and weekends,
As shown below, using rangebreaks is the way to do it.
fig = go.Figure(data=[go.Candlestick(x=df['date'], open=df['Open'], high=df['High'], low=df['Low'], close=df['Close'])])
fig.update_xaxes(
rangeslider_visible=True,
rangebreaks=[
# NOTE: Below values are bound (not single values), ie. hide x to y
dict(bounds=["sat", "mon"]), # hide weekends, eg. hide sat to before mon
dict(bounds=[16, 9.5], pattern="hour"), # hide hours outside of 9.30am-4pm
# dict(values=["2020-12-25", "2021-01-01"]) # hide holidays (Christmas and New Year's, etc)
]
)
fig.update_layout(
title='Stock Analysis',
yaxis_title=f'{symbol} Stock'
)
fig.show()
here's Plotly's doc.
thanks for the amazing sample! works on daily data but with intraday / 5min data rangebreaks only leave one day on chart
# build complete timepline
dt_all = pd.date_range(start=df.index[0],end=df.index[-1], freq="5T")
# retrieve the dates that ARE in the original datset
dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in pd.to_datetime(df.index, format="%Y-%m-%d %H:%M:%S")]
# define dates with missing values
dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
To fix problem with intraday data, you can use the dvalue parameter of rangebreak with the right ms value.
For example, 1 hour = 3.6e6 ms, so use dvalue with this value.
Documentation here : https://plotly.com/python/reference/layout/xaxis/
fig.update_xaxes(rangebreaks=[dict(values=dt_breaks, dvalue=3.6e6)])
I am plotting a large dataset from a database using matplotlib and I use mpld3 to pass the figure to the browser. On the x-axis there are dates. The issue here is that while plotting without the mpld3 works perfect, when I use it, the dates don't appear correctly.
Here is my code:
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set if record != 0]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
ax.plot(dates,V)
#mpld3.fig_to_html(fig)
#mpld3.show(fig)
plt.show()
that shows the plot perfectly like this:
.
Now, if I comment out this line only:
plt.show()
and uncomment these two:
mpld3.fig_to_html(fig)
mpld3.show(fig)
the figure appears in the browser like this:
As you can see, the only issue is how the dates appear in the x-axis.
Is there any way to overcome it?
Before creating the HTML figure, add the following line to specify that it is a date axis:
ax.xaxis_date()
The answer above is correct.
If you are exclusively passing through dates, for example
df["Date"][0] = "2018-11-23"
Then you can also pass that through in the format native mpl format below, without making an ordinal value by using date2num.
df["Date"] = [dt.datetime.strptime(d, '%Y-%m-%d') for d in df["Date"]]
ax.plot(df["Dates"].tolist(), some_y_value_list)