I am plotting a large dataset from a database using matplotlib and I use mpld3 to pass the figure to the browser. On the x-axis there are dates. The issue here is that while plotting without the mpld3 works perfect, when I use it, the dates don't appear correctly.
Here is my code:
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set if record != 0]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
ax.plot(dates,V)
#mpld3.fig_to_html(fig)
#mpld3.show(fig)
plt.show()
that shows the plot perfectly like this:
.
Now, if I comment out this line only:
plt.show()
and uncomment these two:
mpld3.fig_to_html(fig)
mpld3.show(fig)
the figure appears in the browser like this:
As you can see, the only issue is how the dates appear in the x-axis.
Is there any way to overcome it?
Before creating the HTML figure, add the following line to specify that it is a date axis:
ax.xaxis_date()
The answer above is correct.
If you are exclusively passing through dates, for example
df["Date"][0] = "2018-11-23"
Then you can also pass that through in the format native mpl format below, without making an ordinal value by using date2num.
df["Date"] = [dt.datetime.strptime(d, '%Y-%m-%d') for d in df["Date"]]
ax.plot(df["Dates"].tolist(), some_y_value_list)
Related
I tried many things, but the X-axis on plotly, which contains times in the format of HH:MM:SS in a pandas dataframe doesn't want to change.
I tried the following:
Trying to change the datatype of the dataframe column to pydatetime
adding the pydates to a list and applying strftime with '%Y-%m-%d %H:%M:%S' (to convert it later) and '%H:%M:%S'
tried the basic stuff with tickformat="%H:%M" in fig.update_xaxes - Method
and tickformat='%H:%M' in fig.update_layout - Method (also with dictionary) for xaxis parameter
also tried this: fig.layout['xaxis_tickformat'] = "%H:%M"
also tried to apply the dates to strftime with "%H:%M" but the values are a type of aggregated than (or only one value for the certain minute is picked)
The results change: Sometimes all datapoints disappear (they are on the left or right corner) and if they not disappear, the x-axis shows the values for 15:23:01 for example.
Below is a code snippet of my method:
pd.options.plotting.backend = "plotly"
dates = pd.to_datetime(dataframe.Time, format='%H:%M:%S')
dates = dates.dt.to_pydatetime()
datelist = []
for date in dates:
date = datetime.datetime.strftime(date, '%Y-%m-%d %H:%M:%S')
datelist.append(date)
# dates['Time'] = pd.Series(dates, dtype=object)
# dates = dates.apply(lambda x: x.strftime('%H:%M:%S'))
print(dataframe.Time)
print(dates)
df = dataframe
# also tried here with x = dataframe.Time with same results
fig = px.line(df, x=datelist, y=["Foo1", "Foo2"], title='Foo')
# changing Y-Axis ticks to 10 minutes
fig.update_xaxes(tickangle=45,
# type="date",
tickmode='array',
tickformat="%H:%M",
tickvals=df["Time"][0::600])
fig.update_layout(
title="Foo", title_x=0.5,
xaxis_title="Time",
xaxis=dict(
tickformat='%H:%M'
),
# xaxis_tickformat = "%H:%M",
yaxis_title="<i>g</i>",
legend_title="Sensor",
font=dict(
family="Courier New, monospace",
size=50,
color="RebeccaPurple"
)
)
# fig.layout['xaxis_tickformat'] = "%H:%M"
fig.show()
I hope you can help me as I followed the instructions on the plotly website and googled, but plotly stuff seems to be rare.
Thanks in advice, I can provide more infos if needed.
I am new to Python and learning data visualization using matplotlib.
I am trying to plot Date/Time vs Values using matplotlib from this CSV file:
https://drive.google.com/file/d/1ex2sElpsXhxfKXA4ZbFk30aBrmb6-Y3I/view?usp=sharing
Following is the code snippet which I have been playing around with:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
years = mdates.YearLocator()
months = mdates.MonthLocator()
days = mdates.DayLocator()
hours = mdates.HourLocator()
minutes = mdates.MinuteLocator()
years_fmt = mdates.DateFormatter('%H:%M')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
# format the ticks
ax.xaxis.set_major_locator(minutes)
ax.xaxis.set_major_formatter(years_fmt)
ax.xaxis.set_minor_locator(hours)
datemin = min(data['Date/Time'])
datemax = max(data['Date/Time'])
ax.set_xlim(datemin, datemax)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.format_ydata = lambda x: '%1.2f' % x # format the price.
ax.grid(True)
fig.autofmt_xdate()
plt.show()
The code is plotting the graph but it is not labeling the X-Axis and also giving some unknown values (on mouse over) for x on the bottom right corner as shown in the below screenshot:
Screenshot of matplotlib figure window
Can someone please suggest what changes are needed to plot the x-axis dates and also make the correct values appear when I move the cursor over the graph?
Thanks
I haven't used matplotlib. Instead I used pandas plotting
import pandas as pd
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
ax = data.plot.line(x='Date/Time', y='Discharge')
Here, you need to convert the Date/Time to pandas datetime type.
The main issue you have there is that the date formats are mixed up - your data uses '%d.%m.%Y %H:%M', but you set '%Y.%m.%d %H:%M' and this is why you saw 'rubbish' values in x ticks labels. Anyway the number of lines in your code can be reduced heavily if you convert your Date/Time column to timestamps, ie.:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.tick_params(axis='x', rotation=45)
ax.grid(True)
fig.autofmt_xdate()
plt.show()
Note that the format of labels in the plot will depend on the zoom level, so you will need to enlarge a portion of the graph to see hours and minutes in the tick labels, but the cursor locator on the bottom bar of the window should be always displaying the detailed timestamp under the cursor.
I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:
I am trying to plot a graph with dates (pandas datetime) on the x axis. However, they are plotting in numerical format instead (showing up as exponents).
Example of dates:
0 2014-05-01
1 2014-05-02
2 2014-05-03
3 2014-05-04
4 2014-05-05
Name: date, dtype: datetime64[ns]
Code for plotly:
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
data = [trace1]
py.iplot(data)
My x-axis:
Not sure how to fix this??
You need to add layout and specify parameter xaxis in it. Such as here.
So try this:
# Create trace
trace1 = go.Scatter(x = df_iso_h.date,
y=del18_f_hum,
mode = 'markers')
# Add trace in data
data = [trace1]
# Create layout. With layout you can customize plotly plot
layout = dict(title = 'Scatter',
# Add what you want to see at xaxis
xaxis = df_iso_h.date
)
#Do not forget added layout to fig!
fig = dict(data=data, layout=layout)
# Plot scatter
py.iplot(data, filename="scatterplot")
This should help you.
Update: Try to convert datetime column with strftime (new column should be in object format!):
df_iso_h["date"] = df_iso_h["date"].dt.strftime("%d-%m-%Y")
If not worked, add this column in xaxis. Maybe plotly do not support datetime format yyyy-mm-dd... Notice, you xaxis will be looks like 01-05-2014
Figured it out... Plotly does not take pandas datetime, so I had to convert my pandas datetime to python datetime.datetime or datetime.date.
It seems that this was a regression introduced in plotly.py Version 3.2.0 and has been fixed in Version 3.2.1
You can now simply pass the pandas datetime column to plotly and it will handle the proper conversion for you like in the past.
See https://github.com/plotly/plotly.py/issues/1160
I have a large database containing about 1 million entries. In one column there are dates in this form: '%Y-%m-%d %H:%M:%S. There is one entry every second.
I can select the period I want to plot from the database, e.g
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 21:42:09'
and the other column I want to plot in the Y axis.
As you can see in the specific example, from date1 to date2 it's about 86000 entries - or - points to plot.
Is there a way to plot efficiently these data using matplotlib, with the dates to show in the x axis?
Of course not all dates can be shown, but as the plotting period is dynamic (I insert into a web form the dates I want), is there a way to program it so that the plot will be the best possible every time?
So far, I can put all the dates in a list, and all the Y data in another list.
Below is my code so far, which plots the data but the X-axis labels are nothing near what I want.
from buzhug import Base
import datetime
import data_calculations as pd
import matplotlib.pyplot as plt
import matplotlib
import time
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-24 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
ax.plot_date(dates, V)
plt.grid(True)
plt.show()
And the result is
Plot
Thank you in advance
Edit:
I have fixed the issue by adding these lines:
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
However, now I want to pass the plot to a web server using the mpld3 plugin:
mpld3.plugins.get_plugins(fig)
mpld3.fig_to_html(fig)
mpld3.show()
While, without the plugin, the plot appears just fine, with the dates in the x axis, with the plugin it seems like it can't parse this line
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
into the html code and as a result the x axis label appears in unix time.
Anyone knows what's wrong with the plugin?
The problem is the large number of points (One every second is a bundle!). If you try to plot each point as a circle you will have these problems.
But it is easily solved by changing it to a line graph, changing:
ax.plot_date(dates, V, '-') # Where '-' means a line plot
For example:
# some sample data
x = np.linspace(0.1, np.pi, 86000)
y = np.cos(x)**2 * np.log(x)
plt.plot(x, y, 'o')
plt.plot(x, y, '-')