Plot huge amount of data with dates in x-axis - python

I have a large database containing about 1 million entries. In one column there are dates in this form: '%Y-%m-%d %H:%M:%S. There is one entry every second.
I can select the period I want to plot from the database, e.g
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 21:42:09'
and the other column I want to plot in the Y axis.
As you can see in the specific example, from date1 to date2 it's about 86000 entries - or - points to plot.
Is there a way to plot efficiently these data using matplotlib, with the dates to show in the x axis?
Of course not all dates can be shown, but as the plotting period is dynamic (I insert into a web form the dates I want), is there a way to program it so that the plot will be the best possible every time?
So far, I can put all the dates in a list, and all the Y data in another list.
Below is my code so far, which plots the data but the X-axis labels are nothing near what I want.
from buzhug import Base
import datetime
import data_calculations as pd
import matplotlib.pyplot as plt
import matplotlib
import time
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-24 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
ax.plot_date(dates, V)
plt.grid(True)
plt.show()
And the result is
Plot
Thank you in advance
Edit:
I have fixed the issue by adding these lines:
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
However, now I want to pass the plot to a web server using the mpld3 plugin:
mpld3.plugins.get_plugins(fig)
mpld3.fig_to_html(fig)
mpld3.show()
While, without the plugin, the plot appears just fine, with the dates in the x axis, with the plugin it seems like it can't parse this line
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
into the html code and as a result the x axis label appears in unix time.
Anyone knows what's wrong with the plugin?

The problem is the large number of points (One every second is a bundle!). If you try to plot each point as a circle you will have these problems.
But it is easily solved by changing it to a line graph, changing:
ax.plot_date(dates, V, '-') # Where '-' means a line plot
For example:
# some sample data
x = np.linspace(0.1, np.pi, 86000)
y = np.cos(x)**2 * np.log(x)
plt.plot(x, y, 'o')
plt.plot(x, y, '-')

Related

Display only time on axis with matplotlib.plot_dates

So I've spent some time managing to plot data using time on the x-axis, and the way I've found to do that is to use matplotlib.plot_date after converting datetime objects to pltdates objects.
X_d = pltdates.date2num(X) # X is an array containing datetime objects
(...)
plt.plot_date(X_d, Y)
It works great, all my data is plotted properly.
Plot with dates appearing on x-axis
However, all the measures I want to plot were made the same day (17/12/2021), the only difference is the time.
As shown on the image, matplotlib still displays the number of the the day (17th) although it is the same within the whole plot.
Anyone has a clue how to keep only the time, still using matplotlib.plot_date?
Use this example:
import matplotlib
import matplotlib.pyplot as plt
from datetime import datetime
origin = ['2020-02-05 04:11:55',
'2020-02-05 05:01:51',
'2020-02-05 07:44:49']
a = [datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in origin]
b = ['35.764299', '20.3008', '36.94704']
x = matplotlib.dates.date2num(a)
formatter = matplotlib.dates.DateFormatter('%H:%M')
figure = plt.figure()
axes = figure.add_subplot(1, 1, 1)
axes.xaxis.set_major_formatter(formatter)
plt.setp(axes.get_xticklabels(), rotation=15)
axes.plot(x, b)
plt.show()

How to plot Date in X Axis, Time in Y axis with Pandas/Matplotlib and present time in HH:MM format as tick labels?

I have date in one column and time in another which I retrieved from database through pandas read_sql. The dataframe looks like below (there are 30 -40 rows in my daaframe). I want to plot them in a time series graph. If I want I should be in a position to convert that to Histogram as well.
COB CALV14
1 2019-10-04 07:04
2 2019-10-04 05:03
3 2019-10-03 16:03
4 2019-10-03 05:15
First I got different errors - like not numeric field to plot etc. After searching a lot,the closest post I could find is : Matplotlib date on y axis
I followed and got some result - However the problem is:
I have to follow number of steps (convert to str then list and then to matplot lib datetime format) before I can plot them. (Please refer the code I am using) There must be a smarter and more precise way to do this.
This does not show the time beside the axis the way they exactly appear in the data frame. (eg it should show 07:03, 05:04 etc)
New to python - will appreciate any help on this.
Code
ob_frame['COB'] = ob_frame.COB.astype(str)
ob_frame['CALV14'] = ob_frame.CALV14.astype(str)
date = ob_frame.COB.tolist()
time = ob_frame.CALV14.tolist()
y = mdates.datestr2num(date)
x = mdates.datestr2num(time)
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x, y)
ax.yaxis_date()
ax.xaxis_date()
fig.autofmt_xdate()
plt.show()
I found the answer to it.I did not need to convert the data retrieved from DB to string type. Rest of the issue I was thought to be getting for not using the right formatting for the tick labels. Here goes the complete code - Posting in case this will help anyone.
In this code I have altered Y and X axis : i:e I plotted dates in x axis and time in Y axis as it looked better.
###### Import all the libraries and modules needed ######
import IN_OUT_SQL as IS ## IN_OUT_SQL.py is the file where the SQL is stored
import cx_Oracle as co
import numpy as np
import Credential as cd # Credentia.py is the File Where you store the DB credentials
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
%matplotlib inline
###### Connect to DB, make the dataframe and prepare the x and y values to be plotted ######
def extract_data(query):
'''
This function takes the given query as input, Connects to the Databse, executes the SQL and
returns the result in a dataframe.
'''
cred = cd.POLN_CONSTR #POLN_CONSTR in the credential file stores the credential in '''USERNAME/PASSWORD#DB_NAME''' format
conn = co.connect(cred)
frame = pd.read_sql(query, con = conn)
return frame
query = IS.OUT_SQL
ob_frame = extract_data(query)
ob_frame.dropna(inplace = True) # Drop the rows with NaN values for all the columns
x = mdates.datestr2num(ob_frame['COB']) #COB is date in "01-MAR-2020" format- convert it to madates type
y = mdates.datestr2num(ob_frame['CALV14']) #CALV14 is time in "21:04" Format- convert it to madates type
###### Make the Timeseries plot of delivery time in y axis vs delivery date in x axis ######
fig, ax = plt.subplots(figsize=(15,8))
ax.clear() # Clear the axes
ax.plot(x, y, 'bo-', color = 'dodgerblue') #Plot the data
##Below two lines are to draw a horizontal line for 05 AM and 07 AM position
plt.axhline(y = mdates.date2num (pd.to_datetime('07:00')), color = 'red', linestyle = '--', linewidth = 0.75)
plt.axhline(y = mdates.date2num (pd.to_datetime('05:00')), color = 'green', linestyle = '--', linewidth = 0.75)
plt.xticks(x,rotation = '75')
ax.yaxis_date()
ax.xaxis_date()
#Below 6 lines are about setting the format with which I want my xor y ticks and their labels to be displayed
yfmt = mdates.DateFormatter('%H:%M')
xfmt = mdates.DateFormatter('%d-%b-%y')
ax.yaxis.set_major_formatter(yfmt)
ax.xaxis.set_major_formatter(xfmt)
ax.yaxis.set_major_locator(mdates.HourLocator(interval=1)) # Every 1 Hour
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # Every 1 Day
####### Name the x,y labels, titles and beautify the plot #######
plt.style.use('bmh')
plt.xlabel('\nCOB Dates')
plt.ylabel('Time of Delivery (GMT/BST as applicable)\n')
plt.title(" Data readiness time against COBs (Last 3 months)\n")
plt.rcParams["font.size"] = "12" #Change the font
# plt.rcParams["font.family"] = "Times New Roman" # Set the font type if needed
plt.tick_params(left = False, bottom = False, labelsize = 10) #Remove ticks, make tick labelsize 10
plt.box(False)
plt.show()
Output:

Plotting a times series using matplotlib with 24 hours on the y-axis

If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)

mpld3 does not display dates on x axis correctly

I am plotting a large dataset from a database using matplotlib and I use mpld3 to pass the figure to the browser. On the x-axis there are dates. The issue here is that while plotting without the mpld3 works perfect, when I use it, the dates don't appear correctly.
Here is my code:
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set if record != 0]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
ax.plot(dates,V)
#mpld3.fig_to_html(fig)
#mpld3.show(fig)
plt.show()
that shows the plot perfectly like this:
.
Now, if I comment out this line only:
plt.show()
and uncomment these two:
mpld3.fig_to_html(fig)
mpld3.show(fig)
the figure appears in the browser like this:
As you can see, the only issue is how the dates appear in the x-axis.
Is there any way to overcome it?
Before creating the HTML figure, add the following line to specify that it is a date axis:
ax.xaxis_date()
The answer above is correct.
If you are exclusively passing through dates, for example
df["Date"][0] = "2018-11-23"
Then you can also pass that through in the format native mpl format below, without making an ordinal value by using date2num.
df["Date"] = [dt.datetime.strptime(d, '%Y-%m-%d') for d in df["Date"]]
ax.plot(df["Dates"].tolist(), some_y_value_list)

Showing entire X Axis Ticks in Graph

I'm trying to have the tick labels of my Graph displayed fully, but I'm not getting the desired result, despite my efforts.
If I merely use autofmt_xdate(), the dates are correctly shown, but not for every data point plotted; however, if I force my x tick labels to be displayed by passing x by datetime objects to xtick(), It only seems to display the year.
fig1 = plt.figure(1)
# x is a list of datetime objects
plt.title('Portfolio Instruments')
plt.subplot(111)
plt.plot(x, y)
plt.xticks(fontsize='small')
plt.yticks([i * 5 for i in range(0, 15)])
fig1.autofmt_xdate()
plt.show()
Graph passing x to plt.xticks():
Graph without passing x to plt.xticks()
Where's my mistake? I can't find it.
Question
How do I plot all of my data points of x and format it to show the entire datetime object I'm passing the graph using autofmt_xdate()?
I have a list of datetime objects which I want to pass as the x values of my plot.
Pass the dates you want ticks at to xticks, and then set the major formatter for the x axis, using plt.gca().xaxis.set_major_formatter:
You can then use the DateFormatter from matplotlib.dates, and use a strftime format string to get the format in your question:
import matplotlib.dates as dates
fig1 = plt.figure(1)
# x is a list of datetime objects
plt.title('Portfolio Instruments')
plt.subplot(111)
plt.plot(x, y)
plt.xticks(x,fontsize='small')
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%b %d %Y'))
plt.yticks([i * 5 for i in range(0, 15)])
fig1.autofmt_xdate()
plt.show()
Note: I created the data for the above plot using the code below, so x is just a list of datetime objects for each weekday in a month (i.e. without weekends).
import numpy as np
from datetime import datetime,timedelta
start = datetime(2016, 1, 1)
end = datetime(2016, 2, 1)
delta = timedelta(days=1)
d = start
weekend = set([5, 6])
x = []
while d <= end:
if d.weekday() not in weekend:
x.append(d)
d += delta
y = np.random.rand(len(x))*70
I'm pretty sure I had a similar problem, and the way I solved it was to use the following code:
def formatFig():
date_formatter = DateFormatter('%H:%M:%S') #change the format here to whatever you like
plt.gcf().autofmt_xdate()
ax = plt.gca()
ax.xaxis.set_major_formatter(date_formatter)
max_xticks = 10 # sets the number of x ticks shown. Change this to number of data points you have
xloc = plt.MaxNLocator(max_xticks)
ax.xaxis.set_major_locator(xloc)
def makeFig():
plt.plot(xList,yList,color='blue')
formatFig()
makeFig()
plt.show(block=True)
It is a pretty simple example but you should be able to transfer the formatfig() part to use in your code.

Categories

Resources