python matplotlib how to map bar with str to date - python

This is a first time I am working with matplotlib and the task might be really trivial, but turns out to be hard for me.
I have following data: tickets numbers, dates when they were resolved and dates when whey were supposed to be resolved.
What I want to do is to draw a plot with tickets on x axis and dates on y axis. Then for every ticket I need to have 2 bars: first one with height equal to the date it was resolved, and another one with height equal to the date is was supposed to be resolved.
What I have right now:
a list of all tickets
tickets = []
a list with all dates (both resolved and expected)
all_dates = []
lists with sets of (ticket, datetime) for expected time and resolved time:
tickets_estimation = []
tickets_real = []
the code I am at right now:
plt.xticks(arange(len(tickets)), tickets, rotation=90)
plt.yticks(arange(len(all_dates)), all_dates)
plt.show()
which shows me following plot:
So how can I do the rest? Please pay attention that I need to map tickets numbers at X axis to the dates on Y axis.
Ok, here is is simplified at where I stack:
I cannot figure out how to draw even a single bar so its X axis will be a ticket and its Y axis will be a date of it's resolution.
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from numpy import arange
date = ['3 Jan 2013', '4 Jan 2013', '5 Jan 2013']
tickets = ['ENV-666', 'ENV-999', 'ENV-1000']
# Convert to matplotlib's internal date format.
y = mdates.datestr2num(date)
x = arange(len(tickets))
fig, ax = plt.subplots()
ax.plot(x,y)
ax.yaxis_date()
# Optional. Just rotates x-ticklabels in this case.
fig.autofmt_xdate()
plt.show()
This works find, it shows a plot with a line. But if I change
ax.plit(x,y) to ax.bar(x,y) I am receiving an error:
ValueError: ordinal must be >= 1

Bar is your friend: https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.bar.html
Then just use your fantasy to plot it as you like. For example a black line for the due date, red boxes for overdue and green boxes for on time.
supp = np.array([3,5,7])
res = np.array([1,7,7])
H = res-supp
H[H==0] = 0.01
ind = np.arange(len(D1))
plt.bar(ind[H>0], H[H>0], bottom=supp[H>0], color='r');
plt.bar(ind[H<0], H[H<0], bottom=supp[H<0], color='g');
plt.bar(ind, 0.1, bottom=supp, color='k');

Related

Measurement length for X and Y-axis

I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:

How to plot Date in X Axis, Time in Y axis with Pandas/Matplotlib and present time in HH:MM format as tick labels?

I have date in one column and time in another which I retrieved from database through pandas read_sql. The dataframe looks like below (there are 30 -40 rows in my daaframe). I want to plot them in a time series graph. If I want I should be in a position to convert that to Histogram as well.
COB CALV14
1 2019-10-04 07:04
2 2019-10-04 05:03
3 2019-10-03 16:03
4 2019-10-03 05:15
First I got different errors - like not numeric field to plot etc. After searching a lot,the closest post I could find is : Matplotlib date on y axis
I followed and got some result - However the problem is:
I have to follow number of steps (convert to str then list and then to matplot lib datetime format) before I can plot them. (Please refer the code I am using) There must be a smarter and more precise way to do this.
This does not show the time beside the axis the way they exactly appear in the data frame. (eg it should show 07:03, 05:04 etc)
New to python - will appreciate any help on this.
Code
ob_frame['COB'] = ob_frame.COB.astype(str)
ob_frame['CALV14'] = ob_frame.CALV14.astype(str)
date = ob_frame.COB.tolist()
time = ob_frame.CALV14.tolist()
y = mdates.datestr2num(date)
x = mdates.datestr2num(time)
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x, y)
ax.yaxis_date()
ax.xaxis_date()
fig.autofmt_xdate()
plt.show()
I found the answer to it.I did not need to convert the data retrieved from DB to string type. Rest of the issue I was thought to be getting for not using the right formatting for the tick labels. Here goes the complete code - Posting in case this will help anyone.
In this code I have altered Y and X axis : i:e I plotted dates in x axis and time in Y axis as it looked better.
###### Import all the libraries and modules needed ######
import IN_OUT_SQL as IS ## IN_OUT_SQL.py is the file where the SQL is stored
import cx_Oracle as co
import numpy as np
import Credential as cd # Credentia.py is the File Where you store the DB credentials
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
%matplotlib inline
###### Connect to DB, make the dataframe and prepare the x and y values to be plotted ######
def extract_data(query):
'''
This function takes the given query as input, Connects to the Databse, executes the SQL and
returns the result in a dataframe.
'''
cred = cd.POLN_CONSTR #POLN_CONSTR in the credential file stores the credential in '''USERNAME/PASSWORD#DB_NAME''' format
conn = co.connect(cred)
frame = pd.read_sql(query, con = conn)
return frame
query = IS.OUT_SQL
ob_frame = extract_data(query)
ob_frame.dropna(inplace = True) # Drop the rows with NaN values for all the columns
x = mdates.datestr2num(ob_frame['COB']) #COB is date in "01-MAR-2020" format- convert it to madates type
y = mdates.datestr2num(ob_frame['CALV14']) #CALV14 is time in "21:04" Format- convert it to madates type
###### Make the Timeseries plot of delivery time in y axis vs delivery date in x axis ######
fig, ax = plt.subplots(figsize=(15,8))
ax.clear() # Clear the axes
ax.plot(x, y, 'bo-', color = 'dodgerblue') #Plot the data
##Below two lines are to draw a horizontal line for 05 AM and 07 AM position
plt.axhline(y = mdates.date2num (pd.to_datetime('07:00')), color = 'red', linestyle = '--', linewidth = 0.75)
plt.axhline(y = mdates.date2num (pd.to_datetime('05:00')), color = 'green', linestyle = '--', linewidth = 0.75)
plt.xticks(x,rotation = '75')
ax.yaxis_date()
ax.xaxis_date()
#Below 6 lines are about setting the format with which I want my xor y ticks and their labels to be displayed
yfmt = mdates.DateFormatter('%H:%M')
xfmt = mdates.DateFormatter('%d-%b-%y')
ax.yaxis.set_major_formatter(yfmt)
ax.xaxis.set_major_formatter(xfmt)
ax.yaxis.set_major_locator(mdates.HourLocator(interval=1)) # Every 1 Hour
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # Every 1 Day
####### Name the x,y labels, titles and beautify the plot #######
plt.style.use('bmh')
plt.xlabel('\nCOB Dates')
plt.ylabel('Time of Delivery (GMT/BST as applicable)\n')
plt.title(" Data readiness time against COBs (Last 3 months)\n")
plt.rcParams["font.size"] = "12" #Change the font
# plt.rcParams["font.family"] = "Times New Roman" # Set the font type if needed
plt.tick_params(left = False, bottom = False, labelsize = 10) #Remove ticks, make tick labelsize 10
plt.box(False)
plt.show()
Output:

Using matplotlib to plot time on my x axis, however it starts from 0, not the actual start time

I have a CSV file with time data as follows:
Time,Download,Upload
17:00,7.51,0.9
17:15,6.95,0.6
17:31,5.2,0.46
I import the csv into a pandas dataframe: df = pd.read_csv('speeds.csv', parse_dates=['Time'])
And then plot the graph like so:
fig, ax = plt.subplots(figsize=(20, 7))
df.plot(ax=ax)
majorFmt = mdates.DateFormatter('%H:%M:')
minorFmt = mdates.DateFormatter('%H:%M:')
hour_locator = mdates.HourLocator()
min_locator = mdates.MinuteLocator(byminute=[15, 30, 45])
ax.xaxis.set_major_locator(hour_locator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10)
ax.xaxis.set_minor_locator(min_locator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90, fontsize=8)
However the final graph starts from 00:00 like so, although the CSV file starts at 17:00:
How comes the graph doesnt start at 17:00 also?
Another problem (while im here) is the major lables dont line up with the major markers, they are shifted left slightly how would I fix that?
First question - graph doesn't start at 17:00:
Your csv only gives times (no dates) and it rolls over midnight. Pandas implicitely adds the current date to all times, so that times after midnight, which pertain to the next day, get the same date a times before midnight. Therefore you'll have to adjust the date part:
days = 0
df['Datetime']=df['Time']
for i in df.index:
if i > 0 and df.at[i,'Time'] < df.at[i-1,'Time']:
days += 1
df.at[i,'Datetime'] = df.at[i,'Time'] + DateOffset(days=days)
and then use the Datetime column on your x axis.
Second question - shifted major markers:
Set horizontal alingment
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, fontsize=10, horizontalalignment='center')

Using matplotlib to plot a distribution of time occurrences. I would like the x axis to have hours (12:00 PM) rather than integers (12)

Here's my plot, which is generated using the following code:
bins = np.linspace(0,24,25)
plt.hist(hours,bins, edgecolor='black', linewidth = 1.2, color = 'red')
I would like the x axis to show 24 entries, from 12:00AM to 11:00 PM ideally rotated left 90 degrees.
I see two paths: convert the actual data to time values so the histogram reads in time values or simply add a custom x axis with 12:00AM, 1:00 AM, etc. What's the easiest / cleanest approach here? I'm not familiar with how to do either. For reference, "hours" is a int64 array.
Here's a working example:
import numpy as np
import matplotlib.pyplot as plt
bins = np.arange(0,25)
hours = np.random.rand(50)*25
fig, ax = plt.subplots()
labels = []
for i in bins:
if i<12:
labels.append("{}:00AM".format(i))
elif i == 12:
labels.append("12:00PM")
else:
labels.append("{}:00PM".format(i-12))
ax.hist(hours, bins)
ax.set_xticks(bins + 0.5) # 0.5 is half of the "1" auto width
ax.set_xticklabels(labels, rotation='vertical')
fig.subplots_adjust(bottom = 0.2) # makes space for the vertical
#labels.
plt.show()
which gives:
I've changed the linspace to arange as it returns integers
To get a nice time format on the xaxis, the idea could be to calculate the histogram in terms of numbers which can be interpreted as datetimes.
In case you only have times, you would not mind too much about the actual date. So dividing the data by 24 gives fraction of a day. Since matplotlib interpretes numbers as days since 0001-01-01 UTC, plus 1, one then needs to add some whole number >=2 not to run into trouble with negative dates.
Then usual matplotlib.dates locators and formatters can be used to get nice ticklabels. "%I:%M %p" would give the time representation in hours by 12 with am/pm appendix.
import numpy as np; np.random.seed(3)
import matplotlib.pyplot as plt
import matplotlib.dates
data = np.random.normal(12,7, size=200)
data = data[(data >=0) & (data <24)]
f = lambda x: 2+x/24.
bins=np.arange(25)
plt.hist(f(data), bins=f(bins))
plt.gca().xaxis.set_major_locator(matplotlib.dates.HourLocator())
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%I:%M %p"))
plt.setp(plt.gca().get_xticklabels(),rotation=90)
plt.tight_layout()
plt.show()
(This would hence be the histogram of datetimes of the 2nd of january 0001.)

Plot huge amount of data with dates in x-axis

I have a large database containing about 1 million entries. In one column there are dates in this form: '%Y-%m-%d %H:%M:%S. There is one entry every second.
I can select the period I want to plot from the database, e.g
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-23 21:42:09'
and the other column I want to plot in the Y axis.
As you can see in the specific example, from date1 to date2 it's about 86000 entries - or - points to plot.
Is there a way to plot efficiently these data using matplotlib, with the dates to show in the x axis?
Of course not all dates can be shown, but as the plotting period is dynamic (I insert into a web form the dates I want), is there a way to program it so that the plot will be the best possible every time?
So far, I can put all the dates in a list, and all the Y data in another list.
Below is my code so far, which plots the data but the X-axis labels are nothing near what I want.
from buzhug import Base
import datetime
import data_calculations as pd
import matplotlib.pyplot as plt
import matplotlib
import time
date1 = '2015-04-22 20:28:50'
date2 = '2015-04-24 19:42:09'
db = Base('monitor').open()
result_set = db.select(['MeanVoltage','time'],"time>=start and time<=stop", start=date1, stop=date2)
V = [float(record.MeanVoltage) for record in result_set]
Date = [str(record.time) for record in result_set]
dates = [datetime.datetime.strptime(record, '%Y-%m-%d %H:%M:%S') for record in Date]
dates = matplotlib.dates.date2num(dates)
fig, ax = plt.subplots()
ax.plot_date(dates, V)
plt.grid(True)
plt.show()
And the result is
Plot
Thank you in advance
Edit:
I have fixed the issue by adding these lines:
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
plt.gcf().autofmt_xdate()
However, now I want to pass the plot to a web server using the mpld3 plugin:
mpld3.plugins.get_plugins(fig)
mpld3.fig_to_html(fig)
mpld3.show()
While, without the plugin, the plot appears just fine, with the dates in the x axis, with the plugin it seems like it can't parse this line
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y %H:%M:%S' ))
into the html code and as a result the x axis label appears in unix time.
Anyone knows what's wrong with the plugin?
The problem is the large number of points (One every second is a bundle!). If you try to plot each point as a circle you will have these problems.
But it is easily solved by changing it to a line graph, changing:
ax.plot_date(dates, V, '-') # Where '-' means a line plot
For example:
# some sample data
x = np.linspace(0.1, np.pi, 86000)
y = np.cos(x)**2 * np.log(x)
plt.plot(x, y, 'o')
plt.plot(x, y, '-')

Categories

Resources