How to extract hour:minute from a datetime stamp in Python - python

I have dataframe as given below: df=
POA ... Inverter efficiency
2019-01-25 08:00:00 20.608713 ... 0.708626
2019-01-29 08:00:00 200.250137 ... 0.017787
2019-01-29 08:30:00 347.699615 ... 0.000000
2019-01-29 09:00:00 492.822662 ... 0.000000
2019-01-29 09:30:00 620.336243 ...
.
.
2019-03-07 13:00:00 1151.468384 ... 1.067493
2019-03-07 13:30:00 1119.876831 ... 2.311577
2019-03-07 14:00:00 1038.760864 ... 3.395081
I want to plot 24 hours plot for all days. My code
plot(df.index.hour,df['POA'])
Result is:
However, there is a data at 08:30, 09:30,..., etc. But it is not reflected in plot. In fact, these intermediary hour data points are combined with 08, 09hr, etc data. So, my question is, how to show 08.30, 09.30,...,etc data as well on plot? (Looks like I have to extract both hour and minute from same datetime)
My accepted below answer gives following plot and this is what I wanted. But, x-axis ticks are clubbed together. They don't appear as in my first above plot. How to correct x-axis ticks in my second plot?: '

#rng = pd.date_range('1/5/2018 00:00', periods=5, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 5)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
plt.plot(x, df['POA'])
plt.xticks(x, labels)
Steps:
labels = df.index.strftime('%H:%M') => Convert the datetime to "Hours:minutes" format to use as x labels
x = np.arange(len(labels)) => Create a dummy x axis for matplotlib
plt.plot(x, df['POA']) => Make the plot
plt.xticks(x, labels) => Replace the x labels with datetime
Assumption: The datetime index is sorted, if not the graph will be messed up. If the index is not in sorted order then sort it before plotting for correct results.
We can further enhance the x axis to include seconds, dates, etc by using the appropriate string formatter in df.index.strftime
Solution with skipping x-ticks to avoid clubbed x labels
#rng = pd.date_range('1/5/2018 00:00', periods=50, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 50)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
fig, ax = plt.subplots()
plt.plot(x, df['POA'])
plt.xticks(x, labels)
skip_every_n = 10
for i, x_label in enumerate(ax.xaxis.get_ticklabels()):
if i % skip_every_n != 0:
x_label.set_visible(False)

Related

Converting datetime.time to int for use in regression model / initial scatter plot

I am looking to change a datetime.time column in a pandas dataframe to an int value that isnt a unix like time i.e. from 1970 epoch.
Example of df.head(3):
trans_date_trans_time
amt
category
city_pop
is_fraud
Time of Day
2019-01-01 00:00:18
4.97
misc_net
3495
0
00:00:18
2019-01-01 00:00:44
107.23
grocery_pos
149
0
00:00:44
2019-01-01 00:00:51
220.11
entertainment
4154
0
00:00:51
So i want the 'Time of Day' column to read like an integer that can run along an axis of a simple scatter plot against 'amt'.
When I try:
y = int(df_full_slim['Time of Day'])
plt.scatter(x, y)
plt.show()
or simply:
y = df_full_slim['Time of Day']
plt.scatter(x, y)
plt.show()
it doesnt work as it cant plot a datetime.time type on a plt.
How do I get the time in a format that will run along an axis of a plot?
Thanks in advance..
You can plot without conversion by calling the .plot() method of the dataframe:
df_full_slim.plot(x='Time of Day', y='amt')

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

Arrange pandas DataFrame for color Plotting

I have a dataframe which looks like this (left column is the index):
YYYY-MO-DD HH-MI-SS_SSS ATMOSPHERIC PRESSURE (hPa) mean
2016-11-07 14:00:00 1014.028782
2016-11-07 15:00:00 1014.034111
.... ....
2016-11-30 09:00:00 1006.516436
2016-11-30 10:00:00 1006.216156
Now I want to plot a colormap with this data - so I want to create an X (horizontal axis) to be just the dates:
2016-11-07, 2016-11-08,...,2016-11-30
and the Y (Vertical axis) to be the time:
00:00:00, 01:00:00, 02:00:00, ..., 23:00:00
And finally the Z (color map) to be the pressure data for each date and time [f(x,y)].
How can I arrange the data for this kind of plotting ?
Thank you !
With test data prepared like so:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
samples = 24 * 365
index = pd.date_range('2017-01-01', freq='1H', periods=samples)
data = pd.DataFrame(np.random.rand(samples), index=index, columns=['data'])
I would do something like this:
data = data.reset_index()
data['date'] = data['index'].apply(lambda x: x.date())
data['time'] = data['index'].apply(lambda x: x.time())
pivoted = data.pivot(index='time', columns='date', values='data')
fig, ax = plt.subplots(1, 1)
ax.imshow(pivoted, origin='lower', cmap='viridis')
plt.show()
Which produces:
To improve the axis labeling, this is a start:
ax.set_yticklabels(['{:%H:%M:%S}'.format(t) for t in data['time'].unique()])
ax.set_xticklabels(['{:%Y-%m-%d}'.format(t) for t in data['date'].unique()])
but you'll need to figure out how to choose how often a label appears with set_xticks() and set_yticks()

Pandas: bar plot with multiIndex dataframe

I have a pandas DataFrame with a TIMESTAMP column (not the index), and the timestamp format is as follows:
2015-03-31 22:56:45.510
I also have columns called CLASS and AXLES. I would like to compute the count of records for each month separately for each unique value of AXLES (AXLES can take an integer value between 3-12).
I came up with a combination of resample and groupby:
resamp = dfWIM.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
This seems to give me a multiIndex dataframe object, as shown below.
In [72]: resamp
Out [72]:
AXLES TIMESTAMP
3 2014-07-31 5517
2014-08-31 31553
2014-09-30 42816
2014-10-31 49308
2014-11-30 44168
2014-12-31 45518
2015-01-31 54782
2015-02-28 52166
2015-03-31 47929
4 2014-07-31 3147
2014-08-31 24810
2014-09-30 39075
2014-10-31 46857
2014-11-30 42651
2014-12-31 48282
2015-01-31 42708
2015-02-28 43904
2015-03-31 50033
From here, how can I access different components of this multiIndex object to create a bar plot for the following conditions?
show data when AXLES = 3
show x ticks in the Month - Year format (no days, hours, minutes etc.)
Thanks!
EDIT: Following code gives me the plot, but I could not change the xtick formatting to MM-YY.
resamp[3].plot(kind='bar')
EDIT 2 below is a code snippet that generates a small sample of the data similar to what I have:
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
resamp[3].plot(kind='bar')
EDIT 3:
Here below is the solution:
A.Plot the whole resampled dataframe (based on #Ako 's suggestion):
df = resamp.unstack(0)
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)
B.Plot an individual index from the resampled dataframe (based on #Alexander 's suggestion):
df = resamp[3]
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)
You could generate and set the labels explicitly using ax.xaxis.set_major_formatter with a ticker.FixedFormatter. This will allow you to keep your DataFrame's MultiIndex with timestamp values, while displaying the timestamps in the desired %m-%Y format:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
ax = resamp[3].plot(kind='bar')
ticklabels = [timestamp.strftime('%m-%Y') for axle, timestamp in resamp.index]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: ticklabels[int(x)]))
plt.gcf().autofmt_xdate()
plt.show()
yields
The following should work, but it is difficult to test without some data.
Start by resetting your index to get access to the TIMESTAMP column. Then use strftime to format it to your desired text representation (e.g. mm-yy). Finally, reset the index back to AXLES and TIMESTAMP.
df = resamp.reset_index()
df['TIMESTAMP'] = [ts.strftime('%m-%y') for ts in df.TIMESTAMP]
df.set_index(['AXLES', 'TIMESTAMP'], inplace=True)
>>> df.xs(3, level=0).plot(kind='bar')

Matplotlib: Making a line graph's datetime x axis labels look like Excel

I have a simple pandas DataFrame with yearly values that I am plotting as a line graph:
import matplotlib.pyplot as plt
import pandas as pd
>>>df
a b
2010-01-01 9.7 9.0
2011-01-01 8.8 14.2
2012-01-01 8.4 7.6
2013-01-01 9.6 8.4
2014-01-01 8.2 5.5
The expected format for the X axis is to use no margins for the labels:
fig = plt.figure(0)
ax = fig.add_subplot(1, 1, 1)
df.plot(ax = ax)
But I would like to force the values to plot in the middle of the year range, like as done in excel:
I have tried setting the x axis margins:
ax.margins(xmargin = 1)
But can see no difference.
If you just want to move the dates, you could try adding this line at the end:
ax.set_xlim(ax.get_xlim()[0] - 0.5, ax.get_xlim()[1] + 0.5)
If you need to format the dates as well you could either modify your index or make changes in the plotted ticks like so:
(presuming that you df.index is a datetime object)
ax.set_xticklabels(df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y')))
This will format the dates to look like your Excel example.
Or you could change your index to look like you want and then call .plot():
df.index = df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y'))
print df.index.tolist()
['01/01/2010', '01/01/2011', '01/01/2012', '01/01/2013', '01/01/2014']
And, if you index is not datetime you need to convert it first like this:
df.index = pd.to_datetime(df.index)

Categories

Resources