How to make time column to be x-axis of a plot? - python

I am reading an excel file using pandas and trying to read one of its sheets to plot some results. Below is my code;
df = pd.read_excel('Base_Case-Position.xlsx',sheetname = 'Sheet1', header=0, parse_col="A:E")
print df
Time WB NB WBO SBO
0 09:00:00 0.242661 0.839820 0.449634 0.484678
1 10:00:00 0.809247 1.545173 1.129107 1.147414
2 11:00:00 1.519679 2.051029 1.766170 1.699770
3 12:00:00 1.748682 2.291056 2.018005 1.879778
4 13:00:00 1.790151 2.384782 2.123876 1.913225
5 14:00:00 1.966337 2.612614 2.344493 2.094139
6 15:00:00 2.295261 3.030992 2.686752 2.503890
7 16:00:00 2.412628 3.232904 2.772683 2.737191
8 17:00:00 2.476746 3.354741 2.781410 2.923059
I would like to plot columns WB, NB, WBO and SBO against Time and would like to add Time column as xtick.
Below is the way I am doing it at the moment but does not OK to me as I am manually setting up labels. I have created a list for Xticks. Is there a better way of doing this?
%matplotlib inline
import matplotlib.pyplot as plt
a = np.arange(9)
plt.scatter(a,df.WB,color='r',alpha=1)
plt.plot(a,df.NB, color='g',alpha=1)
plt.plot(a,df.WBO, color='k',alpha=1)
plt.plot(a,df.SBO,color='m',alpha=1)
plt.xlabel('Time (hr)')
plt.ylabel('Energy Consumption (kWh)')
labels = ['9:00', '10:00', '11:00', '12:00','13:00', '14:00', '15:00', '16:00','17:00']
plt.xticks(a,labels,rotation='horizontal')
plt.grid(True)
plt.tick_params(which='major', length=4)
plt.tick_params(which='minor', length=4)
plt.show()

Well, this would be the simplest way to do it:
df.set_index('Time').plot()
plt.xticks(map(str, df['Time']), rotation=50)
plt.show()

Related

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

How to extract hour:minute from a datetime stamp in Python

I have dataframe as given below: df=
POA ... Inverter efficiency
2019-01-25 08:00:00 20.608713 ... 0.708626
2019-01-29 08:00:00 200.250137 ... 0.017787
2019-01-29 08:30:00 347.699615 ... 0.000000
2019-01-29 09:00:00 492.822662 ... 0.000000
2019-01-29 09:30:00 620.336243 ...
.
.
2019-03-07 13:00:00 1151.468384 ... 1.067493
2019-03-07 13:30:00 1119.876831 ... 2.311577
2019-03-07 14:00:00 1038.760864 ... 3.395081
I want to plot 24 hours plot for all days. My code
plot(df.index.hour,df['POA'])
Result is:
However, there is a data at 08:30, 09:30,..., etc. But it is not reflected in plot. In fact, these intermediary hour data points are combined with 08, 09hr, etc data. So, my question is, how to show 08.30, 09.30,...,etc data as well on plot? (Looks like I have to extract both hour and minute from same datetime)
My accepted below answer gives following plot and this is what I wanted. But, x-axis ticks are clubbed together. They don't appear as in my first above plot. How to correct x-axis ticks in my second plot?: '
#rng = pd.date_range('1/5/2018 00:00', periods=5, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 5)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
plt.plot(x, df['POA'])
plt.xticks(x, labels)
Steps:
labels = df.index.strftime('%H:%M') => Convert the datetime to "Hours:minutes" format to use as x labels
x = np.arange(len(labels)) => Create a dummy x axis for matplotlib
plt.plot(x, df['POA']) => Make the plot
plt.xticks(x, labels) => Replace the x labels with datetime
Assumption: The datetime index is sorted, if not the graph will be messed up. If the index is not in sorted order then sort it before plotting for correct results.
We can further enhance the x axis to include seconds, dates, etc by using the appropriate string formatter in df.index.strftime
Solution with skipping x-ticks to avoid clubbed x labels
#rng = pd.date_range('1/5/2018 00:00', periods=50, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 50)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
fig, ax = plt.subplots()
plt.plot(x, df['POA'])
plt.xticks(x, labels)
skip_every_n = 10
for i, x_label in enumerate(ax.xaxis.get_ticklabels()):
if i % skip_every_n != 0:
x_label.set_visible(False)

Arrange pandas DataFrame for color Plotting

I have a dataframe which looks like this (left column is the index):
YYYY-MO-DD HH-MI-SS_SSS ATMOSPHERIC PRESSURE (hPa) mean
2016-11-07 14:00:00 1014.028782
2016-11-07 15:00:00 1014.034111
.... ....
2016-11-30 09:00:00 1006.516436
2016-11-30 10:00:00 1006.216156
Now I want to plot a colormap with this data - so I want to create an X (horizontal axis) to be just the dates:
2016-11-07, 2016-11-08,...,2016-11-30
and the Y (Vertical axis) to be the time:
00:00:00, 01:00:00, 02:00:00, ..., 23:00:00
And finally the Z (color map) to be the pressure data for each date and time [f(x,y)].
How can I arrange the data for this kind of plotting ?
Thank you !
With test data prepared like so:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
samples = 24 * 365
index = pd.date_range('2017-01-01', freq='1H', periods=samples)
data = pd.DataFrame(np.random.rand(samples), index=index, columns=['data'])
I would do something like this:
data = data.reset_index()
data['date'] = data['index'].apply(lambda x: x.date())
data['time'] = data['index'].apply(lambda x: x.time())
pivoted = data.pivot(index='time', columns='date', values='data')
fig, ax = plt.subplots(1, 1)
ax.imshow(pivoted, origin='lower', cmap='viridis')
plt.show()
Which produces:
To improve the axis labeling, this is a start:
ax.set_yticklabels(['{:%H:%M:%S}'.format(t) for t in data['time'].unique()])
ax.set_xticklabels(['{:%Y-%m-%d}'.format(t) for t in data['date'].unique()])
but you'll need to figure out how to choose how often a label appears with set_xticks() and set_yticks()

Pandas dataframe groupby plot

I have a dataframe which is structured as:
Date ticker adj_close
0 2016-11-21 AAPL 111.730
1 2016-11-22 AAPL 111.800
2 2016-11-23 AAPL 111.230
3 2016-11-25 AAPL 111.790
4 2016-11-28 AAPL 111.570
...
8 2016-11-21 ACN 119.680
9 2016-11-22 ACN 119.480
10 2016-11-23 ACN 119.820
11 2016-11-25 ACN 120.740
...
How can I plot based on the ticker the adj_close versus Date?
Simple plot,
you can use:
df.plot(x='Date',y='adj_close')
Or you can set the index to be Date beforehand, then it's easy to plot the column you want:
df.set_index('Date', inplace=True)
df['adj_close'].plot()
If you want a chart with one series by ticker on it
You need to groupby before:
df.set_index('Date', inplace=True)
df.groupby('ticker')['adj_close'].plot(legend=True)
If you want a chart with individual subplots:
grouped = df.groupby('ticker')
ncols=2
nrows = int(np.ceil(grouped.ngroups/ncols))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(12,4), sharey=True)
for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
grouped.get_group(key).plot(ax=ax)
ax.legend()
plt.show()
Similar to Julien's answer above, I had success with the following:
fig, ax = plt.subplots(figsize=(10,4))
for key, grp in df.groupby(['ticker']):
ax.plot(grp['Date'], grp['adj_close'], label=key)
ax.legend()
plt.show()
This solution might be more relevant if you want more control in matlab.
Solution inspired by: https://stackoverflow.com/a/52526454/10521959
The question is How can I plot based on the ticker the adj_close versus Date?
This can be accomplished by reshaping the dataframe to a wide format with .pivot or .groupby, or by plotting the existing long form dataframe directly with seaborn.
In the following sample data, the 'Date' column has a datetime64[ns] Dtype.
Convert the Dtype with pandas.to_datetime if needed.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
Imports and Sample Data
import pandas as pd
import pandas_datareader as web # for sample data; this can be installed with conda if using Anaconda, otherwise pip
import seaborn as sns
import matplotlib.pyplot as plt
# sample stock data, where .iloc[:, [5, 6]] selects only the 'Adj Close' and 'tkr' column
tickers = ['aapl', 'acn']
df = pd.concat((web.DataReader(ticker, data_source='yahoo', start='2020-01-01', end='2022-06-21')
.assign(ticker=ticker) for ticker in tickers)).iloc[:, [5, 6]]
# display(df.head())
Date Adj Close ticker
0 2020-01-02 73.785904 aapl
1 2020-01-03 73.068573 aapl
2 2020-01-06 73.650795 aapl
3 2020-01-07 73.304420 aapl
4 2020-01-08 74.483604 aapl
# display(df.tail())
Date Adj Close ticker
1239 2022-06-14 275.119995 acn
1240 2022-06-15 281.190002 acn
1241 2022-06-16 270.899994 acn
1242 2022-06-17 275.380005 acn
1243 2022-06-21 282.730011 acn
pandas.DataFrame.pivot & pandas.DataFrame.plot
pandas plots with matplotlib as the default backend.
Reshaping the dataframe with pandas.DataFrame.pivot converts from long to wide form, and puts the dataframe into the correct format to plot.
.pivot does not aggregate data, so if there is more than 1 observation per index, per ticker, then use .pivot_table
Adding subplots=True will produce a figure with two subplots.
# reshape the long form data into a wide form
dfp = df.pivot(index='Date', columns='ticker', values='Adj Close')
# display(dfp.head())
ticker aapl acn
Date
2020-01-02 73.785904 203.171112
2020-01-03 73.068573 202.832764
2020-01-06 73.650795 201.508224
2020-01-07 73.304420 197.157654
2020-01-08 74.483604 197.544434
# plot
ax = dfp.plot(figsize=(11, 6))
Use seaborn, which accepts long form data, so reshaping the dataframe to a wide form isn't necessary.
seaborn is a high-level api for matplotlib
sns.lineplot: axes-level plot
fig, ax = plt.subplots(figsize=(11, 6))
sns.lineplot(data=df, x='Date', y='Adj Close', hue='ticker', ax=ax)
sns.relplot: figure-level plot
Adding row='ticker', or col='ticker', will generate a figure with two subplots.
g = sns.relplot(kind='line', data=df, x='Date', y='Adj Close', hue='ticker', aspect=1.75)

Remove interpolation Time series plot for missing values

I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?
Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.

Categories

Resources