Customizing x axis for time series based data using Matplotlib - python

I am new to python programming particularly using Matplotlib. I am currently working on a set of data which I need to plot the x axis using this format (YYYY-MM-DD HH:MM:SS). I have tried a few methods but with unsuccessful results. My code is as follows:
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
Radio Network Availability Rate(%)
Time
2019-10-14 00:00:00 99.7144
2019-10-14 01:00:00 99.7144
2019-10-14 02:00:00 99.7144
2019-10-14 03:00:00 99.7144
2019-10-14 04:00:00 99.7144
... ...
2019-10-20 19:00:00 99.7403
2019-10-20 20:00:00 99.7403
2019-10-20 21:00:00 99.7404
2019-10-20 22:00:00 99.7403
2019-10-20 23:00:00 99.7403
fig, ax = plt.subplots(figsize=(8,6))
data['TPG_Radio Network Availability Rate(%)'].plot(style='r.-', title='TPG Network Availability')
plt.ylabel('Availability %')
plt.show()
I would need the output plot to be as below for the x-axis:

Try adding the below code before plt.show():
plt.xticks(len(data.index), data.index)

This helped with what i was looking for:
avai = data['TPG_Radio Network Availability Rate(%)']
fig, ax = plt.subplots(figsize=(12,9), dpi=100)
plt.plot(avai, color='r')
plt.ylabel('Availability %')
plt.xlabel('Time')
plt.title('TPG Network Availability')
loc = plticker.MultipleLocator(base=4.0)
ax.xaxis.set_major_locator(loc)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

Related

Subplots with Time Series

I am struggling a lot with making Time Series Subplots look the way I want. I will provide what I have tried and what shortcomings they have below.
This is a sample of the data I am working with, currently in a pd data frame.
Price_REG1 Price_REG2 Price_REG3 Price_REG4
date
2020-01-01 00:00:00 30.83 30.83 30.83 30.83
2020-01-01 01:00:00 28.78 28.78 28.78 28.78
2020-01-01 02:00:00 28.45 28.45 28.45 28.45
2020-01-01 03:00:00 27.90 27.90 27.90 27.90
2020-01-01 04:00:00 27.52 27.52 27.52 27.52
What I want to do is to plot subplots for these four columns, one with a normal plot and one with a histogram. My plot code goes like this:
df.plot(subplots=True, color= ['grey', 'grey', 'grey', 'grey'],
figsize=(6, 6),lw=0.8, xlabel='', legend=False)
plt.legend(["AA", "BBB", "AAA", "BBB"]);
My only problem here rn is that the legend is only showing on the last plot for some reason.
My first Hist code:
fig, ax = plt.subplots(2, 2, sharex='col', sharey='row')
m=0
for i in range(2):
for j in range(2):
df.hist(column = df.columns[m], grid=False, color='grey',
bins = 150, ax=ax[i,j], figsize=(20, 20))
m+=1
Here I would like to remove titles and add legends, or change titles, "Price Region 1" etc.
My second Hist code is this:
fig, ax = df.plot(kind='hist', bins=150, subplots=True,sharex='col',sharey='row',
title=False,layout=(2, 2), legend=True)
Here I want to remove the y label and change the legends/add titles instead of legends.

Weird time series plot with Python when adding date to x-axis

I'm using matplotlib pyplot for plotting a time series of about 15000 observations. When I use this code for plotting without an x-axis data points:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(15,10)})
sns.set_palette("husl")
sns.set_style('whitegrid')
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['INTC'])
plt.show()
I get this, which is the plot I expect
The matter is that when I add the date as data points for the x-axis:
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['Date'],df['INTC'])
plt.show()
The same time series gets plotted in a weird manner:
The df looks like this:
index Date INTC
0 2022-02-04 09:30:00 47.77
1 2022-02-04 09:31:00 47.96
2 2022-02-04 09:32:00 47.81
3 2022-02-04 09:33:00 47.73
4 2022-02-04 09:34:00 47.57
...
Every observation has a time separation of 1 minute. What should I do to plot it properly including the date points in the x-axis? Thanks.

Smart way of creating multiple graphs using matplotlib

I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.
Project Name Date Paid Actuals Item Amount Cumulative Sum
A 2016-04-10 00:00:00 124.2 124.2
A 2016-04-27 00:00:00 2727.5 2851.7
A 2016-05-11 00:00:00 2123.58 4975.28
A 2016-05-24 00:00:00 2500 7475.28
A 2016-07-07 00:00:00 38374.6 45849.88
A 2016-08-12 00:00:00 2988.14 48838.02
A 2016-09-02 00:00:00 23068 71906.02
A 2016-10-31 00:00:00 570.78 72476.8
A 2016-11-09 00:00:00 10885.75 83362.55
A 2016-12-08 00:00:00 28302.95 111665.5
A 2017-01-19 00:00:00 4354.3 116019.8
A 2017-02-28 00:00:00 3469.77 119489.57
A 2017-03-29 00:00:00 267.75 119757.32
B 2015-04-27 00:00:00 2969.93 2969.93
B 2015-06-02 00:00:00 118.8 3088.73
B 2015-06-18 00:00:00 2640 5728.73
B 2015-06-26 00:00:00 105.6 5834.33
B 2015-09-03 00:00:00 11879.7 17714.03
B 2015-10-22 00:00:00 5303.44 23017.47
B 2015-11-08 00:00:00 52000 75017.47
B 2015-11-25 00:00:00 2704.13 77721.6
B 2016-03-09 00:00:00 59752.85 137474.45
B 2016-03-13 00:00:00 512.73 137987.18
.
.
.
Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.
import pandas as pd
import matplotlib.pyplot as plt
ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')
Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)
plt.show()
You could just iterate the projects:
for proj in ws_actual['Project'].unique():
ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
plt.show()
Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:
import seaborn as sns
g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")

Cannot plot predicted time series values using matplotlib

I am trying to plot my actual time series values and predicted values but it gives me this error:
ValueError: view limit minimum -36816.95989583333 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
I am using statsmodels to fit an arima model to the data.
This is a sample of my data:
datetime value
2017-01-01 00:00:00 10.18
2017-01-01 00:15:00 10.2
2017-01-01 00:30:00 10.32
2017-01-01 00:45:00 10.16
2017-01-01 01:00:00 9.93
2017-01-01 01:15:00 9.77
2017-01-01 01:30:00 9.47
2017-01-01 01:45:00 9.08
This is my code:
mod = sm.tsa.statespace.SARIMAX(
subset,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False
)
results = mod.fit()
pred_uc = results.get_forecast(steps=500)
pred_ci = pred_uc.conf_int(alpha = 0.05)
# Plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(1, 1, 1)
ax.plot(subset,color = "blue")
ax.plot(pred_uc.predicted_mean, color="black", alpha=0.5, label='SARIMAX')
plt.show()
Any idea how to fix this?
It should have been an issue on how you provide the data.
The datetime values must be the index of the values in your data subset variable, and thus, the following works.
I've imported the data as follows right before the code you provided:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
subset = pd.Series(
[
10.18, 10.2 , 10.32,
10.16, 9.93, 9.77,
9.47, 9.08
]
, index=pd.date_range(
start='2017-01-01T00:00:00.000',
end='2017-01-01T01:45:00.000',
freq='15T'
) )
And I got, I believe, your desired plot (it's cut):
.
I used these versions of the libraries, in Python 3:
matplotlib.version
'3.1.2'
numpy.version
'1.17.4'
pandas.version
'0.25.3'
statsmodels.version
'0.12.0'

Struggling formating xticks with DateTime from Pandas Dataframe

I'm currently struggling with my xaxis format.. the date doesnt start at 2017 but somewhere in the year 1138.
Is there something I've done wrong?
my tick labeling should be like Thu 01.06.2017, Fri 02.06.2017, etc.
%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
Gesamt_Apr_Sept_2017 = pd.read_csv('Gesamtstromverbrauch_Met_01.04.2017-30.09.2017-1h.csv',sep=';',decimal = ",", thousands = '.', index_col=0, parse_dates=True, dayfirst=True)
Daten = Gesamt_Apr_Sept_2017['6/1/2017':'06/30/2017']
# Figure erzeugen
fig, ax = plt.subplots(figsize=(15,6))
Daten['LKHF-Strom-Met - Gesamt (kWh)'].plot()
ax.xaxis.set_major_locator(DayLocator(interval=7))
ax.xaxis.set_major_formatter(DateFormatter('%a %d.%m.%Y'))
for tick in ax.get_xticklabels():
tick.set_rotation(90)
diagram: https://imgur.com/a/d4KQQ
The Dataframe looks like:
Date
2017-04-01 01:00:00 1008.0
2017-04-01 02:00:00 996.0
2017-04-01 03:00:00 976.0
2017-04-01 04:00:00 984.0
2017-04-01 05:00:00 1024.0
dtype='datetime64[ns]', name='Date', length=720, freq=None)
Just replace Daten['LKHF-Strom-Met - Gesamt (kWh)'].plot() with Daten[['LKHF-Strom-Met - Gesamt (kWh)']].plot() to fix your error.
Short explanation why:
df[col] returns a Pandas Series and df[[col]] a Pandas Dataframe and for the Dataframe the plot function uses the index as x-axis.
%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
Gesamt_Apr_Sept_2017 = pd.read_csv('Gesamtstromverbrauch_Met_01.04.2017-30.09.2017-1h.csv',sep=';',decimal = ",", thousands = '.', index_col=0, parse_dates=True, dayfirst=True)
Daten = Gesamt_Apr_Sept_2017['6/1/2017':'06/30/2017']
# Figure erzeugen
fig, ax = plt.subplots(figsize=(15,6))
Daten[['LKHF-Strom-Met - Gesamt (kWh)']].plot()
ax.xaxis.set_major_locator(DayLocator(interval=7))
ax.xaxis.set_major_formatter(DateFormatter('%a %d.%m.%Y'))
for tick in ax.get_xticklabels():
tick.set_rotation(90)

Categories

Resources