Weird time series plot with Python when adding date to x-axis - python

I'm using matplotlib pyplot for plotting a time series of about 15000 observations. When I use this code for plotting without an x-axis data points:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(15,10)})
sns.set_palette("husl")
sns.set_style('whitegrid')
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['INTC'])
plt.show()
I get this, which is the plot I expect
The matter is that when I add the date as data points for the x-axis:
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['Date'],df['INTC'])
plt.show()
The same time series gets plotted in a weird manner:
The df looks like this:
index Date INTC
0 2022-02-04 09:30:00 47.77
1 2022-02-04 09:31:00 47.96
2 2022-02-04 09:32:00 47.81
3 2022-02-04 09:33:00 47.73
4 2022-02-04 09:34:00 47.57
...
Every observation has a time separation of 1 minute. What should I do to plot it properly including the date points in the x-axis? Thanks.

Related

How to plot time only of pandas datetime64[ns] attribute

I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'
According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.
This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()

Plot time series and cumulative sum on same graph Matplotlib

I have a pandas df similar to the following:
time price
00:00:00 2
00:10:00 6
00:20:00 3
01:25:00 16
02:25:00 7
etc...
I would like to plot on the same graph:
Time series as the variation of price as function of time
Bar graph as the number of observations taken that hour. So for the previous example, at 01:00:00 we have a bar of height 3 since from (00:00:00 -> 01:00:00 we had 3 observations) and at 02:00:00 we have height 1.
I was able to transform the data to get the bar graph data as a separate df using (.groupby(pd.Grouper(key='time', freq='H'))). Now I have 2 dfs and trying to plot them.
My data points are around 100K+, cleaned it from outliers.
Any pointers?

Customizing x axis for time series based data using Matplotlib

I am new to python programming particularly using Matplotlib. I am currently working on a set of data which I need to plot the x axis using this format (YYYY-MM-DD HH:MM:SS). I have tried a few methods but with unsuccessful results. My code is as follows:
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
Radio Network Availability Rate(%)
Time
2019-10-14 00:00:00 99.7144
2019-10-14 01:00:00 99.7144
2019-10-14 02:00:00 99.7144
2019-10-14 03:00:00 99.7144
2019-10-14 04:00:00 99.7144
... ...
2019-10-20 19:00:00 99.7403
2019-10-20 20:00:00 99.7403
2019-10-20 21:00:00 99.7404
2019-10-20 22:00:00 99.7403
2019-10-20 23:00:00 99.7403
fig, ax = plt.subplots(figsize=(8,6))
data['TPG_Radio Network Availability Rate(%)'].plot(style='r.-', title='TPG Network Availability')
plt.ylabel('Availability %')
plt.show()
I would need the output plot to be as below for the x-axis:
Try adding the below code before plt.show():
plt.xticks(len(data.index), data.index)
This helped with what i was looking for:
avai = data['TPG_Radio Network Availability Rate(%)']
fig, ax = plt.subplots(figsize=(12,9), dpi=100)
plt.plot(avai, color='r')
plt.ylabel('Availability %')
plt.xlabel('Time')
plt.title('TPG Network Availability')
loc = plticker.MultipleLocator(base=4.0)
ax.xaxis.set_major_locator(loc)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

Smart way of creating multiple graphs using matplotlib

I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.
Project Name Date Paid Actuals Item Amount Cumulative Sum
A 2016-04-10 00:00:00 124.2 124.2
A 2016-04-27 00:00:00 2727.5 2851.7
A 2016-05-11 00:00:00 2123.58 4975.28
A 2016-05-24 00:00:00 2500 7475.28
A 2016-07-07 00:00:00 38374.6 45849.88
A 2016-08-12 00:00:00 2988.14 48838.02
A 2016-09-02 00:00:00 23068 71906.02
A 2016-10-31 00:00:00 570.78 72476.8
A 2016-11-09 00:00:00 10885.75 83362.55
A 2016-12-08 00:00:00 28302.95 111665.5
A 2017-01-19 00:00:00 4354.3 116019.8
A 2017-02-28 00:00:00 3469.77 119489.57
A 2017-03-29 00:00:00 267.75 119757.32
B 2015-04-27 00:00:00 2969.93 2969.93
B 2015-06-02 00:00:00 118.8 3088.73
B 2015-06-18 00:00:00 2640 5728.73
B 2015-06-26 00:00:00 105.6 5834.33
B 2015-09-03 00:00:00 11879.7 17714.03
B 2015-10-22 00:00:00 5303.44 23017.47
B 2015-11-08 00:00:00 52000 75017.47
B 2015-11-25 00:00:00 2704.13 77721.6
B 2016-03-09 00:00:00 59752.85 137474.45
B 2016-03-13 00:00:00 512.73 137987.18
.
.
.
Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.
import pandas as pd
import matplotlib.pyplot as plt
ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')
Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)
plt.show()
You could just iterate the projects:
for proj in ws_actual['Project'].unique():
ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
plt.show()
Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:
import seaborn as sns
g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")

pandas/matplotlib datetime tick labels

I have a pandas dataframe with datetime values including microseconds:
column1 column2
time
1900-01-01 10:39:52.887916 19363.876 19362.7575
1900-01-01 10:39:53.257916 19363.876 19362.7575
1900-01-01 10:39:53.808007 19363.876 19362.7575
1900-01-01 10:39:53.827894 19363.876 19362.7575
1900-01-01 10:39:54.277931 19363.876 19362.7575
I plot the dataframe as follows:
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
mpl.pyplot.show()
Notice on the image below that the microseconds are displayed as %f rather than their actual value.
That is, instead of 10:39:52.887916 it displays 10:39:52.%f
How can I display the actual microseconds in the tick labels (even if it's only a few significant digits)?
You should be able to set the major ticks to the format you want, using set_major_formatter:
In [14]:
import matplotlib as mpl
import matplotlib.dates
df = pd.DataFrame({'column1': [1,2,3,4],
'column2': [2,3,4,5]},
index =pd.to_datetime([1e8,2e8,3e8,4e8]))
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
ax.get_xaxis().set_major_formatter(matplotlib.dates.DateFormatter('%H:%M:%S.%f'))
#mpl.pyplot.show()
return ax
print df
column1 column2
1970-01-01 00:00:00.100000 1 2
1970-01-01 00:00:00.200000 2 3
1970-01-01 00:00:00.300000 3 4
1970-01-01 00:00:00.400000 4 5
If the problem do go away, then I think somewhere in the code the formatter format is specified incorrectly, namely %%f instead of %f, which returns a literal '%' character.

Categories

Resources