I am struggling a lot with making Time Series Subplots look the way I want. I will provide what I have tried and what shortcomings they have below.
This is a sample of the data I am working with, currently in a pd data frame.
Price_REG1 Price_REG2 Price_REG3 Price_REG4
date
2020-01-01 00:00:00 30.83 30.83 30.83 30.83
2020-01-01 01:00:00 28.78 28.78 28.78 28.78
2020-01-01 02:00:00 28.45 28.45 28.45 28.45
2020-01-01 03:00:00 27.90 27.90 27.90 27.90
2020-01-01 04:00:00 27.52 27.52 27.52 27.52
What I want to do is to plot subplots for these four columns, one with a normal plot and one with a histogram. My plot code goes like this:
df.plot(subplots=True, color= ['grey', 'grey', 'grey', 'grey'],
figsize=(6, 6),lw=0.8, xlabel='', legend=False)
plt.legend(["AA", "BBB", "AAA", "BBB"]);
My only problem here rn is that the legend is only showing on the last plot for some reason.
My first Hist code:
fig, ax = plt.subplots(2, 2, sharex='col', sharey='row')
m=0
for i in range(2):
for j in range(2):
df.hist(column = df.columns[m], grid=False, color='grey',
bins = 150, ax=ax[i,j], figsize=(20, 20))
m+=1
Here I would like to remove titles and add legends, or change titles, "Price Region 1" etc.
My second Hist code is this:
fig, ax = df.plot(kind='hist', bins=150, subplots=True,sharex='col',sharey='row',
title=False,layout=(2, 2), legend=True)
Here I want to remove the y label and change the legends/add titles instead of legends.
Related
I'm using matplotlib pyplot for plotting a time series of about 15000 observations. When I use this code for plotting without an x-axis data points:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(15,10)})
sns.set_palette("husl")
sns.set_style('whitegrid')
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['INTC'])
plt.show()
I get this, which is the plot I expect
The matter is that when I add the date as data points for the x-axis:
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['Date'],df['INTC'])
plt.show()
The same time series gets plotted in a weird manner:
The df looks like this:
index Date INTC
0 2022-02-04 09:30:00 47.77
1 2022-02-04 09:31:00 47.96
2 2022-02-04 09:32:00 47.81
3 2022-02-04 09:33:00 47.73
4 2022-02-04 09:34:00 47.57
...
Every observation has a time separation of 1 minute. What should I do to plot it properly including the date points in the x-axis? Thanks.
I have a dataframe that looks like this:
A ... B
datetime ...
2020-01-01 00:00:00 10.622 ... 30
2020-01-01 01:00:00 16.397 ... 30
2020-01-01 02:00:00 24.190 ... 30
2020-01-01 03:00:00 33.579 ... 30
2020-01-01 04:00:00 44.643 ... 30
... ... ...
2020-01-07 20:00:00 18.090 ... 30
2020-01-07 21:00:00 18.027 ... 30
When I use df.plot, all columns are plotted with default colors and solid lines. I would like to plot columns A to x in the same color but with different markers and columns x+1 to B in another color with the same markers.
Is this somehow possible without manually declaring it for every column seperately?
Thanks!
You can create a custom cycler for this task:
from cycler import cycler
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
markernr = 3
colornr = 2
#test data generation
np.random.seed(123)
df = pd.DataFrame(np.random.random((10, 6)), columns=["A1", "A2", "A3", "B1", "B2", "B3"])
colors = ["tab:red", "tab:blue", "tab:orange", "brown", "lightblue", "yellow"]
markers = ["x", "o", "D", "+", "H"]
my_cycler = (cycler(color=colors[:colornr]) *
cycler(marker=markers[:markernr]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
df.plot(ax=ax1, title="Standard cycler")
ax2.set_prop_cycle(my_cycler)
df.plot(ax=ax2, title="Customized cycler")
plt.show()
Sample output:
I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'
According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.
This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
I am new to python programming particularly using Matplotlib. I am currently working on a set of data which I need to plot the x axis using this format (YYYY-MM-DD HH:MM:SS). I have tried a few methods but with unsuccessful results. My code is as follows:
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
Radio Network Availability Rate(%)
Time
2019-10-14 00:00:00 99.7144
2019-10-14 01:00:00 99.7144
2019-10-14 02:00:00 99.7144
2019-10-14 03:00:00 99.7144
2019-10-14 04:00:00 99.7144
... ...
2019-10-20 19:00:00 99.7403
2019-10-20 20:00:00 99.7403
2019-10-20 21:00:00 99.7404
2019-10-20 22:00:00 99.7403
2019-10-20 23:00:00 99.7403
fig, ax = plt.subplots(figsize=(8,6))
data['TPG_Radio Network Availability Rate(%)'].plot(style='r.-', title='TPG Network Availability')
plt.ylabel('Availability %')
plt.show()
I would need the output plot to be as below for the x-axis:
Try adding the below code before plt.show():
plt.xticks(len(data.index), data.index)
This helped with what i was looking for:
avai = data['TPG_Radio Network Availability Rate(%)']
fig, ax = plt.subplots(figsize=(12,9), dpi=100)
plt.plot(avai, color='r')
plt.ylabel('Availability %')
plt.xlabel('Time')
plt.title('TPG Network Availability')
loc = plticker.MultipleLocator(base=4.0)
ax.xaxis.set_major_locator(loc)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.
Project Name Date Paid Actuals Item Amount Cumulative Sum
A 2016-04-10 00:00:00 124.2 124.2
A 2016-04-27 00:00:00 2727.5 2851.7
A 2016-05-11 00:00:00 2123.58 4975.28
A 2016-05-24 00:00:00 2500 7475.28
A 2016-07-07 00:00:00 38374.6 45849.88
A 2016-08-12 00:00:00 2988.14 48838.02
A 2016-09-02 00:00:00 23068 71906.02
A 2016-10-31 00:00:00 570.78 72476.8
A 2016-11-09 00:00:00 10885.75 83362.55
A 2016-12-08 00:00:00 28302.95 111665.5
A 2017-01-19 00:00:00 4354.3 116019.8
A 2017-02-28 00:00:00 3469.77 119489.57
A 2017-03-29 00:00:00 267.75 119757.32
B 2015-04-27 00:00:00 2969.93 2969.93
B 2015-06-02 00:00:00 118.8 3088.73
B 2015-06-18 00:00:00 2640 5728.73
B 2015-06-26 00:00:00 105.6 5834.33
B 2015-09-03 00:00:00 11879.7 17714.03
B 2015-10-22 00:00:00 5303.44 23017.47
B 2015-11-08 00:00:00 52000 75017.47
B 2015-11-25 00:00:00 2704.13 77721.6
B 2016-03-09 00:00:00 59752.85 137474.45
B 2016-03-13 00:00:00 512.73 137987.18
.
.
.
Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.
import pandas as pd
import matplotlib.pyplot as plt
ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')
Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)
plt.show()
You could just iterate the projects:
for proj in ws_actual['Project'].unique():
ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
plt.show()
Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:
import seaborn as sns
g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")