Smart way of creating multiple graphs using matplotlib - python

I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.
Project Name Date Paid Actuals Item Amount Cumulative Sum
A 2016-04-10 00:00:00 124.2 124.2
A 2016-04-27 00:00:00 2727.5 2851.7
A 2016-05-11 00:00:00 2123.58 4975.28
A 2016-05-24 00:00:00 2500 7475.28
A 2016-07-07 00:00:00 38374.6 45849.88
A 2016-08-12 00:00:00 2988.14 48838.02
A 2016-09-02 00:00:00 23068 71906.02
A 2016-10-31 00:00:00 570.78 72476.8
A 2016-11-09 00:00:00 10885.75 83362.55
A 2016-12-08 00:00:00 28302.95 111665.5
A 2017-01-19 00:00:00 4354.3 116019.8
A 2017-02-28 00:00:00 3469.77 119489.57
A 2017-03-29 00:00:00 267.75 119757.32
B 2015-04-27 00:00:00 2969.93 2969.93
B 2015-06-02 00:00:00 118.8 3088.73
B 2015-06-18 00:00:00 2640 5728.73
B 2015-06-26 00:00:00 105.6 5834.33
B 2015-09-03 00:00:00 11879.7 17714.03
B 2015-10-22 00:00:00 5303.44 23017.47
B 2015-11-08 00:00:00 52000 75017.47
B 2015-11-25 00:00:00 2704.13 77721.6
B 2016-03-09 00:00:00 59752.85 137474.45
B 2016-03-13 00:00:00 512.73 137987.18
.
.
.
Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.
import pandas as pd
import matplotlib.pyplot as plt
ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')

Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)
plt.show()

You could just iterate the projects:
for proj in ws_actual['Project'].unique():
ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
plt.show()
Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:
import seaborn as sns
g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")

Related

Weird time series plot with Python when adding date to x-axis

I'm using matplotlib pyplot for plotting a time series of about 15000 observations. When I use this code for plotting without an x-axis data points:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(15,10)})
sns.set_palette("husl")
sns.set_style('whitegrid')
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['INTC'])
plt.show()
I get this, which is the plot I expect
The matter is that when I add the date as data points for the x-axis:
plt.figure(figsize=(20, 5), dpi=80)
plt.plot(df['Date'],df['INTC'])
plt.show()
The same time series gets plotted in a weird manner:
The df looks like this:
index Date INTC
0 2022-02-04 09:30:00 47.77
1 2022-02-04 09:31:00 47.96
2 2022-02-04 09:32:00 47.81
3 2022-02-04 09:33:00 47.73
4 2022-02-04 09:34:00 47.57
...
Every observation has a time separation of 1 minute. What should I do to plot it properly including the date points in the x-axis? Thanks.

ploting histogram with timedelta series

I have some series of data, which is a timedelta data type. I wanted to plot these timedelta into a bar diagram where the y axis should only be marked in hours instead of some other format. Previously, when I was trying with a line plot in matplotlib, it showed some not understandable numbers. The following is the sample of my timedelta series of pandas:
date
2020-04-11 0 days 02:00:00
2020-04-12 0 days 03:00:00
2020-04-13 0 days 02:00:00
2020-04-14 0 days 03:00:00
2020-04-15 0 days 01:00:00
2020-04-16 0 days 03:00:00
Freq: D, dtype: timedelta64[ns]
When I am trying to plot it in matplotlib, it results in a plot with y axis values look weird to me.
Please help me to work out with the plots, where the y-axis tick labels should be in 01:00, 02:00 like format.
Eagerly waiting for some of the help.
A possible way is to convert the deltas in seconds and define a FuncFormatter.
This is my test series and my final plot:
2020-04-11 02:00:00
2020-04-12 03:00:00
2020-04-13 05:00:00
dtype: timedelta64[ns]
def delta(x, pos):
out = str(datetime.timedelta(seconds=x) )
return out
fig = plt.figure()
ax = fig.gca()
form = matplotlib.ticker.FuncFormatter(delta)
ax.yaxis.set_major_formatter(form)
ax.plot(s.index, s/np.timedelta64(1,'s'))
ax.set_yticks(s/np.timedelta64(1,'s'))
ax.set_xticks(s.index)
plt.show()

How to plot time only of pandas datetime64[ns] attribute

I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'
According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.
This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()

how to plot only with the dates inside my df and not all the dates

I have this following df :
date values
2020-08-06 08:00:00 5
2020-08-06 09:00:00 10
2020-08-06 10:00:00 0
2020-08-17 08:00:00 8
2020-08-17 09:00:00 15
I want to plot this df so I do : df.set_index('date')['values'].plot(kind='line') but it shows all the dates between the 6th and the 17th.
How can I plot the graph only with the dates inside my df ?
I assume that date column is of datetime type.
To draw for selected dates only, the index must be built on
the principle "number of day from a unique list + hour".
But to suppress the default x label ticks, you have to define
your own, e.g. each 8 h in each date to be drawn.
Start from converting your DataFrame as follows:
idx = df['date'].dt.normalize().unique()
dateMap = pd.Series(np.arange(idx.size) * 24, index=idx)
df.set_index(df.date.dt.date.map(dateMap) + df.date.dt.hour, inplace=True)
df.index.rename('HourNo', inplace=True); df
Now, for your data sample, it has the following content:
date values
HourNo
8 2020-08-06 08:00:00 5
9 2020-08-06 09:00:00 10
10 2020-08-06 10:00:00 0
32 2020-08-17 08:00:00 8
33 2020-08-17 09:00:00 15
Then generate your plot and x ticks positions and labels:
fig, ax = plt.subplots(tight_layout=True)
df.loc[:, 'values'].plot(style='o-', rot=30, ax=ax)
xLoc = np.arange(0, dateMap.index.size * 24, 8)
xLbl = pd.concat([ pd.Series(d + pd.timedelta_range(start=0, freq='8H',
periods=3)) for d in dateMap.index ]).dt.strftime('%Y-%m-%d\n%H:%M')
plt.xticks(ticks=xLoc, labels=xLbl, ha='right')
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.set_title('Set the proper heading')
ax.grid()
plt.show()
I added also the grid.
The result is:
And the final remark: Avoid column names which are the same as existing
Pandas methods or arrtibutes (e.g. values).
Sometimes it is the cause of "stupid" errors (you intend to refer to
a column, but you actually refer to a metod or attribute).

python plot several curves from dataframe

I am trying to plot several appliances' temperatures on a plot.
The data comes from the dataframe df below, and I first create the date column as the index.
df=df.set_index('Date')
Date Appliance Value (degrees)
2016-07-05 03:00:00 Thermometer 22
2016-08-06 16:00:00 Thermometer . 19
2016-12-07 21:00:00 . Thermometer . 25
2016-19-08 23:00:00 . Thermostat . 21
2016-25-09 06:00:00 . Thermostat . 20
2016-12-10 21:00:00 . Thermometer . 18
2016-10-11 21:00:00 . Thermostat . 21
2016-10-12 04:00:00 . Thermometer . 20
2017-01-01 07:00:00 . Thermostat . 19
2017-01-02 07:00:00 . Thermometer . 23
We want to be able to show 2 curves: one for the thermometer's temperatures and one for the thermostat's temperatures with 2 different colours, over time.
plt.plot(df.index, [df.value for i in range(len(appliance)]
ax = df.plot()
ax.set_xlim(pd.Timestamp('2016-07-05'), pd.Timestamp('2015-11-30'))
Is ggplot better for this?
I cannot manage to make this works
There are of course several ways to plot the data.
So assume we have a dataframe like this
import pandas as pd
dates = ["2016-07-05 03:00:00", "2016-08-06 16:00:00", "2016-12-07 21:00:00",
"2016-19-08 23:00:00", "2016-25-09 06:00:00", "2016-12-10 21:00:00",
"2016-10-11 21:00:00", "2016-10-12 04:00:00", "2017-01-01 07:00:00",
"2017-01-02 07:00:00"]
app = ["Thermometer","Thermometer","Thermometer","Thermostat","Thermostat","Thermometer",
"Thermostat","Thermometer","Thermostat","Thermometer"]
values = [22,19,25,21,20,18,21,20,19,23]
df = pd.DataFrame({"Date" : dates, "Appliance" : app, "Value":values})
df.Date = pd.to_datetime(df['Date'], format='%Y-%d-%m %H:%M:%S')
df=df.set_index('Date')
Using matplotlib pyplot.plot()
import matplotlib.pyplot as plt
df1 = df[df["Appliance"] == "Thermostat"]
df2 = df[df["Appliance"] == "Thermometer"]
plt.plot(df1.index, df1["Value"].values, marker="o", label="Thermostat")
plt.plot(df2.index, df2["Value"].values, marker="o", label="Thermmeter")
plt.gcf().autofmt_xdate()
plt.legend()
Using pandas DataFrame.plot()
df1 = df[df["Appliance"] == "Thermostat"]
df2 = df[df["Appliance"] == "Thermometer"]
ax = df1.plot(y="Value", label="Thermostat")
df2.plot(y="Value", ax=ax, label="Thermometer")
ax.legend()

Categories

Resources