Datetime, Timedelta and separate lineplot, plot area - python

My df has index with time datetime64 and my columns are timedelta and float64. Below 3 example rows of my df.
CTIL downtime ratio
quater
2015-04-01 4859 days 01:46:00 1699 days 17:20:00 0.349804
2015-07-01 4553 days 14:16:00 1862 days 03:27:00 0.408939
2015-10-01 5502 days 21:18:00 2442 days 20:15:00 0.443920
I would like to plot in on one chart. CTIL and downtime should be area plots and ratio should be a line chart.
Current I have 2 separate plots:
df_quater[['CTIL', 'downtime']].plot()
df_quater['ratio'].plot()
Question 1:
How can I plot area plot when type of x is different than y.
I try this:
df_quater[['CTIL', 'downtime']].plot(kind='area')
It generate error
TypeError: Cannot cast ufunc greater_equal input (...) with casting rule 'same_kind'
Question 2:
Can my labels on y be in deltatime format? Current plot has numbers.
Qustion 3:
Can I connect this 2 plot into one? Label for CTIL and downright.time should be on left and label for ratio should be on

Related

How do I construct ticks and labels when ploting large temporal series with matplotlib where the dates are a str column?

I have a pandas data series to display that contains basically a temporal series where each observation is contained in df['date'] (x-axis) and df['value'] (y-axis):
df['date']
0 2004-01-02
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
...
4527 2021-12-27
4528 2021-12-28
4529 2021-12-29
4530 2021-12-30
4531 2021-12-31
Name: session, Length: 4532, dtype: object
Notice how the Series contains str types:
print(type(df['date'].values[0]))
<class 'str'>
df['values'] are just integers.
If I plot the series using matplotlib and try to use df['date'] I obtain a too dense chart where the xtick labels can not be read:
ax.plot(df['date'], df['value']);
If I want to display xticks on every month change (so 2004/01, 2004/02, .... 2021/11, 2021/12) and labels just when the year changes (2004, 2005, ... 2021), which would be the best way to accomplish that either via numpy or pandas to get the arrays that .set_xticks require?
Just to further elaborate on tdy's comment (all credit on the answer to tdy), this is the code excerpt to implement to_datetime and use the locators:
ax.plot(pd.to_datetime(df['date']), df['value'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))

Plot dataframe in Python

I'm new to Python. I hope you can help me.
I have a dataframe with two columns. The first column is called dates and the second column is filled with numbers. The dataframe has 351 row.
dates numbers
01.03.2019 5
02.03.2019 8
...
20.02.2020 3
21.02.2020 2
I want the whole first column to be on the x axis from. I tried to plot it like this:
graph = FinalDataframe.plot(figsize=(12, 8))
graph.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
graph.set_xticklabels(FinalDataframe['dates'])
plt.show()
But on the x axis are only the first few values from the column instead of the whole column. Furthermore, they are not correlated to the data from the second column.
Any suggestions?
Thank you in advance!
Your issue is that x ticks are generated automatically, and spaced out to be readable. However you the tell matplotlib to use all the labels. The simple fix is to tell him to use one tick label per entry, but that’s going to make your x-axis unreadable:
graph.set_xticks(range(len(FinalDataframe['dates'])))
Now you could space them out manually:
graph.set_xticks(range(0, len(FinalDataframe['dates']), 61))
graph.set_xticklabels(FinalDataframe['dates'][::61])
However the best result to plot dates on the x-axis is still to use pandas’ built-in date objects. We can do this with pd.to_datetime
This will also allow pandas to know where to place points on the x-axis, by specifying that you want the x-axis to be the dates. In that way, if dates are not sorted or missing, the gaps will be skipped properly, and points will be above the ordinate of the right date.
I’m first recreating a dataframe that looks like what you posted:
>>> df = pd.DataFrame({'dates': pd.date_range('20190301', '20200221', freq='D').strftime('%d.%m.%Y'), 'numbers': np.random.randint(0, 10, 358)})
>>> df
dates numbers
0 01.03.2019 2
1 02.03.2019 2
2 03.03.2019 5
3 04.03.2019 4
4 05.03.2019 3
.. ... ...
353 17.02.2020 2
354 18.02.2020 1
355 19.02.2020 2
356 20.02.2020 3
357 21.02.2020 1
(This should be the same as FinalDataFrame, or if your dates are the index, then it’s the same as FinalDataFrame.reset_index())
Now I’m converting the dates:
>>> df['dates'] = pd.to_datetime(df['dates'], format='%d.%m.%Y')
>>> df
dates numbers
0 2019-03-01 2
1 2019-03-02 2
2 2019-03-03 5
3 2019-03-04 4
4 2019-03-05 3
.. ... ...
353 2020-02-17 2
354 2020-02-18 1
355 2020-02-19 2
356 2020-02-20 3
357 2020-02-21 1
You can check your columns contain dates and not string representations of dates by checking their dtypes:
>>> df.dtypes
dates datetime64[ns]
numbers int64
Finally plotting:
>>> ax = df.plot(x='dates', y='numbers', figsize=(12, 8))
>>> ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
<matplotlib.legend.Legend object at 0x7fc8c24fd4f0>
>>> plt.show()
Legends are taken care of automatically. This is what you get:

Iteratively plot data through datetime in pandas dataframe

I have a dataframe here that contains a value daily since 2000 (ignore the index).
Extent Date
6453 13.479 2001-01-01
6454 13.385 2001-01-02
6455 13.418 2001-01-03
6456 13.510 2001-01-04
6457 13.566 2001-01-05
I would like to make a plot where the x-axis is the day of the year, and the y-axis is the value. The plot would contain 20 different lines, with each line corresponding to the year of the data. Is there an intuitive way to do this using pandas, or is it easier to do with matplotlib?
Here is a quick paint sketch to illustrate.
One quick way is to plot x-axis as strings:
df['Date'] = pd.to_datetime(df['Date'])
(df.set_index([df.Date.dt.strftime('%m-%d'),
df.Date.dt.year])
.Extent.unstack()
.plot()
)

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

pandas: How to format timestamp axis labels nicely in df.plt()?

I have a dataset that looks like this:
prod_code month items cost
0 040201060AAAIAI 2016-05-01 5 572.20
1 040201060AAAKAK 2016-05-01 164 14805.19
2 040201060AAALAL 2016-05-01 13465 14486.07
Doing df.dtypes shows that the month column is a datetime64[ns] type.
I am now trying to plot the cost per month for a particular product:
df[df.bnf_code=='040201060AAAIAI'][['month', 'cost']].plot()
plt.show()
This works, but the x-axis isn't a timestamp as I'd expect:
How can I format the x-axis labels nicely, with month and year labels?
Update: I also tried this, to get a bar chart, which does output timestamps on the x-axis, but in a very long unwieldy format:
df[df.bnf_code=='040201060AAAIAI'].plot.bar(x='month', y='cost', title='Spending on 040201060AAAIAI')
If you set the dates as index, the x-axis should be labelled properly:
df[df.bnf_code=='040201060AAAIAI'][['month', 'cost']].set_index('month').plot()
I have simply added set_index to your code.

Categories

Resources