pandas: How to format timestamp axis labels nicely in df.plt()? - python

I have a dataset that looks like this:
prod_code month items cost
0 040201060AAAIAI 2016-05-01 5 572.20
1 040201060AAAKAK 2016-05-01 164 14805.19
2 040201060AAALAL 2016-05-01 13465 14486.07
Doing df.dtypes shows that the month column is a datetime64[ns] type.
I am now trying to plot the cost per month for a particular product:
df[df.bnf_code=='040201060AAAIAI'][['month', 'cost']].plot()
plt.show()
This works, but the x-axis isn't a timestamp as I'd expect:
How can I format the x-axis labels nicely, with month and year labels?
Update: I also tried this, to get a bar chart, which does output timestamps on the x-axis, but in a very long unwieldy format:
df[df.bnf_code=='040201060AAAIAI'].plot.bar(x='month', y='cost', title='Spending on 040201060AAAIAI')

If you set the dates as index, the x-axis should be labelled properly:
df[df.bnf_code=='040201060AAAIAI'][['month', 'cost']].set_index('month').plot()
I have simply added set_index to your code.

Related

Pandas resampling data with bigger interval than a whole index range

Situation
I have the folowwing pandas timeseries data:
date
predicted1
2001-03-13
0.994756
2005-08-22
0.551661
2000-05-07
0.001396
I need to take into account a case of resampling into bigger interval than a 5 years, for e.g. 10 years:
sample = data.set_index(pd.DatetimeIndex(data['date'])).drop('date', axis=1)['predicted1']
sample.resample('10Y').sum()
I get the following:
date
2000-12-31
0.001396
2010-12-31
1.546418
So resampling function groups data for the first year and separetely for other years.
Question
How to group all data to the 10 year interval? I want to get smth like this:
date
2000-12-31
1.5478132011506138
You can change the reference, closing and label in resample:
sample.resample('10Y', origin=sample.index.min(), closed='left', label='left').sum()
Output:
date
1999-12-31 1.547813
Freq: 10A-DEC, Name: predicted1, dtype: float64

How do I construct ticks and labels when ploting large temporal series with matplotlib where the dates are a str column?

I have a pandas data series to display that contains basically a temporal series where each observation is contained in df['date'] (x-axis) and df['value'] (y-axis):
df['date']
0 2004-01-02
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
...
4527 2021-12-27
4528 2021-12-28
4529 2021-12-29
4530 2021-12-30
4531 2021-12-31
Name: session, Length: 4532, dtype: object
Notice how the Series contains str types:
print(type(df['date'].values[0]))
<class 'str'>
df['values'] are just integers.
If I plot the series using matplotlib and try to use df['date'] I obtain a too dense chart where the xtick labels can not be read:
ax.plot(df['date'], df['value']);
If I want to display xticks on every month change (so 2004/01, 2004/02, .... 2021/11, 2021/12) and labels just when the year changes (2004, 2005, ... 2021), which would be the best way to accomplish that either via numpy or pandas to get the arrays that .set_xticks require?
Just to further elaborate on tdy's comment (all credit on the answer to tdy), this is the code excerpt to implement to_datetime and use the locators:
ax.plot(pd.to_datetime(df['date']), df['value'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))

Iteratively plot data through datetime in pandas dataframe

I have a dataframe here that contains a value daily since 2000 (ignore the index).
Extent Date
6453 13.479 2001-01-01
6454 13.385 2001-01-02
6455 13.418 2001-01-03
6456 13.510 2001-01-04
6457 13.566 2001-01-05
I would like to make a plot where the x-axis is the day of the year, and the y-axis is the value. The plot would contain 20 different lines, with each line corresponding to the year of the data. Is there an intuitive way to do this using pandas, or is it easier to do with matplotlib?
Here is a quick paint sketch to illustrate.
One quick way is to plot x-axis as strings:
df['Date'] = pd.to_datetime(df['Date'])
(df.set_index([df.Date.dt.strftime('%m-%d'),
df.Date.dt.year])
.Extent.unstack()
.plot()
)

Include 0 value days on barchart

I currently have a grouped dataframe of dates and values that I am creating a bar chart of:
date | value
--------|--------
7-9-19 | 250
7-14-19 | 400
7-20-19 | 500
7-20-19 | 300
7-21-19 | 200
7-30-19 | 142
When I plot the df, I get back a bar chart only showing the days that have a value. Is there a way for me to easily plot a bar chart with all the days for the month without inserting dates and 0 values for all the missing days in the dataframe ?
**Edit: I left out that certain dates may have more than one entry, so adding the missing dates by re-indexing throws a duplicate axis error.
*** Solution - I ended up using just the day of the month to simplify having to deal with the datetime objs. ie, 7-9-19 => 9 . After a helpful suggestion by Quang Hoang below, I realized I could do this a little bit easier using just the day #:
ind = range(1,32)
df = df.reindex(ind, fill_value=0)
You could use reindex, remember to set date as index:
# convert to datetime
# skip if already is
df['date'] = pd.to_datetime(df['date'], format='%m-%d-%y')
(df.set_index('date')
.reindex(pd.date_range('2019-07-01','2019-07-31', freq='D'))
.plot.bar()
)
Output:

Datetime, Timedelta and separate lineplot, plot area

My df has index with time datetime64 and my columns are timedelta and float64. Below 3 example rows of my df.
CTIL downtime ratio
quater
2015-04-01 4859 days 01:46:00 1699 days 17:20:00 0.349804
2015-07-01 4553 days 14:16:00 1862 days 03:27:00 0.408939
2015-10-01 5502 days 21:18:00 2442 days 20:15:00 0.443920
I would like to plot in on one chart. CTIL and downtime should be area plots and ratio should be a line chart.
Current I have 2 separate plots:
df_quater[['CTIL', 'downtime']].plot()
df_quater['ratio'].plot()
Question 1:
How can I plot area plot when type of x is different than y.
I try this:
df_quater[['CTIL', 'downtime']].plot(kind='area')
It generate error
TypeError: Cannot cast ufunc greater_equal input (...) with casting rule 'same_kind'
Question 2:
Can my labels on y be in deltatime format? Current plot has numbers.
Qustion 3:
Can I connect this 2 plot into one? Label for CTIL and downright.time should be on left and label for ratio should be on

Categories

Resources