Matplotlib & Pandas DateTime Compatibility

Matplotlib & Pandas DateTime Compatibility - python

Problem: I am trying to make a very simple bar chart in Matplotlib of a Pandas DataFrame. The DateTime index is causing confusion, however: Matplotlib does not appear to understand the Pandas DateTime, and is labeling the years incorrectly. How can I fix this?
Code
# Make date time series
index_dates = pd.date_range('2018-01-01', '2021-01-01')
# Make data frame with some random data, using the date time index
df = pd.DataFrame(index=index_dates,
data = np.random.rand(len(index_dates)),
columns=['Data'])
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
df.plot.bar(ax=ax)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Instead of showing up as 2018-2021, however, the years show up as 1970 - 1973.
I've already looked at the answers here, here, and documentation here. I know the date timeindex is in fact a datetime index because when I call df.info() it shows it as a datetime index, and when I call index_dates[0].year it returns 2018. How can I fix this? Thank you!

The problem is with mixing df.plot.bar and matplotlib here.
df.plot.bar sets tick locations starting from 0 (and assigns labels), while matplotlib.dates expects the locations to be the number of days since 1970-01-01 (more info here).
If you do it with matplotlib directly, it shows labels correctly:
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
plt.bar(x=df.index, height=df['Data'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Output:

Related

Matplotlib - highlighting weekends on x axis?

I've a time series (typically energy usage) recorded over a range of days. Since usage tends to be different over the weekend I want to highlight the weekends.
I've done what seems sensible:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import random
#Create dummy data.
start=datetime.datetime(2022,10,22,0,0)
finish=datetime.datetime(2022,11,7,0,0)
def randomWalk():
i=0
while True:
i=i+random.random()-0.5
yield i
s = pd.Series({i: next(randomWalk()) for i in pd.date_range(start, finish,freq='h')})
# Plot it.
plt.figure(figsize=[12, 8]);
s.plot();
# Color the labels according to the day of week.
for label, day in zip(plt.gca().xaxis.get_ticklabels(which='minor'),
pd.date_range(start,finish,freq='d')):
label.set_color('red' if day.weekday() > 4 else 'black')
But what I get is wrong. Two weekends appear one off, and the third doesn't show at all.
I've explored the 'label' objects, but their X coordinate is just an integer, and doesn't seem meaningful. Using DateFormatter just gives nonsense.
How would be best to fix this, please?

OK - since matplotlib only provides the information we need to the Tick Label Formatter functions, that's what we have to use:
minorLabels=plt.gca().xaxis.get_ticklabels(which='minor')
majorLabels=plt.gca().xaxis.get_ticklabels(which='major')
def MinorFormatter(dateInMinutes, index):
# Formatter: first param is value (date in minutes, would you believe), second is which item in order.
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
minorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return day.day
def MajorFormatter(dateInMinutes, index):
day=pd.to_datetime(np.datetime64(int(dateInMinutes),'m'))
majorLabels[index].set_color('red' if day.weekday()==6 else 'black') # Sunday
return "" if (index==0 or index==len(majorLabels)-1) else day.strftime("%d\n%b\n%Y")
plt.gca().xaxis.set_minor_formatter(MinorFormatter)
plt.gca().xaxis.set_major_formatter(MajorFormatter)
Pretty clunky, but it works. Could be fragile, though - anyone got a better answer?

Matplotlib is meant for scientific use and although technically styling is possible, it's really hard and not worth the effort.
Consider using Plotly instead of Matplotlib as below:
#pip install plotly in terminal
import plotly.express as px
# read plotly express provided sample dataframe
df = px.data.tips()
# create plotly figure with color_discrete_map property specifying color per day
fig = px.bar(df, x="day", y="total_bill", color='day',
color_discrete_map={"Sat": "orange", "Sun": "orange", "Thur": "blue", "Fri": "blue"}
)
# send to browser
fig.show()
Solves your problem using a lot fewer lines. Only thing here is you need to make sure your data is in a Pandas DataFrame rather than Series with column names which you can pass into plotly.express.bar or scatter plot.

How to display only certain x axis values on plot

I am plotting values from a dataframe where time is the x-axis. The time is formatted as 00:00 to 23:45. I only want to display the specific times 00:00, 06:00, 12:00, 18:00 on the x-axis of my plot. How can this be done? I have posted two figures, the first shows the format of my dataframe after setting the index to time. And the second shows my figure. Thank you for your help!
monday.set_index("Time", drop=True, inplace=True)
monday_figure = monday.plot(kind='line', legend = False,
title = 'Monday Average Power consumption')
monday_figure.xaxis.set_major_locator(plt.MaxNLocator(8))
Edit: Adding data as text:
Time,DayOfWeek,kW
00:00:00,Monday,5.8825
00:15:00,Monday,6.0425
00:30:00,Monday,6.0025
00:45:00,Monday,5.7475
01:00:00,Monday,6.11
01:15:00,Monday,5.8025
01:30:00,Monday,5.6375
01:45:00,Monday,5.85
02:00:00,Monday,5.7250000000000005
02:15:00,Monday,5.66
02:30:00,Monday,6.0025
02:45:00,Monday,5.71
03:00:00,Monday,5.7425
03:15:00,Monday,5.6925
03:30:00,Monday,5.9475
03:45:00,Monday,6.380000000000001
04:00:00,Monday,5.65
04:15:00,Monday,5.8725
04:30:00,Monday,5.865
04:45:00,Monday,5.71
05:00:00,Monday,5.6925
05:15:00,Monday,5.9975000000000005
05:30:00,Monday,5.905000000000001
05:45:00,Monday,5.93
06:00:00,Monday,5.6025
06:15:00,Monday,6.685
06:30:00,Monday,7.955
06:45:00,Monday,8.9225
07:00:00,Monday,10.135
07:15:00,Monday,12.9475
07:30:00,Monday,14.327499999999999
07:45:00,Monday,14.407499999999999
08:00:00,Monday,15.355
08:15:00,Monday,16.2175
08:30:00,Monday,18.355
08:45:00,Monday,18.902499999999996
09:00:00,Monday,19.0175
09:15:00,Monday,20.0025
09:30:00,Monday,20.355
09:45:00,Monday,20.3175
10:00:00,Monday,20.8025
10:15:00,Monday,20.765
10:30:00,Monday,21.07
10:45:00,Monday,19.9825
11:00:00,Monday,20.94
11:15:00,Monday,22.1325
11:30:00,Monday,20.6275
11:45:00,Monday,21.4475
12:00:00,Monday,22.092499999999998
The image above is produced using the code from the comment below.

Make sure you have a datetime index using pd.to_datetime when plotting timeseries.
I then used matplotlib.mdates to detect the desired ticks and format them in the plot. I don't know if it can be done from pandas with df.plot.
See matplotlib date tick labels. You can customize the HourLocator or use a different locator to suit your needs. Minor ticks are created the same way with ax.xaxis.set_minor_locator. Hope it helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Using your dataframe
df = pd.read_clipboard(sep=',')
# Make sure you have a datetime index
df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time')
fig, ax = plt.subplots(1,1)
ax.plot(df['kW'])
# Use mdates to detect hours
locator = mdates.HourLocator(byhour=[0,6,12,18])
ax.xaxis.set_major_locator(locator)
# Format x ticks
formatter = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(formatter)
# rotates and right aligns the x labels, and moves the bottom of the axes up to make room for them
fig.autofmt_xdate()

mpl_finance remove empty dates on Candlestick Python

I'm working with a DataFrame. My data is using for a Candlestick.
The problem is I can't remove the weekend dates. I mean, my code shows this:
enter image description here
And I'm looking for this:
enter image description here
Here is my code:
import matplotlib.ticker as ticker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
df = pd.read_csv('AAPL.csv')
df['Date'] = pd.to_datetime(df['Date'])
df["Date"] = df["Date"].apply(mdates.date2num)
dates = df['Date'].tolist()
ohlc = df[['Date', 'Open', 'High', 'Low','Close']]
f1, ax = plt.subplots(figsize = (12,6))
candlestick_ohlc(ax, ohlc.values, width=.5, colorup='green', colordown='red')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1.0))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.setp(ax.get_xticklabels(), rotation=70, fontsize=7)
close = df['Close'].values
plt.plot(dates,close, marker='o')
plt.show()
Dataframe:
Date,Open,High,Low,Close,Adj Close,Volume
2019-02-04,167.410004,171.660004,167.279999,171.250000,170.518677,31495500
2019-02-05,172.860001,175.080002,172.350006,174.179993,173.436157,36101600
2019-02-06,174.649994,175.570007,172.850006,174.240005,173.495911,28239600
2019-02-07,172.399994,173.940002,170.339996,170.940002,170.210007,31741700
2019-02-08,168.990005,170.660004,168.419998,170.410004,170.410004,23820000
2019-02-11,171.050003,171.210007,169.250000,169.429993,169.429993,20993400
2019-02-12,170.100006,171.000000,169.699997,170.889999,170.889999,22283500
2019-02-13,171.389999,172.479996,169.919998,170.179993,170.179993,22490200
2019-02-14,169.710007,171.259995,169.380005,170.800003,170.800003,21835700
2019-02-15,171.250000,171.699997,169.750000,170.419998,170.419998,24626800
2019-02-19,169.710007,171.440002,169.490005,170.929993,170.929993,18972800
2019-02-20,171.190002,173.320007,170.990005,172.029999,172.029999,26114400
2019-02-21,171.800003,172.369995,170.300003,171.059998,171.059998,17249700
2019-02-22,171.580002,173.000000,171.380005,172.970001,172.970001,18913200
2019-02-25,174.160004,175.869995,173.949997,174.229996,174.229996,21873400
2019-02-26,173.710007,175.300003,173.169998,174.330002,174.330002,17070200
2019-02-27,173.210007,175.000000,172.729996,174.869995,174.869995,27835400
2019-02-28,174.320007,174.910004,172.919998,173.149994,173.149994,28215400

This is "NOT" enough solution, but I can suggest something for u.
Just use
import mplfinance as mpf
mpf.plot(df, type='candle')
This ignores non-trading days automatically in the plot and make me happier little bit, though I couldn't be fully-satisfied with. I hope this would help u.
Check this out.
https://github.com/matplotlib/mplfinance#basic-usage

You can slice it from the dataframe before processing
please check this link Remove non-business days rows from pandas dataframe

Do not use date/time as your index but use a candle number as index.
then your data becomes continuously and you have no interruption of the time series.
So use candle number as Index , for plotting the data you need to plot it not with a date/time
If you want plot with a date/time you need to use a column where you have put the timestamp of the candle and put that into a plot .. but then you will have gaps again.

Try to filter your dataframe.
df = df[df.Open.notnull()]
Add this to your plot.
show_nontrading=False

Reading data from csv and create a graph

I have a csv file with data in the following format -
Issue_Type DateTime
Issue1 03/07/2011 11:20:44
Issue2 01/05/2011 12:30:34
Issue3 01/01/2011 09:44:21
... ...
I'm able to read this csv file, but what I'm unable to achieve is to plot a graph or rather trend based on the data.
For instance - I'm trying to plot a graph with X-axis as Datetime(only Month) and Y-axis as #of Issues. So I would show the trend in line-graphy with 3 lines indicating the pattern of issue under each category for the month.
I really don't have a code for plotting the graph and hence can't share any, but so far I'm only reading the csv file. I'm not sure how to proceed further to plot a graph
PS: I'm not bent on using python - Since I've parsed csv using python earlier I though of using the language, but if there is an easier approach using some other language - I would be open explore that as well.

A way to do this is to use dataframes with pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";") #Reads the csv
df.index = pd.to_datetime(df["DateTime"]) #Set the index of the dataframe to the DateTime column
del df["DateTime"] #The DateTime column is now useless
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_Type"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m')) #This will only show the month number on the graph
This assumes that Issue1/2/3 are integers, I assumed they were as I didn't really understand what they were supposed to be.
Edit: This should do the trick then, it's not pretty and can probably be optimised, but it works well:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv("D:\Programmes Python\Data\Data_csv.txt",sep=";")
df.index = pd.to_datetime(df["DateTime"])
del df["DateTime"]
list=[]
for Issue in df["Issue_Type"]:
list.append(int(Issue[5:]))
df["Issue_number"]=list
fig, ax = plt.subplots()
ax.plot(df.index,df["Issue_number"])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m'))
plt.show()

The first thing you need to do is to parse the datetime fields as dates/times. Try using dateutil.parser for that.
Next, you will need to count the number of issues of each type in each month. The naive way to do that would be to maintain lists of lists for each issue type, and just iterate through each column, see which month and which issue type it is, and then increment the appropriate counter.
When you have such a frequency count of issues, sorted by issue types, you can simply plot them against dates like this:
import matplotlib.pyplot as plt
import datetime as dt
dates = []
for year in range(starting_year, ending_year):
for month in range(1, 12):
dates.append(dt.datetime(year=year, month=month, day=1))
formatted_dates = dates.DateFormatter('%b') # Format dates to only show month names
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(issues[0], dates) # To plot just issues of type 1
ax.plot(issues[1], dates) # To plot just issues of type 2
ax.plot(issues[2], dates) # To plot just issues of type 3
ax.xaxis.set_major_formatter(formatted_dates) # Format X tick labels
plt.show()
plt.close()

honestly, I would just use R. check this link out on downloading / setting up R & RStudio.
data <- read.csv(file="c:/yourdatafile.csv", header=TRUE, sep=",")
attach(data)
data$Month <- format(as.Date(data$DateTime), "%m")
plot(DateTime, Issue_Type)

pandas .plot() x-axis tick frequency -- how can I show more ticks?

I am plotting time series using pandas .plot() and want to see every month shown as an x-tick.
Here is the dataset structure
Here is the result of the .plot()
I was trying to use examples from other posts and matplotlib documentation and do something like
ax.xaxis.set_major_locator(
dates.MonthLocator(revenue_pivot.index, bymonthday=1,interval=1))
But that removed all the ticks :(
I also tried to pass xticks = df.index, but it has not changed anything.
What would be the rigth way to show more ticks on x-axis?

No need to pass any args to MonthLocator. Make sure to use x_compat in the df.plot() call per #Rotkiv's answer.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.rand(100,2), index=pd.date_range('1-1-2018', periods=100))
ax = df.plot(x_compat=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.show()
formatted x-axis with set_major_locator
unformatted x-axis

You could also format the x-axis ticks and labels of a pandas DateTimeIndex "manually" using the attributes of a pandas Timestamp object.
I found that much easier than using locators from matplotlib.dates which work on other datetime formats than pandas (if I am not mistaken) and thus sometimes show an odd behaviour if dates are not converted accordingly.
Here's a generic example that shows the first day of each month as a label based on attributes of pandas Timestamp objects:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data
dim = 8760
idx = pd.date_range('1/1/2000 00:00:00', freq='h', periods=dim)
df = pd.DataFrame(np.random.randn(dim, 2), index=idx)
# select tick positions based on timestamp attribute logic. see:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html
positions = [p for p in df.index
if p.hour == 0
and p.is_month_start
and p.month in range(1, 13, 1)]
# for date formatting, see:
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
labels = [l.strftime('%m-%d') for l in positions]
# plot with adjusted labels
ax = df.plot(kind='line', grid=True)
ax.set_xlabel('Time (h)')
ax.set_ylabel('Foo (Bar)')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
plt.show()
yields:
Hope this helps!

The right way to do that described here
Using the x_compat parameter, it is possible to suppress automatic tick resolution adjustment
df.A.plot(x_compat=True)

If you want to just show more ticks, you can also dive deep into the structure of pd.plotting._converter:
dai = ax.xaxis.minor.formatter.plot_obj.date_axis_info
dai['fmt'][dai['fmt'] == b''] = b'%b'
After plotting, the formatter is a TimeSeries_DateFormatter and _set_default_format has been called, so self.plot_obj.date_axis_info is not None. You can now manipulate the structured array .date_axis_info to be to your liking, namely contain less b'' and more b'%b'

Remove tick labels:
ax = df.plot(x='date', y=['count'])
every_nth = 10
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Lower every_nth to include more labels, raise to keep fewer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib & Pandas DateTime Compatibility - python

Related

Matplotlib - highlighting weekends on x axis?

How to display only certain x axis values on plot

mpl_finance remove empty dates on Candlestick Python

Reading data from csv and create a graph

pandas .plot() x-axis tick frequency -- how can I show more ticks?

Categories

Resources