Problem: I am trying to make a very simple bar chart in Matplotlib of a Pandas DataFrame. The DateTime index is causing confusion, however: Matplotlib does not appear to understand the Pandas DateTime, and is labeling the years incorrectly. How can I fix this?
Code
# Make date time series
index_dates = pd.date_range('2018-01-01', '2021-01-01')
# Make data frame with some random data, using the date time index
df = pd.DataFrame(index=index_dates,
data = np.random.rand(len(index_dates)),
columns=['Data'])
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
df.plot.bar(ax=ax)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Instead of showing up as 2018-2021, however, the years show up as 1970 - 1973.
I've already looked at the answers here, here, and documentation here. I know the date timeindex is in fact a datetime index because when I call df.info() it shows it as a datetime index, and when I call index_dates[0].year it returns 2018. How can I fix this? Thank you!
The problem is with mixing df.plot.bar and matplotlib here.
df.plot.bar sets tick locations starting from 0 (and assigns labels), while matplotlib.dates expects the locations to be the number of days since 1970-01-01 (more info here).
If you do it with matplotlib directly, it shows labels correctly:
# Make a bar chart in marplot lib
fig, ax = plt.subplots(figsize=(12,8))
plt.bar(x=df.index, height=df['Data'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Output:
I want to reduce the xlim label because i'm using datetime information and that take long space of the xlim. The problem it's when i want to read that
So i need some like to scale that, i think
dates = pd.read_csv("EURUSDtest.csv")
dates = dates["Date"]+" " + dates["Time"]
plt.title("EUR/USD")
plt.plot(dates, data_pred)
plt.xticks(rotation="vertical")
plt.tick_params(labelsize=10)
plt.plot(forecasting)
The problem...
IIUC: You need to convert the dates column to pandas datetime type by calling pd.to_datetime.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# To reproduce the issue you have lets create a date column as string
df = pd.DataFrame({"Dates":pd.date_range(start='2018-1-1', end='2019-1-1', freq='15MIN').strftime("%m-%d-%Y %H-%M-%S")})
# Convert the date string to date type
df["Dates"] = pd.to_datetime(df["Dates"])
# Add column to assign some dummy values
df = df.assign(VAL=np.linspace(10, 110, len(df)))
# Plot the graph
# Now the graph automatically adjusts the XLIM based on the size of the graph
plt.title("eur/usd")
plt.plot(df["Dates"], df["VAL"])
plt.xticks(rotation="vertical")
plt.show()
However if you need to further control xlim to your needs you need to go through matplotlib tutorials.
I'm working with a DataFrame. My data is using for a Candlestick.
The problem is I can't remove the weekend dates. I mean, my code shows this:
enter image description here
And I'm looking for this:
enter image description here
Here is my code:
import matplotlib.ticker as ticker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_finance import candlestick_ohlc
df = pd.read_csv('AAPL.csv')
df['Date'] = pd.to_datetime(df['Date'])
df["Date"] = df["Date"].apply(mdates.date2num)
dates = df['Date'].tolist()
ohlc = df[['Date', 'Open', 'High', 'Low','Close']]
f1, ax = plt.subplots(figsize = (12,6))
candlestick_ohlc(ax, ohlc.values, width=.5, colorup='green', colordown='red')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1.0))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.setp(ax.get_xticklabels(), rotation=70, fontsize=7)
close = df['Close'].values
plt.plot(dates,close, marker='o')
plt.show()
Dataframe:
Date,Open,High,Low,Close,Adj Close,Volume
2019-02-04,167.410004,171.660004,167.279999,171.250000,170.518677,31495500
2019-02-05,172.860001,175.080002,172.350006,174.179993,173.436157,36101600
2019-02-06,174.649994,175.570007,172.850006,174.240005,173.495911,28239600
2019-02-07,172.399994,173.940002,170.339996,170.940002,170.210007,31741700
2019-02-08,168.990005,170.660004,168.419998,170.410004,170.410004,23820000
2019-02-11,171.050003,171.210007,169.250000,169.429993,169.429993,20993400
2019-02-12,170.100006,171.000000,169.699997,170.889999,170.889999,22283500
2019-02-13,171.389999,172.479996,169.919998,170.179993,170.179993,22490200
2019-02-14,169.710007,171.259995,169.380005,170.800003,170.800003,21835700
2019-02-15,171.250000,171.699997,169.750000,170.419998,170.419998,24626800
2019-02-19,169.710007,171.440002,169.490005,170.929993,170.929993,18972800
2019-02-20,171.190002,173.320007,170.990005,172.029999,172.029999,26114400
2019-02-21,171.800003,172.369995,170.300003,171.059998,171.059998,17249700
2019-02-22,171.580002,173.000000,171.380005,172.970001,172.970001,18913200
2019-02-25,174.160004,175.869995,173.949997,174.229996,174.229996,21873400
2019-02-26,173.710007,175.300003,173.169998,174.330002,174.330002,17070200
2019-02-27,173.210007,175.000000,172.729996,174.869995,174.869995,27835400
2019-02-28,174.320007,174.910004,172.919998,173.149994,173.149994,28215400
This is "NOT" enough solution, but I can suggest something for u.
Just use
import mplfinance as mpf
mpf.plot(df, type='candle')
This ignores non-trading days automatically in the plot and make me happier little bit, though I couldn't be fully-satisfied with. I hope this would help u.
Check this out.
https://github.com/matplotlib/mplfinance#basic-usage
You can slice it from the dataframe before processing
please check this link Remove non-business days rows from pandas dataframe
Do not use date/time as your index but use a candle number as index.
then your data becomes continuously and you have no interruption of the time series.
So use candle number as Index , for plotting the data you need to plot it not with a date/time
If you want plot with a date/time you need to use a column where you have put the timestamp of the candle and put that into a plot .. but then you will have gaps again.
Try to filter your dataframe.
df = df[df.Open.notnull()]
Add this to your plot.
show_nontrading=False
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
I am plotting time series using pandas .plot() and want to see every month shown as an x-tick.
Here is the dataset structure
Here is the result of the .plot()
I was trying to use examples from other posts and matplotlib documentation and do something like
ax.xaxis.set_major_locator(
dates.MonthLocator(revenue_pivot.index, bymonthday=1,interval=1))
But that removed all the ticks :(
I also tried to pass xticks = df.index, but it has not changed anything.
What would be the rigth way to show more ticks on x-axis?
No need to pass any args to MonthLocator. Make sure to use x_compat in the df.plot() call per #Rotkiv's answer.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.rand(100,2), index=pd.date_range('1-1-2018', periods=100))
ax = df.plot(x_compat=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.show()
formatted x-axis with set_major_locator
unformatted x-axis
You could also format the x-axis ticks and labels of a pandas DateTimeIndex "manually" using the attributes of a pandas Timestamp object.
I found that much easier than using locators from matplotlib.dates which work on other datetime formats than pandas (if I am not mistaken) and thus sometimes show an odd behaviour if dates are not converted accordingly.
Here's a generic example that shows the first day of each month as a label based on attributes of pandas Timestamp objects:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data
dim = 8760
idx = pd.date_range('1/1/2000 00:00:00', freq='h', periods=dim)
df = pd.DataFrame(np.random.randn(dim, 2), index=idx)
# select tick positions based on timestamp attribute logic. see:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html
positions = [p for p in df.index
if p.hour == 0
and p.is_month_start
and p.month in range(1, 13, 1)]
# for date formatting, see:
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
labels = [l.strftime('%m-%d') for l in positions]
# plot with adjusted labels
ax = df.plot(kind='line', grid=True)
ax.set_xlabel('Time (h)')
ax.set_ylabel('Foo (Bar)')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
plt.show()
yields:
Hope this helps!
The right way to do that described here
Using the x_compat parameter, it is possible to suppress automatic tick resolution adjustment
df.A.plot(x_compat=True)
If you want to just show more ticks, you can also dive deep into the structure of pd.plotting._converter:
dai = ax.xaxis.minor.formatter.plot_obj.date_axis_info
dai['fmt'][dai['fmt'] == b''] = b'%b'
After plotting, the formatter is a TimeSeries_DateFormatter and _set_default_format has been called, so self.plot_obj.date_axis_info is not None. You can now manipulate the structured array .date_axis_info to be to your liking, namely contain less b'' and more b'%b'
Remove tick labels:
ax = df.plot(x='date', y=['count'])
every_nth = 10
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Lower every_nth to include more labels, raise to keep fewer.