I am trying to plot a time-series data by HOUR as shown in the image, but I keep getting this error - Locator attempting to generate 91897 ticks ([15191.0, ..., 19020.0]), which exceeds Locator.MAXTICKS (1000). I have tried all available solutions on StackOverflow for similar problems but still could not get around it, Please help.
Link to image: https://drive.google.com/file/d/1b1PNCqVp7W65ciVPEWELiV2cTiXgBu2V/view?usp=sharing
Link to CSV:
https://drive.google.com/file/d/113kYjsqbyL5wx1j204yK6Wmop4wLsqMQ/view?usp=sharing
Attempted codes:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv('price.csv', delimiter=',')
plt.rcParams['font.size'] = 18
fig, (ax) = plt.subplots(ncols=1, nrows=1, figsize=(20,15))
# Plot the data
ax.plot(df.VOLUME, label = 'volume')
# Set the title
ax.set_title("AUDUSD Hourly Variation\n", fontsize=23)
# Set the Y-Axis label
ax.set_ylabel("\nVolume")
# Set the X-Axis label
ax.set_xlabel('\nHour')
ax.legend() # Plot the legend for axes
# apply locator and formatter to the ticks on the X axis
ax.xaxis.set_major_locator(md.HourLocator(interval = 1)) # X axis will be formatted in Hour
ax.xaxis.set_major_formatter(md.DateFormatter('%H')) # set the date format to the hour shortname
# Set the limits (range) of the X-Axis
ax.set_xlim([pd.to_datetime('2011.08.05', format = '%Y.%m.%d'),
pd.to_datetime('2022.01.28', format = '%Y.%m.%d')])
plt.tight_layout()
plt.show()
Thanks for your assistance.
According to the error message, you are attempting to plot a time range that covers ~4000 hours (19000 - 15000 = 4000), but are only allowed to have max 1000 ticks. Increase the interval to 4 or 5.
ax.xaxis.set_major_locator(md.HourLocator(interval = 5))
Perhaps . doesn't work well as a separator in dates (because it is also used as a decimal separator in numbers). Try:
ax.set_xlim([pd.to_datetime('2011-08-05', format = '%Y-%m-%d'),
pd.to_datetime('2022-01-28', format = '%Y-%m-%d')])
If that doesn't cause the error, then change your input data correspondingly.
Related
Checking I am not re-inventing the wheel here, with mplfinance I would like to have ticks on the x-axis every 15 minutes, but only if data exists.
Plotting direct to mplfinance (without returnfig=True) the plot is ok except the xaxis which values are not time aligned they depend on the first element of the dataframe as to what time gets used.
To try and make this have grid/tick every hour, I have the below code which works ok when there are the expected panda indexes in my dataset:
start_date = pd.to_datetime('2021-12-21 04:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 19:55').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
ticks = pd.date_range(start_date, end_date, freq='15T')
ticklocations = [ df5trimmed.index.get_loc(tick) for tick in ticks ]
ticklabels = [ tick.time().strftime('%H:%M') for tick in ticks ]
fig, axlist = mpf.plot(df5trimmed,style='yahoo', addplot=plotsToAdd, figsize=(48,24),
type='candlestick', volume=True, xrotation=0,
tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(ticklabels)
However it blows up, expectedly, with index not found during the iterator df5trimmed.index.get_loc(tick) for tick in ticks when no data exists in the pandaframe for that tick.
Notice the discontinuities in the data below it blows up during 17:00 key access attempt, as it doesn't exist in my data:
Essentially I am looking to plot the lines aligned to n minutes (in the example below 15 minutes), but only if it exists and not otherwise (if it doesn't exist, I am ok with the bars being right next to one another)... in summary during regulary trading hours with liqudity (where there would be data points) would be ticks # 08:15, 08:30.
Is there an argument in mplfinance that can do this?
What I am looking to achieve
The below is from tradingview, note the aligned time ticks every 15 minutes during regular trading hours and pretty much the entire plot.
Additional Info - source data and what is plotted
The below uses this csv data, and plots directly to mplfinance, you can see the time ticks are not aligned to the hour I get 04:00, 06:25, 08:10, 09:50 etc:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, xrotation=0, tight_layout=True, returnfig=True)
# Display:
mpf.show()
I used the following code to add the ticks, but only if they exist - I suspect much nicer ways of writing this so open to reposts of this as alternative answers.
My next iteration will be to make the text with 0-degree rotation, and not cluttered should a label be overwritten by another.
import mplfinance as mpf
import pandas as pd
import datetime
def getTimestampTickFrequency(df):
# get most common interval in minutes
mode = df.index.to_series().diff().astype('timedelta64[m]').astype('Int64').mode()[0]
if mode==5:
return 15, 3 # for 5 minutes, tick every 15 mins
elif mode==15:
return 60, 12 # for 15 minute data, tick every hour
elif mode==120:
return 240, 48 # for hourly data, tick every 2 hours
return mode
def getTickLocationsAndLabels(df):
tickLocations = []
tickLabels = []
tickFrequencyMinutes, samplesBetweenTicks = getTimestampTickFrequency(df)
entireTickRange = pd.date_range(start=df.index[0], end=df.index[-1], freq=f'{tickFrequencyMinutes}T')
prevTimestamp = df.index[0]
# get indexes of data frame that match the ticks, if they exist
for tick in entireTickRange:
print(tick)
try:
found = df.index.get_loc(tick)
currentTimestamp = df.index[found]
timestampDifference = (currentTimestamp - prevTimestamp).total_seconds() / 60
print(f'Delta last time stamp = {timestampDifference}')
#if timestampDifference <= tickFrequencyMinutes:
tickLocations.append(found)
tickLabels.append(tick.time().strftime('%H:%M'))
prevTimestamp = currentTimestamp
except KeyError:
pass # ignore
return tickLocations, tickLabels
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
tickLocations, tickLabels = getTickLocationsAndLabels(df5trimmed)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(tickLocations)
axlist[-2].set_xticklabels(tickLabels)
# Display:
mpf.show()
So, what need is to remove gaps from the plot due to missing bars.
Essentially I am looking to plot the lines aligned to n minutes (in
the example below 15 minutes), but only if it exists and not otherwise
From the code you posted to try, but zooming in the first 30 bars and adding the option show_nontrading=True:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed[0:30],
style='yahoo',
figsize=(48,24),
type='candle',
volume=True,
xrotation=0,
tight_layout=True,
returnfig=True,
show_nontrading=True)
# Display:
mpf.show()
I get this, which is showing the gaps you mentioned.
But if I set the option show_nontrading=False
This changes to the plot below, which removes the gaps corresponding to the missing bars.
Isn't this what you needed?
Please check if this solves your issue. I think it does.
Check the date range between 6:00 am and 7:00 AM. Few bars are plotted between 5:00 and 6:00 and 6:30 is missing.
import mplfinance as mpf
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
# DATA PREPARATION
df5 = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
start_date = pd.to_datetime('2021-12-21 1:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 7:00').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
# PLOTTING
fig, axlist= mpf.plot(df5trimmed,
style='yahoo',
type='candlestick',
volume=True,
xrotation=0,
figsize=(20,10),
returnfig=True,
show_nontrading=False)
# x-tick labeling preparation
# the frequency can be ajusted but needs to be lower than original data
idx = pd.date_range(start_date, end_date, freq='30T', tz='America/New_York')
df_label_idx = pd.DataFrame(index=idx)
# this merge does the trick: the output is the intersection between the lower frequency and the
# higher freq time series. The inner option leaves in the output only those rows present in both TS
# dropping from the lower freq TS those missing periods in the higher freq TS.
df_label = pd.merge(df_label_idx, df5trimmed, how='inner', left_index=True, right_index=True ).tz_convert('America/New_York')
# Tick labels are generated based on df_label
tick_labels = list(df_label.index.strftime('%H:%M'))
ticklocations = [df5trimmed.index.get_loc(tick) for tick in df_label.index ]
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(tick_labels)
mpf.show()
print(tick_labels)
df_label
df5trimmed['2021-12-21 05:00:00-05:00':'2021-12-21 07:00:00-05:00']
I have date in one column and time in another which I retrieved from database through pandas read_sql. The dataframe looks like below (there are 30 -40 rows in my daaframe). I want to plot them in a time series graph. If I want I should be in a position to convert that to Histogram as well.
COB CALV14
1 2019-10-04 07:04
2 2019-10-04 05:03
3 2019-10-03 16:03
4 2019-10-03 05:15
First I got different errors - like not numeric field to plot etc. After searching a lot,the closest post I could find is : Matplotlib date on y axis
I followed and got some result - However the problem is:
I have to follow number of steps (convert to str then list and then to matplot lib datetime format) before I can plot them. (Please refer the code I am using) There must be a smarter and more precise way to do this.
This does not show the time beside the axis the way they exactly appear in the data frame. (eg it should show 07:03, 05:04 etc)
New to python - will appreciate any help on this.
Code
ob_frame['COB'] = ob_frame.COB.astype(str)
ob_frame['CALV14'] = ob_frame.CALV14.astype(str)
date = ob_frame.COB.tolist()
time = ob_frame.CALV14.tolist()
y = mdates.datestr2num(date)
x = mdates.datestr2num(time)
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x, y)
ax.yaxis_date()
ax.xaxis_date()
fig.autofmt_xdate()
plt.show()
I found the answer to it.I did not need to convert the data retrieved from DB to string type. Rest of the issue I was thought to be getting for not using the right formatting for the tick labels. Here goes the complete code - Posting in case this will help anyone.
In this code I have altered Y and X axis : i:e I plotted dates in x axis and time in Y axis as it looked better.
###### Import all the libraries and modules needed ######
import IN_OUT_SQL as IS ## IN_OUT_SQL.py is the file where the SQL is stored
import cx_Oracle as co
import numpy as np
import Credential as cd # Credentia.py is the File Where you store the DB credentials
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
%matplotlib inline
###### Connect to DB, make the dataframe and prepare the x and y values to be plotted ######
def extract_data(query):
'''
This function takes the given query as input, Connects to the Databse, executes the SQL and
returns the result in a dataframe.
'''
cred = cd.POLN_CONSTR #POLN_CONSTR in the credential file stores the credential in '''USERNAME/PASSWORD#DB_NAME''' format
conn = co.connect(cred)
frame = pd.read_sql(query, con = conn)
return frame
query = IS.OUT_SQL
ob_frame = extract_data(query)
ob_frame.dropna(inplace = True) # Drop the rows with NaN values for all the columns
x = mdates.datestr2num(ob_frame['COB']) #COB is date in "01-MAR-2020" format- convert it to madates type
y = mdates.datestr2num(ob_frame['CALV14']) #CALV14 is time in "21:04" Format- convert it to madates type
###### Make the Timeseries plot of delivery time in y axis vs delivery date in x axis ######
fig, ax = plt.subplots(figsize=(15,8))
ax.clear() # Clear the axes
ax.plot(x, y, 'bo-', color = 'dodgerblue') #Plot the data
##Below two lines are to draw a horizontal line for 05 AM and 07 AM position
plt.axhline(y = mdates.date2num (pd.to_datetime('07:00')), color = 'red', linestyle = '--', linewidth = 0.75)
plt.axhline(y = mdates.date2num (pd.to_datetime('05:00')), color = 'green', linestyle = '--', linewidth = 0.75)
plt.xticks(x,rotation = '75')
ax.yaxis_date()
ax.xaxis_date()
#Below 6 lines are about setting the format with which I want my xor y ticks and their labels to be displayed
yfmt = mdates.DateFormatter('%H:%M')
xfmt = mdates.DateFormatter('%d-%b-%y')
ax.yaxis.set_major_formatter(yfmt)
ax.xaxis.set_major_formatter(xfmt)
ax.yaxis.set_major_locator(mdates.HourLocator(interval=1)) # Every 1 Hour
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1)) # Every 1 Day
####### Name the x,y labels, titles and beautify the plot #######
plt.style.use('bmh')
plt.xlabel('\nCOB Dates')
plt.ylabel('Time of Delivery (GMT/BST as applicable)\n')
plt.title(" Data readiness time against COBs (Last 3 months)\n")
plt.rcParams["font.size"] = "12" #Change the font
# plt.rcParams["font.family"] = "Times New Roman" # Set the font type if needed
plt.tick_params(left = False, bottom = False, labelsize = 10) #Remove ticks, make tick labelsize 10
plt.box(False)
plt.show()
Output:
I am plotting values from a dataframe where time is the x-axis. The time is formatted as 00:00 to 23:45. I only want to display the specific times 00:00, 06:00, 12:00, 18:00 on the x-axis of my plot. How can this be done? I have posted two figures, the first shows the format of my dataframe after setting the index to time. And the second shows my figure. Thank you for your help!
monday.set_index("Time", drop=True, inplace=True)
monday_figure = monday.plot(kind='line', legend = False,
title = 'Monday Average Power consumption')
monday_figure.xaxis.set_major_locator(plt.MaxNLocator(8))
Edit: Adding data as text:
Time,DayOfWeek,kW
00:00:00,Monday,5.8825
00:15:00,Monday,6.0425
00:30:00,Monday,6.0025
00:45:00,Monday,5.7475
01:00:00,Monday,6.11
01:15:00,Monday,5.8025
01:30:00,Monday,5.6375
01:45:00,Monday,5.85
02:00:00,Monday,5.7250000000000005
02:15:00,Monday,5.66
02:30:00,Monday,6.0025
02:45:00,Monday,5.71
03:00:00,Monday,5.7425
03:15:00,Monday,5.6925
03:30:00,Monday,5.9475
03:45:00,Monday,6.380000000000001
04:00:00,Monday,5.65
04:15:00,Monday,5.8725
04:30:00,Monday,5.865
04:45:00,Monday,5.71
05:00:00,Monday,5.6925
05:15:00,Monday,5.9975000000000005
05:30:00,Monday,5.905000000000001
05:45:00,Monday,5.93
06:00:00,Monday,5.6025
06:15:00,Monday,6.685
06:30:00,Monday,7.955
06:45:00,Monday,8.9225
07:00:00,Monday,10.135
07:15:00,Monday,12.9475
07:30:00,Monday,14.327499999999999
07:45:00,Monday,14.407499999999999
08:00:00,Monday,15.355
08:15:00,Monday,16.2175
08:30:00,Monday,18.355
08:45:00,Monday,18.902499999999996
09:00:00,Monday,19.0175
09:15:00,Monday,20.0025
09:30:00,Monday,20.355
09:45:00,Monday,20.3175
10:00:00,Monday,20.8025
10:15:00,Monday,20.765
10:30:00,Monday,21.07
10:45:00,Monday,19.9825
11:00:00,Monday,20.94
11:15:00,Monday,22.1325
11:30:00,Monday,20.6275
11:45:00,Monday,21.4475
12:00:00,Monday,22.092499999999998
The image above is produced using the code from the comment below.
Make sure you have a datetime index using pd.to_datetime when plotting timeseries.
I then used matplotlib.mdates to detect the desired ticks and format them in the plot. I don't know if it can be done from pandas with df.plot.
See matplotlib date tick labels. You can customize the HourLocator or use a different locator to suit your needs. Minor ticks are created the same way with ax.xaxis.set_minor_locator. Hope it helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Using your dataframe
df = pd.read_clipboard(sep=',')
# Make sure you have a datetime index
df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time')
fig, ax = plt.subplots(1,1)
ax.plot(df['kW'])
# Use mdates to detect hours
locator = mdates.HourLocator(byhour=[0,6,12,18])
ax.xaxis.set_major_locator(locator)
# Format x ticks
formatter = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(formatter)
# rotates and right aligns the x labels, and moves the bottom of the axes up to make room for them
fig.autofmt_xdate()
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
I am plotting time series using pandas .plot() and want to see every month shown as an x-tick.
Here is the dataset structure
Here is the result of the .plot()
I was trying to use examples from other posts and matplotlib documentation and do something like
ax.xaxis.set_major_locator(
dates.MonthLocator(revenue_pivot.index, bymonthday=1,interval=1))
But that removed all the ticks :(
I also tried to pass xticks = df.index, but it has not changed anything.
What would be the rigth way to show more ticks on x-axis?
No need to pass any args to MonthLocator. Make sure to use x_compat in the df.plot() call per #Rotkiv's answer.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.rand(100,2), index=pd.date_range('1-1-2018', periods=100))
ax = df.plot(x_compat=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
plt.show()
formatted x-axis with set_major_locator
unformatted x-axis
You could also format the x-axis ticks and labels of a pandas DateTimeIndex "manually" using the attributes of a pandas Timestamp object.
I found that much easier than using locators from matplotlib.dates which work on other datetime formats than pandas (if I am not mistaken) and thus sometimes show an odd behaviour if dates are not converted accordingly.
Here's a generic example that shows the first day of each month as a label based on attributes of pandas Timestamp objects:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data
dim = 8760
idx = pd.date_range('1/1/2000 00:00:00', freq='h', periods=dim)
df = pd.DataFrame(np.random.randn(dim, 2), index=idx)
# select tick positions based on timestamp attribute logic. see:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html
positions = [p for p in df.index
if p.hour == 0
and p.is_month_start
and p.month in range(1, 13, 1)]
# for date formatting, see:
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
labels = [l.strftime('%m-%d') for l in positions]
# plot with adjusted labels
ax = df.plot(kind='line', grid=True)
ax.set_xlabel('Time (h)')
ax.set_ylabel('Foo (Bar)')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
plt.show()
yields:
Hope this helps!
The right way to do that described here
Using the x_compat parameter, it is possible to suppress automatic tick resolution adjustment
df.A.plot(x_compat=True)
If you want to just show more ticks, you can also dive deep into the structure of pd.plotting._converter:
dai = ax.xaxis.minor.formatter.plot_obj.date_axis_info
dai['fmt'][dai['fmt'] == b''] = b'%b'
After plotting, the formatter is a TimeSeries_DateFormatter and _set_default_format has been called, so self.plot_obj.date_axis_info is not None. You can now manipulate the structured array .date_axis_info to be to your liking, namely contain less b'' and more b'%b'
Remove tick labels:
ax = df.plot(x='date', y=['count'])
every_nth = 10
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Lower every_nth to include more labels, raise to keep fewer.