I want the X label is like:
00:00 00:30 01:00 01:30 02:00 ...... 23:30
My code:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import random
data = [random.random() for i in range(48)]
times = pd.date_range('16-09-2017', periods=48, freq='30MIN')
fig, ax = plt.subplots(1)
fig.autofmt_xdate()
plt.plot(times, data)
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
But my X-Label looks like this:
Whats the problem?
I have 48 values, each value represents a value for a half hour of a day
You can use the MinuteLocator and explicitly set it for every 0 and 30 minutes.
minlocator = mdates.MinuteLocator(byminute=[0,30])
ax.xaxis.set_major_locator(minlocator)
And to clean it up - remove extraneous tick marks and fill out the empty space.
xticks = ax.get_xticks()
ax.set_xticks(xticks[2:-2]);
hh = pd.Timedelta('30min')
ax.set_xlim(times[0] - hh, times[-1] + hh)
Edit:
As my answer was already accepted but didn't work correctly, I added simplified solutions for matplotlib and pandas
The key is to set x-ticks parameter correctly
In your case it could look like this:
data = [random.random() for i in range(48)]
times = pd.date_range('16-09-2017', periods=48, freq='30MIN')
In both cases, you want to use only hours and minutes:
hour_minutes = times.strftime('%H:%M')
1. Matplotlib solution
plt.figure(figsize=(12,5))
plt.plot(range(len(data)),data)
# .plot(times, data)
plt.xticks(range(len(hour_minutes)), hour_minutes, size='small',
rotation=45, horizontalalignment='center')
plt.show()
2. Pandas solution
# create dataframe from arrays (not neccessary, but nice)
df = pd.DataFrame({'values': data,
'hour_minutes': hour_minutes})
# specify size of plot
value_plot = df.plot(figsize=(12,5), title='Value by Half-hours')
# first set number of ticks
value_plot.set_xticks(df.index)
# and label them after
value_plot.set_xticklabels(df.hour_minutes, rotation=45, size='small')
# get the plot figure and save it
fig = value_plot.get_figure()
fig.savefig('value_plot.png')
But I also like the alternative method that is proposed here :)
Related
I encounter an issue with Matplotlib.dates.DateFormatter :
I want to convert timestamps in Date format which is simple usually with the straftime but when using it on matplotlib i don't have the dynamic position on my graph so I used the md.DateFormatter('%H:%M:%S.%f') to have the X values as a date format with the dynamic index.
The fact is, my dates have too much values, I don't want the nanoseconds but I don't know how to remove them. I searched on StackOverflow to find a solution but applying a date[:-3] won't work as I have a datetime format...
Do you have a solution? It's maybe trivial but can't find any solution right now...
Thanks in advance.
NB : What I call the dynamic index is when you are on the graph and you can see the exact X and Y value of your pointer at the bottom
Here is an applicable example :
df =
timestamp val
0 2022-03-13 03:19:59.999070 X1
1 2022-03-13 03:20:00.004070 X2
2 2022-03-13 03:20:00.009070 X3
3 2022-03-13 03:20:00.014070 X4
And I try to plot this with :
ax=plt.gca()
xfmt = md.DateFormatter('%H:%M:%S.%f')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.setp(ax.get_xticklabels(), rotation=40)
plt.show()
In conclusin, what I want is to remove the 070 in the graph but if I remove it beforehand, DateFormatter will replace it by 000 which is as useless as it was..
If you want to change both the tick labels and the format of the number shown on the interactive status bar, you could define your own function to deliver your desired format, then use a FuncFormatter to display those values on your plot.
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as md
import pandas as pd
# dummy data
ts = pd.date_range("2022-03-13 03:19:59.999070",
"2022-03-13 03:20:00.014070", periods=4)
df = pd.DataFrame({'timestamp': ts, 'val':[0, 1, 2, 3]})
fig, ax = plt.subplots()
# define our own function to drop the last three characters
xfmt = lambda x, pos: md.DateFormatter('%H:%M:%S.%f')(x)[:-3]
# use that function as the major formatter, using FuncFormatter
ax.xaxis.set_major_formatter(plt.FuncFormatter(xfmt))
plt.setp(ax.get_xticklabels(), rotation=40)
ax.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.tight_layout()
plt.show()
Note the matching tick format and status bar format.
If, however, you do not want to change the tick labels, but only change the value on the status bar, we can do that by reassigning the ax.format_coord function, using the a similar idea for the function we defined above, but also adding in the y value for display
For example:
import matplotlib.pyplot as plt
import matplotlib.dates as md
import pandas as pd
# dummy data
ts = pd.date_range("2022-03-13 03:19:59.999070",
"2022-03-13 03:20:00.014070", periods=4)
df = pd.DataFrame({'timestamp': ts, 'val':[0, 1, 2, 3]})
fig, ax = plt.subplots()
xfmt = md.DateFormatter('%H:%M:%S.%f')
xfmt2 = lambda x, y: "x={}, y={:g}".format(xfmt(x)[:-3], y)
# use original formatter here with microseconds
ax.xaxis.set_major_formatter(plt.FuncFormatter(xfmt))
# and the millisecond function here
ax.format_coord = xfmt2
plt.setp(ax.get_xticklabels(), rotation=40)
ax.plot(df.timestamp, df.val, linestyle="-", marker = ".")
plt.tight_layout()
plt.show()
Note the difference between the status bar and the tick formats here.
Checking I am not re-inventing the wheel here, with mplfinance I would like to have ticks on the x-axis every 15 minutes, but only if data exists.
Plotting direct to mplfinance (without returnfig=True) the plot is ok except the xaxis which values are not time aligned they depend on the first element of the dataframe as to what time gets used.
To try and make this have grid/tick every hour, I have the below code which works ok when there are the expected panda indexes in my dataset:
start_date = pd.to_datetime('2021-12-21 04:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 19:55').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
ticks = pd.date_range(start_date, end_date, freq='15T')
ticklocations = [ df5trimmed.index.get_loc(tick) for tick in ticks ]
ticklabels = [ tick.time().strftime('%H:%M') for tick in ticks ]
fig, axlist = mpf.plot(df5trimmed,style='yahoo', addplot=plotsToAdd, figsize=(48,24),
type='candlestick', volume=True, xrotation=0,
tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(ticklabels)
However it blows up, expectedly, with index not found during the iterator df5trimmed.index.get_loc(tick) for tick in ticks when no data exists in the pandaframe for that tick.
Notice the discontinuities in the data below it blows up during 17:00 key access attempt, as it doesn't exist in my data:
Essentially I am looking to plot the lines aligned to n minutes (in the example below 15 minutes), but only if it exists and not otherwise (if it doesn't exist, I am ok with the bars being right next to one another)... in summary during regulary trading hours with liqudity (where there would be data points) would be ticks # 08:15, 08:30.
Is there an argument in mplfinance that can do this?
What I am looking to achieve
The below is from tradingview, note the aligned time ticks every 15 minutes during regular trading hours and pretty much the entire plot.
Additional Info - source data and what is plotted
The below uses this csv data, and plots directly to mplfinance, you can see the time ticks are not aligned to the hour I get 04:00, 06:25, 08:10, 09:50 etc:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, xrotation=0, tight_layout=True, returnfig=True)
# Display:
mpf.show()
I used the following code to add the ticks, but only if they exist - I suspect much nicer ways of writing this so open to reposts of this as alternative answers.
My next iteration will be to make the text with 0-degree rotation, and not cluttered should a label be overwritten by another.
import mplfinance as mpf
import pandas as pd
import datetime
def getTimestampTickFrequency(df):
# get most common interval in minutes
mode = df.index.to_series().diff().astype('timedelta64[m]').astype('Int64').mode()[0]
if mode==5:
return 15, 3 # for 5 minutes, tick every 15 mins
elif mode==15:
return 60, 12 # for 15 minute data, tick every hour
elif mode==120:
return 240, 48 # for hourly data, tick every 2 hours
return mode
def getTickLocationsAndLabels(df):
tickLocations = []
tickLabels = []
tickFrequencyMinutes, samplesBetweenTicks = getTimestampTickFrequency(df)
entireTickRange = pd.date_range(start=df.index[0], end=df.index[-1], freq=f'{tickFrequencyMinutes}T')
prevTimestamp = df.index[0]
# get indexes of data frame that match the ticks, if they exist
for tick in entireTickRange:
print(tick)
try:
found = df.index.get_loc(tick)
currentTimestamp = df.index[found]
timestampDifference = (currentTimestamp - prevTimestamp).total_seconds() / 60
print(f'Delta last time stamp = {timestampDifference}')
#if timestampDifference <= tickFrequencyMinutes:
tickLocations.append(found)
tickLabels.append(tick.time().strftime('%H:%M'))
prevTimestamp = currentTimestamp
except KeyError:
pass # ignore
return tickLocations, tickLabels
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
tickLocations, tickLabels = getTickLocationsAndLabels(df5trimmed)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(tickLocations)
axlist[-2].set_xticklabels(tickLabels)
# Display:
mpf.show()
So, what need is to remove gaps from the plot due to missing bars.
Essentially I am looking to plot the lines aligned to n minutes (in
the example below 15 minutes), but only if it exists and not otherwise
From the code you posted to try, but zooming in the first 30 bars and adding the option show_nontrading=True:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed[0:30],
style='yahoo',
figsize=(48,24),
type='candle',
volume=True,
xrotation=0,
tight_layout=True,
returnfig=True,
show_nontrading=True)
# Display:
mpf.show()
I get this, which is showing the gaps you mentioned.
But if I set the option show_nontrading=False
This changes to the plot below, which removes the gaps corresponding to the missing bars.
Isn't this what you needed?
Please check if this solves your issue. I think it does.
Check the date range between 6:00 am and 7:00 AM. Few bars are plotted between 5:00 and 6:00 and 6:30 is missing.
import mplfinance as mpf
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
# DATA PREPARATION
df5 = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
start_date = pd.to_datetime('2021-12-21 1:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 7:00').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
# PLOTTING
fig, axlist= mpf.plot(df5trimmed,
style='yahoo',
type='candlestick',
volume=True,
xrotation=0,
figsize=(20,10),
returnfig=True,
show_nontrading=False)
# x-tick labeling preparation
# the frequency can be ajusted but needs to be lower than original data
idx = pd.date_range(start_date, end_date, freq='30T', tz='America/New_York')
df_label_idx = pd.DataFrame(index=idx)
# this merge does the trick: the output is the intersection between the lower frequency and the
# higher freq time series. The inner option leaves in the output only those rows present in both TS
# dropping from the lower freq TS those missing periods in the higher freq TS.
df_label = pd.merge(df_label_idx, df5trimmed, how='inner', left_index=True, right_index=True ).tz_convert('America/New_York')
# Tick labels are generated based on df_label
tick_labels = list(df_label.index.strftime('%H:%M'))
ticklocations = [df5trimmed.index.get_loc(tick) for tick in df_label.index ]
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(tick_labels)
mpf.show()
print(tick_labels)
df_label
df5trimmed['2021-12-21 05:00:00-05:00':'2021-12-21 07:00:00-05:00']
I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:
i have x-axis which is in terms of days (366 days Feb was taken as 29 days) but instead I want to convert it in terms of months (Jan - Dec). What should i do...
def plotGraph():
line, point = getXY()
plt.plot(line['xlMax'], c='orangered', alpha=0.5, label = 'Minimum Temperature (2005-14)')
plt.plot(line['xlMin'], c='dodgerblue', alpha=0.5, label = 'Minimum Temperature (2005-14)')
plt.scatter(point['xsMax'].index, point['xsMax'], s = 10, c = 'maroon', label = 'Record Break Minimum (2015)')
plt.scatter(point['xsMin'].index, point['xsMin'], s = 10, c = 'midnightblue', label = 'Record Break Maximum (2015)')
ax1 = plt.gca() # Primary axes
ax1.fill_between(line['xlMax'].index , line['xlMax'], line['xlMin'], facecolor='lightgray', alpha=0.25)
ax1.grid(True, alpha = 1)
for spine in ax1.spines:
ax1.spines[spine].set_visible(False)
ax1.spines['bottom'].set_visible(True)
ax1.spines['bottom'].set_alpha(0.3)
# Removing Ticks
ax1.tick_params(axis=u'both', which=u'both',length=0)
plt.show()
I think the quickest change might be to just set new ticks and tick labels at the starts of months; I found the conversion from day-of-the-year to month here, the first table:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = range(1,367)
y = np.random.rand(len(range(1,367)))
ax.plot(x,y)
month_starts = [1,32,61,92,122,153,183,214,245,275,306,336]
month_names = ['Jan','Feb','Mar','Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec']
ax.set_xticks(month_starts)
ax.set_xticklabels(month_names)
Note I assumed your days were numbered 1 to 366; if they are 0 to 365 you may have to change the range.
But I think usually a better approach is to get your days into some sort of datetime; this is more flexible and usually pretty smart. If say, your days were not confined to one year, it would be more complicated to associate day numbers with months.
This example uses datetime instead of integers. The dates are plotted on the x-axis directly, and then the DateFormatter and MonthLocator from matplotlib.dates are used to format the axis appropriately:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
start = dt.datetime(2016,1,1) #there has to be a year given, even if it isn't plotted
new_dates = [start + dt.timedelta(days=i) for i in range(366)]
fig, ax = plt.subplots()
x = new_dates
y = np.random.rand(len(range(1,367)))
xfmt = mdates.DateFormatter('%b')
months = mdates.MonthLocator()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(xfmt)
ax.plot(x,y)
All I want is quite straight forward, I just want the locator ticks to start at a specified timestamp:
peudo code: locator.set_start_ticking_at( datetime_dummy )
I have no luck finding anything so far.
Here is the portion of the code for this question:
axes[0].set_xlim(datetime_dummy) # datetime_dummy = '2015-12-25 05:34:00'
import matplotlib.dates as matdates
seclocator = matdates.SecondLocator(interval=20)
minlocator = matdates.MinuteLocator(interval=1)
hourlocator = matdates.HourLocator(interval=12)
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
hourlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
axes[0].xaxis.set_major_locator(minlocator)
axes[0].xaxis.set_major_formatter(majorFmt)
plt.setp(axes[0].xaxis.get_majorticklabels(), rotation=90 )
axes[0].xaxis.set_minor_locator(seclocator)
axes[0].xaxis.set_minor_formatter(minorFmt)
plt.setp(axes[0].xaxis.get_minorticklabels(), rotation=90 )
# other codes
# save fig as a picture
The x axis ticks of above code will get me:
How do I tell the minor locator to align with the major locator?
How do I tell the locators which timestamp to start ticking at?
what I have tried:
set_xlim doesn't do the trick
seclocator.tick_values(datetime_dummy, datetime_dummy1) doesn't do anything
Instead of using the interval keyword parameter, use bysecond and byminute to specify exactly which seconds and minutes you with to mark. The bysecond and byminute parameters are used to construct a dateutil rrule. The rrule generates datetimes which match certain specified patterns (or, one might say, "rules").
For example, bysecond=[20, 40] limits the datetimes to those whose seconds
equal 20 or 40. Thus, below, the minor tick marks only appear for datetimes
whose soconds equal 20 or 40.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as matdates
N = 100
fig, ax = plt.subplots()
x = np.arange(N).astype('<i8').view('M8[s]').tolist()
y = (np.random.random(N)-0.5).cumsum()
ax.plot(x, y)
seclocator = matdates.SecondLocator(bysecond=[20, 40])
minlocator = matdates.MinuteLocator(byminute=range(60)) # range(60) is the default
seclocator.MAXTICKS = 40000
minlocator.MAXTICKS = 40000
majorFmt = matdates.DateFormatter('%Y-%m-%d, %H:%M:%S')
minorFmt = matdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_locator(minlocator)
ax.xaxis.set_major_formatter(majorFmt)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90)
ax.xaxis.set_minor_locator(seclocator)
ax.xaxis.set_minor_formatter(minorFmt)
plt.setp(ax.xaxis.get_minorticklabels(), rotation=90)
plt.subplots_adjust(bottom=0.5)
plt.show()
#unutbu: Many thanks: I've been looking everywhere for the answer to a related problem!
#eliu: I've adapted unutbu's excellent answer to demonstrate how you can define lists (to create different 'dateutil' rules) which give you complete control over which x-ticks are displayed. Try un-commenting each example below in turn and play around with the values to see the effect. Hope this helps.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
idx = pd.date_range('2017-01-01 05:03', '2017-01-01 18:03', freq = 'min')
df = pd.Series(np.random.randn(len(idx)), index = idx)
fig, ax = plt.subplots()
# Choose which major hour ticks are displayed by creating a 'dateutil' rule e.g.:
# Only use the hours in an explicit list:
# hourlocator = mdates.HourLocator(byhour=[6,12,8])
# Use the hours in a range defined by: Start, Stop, Step:
# hourlocator = mdates.HourLocator(byhour=range(8,15,2))
# Use every 3rd hour:
# hourlocator = mdates.HourLocator(interval = 3)
# Set the format of the major x-ticks:
majorFmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(hourlocator)
ax.xaxis.set_major_formatter(majorFmt)
#... and ditto to set minor_locators and minor_formatters for minor x-ticks if needed as well)
ax.plot(df.index, df.values, color = 'black', linewidth = 0.4)
fig.autofmt_xdate() # optional: makes 30 deg tilt on tick labels
plt.show()