Data - we import historical yields of the ten and thirty year Treasury and calculate the spread (difference) between the two (this block of code is good; feel free so skip):
#Import statements
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
#Constants
start_date = "2018-01-01"
end_date = "2023-01-01"
#Pull in data
tenYear_master = yf.download('^TNX', start_date, end_date)
thirtyYear_master = yf.download('^TYX', start_date, end_date)
#Trim DataFrames to only include 'Adj Close columns'
tenYear = tenYear_master['Adj Close'].to_frame()
thirtyYear = thirtyYear_master['Adj Close'].to_frame()
#Rename columns
tenYear.rename(columns = {'Adj Close' : 'Adj Close - Ten Year'}, inplace= True)
thirtyYear.rename(columns = {'Adj Close' : 'Adj Close - Thirty Year'}, inplace= True)
#Join DataFrames
data = tenYear.join(thirtyYear)
#Add column for difference (spread)
data['Spread'] = data['Adj Close - Thirty Year'] - data['Adj Close - Ten Year']
data
This block is also good.
'''Plot data'''
#Delete top, left, and right borders from figure
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.left'] = False
plt.rcParams['axes.spines.right'] = False
fig, ax = plt.subplots(figsize = (15,10))
data.plot(ax = ax, secondary_y = ['Spread'], ylabel = 'Yield', legend = False);
'''Change left y-axis tick labels to percentage'''
left_yticks = ax.get_yticks().tolist()
ax.yaxis.set_major_locator(mticker.FixedLocator(left_yticks))
ax.set_yticklabels((("%.1f" % tick) + '%') for tick in left_yticks);
#Add legend
fig.legend(loc="upper center", ncol = 3, frameon = False)
fig.tight_layout()
plt.show()
I have questions concerning two features of the graph that I want to customize:
The x-axis currently has a tick and tick label for every year. How can I change this so that there is a tick and tick label for every 3 months in the form MMM-YY? (see picture below)
The spread was calculated as thirty year yield - ten year yield. Say I want to change the RIGHT y-axis tick labels so that their sign is flipped, but I want to leave both the original data and curves alone (for the sake of argument; bear with me, there is logic underlying this). In other words, the right y-axis tick labels currently go from -0.2 at the bottom to 0.8 at the top. How can I change them so that they go from 0.2 at the bottom to -0.8 at the top without changing anything about the data or curves? This is purely a cosmetic change of the right y-axis tick labels.
I tried doing the following:
'''Change right y-axis tick labels'''
right_yticks = (ax.right_ax).get_yticks().tolist()
#Loop through and multiply each right y-axis tick label by -1
for index, value in enumerate(right_yticks):
right_yticks[index] = value*(-1)
(ax.right_ax).yaxis.set_major_locator(mticker.FixedLocator(right_yticks))
(ax.right_ax).set_yticklabels(right_yticks)
But I got this:
Note how the right y-axis is incomplete.
I'd appreciate any help. Thank you!
Let's create some data:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
days = np.array(["2022-01-01", "2022-07-01", "2023-02-15", "2023-11-15", "2024-03-03"],
dtype = "datetime64")
val = np.array([20, 20, -10, -10, 10])
For the date in the x-axis, we import matplotlib.dates, which provides the month locator and the date formater. The locator sets the ticks each 3 months, and the formater sets the way the labels are displayed (month-00).
For the y-axis data, you require changing the sign of the data (hence the negative sign in ax2.plot(), but you want the curve in the same position, so afterwards you need to invert the axis. And so, the curves in both plots are identical, but the y-axis values have different signs and directions.
fig, (ax1, ax2) = plt.subplots(figsize = (10,5), nrows = 2)
ax1.plot(days, val, marker = "x")
# set the locator to Jan, Apr, Jul, Oct
ax1.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
# change the sign of the y data plotted
ax2.plot(days, -val, marker = "x")
#invert the y axis
ax2.invert_yaxis()
# set the locator to Jan, Apr, Jul, Oct
ax2.xaxis.set_major_locator(mdates.MonthLocator( bymonth = (1, 4, 7, 10) ))
# set the formater for month-year, with lower y to show only two digits
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%b-%y"))
plt.show()
Related
Checking I am not re-inventing the wheel here, with mplfinance I would like to have ticks on the x-axis every 15 minutes, but only if data exists.
Plotting direct to mplfinance (without returnfig=True) the plot is ok except the xaxis which values are not time aligned they depend on the first element of the dataframe as to what time gets used.
To try and make this have grid/tick every hour, I have the below code which works ok when there are the expected panda indexes in my dataset:
start_date = pd.to_datetime('2021-12-21 04:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 19:55').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
ticks = pd.date_range(start_date, end_date, freq='15T')
ticklocations = [ df5trimmed.index.get_loc(tick) for tick in ticks ]
ticklabels = [ tick.time().strftime('%H:%M') for tick in ticks ]
fig, axlist = mpf.plot(df5trimmed,style='yahoo', addplot=plotsToAdd, figsize=(48,24),
type='candlestick', volume=True, xrotation=0,
tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(ticklabels)
However it blows up, expectedly, with index not found during the iterator df5trimmed.index.get_loc(tick) for tick in ticks when no data exists in the pandaframe for that tick.
Notice the discontinuities in the data below it blows up during 17:00 key access attempt, as it doesn't exist in my data:
Essentially I am looking to plot the lines aligned to n minutes (in the example below 15 minutes), but only if it exists and not otherwise (if it doesn't exist, I am ok with the bars being right next to one another)... in summary during regulary trading hours with liqudity (where there would be data points) would be ticks # 08:15, 08:30.
Is there an argument in mplfinance that can do this?
What I am looking to achieve
The below is from tradingview, note the aligned time ticks every 15 minutes during regular trading hours and pretty much the entire plot.
Additional Info - source data and what is plotted
The below uses this csv data, and plots directly to mplfinance, you can see the time ticks are not aligned to the hour I get 04:00, 06:25, 08:10, 09:50 etc:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, xrotation=0, tight_layout=True, returnfig=True)
# Display:
mpf.show()
I used the following code to add the ticks, but only if they exist - I suspect much nicer ways of writing this so open to reposts of this as alternative answers.
My next iteration will be to make the text with 0-degree rotation, and not cluttered should a label be overwritten by another.
import mplfinance as mpf
import pandas as pd
import datetime
def getTimestampTickFrequency(df):
# get most common interval in minutes
mode = df.index.to_series().diff().astype('timedelta64[m]').astype('Int64').mode()[0]
if mode==5:
return 15, 3 # for 5 minutes, tick every 15 mins
elif mode==15:
return 60, 12 # for 15 minute data, tick every hour
elif mode==120:
return 240, 48 # for hourly data, tick every 2 hours
return mode
def getTickLocationsAndLabels(df):
tickLocations = []
tickLabels = []
tickFrequencyMinutes, samplesBetweenTicks = getTimestampTickFrequency(df)
entireTickRange = pd.date_range(start=df.index[0], end=df.index[-1], freq=f'{tickFrequencyMinutes}T')
prevTimestamp = df.index[0]
# get indexes of data frame that match the ticks, if they exist
for tick in entireTickRange:
print(tick)
try:
found = df.index.get_loc(tick)
currentTimestamp = df.index[found]
timestampDifference = (currentTimestamp - prevTimestamp).total_seconds() / 60
print(f'Delta last time stamp = {timestampDifference}')
#if timestampDifference <= tickFrequencyMinutes:
tickLocations.append(found)
tickLabels.append(tick.time().strftime('%H:%M'))
prevTimestamp = currentTimestamp
except KeyError:
pass # ignore
return tickLocations, tickLabels
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
tickLocations, tickLabels = getTickLocationsAndLabels(df5trimmed)
fig, axlist = mpf.plot(df5trimmed,style='yahoo', figsize=(48,24), type='candlestick', volume=True, tight_layout=True, returnfig=True)
axlist[-2].xaxis.set_ticks(tickLocations)
axlist[-2].set_xticklabels(tickLabels)
# Display:
mpf.show()
So, what need is to remove gaps from the plot due to missing bars.
Essentially I am looking to plot the lines aligned to n minutes (in
the example below 15 minutes), but only if it exists and not otherwise
From the code you posted to try, but zooming in the first 30 bars and adding the option show_nontrading=True:
import mplfinance as mpf
import pandas as pd
import datetime
df5trimmed = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
fig, axlist = mpf.plot(df5trimmed[0:30],
style='yahoo',
figsize=(48,24),
type='candle',
volume=True,
xrotation=0,
tight_layout=True,
returnfig=True,
show_nontrading=True)
# Display:
mpf.show()
I get this, which is showing the gaps you mentioned.
But if I set the option show_nontrading=False
This changes to the plot below, which removes the gaps corresponding to the missing bars.
Isn't this what you needed?
Please check if this solves your issue. I think it does.
Check the date range between 6:00 am and 7:00 AM. Few bars are plotted between 5:00 and 6:00 and 6:30 is missing.
import mplfinance as mpf
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
# DATA PREPARATION
df5 = pd.read_csv('https://pastebin.com/raw/SgpargBb', index_col=0, parse_dates=True)
start_date = pd.to_datetime('2021-12-21 1:00').tz_localize('America/New_York')
end_date = pd.to_datetime('2021-12-21 7:00').tz_localize('America/New_York')
df5trimmed = df5.truncate(before=start_date, after=end_date)
# PLOTTING
fig, axlist= mpf.plot(df5trimmed,
style='yahoo',
type='candlestick',
volume=True,
xrotation=0,
figsize=(20,10),
returnfig=True,
show_nontrading=False)
# x-tick labeling preparation
# the frequency can be ajusted but needs to be lower than original data
idx = pd.date_range(start_date, end_date, freq='30T', tz='America/New_York')
df_label_idx = pd.DataFrame(index=idx)
# this merge does the trick: the output is the intersection between the lower frequency and the
# higher freq time series. The inner option leaves in the output only those rows present in both TS
# dropping from the lower freq TS those missing periods in the higher freq TS.
df_label = pd.merge(df_label_idx, df5trimmed, how='inner', left_index=True, right_index=True ).tz_convert('America/New_York')
# Tick labels are generated based on df_label
tick_labels = list(df_label.index.strftime('%H:%M'))
ticklocations = [df5trimmed.index.get_loc(tick) for tick in df_label.index ]
axlist[-2].xaxis.set_ticks(ticklocations)
axlist[-2].set_xticklabels(tick_labels)
mpf.show()
print(tick_labels)
df_label
df5trimmed['2021-12-21 05:00:00-05:00':'2021-12-21 07:00:00-05:00']
I've been trying to plot my data on a line chart, and I expect it to show dates on the horizontal axis, i used index_col to set the index as date but that returns an empty dataframe.. can some one help please
data = pd.read_csv('good_btc_dataset.csv', warn_bad_lines= True,
index_col= ['date'])
data.dropna(inplace=True)
data.index = range(3169)
data.head()
I expect my chart to show dates on the horizontal axis but all it shows is numbers
thanks in advance
I recommend you to check this script (it is a copy and paste from the documentation). I think you just need to adapt your own data.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
# Load a numpy structured array from yahoo csv data with fields date, open,
# close, volume, adj_close from the mpl-data/example directory. This array
# stores the date as an np.datetime64 with a day unit ('D') in the 'date'
# column.
with cbook.get_sample_data('goog.npz') as datafile:
data = np.load(datafile)['price_data']
fig, ax = plt.subplots()
ax.plot('date', 'adj_close', data=data)
# format the ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(years_fmt)
ax.xaxis.set_minor_locator(months)
# round to nearest years.
datemin = np.datetime64(data['date'][0], 'Y')
datemax = np.datetime64(data['date'][-1], 'Y') + np.timedelta64(1, 'Y')
ax.set_xlim(datemin, datemax)
# format the coords message box
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = lambda x: '$%1.2f' % x # format the price.
ax.grid(True)
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
This code plots the data exactly as I want with the dates on the x-axis and the times on the y-axis. However I want the y-axis to show every hour on the hour (e.g., 00, 01, ... 23) and the x-axis to show the beginning of every month at an angle so there's no overlap (the actual data being used spans over a year) and only once, since this code repeats the same months. How is this accomplished?
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time, '.')
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
plt.show()
UPDATE: This fixes the x axis.
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
However, this attempt to fix the y axis just makes it blank.
# Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
I'm reading these docs but not understanding my error: https://matplotlib.org/api/dates_api.html, https://matplotlib.org/api/ticker_api.html
Matplotlib cannot plot times without corresponding date. This would make is necessary to add some arbitrary date (in the below case I took the 1st of january 2018) to the times. One may use datetime.datetime.combine for that purpose.
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
Then the code from the question using HourLocator() would work fine. Finally, setting the limits on the axes would also require to use datetime objects,
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27',
'2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
## Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
plt.show()
I have a DataFrame plot which looks something like this
The index on the plotted DataFrame is as follows
In[102]: res.index
Out[102]:
Index([00:00:00, 00:05:00, 00:10:00, 00:15:00, 00:20:00, 00:25:00, 00:30:00,
00:35:00, 00:40:00, 00:45:00,
...
23:10:00, 23:15:00, 23:20:00, 23:25:00, 23:30:00, 23:35:00, 23:40:00,
23:45:00, 23:50:00, 23:55:00],
dtype='object', name='time', length=288)
The times depicted on the x-axis are in Central Standard Time, and I would like to add a secondary x-axis to the top of the plot with the corresponding UTC times. I'm aware that this will make the numbers go 6:00 ... 22:00 ... 4:00 but that is fine. How can I do so?
My attempts
I know that a second x-axis can be created with the following, and I managed to procure what seems like the xticks from the first x-axis with the following - my next step was going to be to try to determine how to shift these times to UTC - which I am still unsure on ideas for.
ax2 = ax.twiny()
ax2.set_xlabel('UTC time')
local_ticks = ax.get_xticks().tolist()
However upon plotting this rudimentary step, I found that get_xticks() has yielded one extra tick than the original x-axis, and so they are not aligned.
Correspondingly, treating these as seconds and attempting to perform some further manipulations to go seconds->CST->UTC is still going to leave me with the ugly scaling "double white-line" at each tick. Ideally, I want the exact same ticks along the secondary x-axis, except with appropriately labelled UTC times.
Sample generation
If you want to generate a plot like the first one I provided, I threw this together.
import datetime
import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
start_date = datetime.datetime(2016, 9, 5)
end_date = datetime.datetime.now()
dts = []
cur_date = start_date
while cur_date < end_date:
dts.append((cur_date, np.random.uniform(low=0.0, high=12.0)))
cur_date += datetime.timedelta(minutes=np.random.uniform(10, 20))
dts = pd.DataFrame(dts, columns=['timestamp', 'lag'])
dts.index = dts.timestamp
dts.drop('timestamp', axis=1, inplace=True)
cur_time = datetime.datetime(1, 1, 1, 0, 0)
aggs = []
while cur_time < datetime.datetime(1, 1, 1, 23, 59, 0):
aggs.append((cur_time.time(),
dts.between_time(cur_time.time(),
(cur_time + datetime.timedelta(minutes=5)).time(),
include_end=False)['lag'].mean()))
cur_time += datetime.timedelta(minutes=5)
res = pd.DataFrame(aggs, columns = ['time', 'lag'])
res.index = res.time
res.drop('time', axis=1, inplace=True)
ax = res.plot(figsize=(15, 10))
I got it to work by explicitly setting the x-axis tick labels rather than letting matplotlib decide. This can be done with:
ax.set_xticks([2*60*60*i for i in range(0,13)])
This has the added benefit that the listed times aren't random; you can set the increments to be one hour, 2 hours, ... In the above snippet, the 2* part indicates 2 hour increments.
Then, in the rest of the code you need to copy the axis over to ax2, and manually set the labels. Here's how I did that:
# Begin solution
ax.set_xticks([2*60*60*i for i in range(0,13)])
local_ticks = ax.get_xticks().tolist()
labels = [str(datetime.timedelta(seconds=(second+6*60*60))) for second in local_ticks]
ax2 = ax.twiny()
plt.sca(ax2)
ax2.set_xlabel('UTC time')
plt.xticks(local_ticks, labels)
Here's the final image:
Changing the '1 day' part in the above axis can be done manually with string parsing.
Actually, the easiest fix is to simply add ax2.set_xticks(ax.get_xticks()). I think that the problem is that you don't explicity state the xticks for axis 1, but you do for axis 2; by the time the plot is created, the xticks have been recalculated while the xticks for axis 2 remain the same.
I would like to draw a rectangle to indicate a range within the x axis. I can use locators for setting ticks and labels, but I don't seem to succeed using them to draw the rectangle. How could I go about it?!
import datetime as DT
from matplotlib import pyplot as plt
import matplotlib.dates as dates
ddata = [DT.datetime.strptime('2010-02-05', "%Y-%m-%d"),
DT.datetime.strptime('2010-02-19', "%Y-%m-%d"),
DT.datetime.strptime('2010-03-05', "%Y-%m-%d"),
DT.datetime.strptime('2010-03-19', "%Y-%m-%d"),]
values = [123,678,987,345]
d1 = zip(ddata,values)
def nplot(data):
x = [date for (date, value) in data]
y = [value for (date, value) in data]
# Set the stage
fig, ax = plt.subplots()
graph = fig.add_subplot(111)
# Plot the data as a red line with round markers
graph.plot(x,y,'r-o')
days = dates.DayLocator(interval=7) # every week
months = dates.MonthLocator() # every month
# Create locators and ticks
ax.xaxis.set_minor_locator(days)
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d'))
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(dates.DateFormatter('\n\n%b'))
ax.xaxis.grid(True, which="major", linewidth=2)
# Now how do I align a rectangle with specific dates?
gca().add_patch(Rectangle((data[0][0], 1000),
data[2][0], 1000, facecolor='w', alpha=0.9)) # doesn't work
plt.show()
nplot(d1)
With this I get the currently set minor ticks
locs = ax.xaxis.get_minorticklocs()
And with this I write the rectangle. Odd, the location of the left side is a 6-digit float, but the location for the right side is the number of days since the left side. No idea how that works, but it seems to...
gca().add_patch(Rectangle((locs[0], 0), 7, 1000, facecolor='w', alpha=0.9))
And this is what I wanted to do from the beginning: to mark recurring ranges.
locs = ax.xaxis.get_minorticklocs()
loc_len = len(locs)
zloc = zip(locs, [7] * loc_len) # Seven-day loops
for i in zloc[::2]:
gca().add_patch(Rectangle((i[0], 0), i[1], 1000, facecolor='w', alpha=0.9))
However, this won't work if I decide to box months, as each month has a different number of days. #Greg's suggestion of using fill_between is another option, but it will set its limits in relation to the data, not the scale (which is OK, I guess):
xloc = zip(x[:-1], x[1:])
for i in xloc[::2]:
ax.fill_between(i, 0, 1200, facecolor='w', alpha=0.5)
ylim(0, 1200)
plt.show()