How can I adjust the bounds of the x tick values that are automatically chosen by matplotlib? - python

I have a graph that shows the closing price of a stock throughout a day at each five minute interval. The x axis shows the time and the range of x values is from 9:30 to 4:00 (16:00).
The problem is that the automatic bounds for the x axis go from 9:37 to 16:07 and I really just want it from 9:30 to 16:00.
The code I am currently running is this:
stk = yf.Ticker(ticker)
his = stk.history(interval="5m", start=start, end=end).values.tolist() #open - high - low - close - volume
x = []
y = []
count = 0
five_minutes = datetime.timedelta(minutes = 5)
for bar in his:
x.append((start + five_minutes * count))#.strftime("%H:%M"))
count = count + 1
y.append(bar[3])
plt.clf()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.gca().xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
plt.plot(x, y)
plt.gcf().autofmt_xdate()
plt.show()
And it produces this plot (currently a link because I am on a new user account):
I thought I was supposed to use the axis.set_data_interval function providing, so I did so by providing datetime objects representing 9:30 and 16:00 as the min and the max. This gave me the error:
TypeError: '<' not supported between instances of 'float' and 'datetime.datetime'
Is there another a way for me to be able to adjust the first xtick and still have it automatically fill in the rest?

This problem can be fixed by adjusting the way you use the mdates tick locator. Here is an example based on the one shared by r-beginners to make it comparable. Note that I use the pandas plotting function for convenience. The x_compat=True argument is needed for it to work with mdates:
import pandas as pd # 1.1.3
import yfinance as yf # 0.1.54
import matplotlib.dates as mdates # 3.3.2
# Import data
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D', interval='5m')
# Create pandas plot with appropriately formatted x-axis ticks
ax = his.plot(y='Close', x_compat=True, figsize=(10,5))
ax.xaxis.set_major_locator(mdates.MinuteLocator(byminute=[0, 30]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M', tz=his.index.tz))
ax.legend(frameon=False)
ax.figure.autofmt_xdate(rotation=0, ha='center')

The sample data was created by obtaining Apple's stock price from Yahoo Finance. The desired five-minute interval labels are a list of strings obtained by using the date function to get the start and end times at five-minute intervals.
Based on this, the x-axis is drawn as a graph of the number of five-minute intervals and the closing price, and the x-axis is set to any interval by slicing.
import yfinance as yf
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
ticker = 'AAPL'
stk = yf.Ticker(ticker)
his = stk.history(period='1D',interval="5m")
his.reset_index(inplace=True)
time_rng = pd.date_range('09:30','15:55', freq='5min')
labels = ['{:02}:{:02}'.format(t.hour,t.minute) for t in time_rng]
fig, ax = plt.subplots()
x = np.arange(len(his))
y = his.Close
ax.plot(x,y)
ax.set_xticks(x[::3])
ax.set_xticklabels(labels[::3], rotation=45)
plt.show()

Related

Measurement length for X and Y-axis

I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:

Plot rectangles over datetime axis in matplotlib?

I am trying to manually create a candlestick chart with matplotlib using errorbar for the daily High and Low prices and Rectangle() for the Adjusted Close and Open prices. This question seemed to have all the prerequisites for accomplishing this.
I attempted to use the above very faithfully, but the issue of plotting something over an x-axis of datetime64[ns]'s gave me no end of errors, so I've additionally tried to incorporate the advice here on plotting over datetime.
This is my code so far, with apologies for the messiness:
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.collections import PatchCollection
from matplotlib.patches import Rectangle
def makeCandles(xdata,high,low,adj_close,adj_open,fc='r',ec='None',alpha=0.5):
## Converting datetimes to numerical format matplotlib can understand.
dates = mdates.date2num(xdata)
## Creating default objects
fig,ax = plt.subplots(1)
## Creating errorbar peaks based on high and low prices
avg = (high + low) / 2
err = [high - avg,low - avg]
ax.errorbar(dates,err,fmt='None',ecolor='k')
## Create list for all the error patches
errorboxes = []
## Loop over data points; create "body" of candlestick
## based on adjusted open and close prices
errors=np.vstack((adj_close,adj_open))
errors=errors.T
for xc,yc,ye in zip(dates,avg,errors):
rect = Rectangle((xc,yc-ye[0]),1,ye.sum())
errorboxes.append(rect)
## Create patch collection with specified colour/alpha
pc = PatchCollection(errorboxes,facecolor=fc,alpha=alpha,edgecolor=ec)
## Add collection to axes
ax.add_collection(pc)
plt.show()
With my data looking like
This is what I try to run, first getting a price table from quandl,
import quandl as qd
api = '1uRGReHyAEgwYbzkPyG3'
qd.ApiConfig.api_key = api
data = qd.get_table('WIKI/PRICES', qopts = { 'columns': ['ticker', 'date', 'high','low','adj_open','adj_close'] }, \
ticker = ['AMZN', 'XOM'], date = { 'gte': '2014-01-01', 'lte': '2016-12-31' })
data.reset_index(inplace=True,drop=True)
makeCandles(data['date'],data['high'],data['low'],data['adj_open'],data['adj_close'])
The code runs with no errors, but outputs an empty graph. So what I am asking for is advice on how to plot these rectangles over the datetime dates. For the width of the rectangles, I simply put a uniform "1" bec. I am not aware of a simple way to specify the datetime width of a rectangle.
Edit
This is the plot I am currently getting, having transformed my xdata into matplotlib mdates:
Before I transformed xdata via mdates, with just xdata as my x-axis everywhere, this was one of the errors I kept getting:
To get the plot you want, there's a couple of things that need to be considered. First you're retrieving to stocks AMZN and XOM, displaying both will make the chart you want look funny, because the data are quite far apart. Second, candlestick charts in which you plot each day for several years will get very crowded. Finally, you need to format your ordinal dates back on the x-axis.
As mentioned in the comments, you can use the pre-built matplotlib candlestick2_ohlc function (although deprecated) accessible through mpl_finance, install as shown in this answer. I opted for using solely the matplotlib barchart with built-in errorbars.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import quandl as qd
from matplotlib.dates import DateFormatter, WeekdayLocator, \
DayLocator, MONDAY
# get data
api = '1uRGReHyAEgwYbzkPyG3'
qd.ApiConfig.api_key = api
data = qd.get_table('WIKI/PRICES', qopts={'columns': ['ticker', 'date', 'high', 'low', 'open', 'close']},
ticker=['AMZN', 'XOM'], date={'gte': '2014-01-01', 'lte': '2014-03-10'})
data.reset_index(inplace=True, drop=True)
fig, ax = plt.subplots(figsize = (10, 5))
data['date'] = mdates.date2num(data['date'].dt.to_pydatetime()) #convert dates to ordinal
tickers = list(set(data['ticker'])) # unique list of stock names
for stock_ind in tickers:
df = data[data['ticker'] == 'AMZN'] # select one, can do more in a for loop, but it will look funny
inc = df.close > df.open
dec = df.open > df.close
ax.bar(df['date'][inc],
df['open'][inc]-df['close'][inc],
color='palegreen',
bottom=df['close'][inc],
# this yerr is confusing when independent error bars are drawn => (https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.errorbar)
yerr = [df['open'][inc]-df['high'][inc], -df['open'][inc]+df['low'][inc]],
error_kw=dict(ecolor='gray', lw=1))
ax.bar(df['date'][dec],
df['close'][dec]-df['open'][dec],
color='salmon', bottom=df['open'][dec],
yerr = [df['close'][dec]-df['high'][dec], -df['close'][dec]+df['low'][dec]],
error_kw=dict(ecolor='gray', lw=1))
ax.set_title(stock_ind)
#some tweaking, setting the dates
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
dayFormatter = DateFormatter('%d') # e.g., 12
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
ax.set_ylabel('monies ($)')
plt.show()

How do I display even intervals on both axes using matplotlib?

This code plots the data exactly as I want with the dates on the x-axis and the times on the y-axis. However I want the y-axis to show every hour on the hour (e.g., 00, 01, ... 23) and the x-axis to show the beginning of every month at an angle so there's no overlap (the actual data being used spans over a year) and only once, since this code repeats the same months. How is this accomplished?
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time, '.')
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
plt.show()
UPDATE: This fixes the x axis.
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
However, this attempt to fix the y axis just makes it blank.
# Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
I'm reading these docs but not understanding my error: https://matplotlib.org/api/dates_api.html, https://matplotlib.org/api/ticker_api.html
Matplotlib cannot plot times without corresponding date. This would make is necessary to add some arbitrary date (in the below case I took the 1st of january 2018) to the times. One may use datetime.datetime.combine for that purpose.
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
Then the code from the question using HourLocator() would work fine. Finally, setting the limits on the axes would also require to use datetime objects,
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27',
'2018-02-04 11:55:09']
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
timetodatetime = lambda x:dt.datetime.combine(dt.date(2018, 1, 1), x)
time = list(map(timetodatetime, data.time))
ax.plot(data.date, time, '.')
# Monthly intervals on x axis
months = mdates.MonthLocator()
d_fmt = mdates.DateFormatter('%Y-%m')
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(d_fmt)
## Hourly intervals on y axis
hours = mdates.HourLocator()
t_fmt = mdates.DateFormatter('%H')
ax.yaxis.set_major_locator(hours)
ax.yaxis.set_major_formatter(t_fmt)
ax.set_ylim([dt.datetime(2018,1,1,0), dt.datetime(2018,1,2,0)])
plt.show()

Plotting a times series using matplotlib with 24 hours on the y-axis

If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)

Matplotlib pyplot - tick control and showing date

My matplotlib pyplot has too many xticks - it is currently showing each year and month for a 15-year period, e.g. "2001-01", but I only want the x-axis to show the year (e.g. 2001).
The output will be a line graph where x-axis shows dates and the y-axis shows the sale and rent prices.
# Defining the variables
ts1 = prices['Month'] # eg. "2001-01" and so on
ts2 = prices['Sale']
ts3 = prices['Rent']
# Reading '2001-01' as year and month
ts1 = [dt.datetime.strptime(d,'%Y-%m').date() for d in ts1]
plt.figure(figsize=(13, 9))
# Below is where it goes wrong. I don't know how to set xticks to show each year.
plt.xticks(ts1, rotation='vertical')
plt.xlabel('Year')
plt.ylabel('Price')
plt.plot(ts1, ts2, 'r-', ts1, ts3, 'b.-')
plt.gcf().autofmt_xdate()
plt.show()
Try removing the plt.xticks function call altogether. matplotlib will then use the default AutoDateLocator function to find the optimum tick locations.
Alternatively if the default includes some months which you don't want then you can use matplotlib.dates.YearLocator which will force the ticks to be years only.
You can set the locator as shown below in a quick example:
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
import numpy as np
import datetime as dt
x = [dt.datetime.utcnow() + dt.timedelta(days=i) for i in range(1000)]
y = range(len(x))
plt.plot(x, y)
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator)
plt.gcf().autofmt_xdate()
plt.show()
You can do this with plt.xticks.
As an example, here I have set the xticks frequency to display every three indices. In your case, you would probably want to do so every twelve indices.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(10)
y = np.random.randn(10)
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 3))
plt.show()
In your case, since you are using dates, you can replace the argument of the second to last line above with something like ts1[0::12], which will select every 12th element from ts1 or np.arange(0, len(dates), 12) which will select every 12th index corresponding to the ticks you want to show.

Categories

Resources