Using the following example I was able to create a candle stick graph using matplotlib.
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator,\
DayLocator, MONDAY
from matplotlib.finance import quotes_historical_yahoo_ohlc, candlestick_ohlc
# (Year, month, day) tuples suffice as args for quotes_historical_yahoo
date1 = (2004, 2, 1)
date2 = (2004, 4, 12)
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
dayFormatter = DateFormatter('%d') # e.g., 12
quotes = quotes_historical_yahoo_ohlc('INTC', date1, date2)
if len(quotes) == 0:
raise SystemExit
fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
#ax.xaxis.set_minor_formatter(dayFormatter)
#plot_day_summary(ax, quotes, ticksize=3)
candlestick_ohlc(ax, quotes, width=0.6)
ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
plt.show()
And looking at plot.ly's APIs you can publish matplotlib figures online using plot.ly. I added the following to the above code:
import matplotlib.mlab as mlab
import plotly.plotly as py
py.sign_in('xxx', 'xxxx')
plot_url = py.plot_mpl(fig)
plot.ly produced the following graph, if you zoom in to the graph you can see that the graph does not actually show the body of the candle stick just the upper and lower shadows. Did I import the graph incorrectly? or does plot.ly not support candle stick graphs even if they are generated through matplotlib?
Found out as of 5/27/15 Plot.ly does not currently support candle stick.
I actually made a request to the plot.ly devs to add candlestick chart types to the stock plot.ly package. Very surprised it still has yet to be included as a default "type"
For this use case, I was able to build my own OHLC candles by hacking together a pandas DataFrame with a resampled time index using 'ohlc' as the param. Only caveat is you will need a full history of all the asset's trades with timestamps for the DataFrame index to correctly build this type of chart:
newOHLC_dataframe = old_dataframe['Price'].astype(float).resample('15min', how='ohlc')
where the Price key is all your y values. This .resample() will build the candles and figure out max/min/first/last for the given time period all on its own. In my example above, it will give you a new DataFrame consisting of 15min candles. The .astype(float) may or may not be necessary depending on your underlying data.
Related
I am new to Python and learning data visualization using matplotlib.
I am trying to plot Date/Time vs Values using matplotlib from this CSV file:
https://drive.google.com/file/d/1ex2sElpsXhxfKXA4ZbFk30aBrmb6-Y3I/view?usp=sharing
Following is the code snippet which I have been playing around with:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
years = mdates.YearLocator()
months = mdates.MonthLocator()
days = mdates.DayLocator()
hours = mdates.HourLocator()
minutes = mdates.MinuteLocator()
years_fmt = mdates.DateFormatter('%H:%M')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
# format the ticks
ax.xaxis.set_major_locator(minutes)
ax.xaxis.set_major_formatter(years_fmt)
ax.xaxis.set_minor_locator(hours)
datemin = min(data['Date/Time'])
datemax = max(data['Date/Time'])
ax.set_xlim(datemin, datemax)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.format_ydata = lambda x: '%1.2f' % x # format the price.
ax.grid(True)
fig.autofmt_xdate()
plt.show()
The code is plotting the graph but it is not labeling the X-Axis and also giving some unknown values (on mouse over) for x on the bottom right corner as shown in the below screenshot:
Screenshot of matplotlib figure window
Can someone please suggest what changes are needed to plot the x-axis dates and also make the correct values appear when I move the cursor over the graph?
Thanks
I haven't used matplotlib. Instead I used pandas plotting
import pandas as pd
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
ax = data.plot.line(x='Date/Time', y='Discharge')
Here, you need to convert the Date/Time to pandas datetime type.
The main issue you have there is that the date formats are mixed up - your data uses '%d.%m.%Y %H:%M', but you set '%Y.%m.%d %H:%M' and this is why you saw 'rubbish' values in x ticks labels. Anyway the number of lines in your code can be reduced heavily if you convert your Date/Time column to timestamps, ie.:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
plt.style.use('seaborn')
data = pd.read_csv('datafile.csv')
data.sort_values('Date/Time', inplace=True)
data["Date/Time"] = pd.to_datetime(data["Date/Time"], format="%d.%m.%Y %H:%M")
data.sort_values('Date/Time', inplace=True)
fig, ax = plt.subplots()
ax.plot('Date/Time', 'Discharge', data=data)
ax.format_xdata = mdates.DateFormatter('%Y.%m.%d %H:%M')
ax.tick_params(axis='x', rotation=45)
ax.grid(True)
fig.autofmt_xdate()
plt.show()
Note that the format of labels in the plot will depend on the zoom level, so you will need to enlarge a portion of the graph to see hours and minutes in the tick labels, but the cursor locator on the bottom bar of the window should be always displaying the detailed timestamp under the cursor.
I wonder if it's possible to change the measurement milestones for graphs created by pandas. In my code the X-axis stands for time and is measured by month, but the measurement milestones are all over the place.
In the image below, the milestones for the X-axis are 2012M01, 2012M06, 2012M11, 2013M04 and 2013M09.
Is there any way I can choose how long the distance should be between every milestone? For example, to make it so it shows every year or every half year?
This is the code I used for the function making the graph:
def graph(dataframe):
graph = dataframe[["Profit"]].plot()
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
plt.grid(True)
plt.show()
The actual dataframe is just an excel-file with a bunch of months and monetary values in it.
I think the most straight forward is to use matplotlib.dates to format the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def graph(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m') #see https://strftime.org/
major = mdates.MonthLocator([1,7]) #label only Jan and Jul
graph = dataframe[["Profit"]].plot(ax=ax) #link plot to the existing axes
graph.set_title('Statistics')
graph.set_ylabel('Thousand $')
graph.set_xlabel('Time')
graph.xaxis.set_major_locator(major) #set major locator tick on x-axis
graph.xaxis.set_major_formatter(xfmt) #format xtick label
plt.grid(True)
plt.show()
But a key point is you need to have your dates as Python's built-in datetime.date (not datetime.datetime); thanks to this answer. If your dates are str or a different type of datetime, you will need to convert, but there are many resources on SO and elsewhere for doing this like this or this:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
dr = [pd.to_datetime(date).date() for date in df.index] #explicitly converting to datetime with .date()
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
type(df.index.[0])
Out[0]:
datetime.date
Calling graph(df) using the example above gets this plot:
Just to expand on this, here's what happens when the index is pandas.Timestamp instead of datetime.date:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
graph(df)
Out[0]:
The x-axis is improperly formatted:
However, if you are willing to just create the plot directly through matplotlib, rather than pandas (pandas is using matplotlib anyway), this can handle more types of dates:
In[0]:
dr = pd.date_range('01-01-2012', '01-01-2014', freq='1MS')
# dr = [pd.to_datetime(date).date() for date in df.index] #skipping date conversion
df = pd.DataFrame(index=dr, data={'Profit':np.random.rand(25)})
def graph_2(dataframe):
fig, ax = plt.subplots()
xfmt = mdates.DateFormatter('%YM%m')
major = mdates.MonthLocator([1,7])
ax.plot(dataframe.index,dataframe['Profit'], label='Profit')
ax.set_title('Statistics')
ax.set_ylabel('Thousand $')
ax.set_xlabel('Time')
ax.xaxis.set_major_locator(major)
ax.xaxis.set_major_formatter(xfmt)
ax.legend() #legend needs to be added
plt.grid(True)
plt.show()
graph_2(df)
type(df.index[0])
Out[0]:
pandas._libs.tslibs.timestamps.Timestamp
And here is the working graph:
I am trying to manually create a candlestick chart with matplotlib using errorbar for the daily High and Low prices and Rectangle() for the Adjusted Close and Open prices. This question seemed to have all the prerequisites for accomplishing this.
I attempted to use the above very faithfully, but the issue of plotting something over an x-axis of datetime64[ns]'s gave me no end of errors, so I've additionally tried to incorporate the advice here on plotting over datetime.
This is my code so far, with apologies for the messiness:
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.collections import PatchCollection
from matplotlib.patches import Rectangle
def makeCandles(xdata,high,low,adj_close,adj_open,fc='r',ec='None',alpha=0.5):
## Converting datetimes to numerical format matplotlib can understand.
dates = mdates.date2num(xdata)
## Creating default objects
fig,ax = plt.subplots(1)
## Creating errorbar peaks based on high and low prices
avg = (high + low) / 2
err = [high - avg,low - avg]
ax.errorbar(dates,err,fmt='None',ecolor='k')
## Create list for all the error patches
errorboxes = []
## Loop over data points; create "body" of candlestick
## based on adjusted open and close prices
errors=np.vstack((adj_close,adj_open))
errors=errors.T
for xc,yc,ye in zip(dates,avg,errors):
rect = Rectangle((xc,yc-ye[0]),1,ye.sum())
errorboxes.append(rect)
## Create patch collection with specified colour/alpha
pc = PatchCollection(errorboxes,facecolor=fc,alpha=alpha,edgecolor=ec)
## Add collection to axes
ax.add_collection(pc)
plt.show()
With my data looking like
This is what I try to run, first getting a price table from quandl,
import quandl as qd
api = '1uRGReHyAEgwYbzkPyG3'
qd.ApiConfig.api_key = api
data = qd.get_table('WIKI/PRICES', qopts = { 'columns': ['ticker', 'date', 'high','low','adj_open','adj_close'] }, \
ticker = ['AMZN', 'XOM'], date = { 'gte': '2014-01-01', 'lte': '2016-12-31' })
data.reset_index(inplace=True,drop=True)
makeCandles(data['date'],data['high'],data['low'],data['adj_open'],data['adj_close'])
The code runs with no errors, but outputs an empty graph. So what I am asking for is advice on how to plot these rectangles over the datetime dates. For the width of the rectangles, I simply put a uniform "1" bec. I am not aware of a simple way to specify the datetime width of a rectangle.
Edit
This is the plot I am currently getting, having transformed my xdata into matplotlib mdates:
Before I transformed xdata via mdates, with just xdata as my x-axis everywhere, this was one of the errors I kept getting:
To get the plot you want, there's a couple of things that need to be considered. First you're retrieving to stocks AMZN and XOM, displaying both will make the chart you want look funny, because the data are quite far apart. Second, candlestick charts in which you plot each day for several years will get very crowded. Finally, you need to format your ordinal dates back on the x-axis.
As mentioned in the comments, you can use the pre-built matplotlib candlestick2_ohlc function (although deprecated) accessible through mpl_finance, install as shown in this answer. I opted for using solely the matplotlib barchart with built-in errorbars.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import quandl as qd
from matplotlib.dates import DateFormatter, WeekdayLocator, \
DayLocator, MONDAY
# get data
api = '1uRGReHyAEgwYbzkPyG3'
qd.ApiConfig.api_key = api
data = qd.get_table('WIKI/PRICES', qopts={'columns': ['ticker', 'date', 'high', 'low', 'open', 'close']},
ticker=['AMZN', 'XOM'], date={'gte': '2014-01-01', 'lte': '2014-03-10'})
data.reset_index(inplace=True, drop=True)
fig, ax = plt.subplots(figsize = (10, 5))
data['date'] = mdates.date2num(data['date'].dt.to_pydatetime()) #convert dates to ordinal
tickers = list(set(data['ticker'])) # unique list of stock names
for stock_ind in tickers:
df = data[data['ticker'] == 'AMZN'] # select one, can do more in a for loop, but it will look funny
inc = df.close > df.open
dec = df.open > df.close
ax.bar(df['date'][inc],
df['open'][inc]-df['close'][inc],
color='palegreen',
bottom=df['close'][inc],
# this yerr is confusing when independent error bars are drawn => (https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.errorbar)
yerr = [df['open'][inc]-df['high'][inc], -df['open'][inc]+df['low'][inc]],
error_kw=dict(ecolor='gray', lw=1))
ax.bar(df['date'][dec],
df['close'][dec]-df['open'][dec],
color='salmon', bottom=df['open'][dec],
yerr = [df['close'][dec]-df['high'][dec], -df['close'][dec]+df['low'][dec]],
error_kw=dict(ecolor='gray', lw=1))
ax.set_title(stock_ind)
#some tweaking, setting the dates
mondays = WeekdayLocator(MONDAY) # major ticks on the mondays
alldays = DayLocator() # minor ticks on the days
weekFormatter = DateFormatter('%b %d') # e.g., Jan 12
dayFormatter = DateFormatter('%d') # e.g., 12
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
ax.set_ylabel('monies ($)')
plt.show()
If I run the following, it appears to work as expected, but the y-axis is limited to the earliest and latest times in the data. I want it to show midnight to midnight. I thought I could do that with the code that's commented out. But when I uncomment it, I get the correct y-axis, yet nothing plots. Where am I going wrong?
from datetime import datetime
import matplotlib.pyplot as plt
data = ['2018-01-01 09:28:52', '2018-01-03 13:02:44', '2018-01-03 15:30:27', '2018-01-04 11:55:09']
x = []
y = []
for i in range(0, len(data)):
t = datetime.strptime(data[i], '%Y-%m-%d %H:%M:%S')
x.append(t.strftime('%Y-%m-%d')) # X-axis = date
y.append(t.strftime('%H:%M:%S')) # Y-axis = time
plt.plot(x, y, '.')
# begin = datetime.strptime('00:00:00', '%H:%M:%S').strftime('%H:%M:%S')
# end = datetime.strptime('23:59:59', '%H:%M:%S').strftime('%H:%M:%S')
# plt.ylim(begin, end)
plt.show()
Edit: I also noticed that the x-axis isn't right either. The data skips Jan 2, but I want that on the axis so the data is to scale.
This is a dramatically simplified version of code dealing with over a year's worth of data with over 2,500 entries.
If Pandas is available to you, consider this approach:
import pandas as pd
data = pd.to_datetime(data, yearfirst=True)
plt.plot(data.date, data.time)
_=plt.ylim(["00:00:00", "23:59:59"])
Update per comments
X-axis date formatting can be adjusted using the Locator and Formatter methods of the matplotlib.dates module. Locator finds the tick positions, and Formatter specifies how you want the labels to appear.
Sometimes Matplotlib/Pandas just gets it right, other times you need to call out exactly what you want using these extra methods. In this case, I'm not sure why those numbers are showing up, but this code will remove them.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
f, ax = plt.subplots()
data = pd.to_datetime(data, yearfirst=True)
ax.plot(data.date, data.time)
ax.set_ylim(["00:00:00", "23:59:59"])
days = mdates.DayLocator()
d_fmt = mdates.DateFormatter('%m-%d')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(d_fmt)
I have the following persistent problem:
The following code should draw a straight line:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot_date(d,v,"-")
But all I get is a jagged line because "plot_date" somehow fills up the dates in "d" with the weekends.
Is there a way to force matplotlib to take my dates (only business days) as is without filing them up with weekend dates?
>>>d
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
'2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
'2012-01-12', '2012-01-13', '2012-01-16', '2012-01-17',
'2012-01-18', '2012-01-19', '2012-01-20', '2012-01-23',
'2012-01-24', '2012-01-25', '2012-01-26', '2012-01-27',
'2012-01-30', '2012-01-31', '2012-02-01'],
dtype='datetime64[ns]', freq='B')
plot_date does a trick, it converts dates to number of days since 1-1-1 and uses these numbers to plot, then converts the ticks to dates again in order to draw nice tick labels. So using plot_date each day count as 1, business or not.
You can plot your data against a uniform range of numbers but if you want dates as tick labels you need to do it yourself.
d = pd.date_range(start="1/1/2012", end="2/1/2012", freq="B")
v = np.linspace(1,10,len(d))
plt.plot(range(d.size), v)
xticks = plt.xticks()[0]
xticklabels = [(d[0] + x).strftime('%Y-%m-%d') for x in xticks.astype(int)]
plt.xticks(xticks, xticklabels)
plt.autoscale(True, axis='x', tight=True)
But be aware that the labels can be misleading. The segment between 2012-01-02 and 2012-01-09 represents five days, not seven.