candlestick charts using Matplotlib skip weekends - python

I need some help. I am learning between matplotlib and numpy. I am simply reproducing a piece of code from "Intraday candlestick charts using Matplotlib" with my own csv file to learn from that code. The different part of my code that is different from it is the following:
import numpy as np
import matplotlib.pyplot as plt
import datetime
from mpl_finance import candlestick_ohlc
from matplotlib.dates import num2date
# data in a text file, 5 columns: time, opening, close, high, low
# note that I'm using the time you formated into an ordinal float data =
np.loadtxt("/Users/paul/Documents/python/Quant/INTC.csv", delimiter=",")
I am getting an error that says
ValueError: could not convert string to float: b'Date'.
I even try to use this line and still gives me the same error message
data = np.genfromtxt("/Users/paul/Documents/python/Quant/INTC.csv", delimiter=",", skip_header=1, usecols=[0,1,2,3,4], dtype=(dt, float,float,float, float))"
it's probably a basic concept that I am not understanding. much appriciated for some guidance.
sample data:
Date,Open,High,Low,Close,Volume,Adj Close
2017-11-06,46.599998,46.740002,46.090000,46.700001,46.700001,34035000
2017-11-07,46.700001,47.090000,46.389999,46.779999,46.779999,24461400
2017-11-08,46.619999,46.700001,46.279999,46.700001,46.700001,21565800
2017-11-09,46.049999,46.389999,45.650002,46.299999,46.299999,25570400
2017-11-10,46.040001,46.090000,45.380001,45.580002,45.580002,24095400
2017-11-13,45.259998,45.939999,45.250000,45.750000,45.750000,18999000

You can import "Date column' (string) by converting it to a datetime object. Once you have it as a datetime object you can filter out the weekends as shown below. you can plot the filtered data in matplotlib as matplotlib understands datetime objects. hope that helps.
'''
data in csv
Date,Open,High,Low,Close,Volume,Adj Close
2017-12-06,46.599998,46.740002,46.090000,46.700001,46.700001,34035000
2017-12-07,46.700001,47.090000,46.389999,46.779999,46.779999,24461400
2017-12-08,46.619999,46.700001,46.279999,46.700001,46.700001,21565800
2017-12-09,46.049999,46.389999,45.650002,46.299999,46.299999,25570400
2017-12-10,46.040001,46.090000,45.380001,45.580002,45.580002,24095400
2017-12-13,45.259998,45.939999,45.250000,45.750000,45.750000,18999000
'''
import numpy as np
from datetime import datetime
# use converter to convert a string object to datetime object. Note dtype is object for all columns
data = np.genfromtxt(r'stock.csv', delimiter = ',', names = True,
converters={0: lambda x: datetime.strptime(x, "%Y-%m-%d")}, dtype=object)
print data
'''
[ (datetime.datetime(2017, 12, 6, 0, 0), '46.599998', '46.740002', '46.090000', '46.700001', '46.700001', '34035000')
(datetime.datetime(2017, 12, 7, 0, 0), '46.700001', '47.090000', '46.389999', '46.779999', '46.779999', '24461400')
(datetime.datetime(2017, 12, 8, 0, 0), '46.619999', '46.700001', '46.279999', '46.700001', '46.700001', '21565800')
(datetime.datetime(2017, 12, 9, 0, 0), '46.049999', '46.389999', '45.650002', '46.299999', '46.299999', '25570400')
(datetime.datetime(2017, 12, 10, 0, 0), '46.040001', '46.090000', '45.380001', '45.580002', '45.580002', '24095400')
(datetime.datetime(2017, 12, 13, 0, 0), '45.259998', '45.939999', '45.250000', '45.750000', '45.750000', '18999000')]
'''
# check if a day is a weekday or not
def check_weekday_or_not(datetime_object):
if datetime_object.weekday() not in [5,6]:
# datetime.weekday() returns 5 and 6 for saturday and Sunday
return True
else:
return False
# create a function to apply on each row of the matrix
vfunc =np.vectorize(check_weekday_or_not)
filter_mask = vfunc(data['Date'])
print filter_mask
#[ True True True False False True]
# Apply the filter mask to obtain an array without weekends.
print data[filter_mask]
'''
array([ (datetime.datetime(2017, 12, 6, 0, 0), '46.599998', '46.740002', '46.090000', '46.700001', '46.700001', '34035000'),
(datetime.datetime(2017, 12, 7, 0, 0), '46.700001', '47.090000', '46.389999', '46.779999', '46.779999', '24461400'),
(datetime.datetime(2017, 12, 8, 0, 0), '46.619999', '46.700001', '46.279999', '46.700001', '46.700001', '21565800'),
(datetime.datetime(2017, 12, 13, 0, 0), '45.259998', '45.939999', '45.250000', '45.750000', '45.750000', '18999000')],
dtype=[('Date', 'O'), ('Open', 'O'), ('High', 'O'), ('Low', 'O'), ('Close', 'O'), ('Volume', 'O'), ('Adj_Close', 'O')])
'''

Related

Clipping a datatime series along the y-axis

I have a list of tuples, where each tuple is a datetime and float. I wish to clip the float values so that they are all above a threshold value. For example if I have:
a = [
(datetime.datetime(2021, 11, 1, 0, 0, tzinfo=tzutc()), 100),
(datetime.datetime(2021, 11, 1, 1, 0, tzinfo=tzutc()), 9.0),
(datetime.datetime(2021, 11, 1, 2, 0, tzinfo=tzutc()), 100.0)
]
and if I want to clip at 10.0, this would give me:
b = [
(datetime.datetime(2021, 11, 1, 0, 0, tzinfo=tzutc()), 100),
(datetime.datetime(2021, 11, 1, 0, ?, tzinfo=tzutc()), 10.0),
(datetime.datetime(2021, 11, 1, 1, ?, tzinfo=tzutc()), 10.0),
(datetime.datetime(2021, 11, 1, 2, 0, tzinfo=tzutc()), 100.0)
]
So if I were to plot the a data (before clipping), I would get a V shaped graph. However, if I clip the data at 10.0 to give me the b data, and plot, I will have a \_/ shaped graph instead. There is a bit of math involved in calculating the new times so I'm hoping there is already functionality available to do this kind of thing. The datetimes are sorted in order and are unique. I can fix the data so the difference between consecutive times is equal, should that be necessary.
Apologies for not putting a full answer yesterday, my SO account is still rate-limited.
I have made a bit more complex custom dataset to showcase several values in a row being below threshold.
import pandas as pd
from datetime import datetime
from matplotlib import pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline
df = pd.DataFrame([
(datetime(2021, 10, 31, 23, 0), 0),
(datetime(2021, 11, 1, 0, 0), 80),
(datetime(2021, 11, 1, 1, 0), 100),
(datetime(2021, 11, 1, 2, 0), 6),
(datetime(2021, 11, 1, 3, 0), 105),
(datetime(2021, 11, 1, 4, 0), 70),
(datetime(2021, 11, 1, 5, 0), 200),
(datetime(2021, 11, 1, 6, 0), 0),
(datetime(2021, 11, 1, 7, 0), 7),
(datetime(2021, 11, 1, 8, 0), 0),
(datetime(2021, 11, 1, 9, 0), 20),
(datetime(2021, 11, 1, 10, 0), 100),
(datetime(2021, 11, 1, 11, 0), 0)
], columns=['time', 'whatever'])
THRESHOLD = 10
The first thing to do here is to express index in terms of timedelta so that it behaves as any usual number we can then do all kinds of calculations with. For convenience, I am also expressing it as Series - an even better approach would be to create it as such from the get go, save the initial timestamp and reindex.
start_time = df['time'][0]
df.set_index((df['time'] - start_time).dt.total_seconds(), inplace=True)
series = df['whatever']
Then, I've tried InterpolatedUnivariateSpline from scipy:
roots = InterpolatedUnivariateSpline(df.index, series.values - THRESHOLD).roots()
threshold_crossings = pd.Series([THRESHOLD] * len(roots), index=roots)
new_series = pd.concat([series[series > THRESHOLD], threshold_crossings]).sort_index()
Let's test it out:
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(series)
ax.plot(df.index, [THRESHOLD] * len(df.index), 'k-.', label='threshold')
ax.plot(new_series)
ax.set_xlabel('$t-t_0$, s')
axins = ax.inset_axes([0.6, 0.6, 0.35, 0.3])
axins.plot(series)
axins.plot(df.index, [THRESHOLD] * len(df.index), 'k-.')
axins.plot(new_series)
axins.set_ylim(0, 20)
ax.indicate_inset_zoom(axins, edgecolor="black")
ax.set_ylabel('whatever, a.u.')
ax.legend(loc='upper left')
ax.set_title('Roots from InterpolatedUnivariateSpline')
Not so great. Spline roots interpolation is quite a bit off (after all, it uses a cubic B-spline under the hood and can't find roots if setting order to 1). Ah well. For monotonic functions, we could just inverse the interpolation, but this is not the case here. I hope someone finds a better way to do it, but my next step was rolling out a custom function:
def my_interp(series: pd.Series, thr: float) -> pd.Series:
needs_interp = series > thr
# XOR means we are only considering transition points
needs_interp = (needs_interp ^ needs_interp.shift(-1)).fillna(False)
# The last point will never be interpolated
x = series.index.to_series()
k = series.diff(periods=-1) / x.diff(periods=-1)
b = series - k * x
x_fill = ((thr - b) / k)[needs_interp]
fill_series = pd.Series(data=[thr] * x_fill.size, index=x_fill.values)
# NB! needs_interp is a wrong mask to use for series here
return pd.concat([series[series > thr], fill_series]).sort_index()
new= my_interp(series, THRESHOLD)
It achieves what you want to do with good precision:
To get back to timestamp representation, one would simply do
new_series.index = (start_time + pd.to_timedelta(new_series.index, unit='s'))
With that said, there are a couple caveats:
The function above assumes the timestamps are sorted (can be achieved
by sort_index), and no duplicates are present in the series
Edge conditions are nasty as usual. I have tested the function a little bit, the logic seems sound and it does not break if either side of the series is above/below the threshold, and it handles irregular data just fine, but still - watch out for NaNs in your data and consider how you should handle all the edge conditions, sorting etc.
There is no logic dedicated to handling data points exactly at threshold or ensuring there is any regularity in new timestamps. This could lead to bugs, too: e.g. if some portion of your code relies on having at least 2 data points every day, it might not hold after the transformation.

Hanging in View method

I've been recently learning python through a course. Everything works smoothly except when I use view method. Anybody having this problem as well?
I even used a sample code in https://pythonhosted.org/scikit-fuzzy/auto_examples/plot_tipping_problem_newapi.html#example-plot-tipping-problem-newapi-py. (link updated)
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl
quality = ctrl.Antecedent(np.arange(0, 11, 1), 'quality')
service = ctrl.Antecedent(np.arange(0, 11, 1), 'service')
tip = ctrl.Consequent(np.arange(0, 26, 1), 'tip')
quality.automf(3)
service.automf(3)
tip['low'] = fuzz.trimf(tip.universe, [0, 0, 13])
tip['medium'] = fuzz.trimf(tip.universe, [0, 13, 25])
tip['high'] = fuzz.trimf(tip.universe, [13, 25, 25])
# HERE COMES MY PROBLEM
quality['average'].view()
Whenever I get to view query part, all I get is a little square box that should show me the graph but it just keeps on loading. Any advice is greatly appreciated. Thank you!
This is the complete example running:
import matplotlib.pyplot as plt
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl
quality = ctrl.Antecedent(np.arange(0, 11, 1), 'quality')
service = ctrl.Antecedent(np.arange(0, 11, 1), 'service')
tip = ctrl.Consequent(np.arange(0, 26, 1), 'tip')
quality.automf(3)
service.automf(3)
tip['low'] = fuzz.trimf(tip.universe, [0, 0, 13])
tip['medium'] = fuzz.trimf(tip.universe, [0, 13, 25])
tip['high'] = fuzz.trimf(tip.universe, [13, 25, 25])
Here is my problem
quality['average'].view()
service.view()
tip.view()
rule1 = ctrl.Rule(quality['poor'] | service['poor'], tip['low'])
rule2 = ctrl.Rule(service['average'], tip['medium'])
rule3 = ctrl.Rule(service['good'] | quality['good'], tip['high'])
rule1.view()
tipping_ctrl = ctrl.ControlSystem([rule1, rule2, rule3])
tipping = ctrl.ControlSystemSimulation(tipping_ctrl)
Pass inputs to the ControlSystem using Antecedent labels with Pythonic API
# Note: if you like passing many inputs all at once, use .inputs(dict_of_data)
tipping.input['quality'] = 6.5
tipping.input['service'] = 9.8
# Crunch the numbers
tipping.compute()
print (tip)
tip.view(sim=tipping)
plt.show()
Since skfuzzy uses matplotlib and NetworkX underhood, you can try this code to show your figure:
import matplotlib.pyplot as plt
import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl
quality = ctrl.Antecedent(np.arange(0, 11, 1), 'quality')
service = ctrl.Antecedent(np.arange(0, 11, 1), 'service')
tip = ctrl.Consequent(np.arange(0, 26, 1), 'tip')
quality.automf(3)
service.automf(3)
tip['low'] = fuzz.trimf(tip.universe, [0, 0, 13])
tip['medium'] = fuzz.trimf(tip.universe, [0, 13, 25])
tip['high'] = fuzz.trimf(tip.universe, [13, 25, 25])
# HERE COMES MY PROBLEM
quality['average'].view()
plt.show()

Python Referencing data by interpolation

I have two datasets, One of which has time array in datetime.datetime form, and x,y,z coordinates array of that time, like time[0]=datetime.datetime(2000,1,21,0,7,25), x[0]=-6.7, etc.
I'd like to calculate something from the coordinates, but that needs another parameter (Ma) which depend on time. Second data set has another time array in same datetime form, and the parameter recorded at that time, like time[0]=datetime.datetime(2000,1,1,0,3), Ma[0]=2.73
The problem is that the time array of two data set is different (though the ranges are similar)
So I want to interpolate the parameter's value at each time of data set 1, like Ma[0], but 0 is not index of time of dataset2, but corresponds to index of dataset 1.
How can I do that?
PS. Can I convert the time form to simpler one? datetime.datetime seems quite cumbersome.
The following is an example of how to interpolate your values. The coord_ and ma_ arrays will be your imported data.
The first thing the script does is build some more sensible data structures from your disparate 1 dimensional arrays. The part that you're actually looking for is the call to np.interp, documented here.
import numpy as np
import datetime
import time
# Numpy cannot interpolate between datetimes
# This function converts a datetime to a timestamp
def to_ts(dt):
return time.mktime(dt.timetuple())
coord_dts = np.array([
datetime.datetime(2000, 1, 1, 12),
datetime.datetime(2000, 1, 2, 12),
datetime.datetime(2000, 1, 3, 12),
datetime.datetime(2000, 1, 4, 12)
])
coord_xs = np.array([3, 5, 8, 13])
coord_ys = np.array([2, 3, 5, 7])
coord_zs = np.array([1, 3, 6, 10])
ma_dts = np.array([
datetime.datetime(2000, 1, 1),
datetime.datetime(2000, 1, 2),
datetime.datetime(2000, 1, 3),
datetime.datetime(2000, 1, 4)
])
ma_vals = np.array([1, 2, 3, 4])
# Handling the data as separate arrays will be painful.
# This builds an array of dictionaries with the form:
# [ { 'time': timestamp, 'x': x coordinate, 'y': y coordinate, 'z': z coordinate }, ... ]
coords = np.array([
{ 'time': to_ts(coord_dts[idx]), 'x': coord_xs[idx], 'y': coord_ys[idx], 'z': coord_zs[idx] }
for idx, _ in enumerate(coord_dts)
])
# Build array of timestamps from ma datetimes
ma_ts = [ to_ts(dt) for dt in ma_dts ]
for coord in coords:
print("ma interpolated value", np.interp(coord['time'], ma_ts, ma_vals))
print("at coordinates:", coord['x'], coord['y'], coord['z'])

Interpolation of datetimes for smooth matplotlib plot in python

I have lists of datetimes and values like this:
import datetime
x = [datetime.datetime(2016, 9, 26, 0, 0), datetime.datetime(2016, 9, 27, 0, 0),
datetime.datetime(2016, 9, 28, 0, 0), datetime.datetime(2016, 9, 29, 0, 0),
datetime.datetime(2016, 9, 30, 0, 0), datetime.datetime(2016, 10, 1, 0, 0)]
y = [26060, 23243, 22834, 22541, 22441, 23248]
And can plot them like this:
import matplotlib.pyplot as plt
plt.plot(x, y)
I would like to be able to plot a smooth version using more x-points. So first I do this:
delta_t = max(x) - min(x)
N_points = 300
xnew = [min(x) + i*delta_t/N_points for i in range(N_points)]
Then attempting a spline fit with scipy:
from scipy.interpolate import spline
ynew = spline(x, y, xnew)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
What is the best way to proceed? I am open to solutions involving other libraries such as pandas or plotly.
You're trying to pass a list of datetimes to the spline function, which are Python objects (hence dtype('O')). You need to convert the datetimes to a numeric format first, and then convert them back if you wish:
int_x = [i.total_seconds() for i in x]
ynew = spline(int_x, y, xnew)
Edit: total_seconds() is actually a timedelta method, not for datetimes. However it looks like you sorted it out so I'll leave this answer as is.
Figured something out:
x_ts = [x_.timestamp() for x_ in x]
xnew_ts = [x_.timestamp() for x_ in xnew]
ynew = spline(x_ts, y, xnew_ts)
plt.plot(xnew, ynew)
This works very nicely, but I'm still open to ideas for simpler methods.

Numpy structured array with datetime

I tried to build a structured array with a datetime coloumn
import numpy as np
na_trades = np.zeros(2, dtype = 'datetime64,i4')
na_trades[0] = (np.datetime64('1970-01-01 00:00:00'),0)
TypeError: Cannot cast NumPy timedelta64 scalar from metadata [s] to according to the rule 'same_kind'
Is there a way to fix this?
You have to specify that the datetime64 is in seconds when you create the array because the one you parse and try to assign is a datetime64[s]:
na_trades = np.zeros(2, dtype='datetime64[s],i4')
na_trades[0] = (np.datetime64('1971-01-01 00:00:00'), 0)
The error you get means that the datetime64 object that you specified is not same_kind as the one you try to assing. You try to assign a seconds resolution one, and you created a different one when you constructed the array (by default I think it's nanoseconds).
Try following:
>>> na_trades = np.zeros(2, dtype=[('dt', 'datetime64[s]'), ('vol', 'i4')])
>>> na_trades
array([(datetime.datetime(1970, 1, 1, 0, 0), 0),
(datetime.datetime(1970, 1, 1, 0, 0), 0)],
dtype=[('dt', ('<M8[s]', {})), ('vol', '<i4')])
>>> na_trades[0] = (np.datetime64('1970-01-02 00:00:00'),1)
>>> na_trades
array([(datetime.datetime(4707, 11, 29, 0, 0), 1),
(datetime.datetime(1970, 1, 1, 0, 0), 0)],
dtype=[('dt', ('<M8[s]', {})), ('vol', '<i4')])

Categories

Resources