Getting RSI in python - python

I've been trying to calculate the 14 RSI of stocks and I managed to get it to work, somewhat, it gives me inaccurate numbers
import pandas as pd
import datetime as dt
import pandas_datareader as web
ticker = 'TSLA'
start = dt.datetime(2018, 1, 1)
end = dt.datetime.now()
data = web.DataReader(ticker, 'yahoo', start, end)
delta = data['Adj Close'].diff(1)
delta.dropna(inplace=True)
positive = delta.copy()
negative = delta.copy()
positive[positive < 0] = 0
negative[negative > 0] = 0
days = 14
average_gain = positive.rolling(window=days).mean()
average_loss = abs(negative.rolling(window=days).mean())
relative_strenght = average_gain / average_loss
rsi = 100.0 - (100.0 / (1.0 + relative_strenght))
print(ticker + str(rsi))
It ends up giving me 77.991564 (14 days RSI) when I should be getting 70.13 (14 days RSI), does any know what I'm doing wrong?
also yes I've read Calculating RSI in Python but it doesn't help me with what I need

Here is one way to calculate by yourself RSI. The code could be optimized, but I prefer to make it easy to understand, and the let you optimize.
For the example, we assume that you've got a DataFrame called df, with a column called 'Close', for the close prices. By the way, notice that if you compare results of the RSI with a station, for example, you should be sure that you compare the same values. For example, if in the station, you've got the bid close, and that you calculate by your own on the mid or the ask, it will not be the same result.
Let's see the code :
def rsi(df,_window=14,_plot=0,_start=None,_end=None):
"""[RSI function]
Args:
df ([DataFrame]): [DataFrame with a column 'Close' for the close price]
_window ([int]): [The lookback window.](default : {14})
_plot ([int]): [1 if you want to see the plot](default : {0})
_start ([Date]):[if _plot=1, start of plot](default : {None})
_end ([Date]):[if _plot=1, end of plot](default : {None})
"""
##### Diff for the différences between last close and now
df['Diff'] = df['Close'].transform(lambda x: x.diff())
##### In 'Up', just keep the positive values
df['Up'] = df['Diff']
df.loc[(df['Up']<0), 'Up'] = 0
##### Diff for the différences between last close and now
df['Down'] = df['Diff']
##### In 'Down', just keep the negative values
df.loc[(df['Down']>0), 'Down'] = 0
df['Down'] = abs(df['Down'])
##### Moving average on Up & Down
df['avg_up'+str(_window)] = df['Up'].transform(lambda x: x.rolling(window=_window).mean())
df['avg_down'+str(_window)] = df['Down'].transform(lambda x: x.rolling(window=_window).mean())
##### RS is the ratio of the means of Up & Down
df['RS_'+str(_window)] = df['avg_up'+str(_window)] / df['avg_down'+str(_window)]
##### RSI Calculation
##### 100 - (100/(1 + RS))
df['RSI_'+str(_window)] = 100 - (100/(1+df['RS_'+str(_fast)]))
##### Drop useless columns
df = df.drop(['Diff','Up','Down','avg_up'+str(_window),'avg_down'+str(_window),'RS_'+str(_window)],axis=1)
##### If asked, plot it!
if _plot == 1:
sns.set()
fig = plt.figure(facecolor = 'white', figsize = (30,5))
ax0 = plt.subplot2grid((6,4), (1,0), rowspan=4, colspan=4)
ax0.plot(df[(df.index<=end)&(df.index>=start)&(df.Symbol==_ticker.replace('/',''))]['Close'])
ax0.set_facecolor('ghostwhite')
ax0.legend(['Close'],ncol=3, loc = 'upper left', fontsize = 15)
plt.title(_ticker+" Close from "+str(start)+' to '+str(end), fontsize = 20)
ax1 = plt.subplot2grid((6,4), (5,0), rowspan=1, colspan=4, sharex = ax0)
ax1.plot(df[(df.index<=end)&(df.index>=start)&(df.Symbol==_ticker.replace('/',''))]['RSI_'+str(_window)], color = 'blue')
ax1.legend(['RSI_'+str(_window)],ncol=3, loc = 'upper left', fontsize = 12)
ax1.set_facecolor('silver')
plt.subplots_adjust(left=.09, bottom=.09, right=1, top=.95, wspace=.20, hspace=0)
plt.show()
return(df)
To call the function, you just have to type
df = rsi(df)
if you keep it with default values, or to change _window and/or _plot for the arg.
Notice that if you input _plot=1, you'll need to feed starting and ending of the plot, with a string or a date time.

Related

How to increase the time step in charts

I have a function that reads csv and outputs me 12 graphs. But it displays the time with a very small interval.
Here's the function!
def Gr():
df = pd.read_csv('DataSet.csv',)
'''start = df['Time'].iloc[0]
start = str(start)
start1 = start.replace(':', '-')
end = df['Time'].iloc[-1]
end = str(end)
end1 = end.replace(':', '-')
index = pd.date_range(start = start1, end = end1, freq = "S")
index = [pd.to_datetime(date, format='%H:%M:%S').date() for date in index] '''
names = ['P', 'Filter', 'Answers', 'step','step2','Comulative', 'Delta_ema','ComulativePOC', 'Delta_P', 'Sum','SpeedUp', 'M' ]
features = df[names]
features.index = df['Time']
axs = features.plot(subplots=True)
cursor = MultiCursor(axs[1].get_figure().canvas, axs)
plt.subplots_adjust(wspace=0.19, hspace=0.05, top=0.99, right=0.988, bottom=0.052, left=0.055)
plt.show()
Here is a screenshot of the result of the functions. I circled the time at the bottom. I would like to increase the interval to at least once every 5 seconds or even 1 second.
Is it possible to do it this way ? Without a figure?
you should uses something like this before your plt.show() line
import matplotlib.dates as m_dates
ax = plt.gca() # get the current axis
ax.xaxis.set_major_locator(m_dates.SecondLocator(interval=5)) # every five seconds
info about SecondLocator:
https://matplotlib.org/stable/api/dates_api.html#matplotlib.dates.SecondLocator
info dates in general:https://matplotlib.org/stable/api/dates_api.html
info ticks locating: https://matplotlib.org/stable/api/ticker_api.html?highlight=ticks%20locator

Python plotly how to change X and Y axis in button

I'm working on a graph that ilustrates computer usage each day. I want to have a button that will group dates monthly for last year and set y as AVERAGE (mean) and draw avg line.
My code:
import datetime
import numpy as np
import pandas as pd
import plotly.graph_objects as go
example_data = {"date": ["29/07/2022", "30/07/2022", "31/07/2022", "01/08/2022", "02/08/2022"],
"time_spent" : [15840, 21720, 40020, 1200, 4200]}
df = pd.DataFrame(example_data)
df["date"] = pd.to_datetime(df["date"], dayfirst=True)
df['Time spent'] = df['time_spent'].apply(lambda x:str(datetime.timedelta(seconds=x)))
df['Time spent'] = pd.to_datetime(df['Time spent'])
df = df.drop("time_spent", axis=1)
dfall = df.resample("M", on="date").mean().copy()
dfyearly = dfall.tail(12).copy()
dfweekly = df.tail(7).copy()
dfmonthly = df.tail(30).copy()
del df
dfs = {'Week':dfweekly, 'Month': dfmonthly, 'Year' : dfyearly, "All" : dfall}
for dframe in list(dfs.values()):
dframe['StfTime'] = dframe['Time spent'].apply(lambda x: x.strftime("%H:%M"))
frames = len(dfs) # number of dataframes organized in dict
columns = len(dfs['Week'].columns) - 1 # number of columns i df, minus 1 for Date
scenarios = [list(s) for s in [e==1 for e in np.eye(frames)]]
visibility = [list(np.repeat(e, columns)) for e in scenarios]
lowest_value = datetime.datetime.combine(datetime.date.today(), datetime.datetime.min.time())
highest_value = dfweekly["Time spent"].max().ceil("H")
buttons = []
fig = go.Figure()
for i, (period, df) in enumerate(dfs.items()):
print(i)
for column in df.columns[1:]:
fig.add_bar(
name = column,
x = df['date'],
y = df[column],
customdata=df[['StfTime']],
text=df['StfTime'],
visible=True if period=='Week' else False # 'Week' values are shown from the start
)
#Change display data to more friendly format
fig.update_traces(textfont=dict(size=20), hovertemplate='<b>Time ON</b>: %{customdata[0]}</br>')
#Change range for better scalling
this_value =df["Time spent"].max().ceil("H")
if highest_value <= this_value:
highest_value = this_value
fig.update_yaxes(range=[lowest_value, highest_value])
#Add average value indicator
average_value = df["Time spent"].mean()
fig.add_hline(y=average_value, line_width=3, line_dash="dash",
line_color="green")
# one button per dataframe to trigger the visibility
# of all columns / traces for each dataframe
button = dict(label=period,
method = 'restyle',
args = ['visible',visibility[i]])
buttons.append(button)
fig.update_yaxes(dtick=60*60*1000, tickformat='%H:%M')
fig.update_xaxes(type='date', dtick='D1')
fig.update_layout(updatemenus=[dict(type="dropdown",
direction="down",
buttons = buttons)])
fig.show()
EDIT 1.
Thanks to vestland I managed to get semi-working dropdown.
The problem is that the line added with add_hline affect all bar charts. I want it to display only on the chart that it had been added for. Also after passing in custom data for nicer display, the space between bars is doubled. Any way to fix above issues?

Convert /reshape a dataset from 'wide to long' format and convert the time column into time format for time-series analysis

I have a dataset with 7 columns - level,Time_30,Time_60,Time_90,Time_120,Time_150 and Time_180
My main goal is to do a time-series anomaly detection using cell count in a 30-minute interval.
I want to do the following data preparation steps:
(I) melt/reshape the df into the appropriate time-series format (from wide to long)- consolidate the columns time_30, time_60 ,....., time_180 into one column time with 6 levels (30,60,.....,180)
(II) since the result from (I) comes out as 30,60,.....180, I want to set the time column as the appropriate time or date format for time-series (something like this '%H:%M:%S')
(III) use a for-loop to plot the time-series plot for each level - A, B,...., F) for comparison purposes.
(IV) Anomaly detection
# generate/import dataset
import pandas as pd
df = pd.DataFrame({'level':[A,B,C,D,E,F],
'Time_30':[1993.05,1999.45, 2001.11, 2007.39, 2219.77],
'Time_60':[2123.15,2299.59, 2339.19, 2443.37, 2553.15],
'Time_90':[2323.56,2495.99,2499.13, 2548.71, 2656.0],
'Time_120':[2355.52,2491.19,2519.92,2611.81, 2753.11],
'Time_150':[2425.31,2599.51, 2539.9, 2713.77, 2893.58],
'Time_180':[2443.35,2609.92, 2632.49, 2774.03, 2901.25]} )
Desired outcome
# first series
level, time, count
A, 30, 1993.05
B, 60, 2123.15
C, 90, 2323.56
D, 120, 2355.52
E, 150, 2425.31
F, 180, 2443.35
# 2nd series
level,time,count
A,30,1999.45
B,60,2299.59
C,90,2495.99
D,120,2491.19
E,150,2599.51
F,180,2609.92
.
.
.
.
# up until the last series
See below for my attempt
# (I)
df1 = pd.melt(df,id_vars = ['level'],var_name = 'time',value_name = 'count') #
# (II)
df1['time'] = pd.to_datetime(df1['time'],format= '%H:%M:%S' ).dt.time
OR
df1['time'] = pd.to_timedelta(df1['time'], unit='m')
# (III)
plt.figure(figsize=(10,5))
plt.plot(df1)
for timex in range(30,180):
plt.axvline(datetime(timex,1,1), color='k', linestyle='--', alpha=0.3)
# Perform STL Decomp
stl = STL(df1)
result = stl.fit()
seasonal, trend, resid = result.seasonal, result.trend, result.resid
plt.figure(figsize=(8,6))
plt.subplot(4,1,1)
plt.plot(df1)
plt.title('Original Series', fontsize=16)
plt.subplot(4,1,2)
plt.plot(trend)
plt.title('Trend', fontsize=16)
plt.subplot(4,1,3)
plt.plot(seasonal)
plt.title('Seasonal', fontsize=16)
plt.subplot(4,1,4)
plt.plot(resid)
plt.title('Residual', fontsize=16)
plt.tight_layout()
estimated = trend + seasonal
plt.figure(figsize=(12,4))
plt.plot(df1)
plt.plot(estimated)
plt.figure(figsize=(10,4))
plt.plot(resid)
# Anomaly detection
resid_mu = resid.mean()
resid_dev = resid.std()
lower = resid_mu - 3*resid_dev
upper = resid_mu + 3*resid_dev
anomalies = df1[(resid < lower) | (resid > upper)] # returns the datapoints with the anomalies
anomalies
plt.plot(df1)
for timex in range(30,180):
plt.axvline(datetime(timex,1,1), color='k', linestyle='--', alpha=0.6)
plt.scatter(anomalies.index, anomalies.count, color='r', marker='D')
Please note: if you can only attempt I and/or II that would be much appreciated.
I made a few small edits to your sample dataframe based on my comment above:
import pandas as pd
df = pd.DataFrame({'level':['A','B','C','D','E'],
'Time_30':[1993.05,1999.45, 2001.11, 2007.39, 2219.77],
'Time_60':[2123.15,2299.59, 2339.19, 2443.37, 2553.15],
'Time_90':[2323.56,2495.99,2499.13, 2548.71, 2656.0],
'Time_120':[2355.52,2491.19,2519.92,2611.81, 2753.11],
'Time_150':[2425.31,2599.51, 2539.9, 2713.77, 2893.58],
'Time_180':[2443.35,2609.92, 2632.49, 2774.03, 2901.25]} )
First, manipulate the Time_* column names to be integer values:
timecols = [int(c.replace("Time_","")) for c in df.columns if c != 'level']
df.columns = ['level'] + timecols
After that you can pd.melt() like you were thinking, yielding a datarame with all those "series" you mentioned above concatenated together:
df1 = df.melt(id_vars=['level'], value_vars=timecols, var_name='time', value_name='count').sort_values(['level','time']).reset_index(drop=True)
print(df1.head(10))
level time count
0 A 30 1993.05
1 A 60 2123.15
2 A 90 2323.56
3 A 120 2355.52
4 A 150 2425.31
5 A 180 2443.35
6 B 30 1999.45
7 B 60 2299.59
8 B 90 2495.99
9 B 120 2491.19
If you want to loop over the levels, select them with:
for level in df1['level'].unique():
tmp = df1[df1['level']==level]
or
for level in df1['level'].unique():
tmp = df1[df1['level']==level].copy()
...if you intend to modify/add data to the tmp dataframe.
As for making timestamps, you could do:
df1['time'] = pd.to_timedelta(df1['time'], unit='min')
...like you were attempting, but it depends on how you're using it. If you just want strings that look like "00:30:00", etc, you can try something like:
df1['time'] = pd.to_timedelta(df1['time'], unit='min').apply(lambda x:str(x)[-8:])
Anyway, hope that gets you on track for what you need.

How to plot 3 or more values in plot.bar()

I tried to make plot.bar() using 2 values having them in a list, but I'm unable to plot 3 values.
I tried to add plot.bar(x,y,z), but it didn't work.
ce_data = ce_data.drop(
['pchangeinOpenInterest', 'totalTradedVolume', 'impliedVolatility', # this removes unecesssary items
'pChange', 'totalBuyQuantity', 'totalSellQuantity', 'bidQty',
'bidprice', 'askQty', 'askPrice', 'askQty', 'identifier', 'lastPrice', 'change', 'expiryDate',
'underlying'], axis=1)[
['openInterest', 'changeinOpenInterest', 'strikePrice', 'underlyingValue']]
style.use('ggplot')
ce_data.to_csv('kumar.csv')
df = pd.read_csv('kumar.csv', parse_dates=True, index_col=0)
pivot = df.iloc[2, 3] # this selects the strike price
pivot_round = round(pivot, -2) # round of the price
x = df['strikePrice'].tolist()
y = df['changeinOpenInterest'].tolist()
z = df['openInterest'].tolist()
for i in range(len(x)):
if int(x[i]) >= pivot_round - 400:
xleftpos = i
break
for i in range(len(x)):
if int(x[i]) >= pivot_round + 400:
xrightpos = i
break
x = x[xleftpos:xrightpos]
y = y[xleftpos:xrightpos]
z = z[xleftpos:xrightpos]
plot.bar([value for value in range(len(x))],y)
plot.set_xticks([idx + 0.5 for idx in range(len(x))])
plot.set_xticklabels(x, rotation=35, ha='right', size=10)
I am expecting strike price in x axis and y and z (change in oi and oi) in as bars.
IIUC, here's how I'd do it. This should have a single x-axis w/ 'strikePrice' and two bars of 'changeinOpenInterest' and 'openInterest'.
disp_df = df.pivot('strikePrice', 'changeinOpenInterest', 'openInterest')
disp_df.plot(kind='bar')
You can add the bells and whistles you want to the plot, but this avoids a lot of the manipulation you did above.

Adding a 45 degree line to a time series stock data plot

I guess this is supposed to be simple.. But I cant seem to make it work.
I have some stock data
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data = np.random.rand(62)*100)
I am doing some analysis on it, this results of my drawing some lines on the graph.
And I want to plot a 45 line somewhere on the graph as a reference for lines I drew on the graph.
What I have tried is
x = df.tail(len(df)/20).index
x = x.reset_index()
x_first_val = df.loc[x.loc[0].date].adj_close
In order to get some point and then use slope = 1 and calculate y values.. but this sounds all wrong.
Any ideas?
Here is a possibility:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Get values for the time:
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
# get the timestamps in nanoseconds (since epoch)
timestamps_ns = index_range.astype(np.int64)
# convert it to a relative number of days (for example, could be seconds)
time_day = (timestamps_ns - timestamps_ns[0]) / 1e9 / 60 / 60 / 24
# Define y-data for a line:
slope = 3 # unit: "something" per day
something = time_day * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');
which gives:
edit - first answer, using dayofyear instead of the timestamps:
import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(start = "06/01/2018", end = "08/01/2018"),
data=np.random.rand(62)*100,
columns=['data'])
# Define data for a line:
slope = 3 # unit: "something" per day
index_range = df.index[('2018-06-18' < df.index) & (df.index < '2018-07-21')]
dayofyear = index_range.dayofyear # it will not work around the new year...
dayofyear = dayofyear - dayofyear[0]
something = dayofyear * slope
trendline = pd.Series(something, index=index_range)
# Graph:
df.plot(label='data', alpha=0.8)
trendline.plot(label='some trend')
plt.legend(); plt.ylabel('something');

Categories

Resources