I'm new to coding and am trying to understand a lecture on Quantopian by going through the code but when I run the code in PyCharm, there is no output. Can someone tell me what's going on and advise me on how to resolve this?
Below is my a piece of code (2.7.13):
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
# just set the seed for the random number generator
np.random.seed(107)
import matplotlib.pyplot as plt
X_returns = np.random.normal(0, 1, 100) # Generate the daily returns
# sum them and shift all the prices up into a reasonable range
X = pd.Series(np.cumsum(X_returns), name='X') + 50
X.plot();
The sole output, when I run this, is: "Process finished with exit code 0"
Just add plt.show() at the end:
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
# just set the seed for the random number generator
np.random.seed(107)
import matplotlib.pyplot as plt
X_returns = np.random.normal(0, 1, 100) # Generate the daily returns
# sum them and shift all the prices up into a reasonable range
X = pd.Series(np.cumsum(X_returns), name='X') + 50
X.plot()
plt.show()
Related
I have met ValueError: Exog and Ebndog are in different size.
When I type len(y) or len(y_scaled), it returns 0, but it supposed to be five. Hope for help. Thanks in advance.
import datetime
import dateutil
import pandas_datareader.data as wb
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
year=5
tickers =["0200.KL"]
ohlc = wb.DataReader(tickers, data_source="yahoo",start=datetime.date.today()-dateutil.relativedelta.relativedelta(years=year),end=datetime.date.today())
n=5 #get 5 consecutive data
df =ohlc.copy()
series=df["Adj Close"]
slopes=[i*0 for i in range(n-1)]
for i in range(n,len(series)+1):
y=series[i-n:n]
x=np.array(range(n))
#normalize x and y variable
y_scaled=(y-y.min())/(y.max()-y.min())
X_scaled=(x-x.min())/(x.max()-x.min())
#add a constant to the equation
X_scaled=sm.add_constant(X_scaled)
model=sm.OLS(y_scaled,X_scaled)
results=model.fit()
slopes.append(results.params[-1])
#slope coefficient is the theta in radians
slopes_angle=np.rad2degree(np.arctan(np.array(slopes)))
np.array(slopes_angle)
Solved. Thank you.
Should be y=df["Adj Close"][i-n:i] instead of y=series[n-i:n]
The full code as below:
import datetime
import dateutil
import pandas_datareader.data as wb
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
%matplotlib inline
year=1
tickers ="AAPL"
ohlc = wb.DataReader(tickers, data_source="yahoo",start=datetime.date.today()-dateutil.relativedelta.relativedelta(years=year),end=datetime.date.today())
n=5 #get 5 consecutive datas
df =ohlc.copy()
slopes=[i*0 for i in range(n-1)]
for i in range(n,len(df)+1):
y=df["Adj Close"][i-n:i]
x=np.array(range(n))
#normalize x and y variable
y_scaled=(y-y.min())/(y.max()-y.min())
X_scaled=(x-x.min())/(x.max()-x.min())
#add a constant to the equation
X_scaled=sm.add_constant(X_scaled)
model=sm.OLS(y_scaled,X_scaled)
results=model.fit()
slopes.append(results.params[-1])
#slope coefficient is the theta in radians
slopes_angle=np.rad2deg(np.arctan(np.array(slopes)))
slopes_angle=np.array(slopes_angle)
plt.plot(slopes_angle)
plt.title("Slope Coefficient of 5 Consecutive Stock Price Data")
plt.ylabel("Slope Coefficient")
plt.xlabel("Period")
plt.show()
I have written a simple K-mean algorithm, But I am finding difficulty to explore it cluster by cluster.
Github Link: https://github.com/AkshayBayas/Machine-learning-/blob/master/K-Means%20algorithm.ipynb
Code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%pylab
Df = pd.read_csv('Kdata.csv')
from sklearn.cluster import KMeans
KModule = KMeans()
K_model = KModule.fit(Df)
K_result = K_model.predict(Df)
centers = K_model.cluster_centers_
K_model.labels_
plt.scatter (x1,x2, c = K_model.labels_, cmap = 'rainbow' )
Can anyone help?
No idea what you mean by "explore cluster by cluster".
If you don't specify the number of clusters, by default it is 8, so if you start with 3 like the code below, you can separate them. Also you need to set it as categoric, the cluster, so it will not be colored on a continuous scale:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Df = pd.read_csv('Kdata.csv')
from sklearn.cluster import KMeans
KModule = KMeans(n_clusters=3)
K_model = KModule.fit(Df)
K_result = K_model.predict(Df)
Df['cluster'] = pd.Categorical(K_model.labels_)
sns.scatterplot("V1","V2",data=Df,hue='cluster',cmap = 'rainbow' )
Df.plot.scatter("V1","V2",c='cluster',cmap = 'rainbow')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
ridge_clf = RidgeCV(alphas=alphas,scoring='r2').fit(X, y)
ridge_clf.score(X, y)
I wanted to plot the following graph for RidgeCV. I don't see any option to do that like GridSearhCV. I appreciate your suggestions!
There is no indication what the colors stand for. I assume they stand for features and we investigate the size of each feature weight as function of alpha. Here is my solution:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
w = list()
for a in alphas:
ridge_clf = RidgeCV(alphas=[a],cv=10).fit(X, y)
w.append(ridge_clf.coef_)
w = np.array(w)
plt.semilogx(alphas,w)
plt.title('Ridge coefficients as function of the regularization')
plt.xlabel('alpha')
plt.ylabel('weights')
plt.legend(X.keys())
Output:
Since you only have two features in X there are only two lines.
Here is the code for generating the plot that you had posted.
Firstly, we need to understand that RidgeCV would not return the coef for each alpha value that we had fed in the alphas param.
The motivation behind having the RidgeCV is that it will try for different alpha values mentioned in alphas param, then based on cross validation scoring, it will return the best alpha along with the fitted model.
Hence, the only way to get the coef for each alpha value using cv is iterate through RidgeCV using each alpha value.
Example:
# Author: Fabian Pedregosa -- <fabian.pedregosa#inria.fr>
# License: BSD 3 clause
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
# #############################################################################
# Compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.RidgeCV(alphas=[a], fit_intercept=False, cv=3)
ridge.fit(X, y)
coefs.append(ridge.coef_)
# #############################################################################
# Display results
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('RidgeCV coefficients as a function of the regularization')
plt.axis('tight')
plt.show()
I am stuck with getting the forecast values in the POWER BI /query dataset. Below is the code I have where I tried to export the y_hat values using pd.DataFrame. The code does not give error but only original dataset values get returned not the future date forecasted values. I want to have a separate dataset which would contains full forecasted values for next 6 months. What can be done to achieve the same?
# 'dataset' holds the input data for this script
dataset = dataset.drop_duplicates()
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
from matplotlib.dates import DateFormatter
import numpy as np
import pandas as pd
import datetime
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import os
from datetime import datetime, timedelta
dataset['Month']= pd.to_datetime(dataset['Month'])
dataset.set_index('Month', inplace=True)
def get_prediction(dataset):
list_TPID = dataset.TPID.unique()
for TPID in list_TPID:
TPID_df = dataset.loc[dataset['TPID'] == TPID]
train, test = dataset.iloc[:4,0] , dataset.iloc[3:,0]
model= ExponentialSmoothing(train,trend='add',damped=False).fit()
y_hat = model.forecast(6)
dfoutput= pd.DataFrame(y_hat)
What you have in your snippet are a bunch of imports and a function definition. What you seem to be missing is a return statement in the function, like return(dfoutput). Your indentation seems a bit off though. But if everything else is correct you are missing a call to your function such as output=get_prediction(dataset=dataset).
As long as output does indeed end up as a dataframe, then that will be made available to you in PowerBI after the code is run.
How would I calculated standartized residuals from arima model sarimax function?
lets say we have some basic model:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', context='poster')
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
import seaborn as sns
#plt.style.use("ggplot")
import pandas_datareader.data as web
import pandas as pd
import statsmodels.api as sm
import scipy
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
import datetime
model = SARIMAX(df, order = (6, 0, 0), trend = "c");
model_results = model.fit(maxiter = 500);
print(model_results.summary());
I need standardizer so when we use model_results.plot_diagnostics(figsize = (16, 10)); function and then just basic plot function residuals should look the same.
I think you can use the function "internally_studentized_residual" from https://stackoverflow.com/a/57155553/14294235
It should work like this:
model = SARIMAX(df, order = (6, 0, 0), trend = "c");
model_results = model.fit(maxiter = 500);
model_fittebd_y = model_results.fittedvalues
resid_studentized = internally_studentized_residual(df,model_fitted_y)
resid_studentized = -resid_studentized
plt.plot(resid_studentized)
plt.axhline(y=0, color='b', linestyle='--')
plt.show()