How can I calculate standardized residuals in python? - python

How would I calculated standartized residuals from arima model sarimax function?
lets say we have some basic model:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', context='poster')
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
import seaborn as sns
#plt.style.use("ggplot")
import pandas_datareader.data as web
import pandas as pd
import statsmodels.api as sm
import scipy
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
import datetime
model = SARIMAX(df, order = (6, 0, 0), trend = "c");
model_results = model.fit(maxiter = 500);
print(model_results.summary());
I need standardizer so when we use model_results.plot_diagnostics(figsize = (16, 10)); function and then just basic plot function residuals should look the same.

I think you can use the function "internally_studentized_residual" from https://stackoverflow.com/a/57155553/14294235
It should work like this:
model = SARIMAX(df, order = (6, 0, 0), trend = "c");
model_results = model.fit(maxiter = 500);
model_fittebd_y = model_results.fittedvalues
resid_studentized = internally_studentized_residual(df,model_fitted_y)
resid_studentized = -resid_studentized
plt.plot(resid_studentized)
plt.axhline(y=0, color='b', linestyle='--')
plt.show()

Related

Create legends in scatter plt

I create a mask of my dataset for plotting only No Animals materials, and when I draw this mask I have problems with the legends, because only the first material defines me and I don't know how to add the other 2 materials.
import numpy as np
import umap
import umap.plot
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display,HTML
import cufflinks as cf
cf.set_config_file(sharing='public',theme='ggplot',offline=True)
import seaborn as sns
palpations = np.load('big_matrix_16384.npz',allow_pickle=True)
X = palpations['arr_0']
embedding = umap.UMAP(n_neighbors=50,
min_dist=0.2,
metric='correlation').fit(X)
emb = embedding.transform(X)
mask_1 = Data["Tipo"]=="Animal"
emb_tipo_1 = emb[mask_1]
cmap = plt.cm.Spectral
c =[sns.color_palette("Set2")[x] for x in data_tipo_1.Material.map({"bone":0, "cartilage":1, "liver_raw_piece1":2})]
plt.scatter(emb_tipo_1[:,0],
emb_tipo_1[:,1],
c=c,
label=np.unique(data_tipo_1.Material),s=10)
plt.gca().set_aspect("equal","datalim")
plt.title("UMAP muestras Animales.")
plt.legend()
enter image description here

How to build effective K-means algoritham?

I have written a simple K-mean algorithm, But I am finding difficulty to explore it cluster by cluster.
Github Link: https://github.com/AkshayBayas/Machine-learning-/blob/master/K-Means%20algorithm.ipynb
Code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%pylab
Df = pd.read_csv('Kdata.csv')
from sklearn.cluster import KMeans
KModule = KMeans()
K_model = KModule.fit(Df)
K_result = K_model.predict(Df)
centers = K_model.cluster_centers_
K_model.labels_
plt.scatter (x1,x2, c = K_model.labels_, cmap = 'rainbow' )
Can anyone help?
No idea what you mean by "explore cluster by cluster".
If you don't specify the number of clusters, by default it is 8, so if you start with 3 like the code below, you can separate them. Also you need to set it as categoric, the cluster, so it will not be colored on a continuous scale:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Df = pd.read_csv('Kdata.csv')
from sklearn.cluster import KMeans
KModule = KMeans(n_clusters=3)
K_model = KModule.fit(Df)
K_result = K_model.predict(Df)
Df['cluster'] = pd.Categorical(K_model.labels_)
sns.scatterplot("V1","V2",data=Df,hue='cluster',cmap = 'rainbow' )
Df.plot.scatter("V1","V2",c='cluster',cmap = 'rainbow')

Plot RidgeCV coefficients as a function of the regularization

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
ridge_clf = RidgeCV(alphas=alphas,scoring='r2').fit(X, y)
ridge_clf.score(X, y)
I wanted to plot the following graph for RidgeCV. I don't see any option to do that like GridSearhCV. I appreciate your suggestions!
There is no indication what the colors stand for. I assume they stand for features and we investigate the size of each feature weight as function of alpha. Here is my solution:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
tips = sns.load_dataset('tips')
X = tips.drop(columns=['tip','sex', 'smoker', 'day', 'time'])
y = tips['tip']
alphas = 10**np.linspace(10,-2,100)*0.5
w = list()
for a in alphas:
ridge_clf = RidgeCV(alphas=[a],cv=10).fit(X, y)
w.append(ridge_clf.coef_)
w = np.array(w)
plt.semilogx(alphas,w)
plt.title('Ridge coefficients as function of the regularization')
plt.xlabel('alpha')
plt.ylabel('weights')
plt.legend(X.keys())
Output:
Since you only have two features in X there are only two lines.
Here is the code for generating the plot that you had posted.
Firstly, we need to understand that RidgeCV would not return the coef for each alpha value that we had fed in the alphas param.
The motivation behind having the RidgeCV is that it will try for different alpha values mentioned in alphas param, then based on cross validation scoring, it will return the best alpha along with the fitted model.
Hence, the only way to get the coef for each alpha value using cv is iterate through RidgeCV using each alpha value.
Example:
# Author: Fabian Pedregosa -- <fabian.pedregosa#inria.fr>
# License: BSD 3 clause
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
# #############################################################################
# Compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
ridge = linear_model.RidgeCV(alphas=[a], fit_intercept=False, cv=3)
ridge.fit(X, y)
coefs.append(ridge.coef_)
# #############################################################################
# Display results
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('RidgeCV coefficients as a function of the regularization')
plt.axis('tight')
plt.show()

Pandas not plotting code.

I'm new to coding and am trying to understand a lecture on Quantopian by going through the code but when I run the code in PyCharm, there is no output. Can someone tell me what's going on and advise me on how to resolve this?
Below is my a piece of code (2.7.13):
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
# just set the seed for the random number generator
np.random.seed(107)
import matplotlib.pyplot as plt
X_returns = np.random.normal(0, 1, 100) # Generate the daily returns
# sum them and shift all the prices up into a reasonable range
X = pd.Series(np.cumsum(X_returns), name='X') + 50
X.plot();
The sole output, when I run this, is: "Process finished with exit code 0"
Just add plt.show() at the end:
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
# just set the seed for the random number generator
np.random.seed(107)
import matplotlib.pyplot as plt
X_returns = np.random.normal(0, 1, 100) # Generate the daily returns
# sum them and shift all the prices up into a reasonable range
X = pd.Series(np.cumsum(X_returns), name='X') + 50
X.plot()
plt.show()

Fir kernel distribution to my data

import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import numpy as np
import matplotlib as mpl
import seaborn as sns
from scipy.stats import gaussian_kde
from numpy import linspace,hstack
LINE_WIDTH = 3
filename=('')
data=[ map(float, line.split()) for line in open(filename,'r') if line.strip()]
dataM=np.array(data)
meandata=np.mean(dataM,axis=0)
SD = np.std(dataM,axis=0)
sns.set_palette("hls")
mpl.rc("figure", figsize=(8, 4))
xs = np.linspace(meandata[0]-(4 * SD[0]) ,meandata[0]+( 4 * SD[0]), dataM[:,0].size)
ys=dataM[:,0]
n,bins,patches=plt.hist(ys,15)
I get this plot.
and I want to get a kernel gaussian distribution plotted over my histogram but I am getting an error TypeError: 'module' object is not callable
When I am trying to do this
my_pdf = gaussian_kde(ys)
x = linspace(30,100,1000)
plt(x,my_pdf(x),'r') # distribution function
plt.hist(ys,normed=1,alpha=.3) # histogram
plt.show()
What am I doing wrong?
You can do this directly using seaborn. It would be something like this:
import pandas as pd
import seaborn as sns
import scipy.stats
import matplotlib.pyplot as plt
data = pd.read_csv('input.txt')
sns.distplot(data, kde=False, fit=scipy.stats.norm)
plt.show()
For a kde plot just do:
sns.distplot(data);

Categories

Resources