How to choose coefficients with scikit LinearRegression [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to find an autoregressive model on some data stored in a dataframe and I have 96 data points per day. The data is the value of solar irradiance in some region and I know it has a 1-day seasonality. I want to obtain a simple linear model using scikit LinearRegression and I want to specify which lagged data points to use. I would like to use the last 10 data points, plus the data point that has a lag of 97, which corresponds to the data point of 24 hour earlier. How can I specify the lagged coefficients that I want to use? I don't want to have 97 coefficients, I just want to use 11 of them: the previous 10 data points and the data point 97 positions back.

Just make a dataset X with 11 columns [x0-97, x0-10, x0-9,...,x0-1]. Then series of x0 will be your target Y.

Related

Reducing data to one dimension using PCA [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Can the dimension of the data be reduced to only one principal component?
I tried it on the iris data set-
from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt
pca = PCA(n_components=1)
pca_X = pca.fit_transform(X) #X = standardized iris data
pca_df = pd.DataFrame(pca_X, columns=["PCA1"])
plt.plot(pca_df["PCA1"], "o")
We can see three different clusters. So can to dimension be reduced to 1?
You can choose to reduce the dimensions to 1 using PCA, the only thing it promises is that the resultant principal component is in the direction of highest variance in the data.
If you are reducing the dimensions in order to improve classification you can use Linear Discriminant Analysis which gives you the direction of maximum separation between the classes.
Yes, the dimension can be reduced to 1, which is exactly what you have done in your example.
The y Axis in your plot shows the coordinate for each observation wrt the first principal component.
The three clusters likely relate to the three species in the Iris dataset and have nothing to do with the number of components.

Caculate the standard deviation be for the vatiable in the matrix created by tensorflow [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
import tensorflow as tf
input=[50,10]
O1 = layers.fully connected(input, 20, tf.sigmoid)
Why my input is wrong?
I am not sure I understand the question, but...
The sigmoid layer will output an array with numbers between 0 and 1, but you can't really calculate what the standard deviation will be before feeding your network.
If you are talking about the matrix that contains the weight parameters, then this depends on how you initialize them. But after the training of the network, the deviation will not be the same as before the training.
EDIT:
Ok, so you simply want to calculate the standard deviation for a matrix. In that case see numpy.
a = np.array([[1, 2], [3, 4]]) # or your 50 by 50 matrix
np.std(a)

Normalize a vector with pre-defined mean [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I would like to normalize a vector such that the mean of the normalized vector would be a certain pre-defined value. For instance, I want the mean to be 0.1 in the following example:
import numpy as np
from sklearn.preprocessing import normalize
array = np.arange(1,11)
array_norm = normalize(array[:,np.newaxis], axis=0).ravel()
Of course, np.mean(array_norm) is 0.28 and not 0.1. Is there a way to this in Python?
You could just multiply each element by mean_you_want / current_mean. If you multiply each element by a scalar, the mean will also be multiplied by that scalar. In your case, that would be 0.1/np.mean(array_norm)
array_norm *= 0.1/np.mean(array_norm)
This should do the trick.

Filling in the NaN values with Regression [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have x1=Job level (numerical), x2= Job code (categorical) and y = Stock value (numerical). For a data set of 3x500 i have 250 NaN values in Stock Value.
What do I need to change in my code below to read x2 as a categorical value and rerun the program to find the coefficients?Data set example
> import pandas as pd from sklearn.linear_model import LinearRegression
> df = pd.read_excel("stats.xlsx")
> df_nonull=df.dropna() X_train = df_nonull[['Job Code','Job Level']]
> y_train = df_nonull[['Stock Value']]
>
>
> X_test = df[['Job Code','Job Level']] y_test = df[['Stock Value']]
>
> regressor = LinearRegression() model=regressor.fit(X_train, y_train)
> # display coefficients print(regressor.coef_)
> print(regressor.coef_)
This is a straightforward model training problem. Your available training data (observations) are the rows with Stock Value present; your later "real" data are the rows without.
Categorical data is quite legal in such cases. In fact, you might try declaring Job Level as categorical, as well, since it's discrete; that will free you from any assumptions of linearity (although it also denies any applicability of the level-code ordering).
Your task is to choose a model type that services your data properly. This requires research and experimentation; welcome to Data Science. Since you haven't discussed your data shape, density, connectivity, clustering, etc., there's really not much we can explore that with you. Six observations on three features (note that Job Code and Job Title are not 100% coupled) is not enough for educated speculation.
Try adding some polynomial terms to your "linear" regression: perhaps a sqared term and a square root for each input. That's often the first attempt for such a task.

python - how to find area under curve? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
would like to ask if it is possible to calculate the area under curve for a fitted distribution curve?
The curve would look like this
I've seen some post online regarding the usage of trapz, but i'm not sure if it will work for a curve like that. Please enlighten me and thank you for the help!
If your distribution, f, is discretized on a set of points, x, that you know about, then you can use scipy.integrate.trapz or scipy.integrate.simps directly (pass f, x as arguments in that order). For a quick check (e.g. that your distribution is normalized), just sum the values of f and multiply by the grid spacing:
import numpy as np
from scipy.integrate import trapz, simps
x, dx = np.linspace(-100, 250, 50, retstep=True)
mean, sigma = 90, 20
f = np.exp(-((x-mean)/sigma)**2/2) / sigma / np.sqrt(2 * np.pi)
print('{:18.16f}'.format(np.sum(f)*dx))
print('{:18.16f}'.format(trapz(f, x)))
print('{:18.16f}'.format(simps(f, x)))
Output:
1.0000000000000002
0.9999999999999992
1.0000000000000016
Firstly, you have to find a function from a graph. You can check here. Then you can use integration in python with scipy. You can check here for integration.
It is just math stuff as Daniel Sanchez says.

Categories

Resources