Python sklearn polynomial preprocessing and dimensional problems - python

I am experimenting the fit of 1-3 degree polynomial transformation to the original data using 100 predicted values each. I first 1) reshaped the original data, 2) applied fit_transform on the test set and prediction space (of data features), 3) obtained linear prediction on the prediction space, and 4) exported them into an array, using the following code:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
np.random.seed(0)
n = 100
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+n/6 + np.random.randn(n)/10
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
pred_data = np.linspace(0,10,100).reshape(-1,1)
results = []
for i in [1, 2, 3] :
poly = PolynomialFeatures(degree = i)
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)
x_poly1 = poly.fit_transform(x_train)
pred_data = poly.fit_transform(pred_data)
linreg1 = LinearRegression().fit(x_poly1, y_train)
pred = linreg1.predict(pred_data)
results.append(pred)
results
However, I did not get what I wanted, Python did not return an array of (3, 100) shape as I was expecting and, in fact, I received an error message
ValueError: shapes (100,10) and (4,1) not aligned: 10 (dim 1) != 4 (dim 0)
Seems to be a dimensional problem resulting either from "reshape" or from the "fit_transform" step. I got confused as this was supposed to be straightforward test. Would anyone enlighten me on this? It will be much appreciated.
Thank you.
Sincerely,

First, as I suggested in comment, you should always call just transform() on test data (pred_data in your case).
But even if you do that, a different error occurs. The error is due to this line:
pred_data = poly.fit_transform(pred_data)
Here you are replacing the original pred_data with the transformed version. So for first iteration of loop, it works, but for second and third iteration it becomes invalid, because it requires the original pred_data of shape (100,1) defined in this line above the for loop:
pred_data = np.linspace(0,10,100).reshape(-1,1)
Change the name of variable inside the loop to something else and all works well.
for i in [1, 2, 3] :
poly = PolynomialFeatures(degree = i)
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=0)
x_poly1 = poly.fit_transform(x_train)
# Changed here
pred_data_poly1 = poly.transform(pred_data)
linreg1 = LinearRegression().fit(x_poly1, y_train)
pred = linreg1.predict(pred_data_poly1)
results.append(pred)
results

Related

After executing the last line i get following error: ValueError: y should be a 1d array, got an array of shape (4457, 2) instead

y = pd.get_dummies(messages['label'])
y = y.iloc[:,1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,random_state = 0)
from sklearn.naive_bayes import MultinomialNB
spam_detect_model = MultinomialNB().fit(X_train, y_train)
y_pred = spam_detect_model.predict(y_test)
<after this getting this error ValueError: y should be a 1d array, got an array of shape (4457, 2) instead.>
First of all, directly under your y variable, do this to convert it to integer:
y = y.apply(lambda x: x.argmax(), axis=1).values
And y_pred = spam_detect_model.predict(X_test) not y_test.

ROC curve for each class

Code:
df = pd.read_csv(r"model_data_TBI_3(o).csv", index_col= ["X","Y"])
X = df.drop("class", axis= "columns")
y = df["class"]
array = y.values
y = array[: :]
y = label_binarize(y, classes= [1,2,3], pos_label= 1, neg_label= 0)
n_classes = y.shape[1]
print(n_classes)
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=45, train_size=0.10)
mm = MinMaxScaler()
mm.fit(X_train)
X_train_mm = mm.transform(X_train)
X_test_mm = mm.transform(X_test)
svm_tuned = SVC(kernel= "rbf", C=2.33, gamma= 0.00046, probability= True,
break_ties= False, decision_function_shape="ovo",
random_state=1, shrinking=True, tol=0.12,
class_weight={1:100, 1:75, 2:55})
y_score = svm_tuned.fit(X_train_mm, y_train).decision_function(X_test_mm)
Result:
ValueError: y should be a 1d array, got an array of shape (329, 3) instead.
I want to plot the ROC curve of my svm model. My data has 1,2,3 classes. I binarized them using label binarizer but still getting this error. The traceback error is at y_score. My request to those who wanna clear my doubt or solution please dont send me the code of iris data i can look it up on sklearn website just give me explanation or write your code as a solution to this problem. I am really thankful to those who will answer. Sorry if i make any mistake in posting questions i am new to stackoverflow and new to python & machine learning.
Thank you

Error when calculating predicted values of polynomial regression python

I am trying to calculate predicted values after running a polynomial regression in Python using the following code:
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x) + x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
X = X_train.reshape(-1, 1)
X_predict = np.linspace(0, 10, 100)
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X)
model = LinearRegression()
reg_poly = model.fit(X_train_poly, y_train)
y_predict = model.predict(X_predict)
After running it I get the following error:
ValueError: Expected 2D array, got 1D array instead:
array=[ 0. 0.1010101 0.2020202 0.3030303 0.4040404 0.50505051 ......
Reshape your data either using array.reshape(-1, 1)
if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I tried reshaping the array as was said in the error message, so the last line of code would be:
y_predict = model.predict(X_predict.reshape(-1,1))
But as a result I got this error:
ValueError: shapes (100,1) and (3,) not aligned: 1 (dim 1) != 3 (dim 0)
Can someone please explain what I am doing wrong?
You forgot to prepare data for your prediction in the same way you prepared training data for the model. In particular, you forgot to fit_transform your X_predict with PolynomialFeatures.
Since the shape of data you used to predict have to exactly match the shape used for training, you need to recreate all you did for X_train_poly (you used that for training) for X_predict. Therefore your line should look like:
y_predict = model.predict(poly.fit_transform(X_predict.reshape(-1, 1)))

what is the problem with my code linreg.predict() not giving out right answer?

SO The question given to me was
Write a function that fits a polynomial LinearRegression model on the training data X_train for degrees 1, 3, 6, and 9. (Use PolynomialFeatures in sklearn.preprocessing to create the polynomial features and then fit a linear regression model) For each model, find 100 predicted values over the interval x = 0 to 10 (e.g. np.linspace(0,10,100)) and store this in a numpy array. The first row of this array should correspond to the output from the model trained on degree 1, the second row degree 3, the third row degree 6, and the fourth row degree 9.
So tried the problem myself and failed and saw some other persons GitHub code and was very similar to me but it worked.
So what is the difference between my code and the other person code?
Here is some basic code prior to my question
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
Here is my approach
pred=np.linspace(0,10,100).reshape(100,1)
k=np.zeros((4,100))
for count,i in enumerate([1,3,6,9]):
poly = PolynomialFeatures(degree=i)
X_poly = poly.fit_transform(X_train.reshape(-1,1))
linreg = LinearRegression()
linreg.fit(X_poly,y_train.reshape(-1,1))
pred = poly.fit_transform(pred.reshape(-1,1))
t=linreg.predict(pred)
#print(t) #used for debugging
print("### **** ####") #used for debugging
k[count,:]=t.reshape(1,-1)
print(k)
Here is the code that works
result = np.zeros((4, 100))
for i, degree in enumerate([1, 3, 6, 9]):
poly = PolynomialFeatures(degree=degree)
X_poly = poly.fit_transform(X_train.reshape(11,1))
linreg = LinearRegression().fit(X_poly, y_train)
y=linreg.predict(poly.fit_transform(np.linspace(0,10,100).reshape(100,1)))
result[i, :] = y
print(result)
My approach got an error
13 print("### **** ####")
---> 14 k[count,:]=t.reshape(1,-1)
15
16
ValueError: could not broadcast input array from shape (200) into shape (100)
While other code worked fine
The difference lies in the argument for linreg.predict. You are overwriting your pred variable with the result of poly.fit_transform, which changes it's shape from (100,1) to (200,2) in the first iteration of the loop. In the second iteration, t does not fit into k anymore, resulting in the error you are facing.

How to fix sklearn multiple linear regression ValueError in python (inconsistent numbers of samples: [2, 1])

I had my linear regression working perfectly with a single feature. Ever since trying to use two I get the following error: ValueError: Found input variables with inconsistent numbers of samples: [2, 1]
The first print statement is printing the following:
(2, 6497) (1, 6497)
Then the code crashes at the train_test_split phase.
Any ideas?
feat_scores = {}
X = df[['alcohol','density']].values.reshape(2,-1)
y = df['quality'].values.reshape(1,-1)
print (X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)
reg = LinearRegression()
reg.fit(X_train, y_train)
reg.predict(y_train)
Your missed out in this line
X = df[['alcohol','density']].values.reshape(2,-1)
y = df['quality'].values.reshape(1,-1)
Don't reshape the data into (2, 6497) (1, 6497), instead you have to give it as (6497,2) (6497,)
Sklearn takes the dataframes/Series directly. so you could give,
X = df[['alcohol','density']]
y = df['quality']
Also, you can predict only with X values, Hence
reg.predict(X_train)
or
reg.predict(X_test)

Categories

Resources