Sklearn | LinearRegression | Fit - python

I'm having a few issues with LinearRegression algorithm in Scikit Learn - I have trawled through the forums and Googled a lot, but for some reason, I haven't managed to bypass the error. I am using Python 3.5
Below is what I've attempted, but keep getting a value error:"Found input variables with inconsistent numbers of samples: [403, 174]"
X = df[["Impressions", "Clicks", "Eligible_Impressions", "Measureable_Impressions", "Viewable_Impressions"]].values
y = df["Total_Conversions"].values.reshape(-1,1)
print ("The shape of X is {}".format(X.shape))
print ("The shape of y is {}".format(y.shape))
The shape of X is (577, 5)
The shape of y is (577, 1)
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print (y_pred)
print ("The shape of X_train is {}".format(X_train.shape))
print ("The shape of y_train is {}".format(y_train.shape))
print ("The shape of X_test is {}".format(X_test.shape))
print ("The shape of y_test is {}".format(y_test.shape))
The shape of X_train is (403, 5)
The shape of y_train is (174, 5)
The shape of X_test is (403, 1)
The shape of y_test is (174, 1)
Am I missing something glaringly obvious?
Any help would be greatly appreciated.
Kind Regards,
Adrian

Looks like your Train and Tests contain different number of rows for X and y. And its because you're storing the return values of train_test_split() in the incorrect order
Change this
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
To this
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)

Related

how to input the model into the KNN classification algorithm?

I want to make image clasification using KNN. i use https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/ to make a model. i have 20 image which 10 image in dog category and 10 image in cat category. I'm having trouble entering the model into the KNN algorithm,there is a problem in my coding. this is my code:
knn_model=KNeighborsClassifier(n_neighbors=3) #define K=3
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
predict_knn=knn_model.predict(X_test)
print(predict_knn)
there is an error : found input variables with inconsistent numbers of samples: [60, 20]
I need your opinion how to fix this code. thank you.
The problem could be due to the inconsistent sample size of X and y.
1. len(y) == 20
# Works
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(20))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
2. len(y) == 60
# Does not work
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(60))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
The second script produces the below error.

How to solve "ValueError: y should be a 1d array, got an array of shape (3, 5) instead." for naive Bayes?

from sklearn.model_selection import train_test_split
X = data.drop('Vickers Hardness\n(HV0.5)', axis=1)
y = data['Vickers Hardness\n(HV0.5)']
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.3)
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
ValueError: y should be a 1d array, got an array of shape (3, 5) instead.
Used data:
How to rectify this error in naive bayes? how can I put y in 1D array?
The assignments of the train/test split are not ordered right, use:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

How to split a tuple using train_test_split?

X = (569,30)
y = (569,)
X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)
I am expecting output as below:
X_train has shape (426, 30)
X_test has shape (143, 30)
y_train has shape (426,)
y_test has shape (143,)
But i am getting the following warning
ValueError: Found input variables with inconsistent numbers of samples: [2, 1]
I know that, i can get the desired output in another way, all the problems found in the online show that lengths of X and y are not same but in my case that's not the problem.
It seems that you're misunderstanding what train_test_split does. It is not expecting the shapes of the input arrays, what it does is to split the input arrays into train and test sets. So you must feed it the actual arrays, for instace:
X = np.random.rand(569,30)
y = np.random.randint(0,2,(569))
X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(426, 30)
(143, 30)
(426,)
(143,)

too many values to unpack (expected 2)?

I got dataframe to work with and when i test/split the data this error msg pops up
too many values to unpack (expected 2)
I just set my target columns Global as y value and rest of the columns as X for train_test_split. Not sure where to start fix this issues
X = df[['Year_of_Release', 'Critic_Score', 'Critic_Count',
'User_Score', 'User_Count', 'Platform_PC', 'Platform_PS3',
'Platform_PS4', 'Platform_Wii', 'Platform_X360',
'Platform_XOne', 'Genre_Action', 'Genre_Adventure', 'Genre_Fighting',
'Genre_Misc', 'Genre_Platform', 'Genre_Puzzle', 'Genre_Racing',
'Genre_Role-Playing', 'Genre_Shooter', 'Genre_Simulation',
'Genre_Sports', 'Genre_Strategy', 'Rating_E', 'Rating_E10+', 'Rating_M',
'Rating_RP', 'Rating_T']]
y = df[['Global']]
print(X.shape)
print(y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)
X_train, X_val = train_test_split(X_train, train_size=0.8, test_size=0.2, random_state=42)
target = 'Global'
y_train = X_train[target]
y_val = X_val[target]
y_test = X_test[target]
X_train = X_train.drop(columns=target)
X_val = X_val.drop(columns=target)
X_test = X_test.drop(columns=target)
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-75-d3fede999d7b> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train, X_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)
4
5
ValueError: too many values to unpack (expected 2)
X = df[['Year_of_Release', 'Critic_Score', 'Critic_Count',
'User_Score', 'User_Count', 'Platform_PC', 'Platform_PS3',
'Platform_PS4', 'Platform_Wii', 'Platform_X360',
'Platform_XOne', 'Genre_Action', 'Genre_Adventure', 'Genre_Fighting',
'Genre_Misc', 'Genre_Platform', 'Genre_Puzzle', 'Genre_Racing',
'Genre_Role-Playing', 'Genre_Shooter', 'Genre_Simulation',
'Genre_Sports', 'Genre_Strategy', 'Rating_E', 'Rating_E10+', 'Rating_M',
'Rating_RP', 'Rating_T']]
y = df[['Global']]
print(X.shape)
print(y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_test_split from sklearn returns 4 values X_Train and y_train along with X_test y_test.
Refer to the official documentation here.
Also you should specify either of the test size or train size .

Value Error faced during my logistic regression code

I am getting value error related to shape of input when I am traning logistic model
titanic_data = pd.read_csv("E:\\Python\\CSV\\train.csv")
titanic_data.drop('Cabin', axis=1, inplace=True)
titanic_data.dropna(inplace=True)
#print(titanic_data.head(10))
new_sex = pd.get_dummies(titanic_data['Sex'],drop_first=True)
new_embarked = pd.get_dummies(titanic_data['Embarked'],drop_first=True)
new_pcl = pd.get_dummies(titanic_data['Pclass'],drop_first=True)
titanic_data = pd.concat([titanic_data,new_sex,new_embarked,new_pcl],axis=1)
titanic_data.drop(['PassengerId','Pclass','Name','Sex','Ticket','Embarked','Age','Fare'],axis=1,inplace=True)
X = titanic_data.drop(['Survived'],axis=1)
y = titanic_data['Survived']
print(X)
print(y)
X_train, y_train, X_test, y_test = train_test_split(X,y,test_size=0.3, random_state=1)
logreg = LogisticRegression()
logreg.fit(X_train,y_train)
Error
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (214, 7)
you are unpacking your split data in to the wrong variables the order should be as follows:
X_train, X_test, y_train, y_test = train_test_split(...)
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Categories

Resources