I got value error from train test split method - python

This is my code
X_train , X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
and this is what i got
ValueError: Found input variables with inconsistent numbers of samples: [7, 5000]
I have no idea what happend and i tried to run it over and over and that's all i got

What shapes do your matrices have? Seems, that X and y have different lengths.
Check it with print(X.shape, y.shape)

Related

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

I have a challenge using the sklearn 70-30 division. I receive an error on line:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error is:
Found input variables with inconsistent numbers of samples
Context
from imblearn.over_sampling import SMOTE
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
X_smote, y_smote= sm.fit_sample(X,y)
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')
Edit
I solve the error, I just had to put the stratify=y in stratify=y_smote
Just an observation in your line of code:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error thrown typically is a result of some input value that is expected to have a particular dimension or length that is consistent with other input values.
Check the length and/or dimensions of X_smote, y_smote and y to see if they are all as expected.
I got the same Issue but when I changed
x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
to
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
my error got removed.

How to split a tuple using train_test_split?

X = (569,30)
y = (569,)
X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)
I am expecting output as below:
X_train has shape (426, 30)
X_test has shape (143, 30)
y_train has shape (426,)
y_test has shape (143,)
But i am getting the following warning
ValueError: Found input variables with inconsistent numbers of samples: [2, 1]
I know that, i can get the desired output in another way, all the problems found in the online show that lengths of X and y are not same but in my case that's not the problem.
It seems that you're misunderstanding what train_test_split does. It is not expecting the shapes of the input arrays, what it does is to split the input arrays into train and test sets. So you must feed it the actual arrays, for instace:
X = np.random.rand(569,30)
y = np.random.randint(0,2,(569))
X_train, X_test, y_train, y_test = train_test_split(np.asarray(X),np.asarray(y),test_size = 0.25, random_state=0)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(426, 30)
(143, 30)
(426,)
(143,)

How to fix sklearn multiple linear regression ValueError in python (inconsistent numbers of samples: [2, 1])

I had my linear regression working perfectly with a single feature. Ever since trying to use two I get the following error: ValueError: Found input variables with inconsistent numbers of samples: [2, 1]
The first print statement is printing the following:
(2, 6497) (1, 6497)
Then the code crashes at the train_test_split phase.
Any ideas?
feat_scores = {}
X = df[['alcohol','density']].values.reshape(2,-1)
y = df['quality'].values.reshape(1,-1)
print (X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)
reg = LinearRegression()
reg.fit(X_train, y_train)
reg.predict(y_train)
Your missed out in this line
X = df[['alcohol','density']].values.reshape(2,-1)
y = df['quality'].values.reshape(1,-1)
Don't reshape the data into (2, 6497) (1, 6497), instead you have to give it as (6497,2) (6497,)
Sklearn takes the dataframes/Series directly. so you could give,
X = df[['alcohol','density']]
y = df['quality']
Also, you can predict only with X values, Hence
reg.predict(X_train)
or
reg.predict(X_test)

Pandas and scikit-learn - train_test_split dimensions of X, y

I have a pandas datafrane with the following info:
RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns)
X = df[df.columns[:-1]]
Y = df['my_Target']
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)
The last column is the target, and the rest is the data.
The shape is the following:
print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)
(616, 40) (304, 40) (616,) (304,)
However when I train a model:
model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))
it gives the following error:
model.fit(train_X,train_Y)
ValueError: Found input variables with inconsistent numbers of
samples: [616, 2]
Anyone got a hint about what is going on?
Your variables are in the wrong order:
X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
Per docs
X_train then X_test then y_train and then y_test
You have:
train_X,train_y,test_X, test_y

Sklearn | LinearRegression | Fit

I'm having a few issues with LinearRegression algorithm in Scikit Learn - I have trawled through the forums and Googled a lot, but for some reason, I haven't managed to bypass the error. I am using Python 3.5
Below is what I've attempted, but keep getting a value error:"Found input variables with inconsistent numbers of samples: [403, 174]"
X = df[["Impressions", "Clicks", "Eligible_Impressions", "Measureable_Impressions", "Viewable_Impressions"]].values
y = df["Total_Conversions"].values.reshape(-1,1)
print ("The shape of X is {}".format(X.shape))
print ("The shape of y is {}".format(y.shape))
The shape of X is (577, 5)
The shape of y is (577, 1)
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print (y_pred)
print ("The shape of X_train is {}".format(X_train.shape))
print ("The shape of y_train is {}".format(y_train.shape))
print ("The shape of X_test is {}".format(X_test.shape))
print ("The shape of y_test is {}".format(y_test.shape))
The shape of X_train is (403, 5)
The shape of y_train is (174, 5)
The shape of X_test is (403, 1)
The shape of y_test is (174, 1)
Am I missing something glaringly obvious?
Any help would be greatly appreciated.
Kind Regards,
Adrian
Looks like your Train and Tests contain different number of rows for X and y. And its because you're storing the return values of train_test_split() in the incorrect order
Change this
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
To this
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)

Categories

Resources