I got dataframe to work with and when i test/split the data this error msg pops up
too many values to unpack (expected 2)
I just set my target columns Global as y value and rest of the columns as X for train_test_split. Not sure where to start fix this issues
X = df[['Year_of_Release', 'Critic_Score', 'Critic_Count',
'User_Score', 'User_Count', 'Platform_PC', 'Platform_PS3',
'Platform_PS4', 'Platform_Wii', 'Platform_X360',
'Platform_XOne', 'Genre_Action', 'Genre_Adventure', 'Genre_Fighting',
'Genre_Misc', 'Genre_Platform', 'Genre_Puzzle', 'Genre_Racing',
'Genre_Role-Playing', 'Genre_Shooter', 'Genre_Simulation',
'Genre_Sports', 'Genre_Strategy', 'Rating_E', 'Rating_E10+', 'Rating_M',
'Rating_RP', 'Rating_T']]
y = df[['Global']]
print(X.shape)
print(y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)
X_train, X_val = train_test_split(X_train, train_size=0.8, test_size=0.2, random_state=42)
target = 'Global'
y_train = X_train[target]
y_val = X_val[target]
y_test = X_test[target]
X_train = X_train.drop(columns=target)
X_val = X_val.drop(columns=target)
X_test = X_test.drop(columns=target)
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-75-d3fede999d7b> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train, X_test = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=42)
4
5
ValueError: too many values to unpack (expected 2)
X = df[['Year_of_Release', 'Critic_Score', 'Critic_Count',
'User_Score', 'User_Count', 'Platform_PC', 'Platform_PS3',
'Platform_PS4', 'Platform_Wii', 'Platform_X360',
'Platform_XOne', 'Genre_Action', 'Genre_Adventure', 'Genre_Fighting',
'Genre_Misc', 'Genre_Platform', 'Genre_Puzzle', 'Genre_Racing',
'Genre_Role-Playing', 'Genre_Shooter', 'Genre_Simulation',
'Genre_Sports', 'Genre_Strategy', 'Rating_E', 'Rating_E10+', 'Rating_M',
'Rating_RP', 'Rating_T']]
y = df[['Global']]
print(X.shape)
print(y.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_test_split from sklearn returns 4 values X_Train and y_train along with X_test y_test.
Refer to the official documentation here.
Also you should specify either of the test size or train size .
Related
I want to make image clasification using KNN. i use https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/ to make a model. i have 20 image which 10 image in dog category and 10 image in cat category. I'm having trouble entering the model into the KNN algorithm,there is a problem in my coding. this is my code:
knn_model=KNeighborsClassifier(n_neighbors=3) #define K=3
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
predict_knn=knn_model.predict(X_test)
print(predict_knn)
there is an error : found input variables with inconsistent numbers of samples: [60, 20]
I need your opinion how to fix this code. thank you.
The problem could be due to the inconsistent sample size of X and y.
1. len(y) == 20
# Works
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(20))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
2. len(y) == 60
# Does not work
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(60))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
The second script produces the below error.
i am trying to split data into training and validation data, for this i am using train_test_split from cuml.preprocessing.model_selection module.
but got an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-317-4e11838456ea> in <module>
----> 1 X_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)
/opt/conda/lib/python3.7/site-packages/cuml/preprocessing/model_selection.py in train_test_split(X, y, test_size, train_size, shuffle, random_state, seed, stratify)
454 X_train = X.iloc[0:train_size]
455 if y is not None:
--> 456 y_train = y.iloc[0:train_size]
457
458 if hasattr(X, "__cuda_array_interface__") or \
AttributeError: 'cupy.core.core.ndarray' object has no attribute 'iloc'
Although i am not using iloc.
here is code:
from cuml.preprocessing.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train_dfIF,train_y, test_size=0.20, random_state=42)
here train_dfIF is a cudf DataFrame and train_y is cupy array.
You cannot (currently) pass an array to the y parameter if your X parameter is a dataframe. I would recommend passing two dataframes or two arrays, not one of each.
from cuml.preprocessing.model_selection import train_test_split
import cudf
import cupy as cp
df = cudf.DataFrame({
"a":range(5),
"b":range(5)
})
y = cudf.Series(range(5))
# train_test_split(df, y.values, test_size=0.20, random_state=42) # fail
X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.20, random_state=42) # succeed
X_train, X_test, y_train, y_test = train_test_split(df.values, y.values, test_size=0.20, random_state=42) # succeed
from sklearn.model_selection import train_test_split
X = data.drop('Vickers Hardness\n(HV0.5)', axis=1)
y = data['Vickers Hardness\n(HV0.5)']
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.3)
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
ValueError: y should be a 1d array, got an array of shape (3, 5) instead.
Used data:
How to rectify this error in naive bayes? how can I put y in 1D array?
The assignments of the train/test split are not ordered right, use:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
I have a challenge using the sklearn 70-30 division. I receive an error on line:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error is:
Found input variables with inconsistent numbers of samples
Context
from imblearn.over_sampling import SMOTE
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
X_smote, y_smote= sm.fit_sample(X,y)
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')
Edit
I solve the error, I just had to put the stratify=y in stratify=y_smote
Just an observation in your line of code:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error thrown typically is a result of some input value that is expected to have a particular dimension or length that is consistent with other input values.
Check the length and/or dimensions of X_smote, y_smote and y to see if they are all as expected.
I got the same Issue but when I changed
x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
to
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
my error got removed.
I'm having a few issues with LinearRegression algorithm in Scikit Learn - I have trawled through the forums and Googled a lot, but for some reason, I haven't managed to bypass the error. I am using Python 3.5
Below is what I've attempted, but keep getting a value error:"Found input variables with inconsistent numbers of samples: [403, 174]"
X = df[["Impressions", "Clicks", "Eligible_Impressions", "Measureable_Impressions", "Viewable_Impressions"]].values
y = df["Total_Conversions"].values.reshape(-1,1)
print ("The shape of X is {}".format(X.shape))
print ("The shape of y is {}".format(y.shape))
The shape of X is (577, 5)
The shape of y is (577, 1)
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print (y_pred)
print ("The shape of X_train is {}".format(X_train.shape))
print ("The shape of y_train is {}".format(y_train.shape))
print ("The shape of X_test is {}".format(X_test.shape))
print ("The shape of y_test is {}".format(y_test.shape))
The shape of X_train is (403, 5)
The shape of y_train is (174, 5)
The shape of X_test is (403, 1)
The shape of y_test is (174, 1)
Am I missing something glaringly obvious?
Any help would be greatly appreciated.
Kind Regards,
Adrian
Looks like your Train and Tests contain different number of rows for X and y. And its because you're storing the return values of train_test_split() in the incorrect order
Change this
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
To this
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)