Print predict ValueError: Expected 2D array, got 1D array instead - python

The error shows in my last two codes.
ValueError: Expected 2D array, got 1D array instead: array=[0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
import numpy as np
import pandas as pd
from sklearn.model_selection import ShuffleSplit
%matplotlib inline
df = pd.read_csv('.......csv')
df.drop(['Company'], 1, inplace=True)
x = pd.DataFrame(df.drop(['R&D Expense'],1))
y = pd.DataFrame(df['R&D Expense'])
X_test = x.index[[0,1]]
y_test = y.index[[0,1]]
X_train = x.drop(x.index[[0,1]])
y_train = y.drop(y.index[[0,1]])
from sklearn.metrics import r2_score
def performance_metric(y_true, y_predict):
score = r2_score(y_true, y_predict)
return score
from sklearn.metrics import make_scorer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
def fit_model_shuffle(x, y):
cv_sets = ShuffleSplit(n_splits = 10, test_size = 0.20, random_state = 0)
regressor = KNeighborsRegressor()
params = {'n_neighbors':range(3,10)}
scoring_fnc = make_scorer(performance_metric)
grid = GridSearchCV(regressor, param_grid=params,scoring=scoring_fnc,cv=cv_sets)
grid = grid.fit(x, y)
return grid.best_estimator_
reg = fit_model_shuffle(X_train, y_train)
> for i, y_predict in enumerate(reg.predict(X_test),1):
print(i, y_predict)

The error message is self-explanatory. Your library expects the input to be a 2D matrix, with one pattern per row. So, if you are doing regression with just one input, before passing it to the regressor, do
my_data = my_data.reshape(-1, 1)
to make a 2X1 shaped matrix
On the other hand (unlikely), if you have a single vector [0, 1]
my_data = my_data.reshape(1, -1)
to make a 1X2 matrix

Related

Getting 'ValueError: Expected 2D array, got 1D array instead: array=[-0.27861589].' in python for predicting SVR regrssion

I am having trouble to solve the array dimension problem showing in the code. When I am trying to figure out the y_predict, the valueerror problem is showing. here is the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#importing dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:,1:2].values
y = dataset.iloc[:,2].values
y=np.reshape(y,(10,1))
#Spliting dataset into training set and test set
'''from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 0)'''
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
######## SVR regression
from sklearn.svm import SVR
svr_regressor = SVR(kernel='rbf') #rbf = gaussian kernel
svr_regressor.fit(X, y)
#Prediction of given value using SVR regression
X = np.reshape(X,(-1, 1))
y_predict = sc_y.inverse_transform(svr_regressor.predict(sc_X.transform([[6.5]])))
########### Visulization of svr model
plt.scatter(X, y, color = 'blue')
plt.plot(X, svr_regressor.predict(X), color = 'red')
plt.show()
I am getting error:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.27861589].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I keep getting this error: y should be a 1d array, got an array of shape (576, 8) instead

im trying to test and train my data using the MLPRegressor, but always end up with the same error for the line "classifier.fit(xtrain, ttrain)", I'm relatively new to python so this has had me stuck for a while now
iv tried to use ravel() and reshaping it but neither has worked
anyone got any good advice?
data = pd.read_excel("data.xlsx")
print(data)
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
inputs = data.values[:,:8].astype(float)
#Normalize the inputs
scaler = MinMaxScaler()
scaled = scaler.fit_transform(inputs)
print(inputs.ptp(axis=0))
print(scaled.ptp(axis=0))
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_excel("ENB2012_data.xlsx")
x = df.iloc[:,1:2].values
y = df.iloc[:,2].values
df
from sklearn.neural_network import MLPRegressor
regressor = MLPRegressor(max_iter=5000)
regressor.fit(inputs, scaled)
outputs = regressor.predict(inputs)
print("MLP Regressor: \n", outputs)
from sklearn.metrics import mean_absolute_error
regressor = MLPRegressor(max_iter=5000)
regressor.fit(inputs, scaled)
outputs = regressor.predict(inputs)
print(mean_absolute_error(outputs, scaled))
from numpy.lib.shape_base import split
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.base import ClassifierMixin
#split the data to training and testing
xtrain, xtest, ttrain, ttest = train_test_split(outputs, scaled)
#train the classifiers
classifier = SVC(gamma = "auto")
classifier.fit(xtrain, ttrain)
ytrain = classifier(xtrain)
ytest = classifier.predict(xtest)
Edit:
ValueError Traceback (most recent call
last)
<ipython-input-229-62670b265a4b> in <module>()
4 #train the classifiers
5 classifier = SVC(gamma = "auto")
----> 6 classifier.fit(xtrain, ttrain)
7 ytrain = classifier(xtrain)
8 ytest = classifier.predict(xtest)
4 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in
column_or_1d(y, warn)
1037
1038 raise ValueError(
1039 "y should be a 1d array, got an array of shape {}
instead.".format(shape)
1040 )
1041
ValueError: y should be a 1d array, got an array of shape (576, 8)
instead.
While we need more context to answer your question, you should start with using np.shape(input) to check the shape of your input variable.
You should check after this line:
inputs = data.values[:,:8].astype(float)
# Check with np.shape
print(np.shape(input))
# Expected output:
...we need to know this
You want a 1-dimensional array for this. Here are a few examples,
# This is a 3x1 shape (1-dimensional).
np.shape([0, 1, 2])
# This is a 3x3 shape (multi-dimensional).
np.shape([[0, 1, 2],[3, 4, 5],[6, 7, 8]])

Expected 2d array but got scalar array instead

I am getting this error
ValueError: Expected 2D array, got scalar array instead: array=6.5.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
while executing this code
# SVR
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import SVR
# Load dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Fitting the SVR to the data set
regressor = SVR(kernel = 'rbf', gamma = 'auto')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
You need to understand how SVM works. Your trainig data is a matrix of shape (n_samples, n_features). That means, your SVM operates in feature space of n_features dimensions. Hence, it cannot predict a value for a scalar input, unless n_features is 1. You can only predict values for vectors of dimension n_features. So, if your data set has 5 columns, you can predict values for an arbitrary row-vector of 5 columns. See the below example.
import numpy as np
from sklearn.svm import SVR
# Data: 200 instances of 5 features each
X = randint(1, 100, size=(200, 5))
y = randint(0, 2, size=200)
reg = SVR()
reg.fit(X, y)
y_test = np.array([[0, 1, 2, 3, 4]]) # Input to .predict must be 2-dimensional
reg.predict(y_test)
# Predicting a new result with Linear Regression
X_test = np.array([[6.5]])
print(lin_reg.predict(X_test))
# Predicting a new result with Polynomial Regression
print(lin_reg_2.predict(poly_reg.fit_transform(X_test)))

Expected 2D array, got 1D array instead, any solution?

I'm new using Machine Learning and I am trying to predict the price of the stocks in 30 days.
This is my code:
import pandas as pd
import matplotlib.pyplot as plt
import pymysql as MySQLdb
import numpy as np
import sqlalchemy
import datetime
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
forecast_out = int(30)
df['Prediction'] = df[['LastPrice']].shift(-forecast_out)
df['Prediction'].fillna(0)
X = np.array(df['Prediction'].fillna(0))
X = preprocessing.scale(X)
X_forecast = X[-forecast_out:]
X = X[:-forecast_out]
y = np.array(df['Prediction'].fillna(0))
y = y[:-forecast_out]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train, X_test, y_train, y_test.reshape(-1,1)
# Training
clf = LinearRegression()
clf.fit(X_train,y_train)
# Testing
confidence = clf.score(X_test, y_test)
print("confidence: ", confidence)
forecast_prediction = clf.predict(X_forecast)
print(forecast_prediction)
I got this error:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.46939923 -0.47076913 -0.47004993 ... -0.42782272 3.07433019 -0.46573474].
Reshape your data either using
array.reshape(-1, 1) if your data has a single feature
or
array.reshape(1, -1) if it contains a single sample.
It's expecting a 2D Array when you're only passing in a 1D Array. You can solve this by putting another set of brackets around where you're getting the probelm. For example
x = [1,2,3,4]
Foo(x)
If that throws the error, you could just do
Foo([x])

Naivebayes MultinomialNB scikit-learn/sklearn

I am bulding a naive bayes classifier and I follow the tutorial on the scikit-learn website.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
import csv
import string
from sklearn.cross_validation import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Importing dataset
data = pd.read_csv("test.csv", quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True,error_bad_lines=False)
df2 = data.set_index("name", drop = False)
df2['sentiment'] = df2['rating'].apply(lambda rating : +1 if rating > 3 else -1)
train, test = train_test_split(df2, test_size=0.2)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(traintrain['review'])
test_matrix = count_vect.transform(testrain['review'])
clf = MultinomialNB().fit(X_train_tfidf, train['sentiment'])
The first argument is the vocabulary dictionary and it returns a Document-Term matrix.
What should be the second argument,twenty_train.target?
Edit Data example
Name, review,rating
film1,......,1
film2, the film is....,5
film3, film about..., 4
with this instruction I created a new column , if the rating is >3 so the review is positive, else it is negative
df2['sentiment'] = df2['rating'].apply(lambda rating : +1 if rating > 3 else -1)
The fit method of MultinomialNB expects as input the x and y.
Now, x should be the training vectors (training data) and y should be the target values.
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)
In more detail:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is
the number of features.
y : array-like, shape = [n_samples]
Target values.
Note: Make sure that shape = [n_samples, n_features] and shape = [n_samples] of x and y are defined correctly. Otherwise, the fit will throw an error.
Toy example:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
newsgroups_train = fetch_20newsgroups(subset='train')
categories = ['alt.atheism', 'talk.religion.misc',
'comp.graphics', 'sci.space']
newsgroups_train = fetch_20newsgroups(subset='train',
categories=categories)
vectorizer = TfidfVectorizer()
# the following will be the training data
vectors = vectorizer.fit_transform(newsgroups_train.data)
vectors.shape
newsgroups_test = fetch_20newsgroups(subset='test',
categories=categories)
# this is the test data
vectors_test = vectorizer.transform(newsgroups_test.data)
clf = MultinomialNB(alpha=.01)
# the fitting is done using the TRAINING data
# Check the shapes before fitting
vectors.shape
#(2034, 34118)
newsgroups_train.target.shape
#(2034,)
# fit the model using the TRAINING data
clf.fit(vectors, newsgroups_train.target)
# the PREDICTION is done using the TEST data
pred = clf.predict(vectors_test)
EDIT:
The newsgroups_train.target is just a numpy array that contains the labels (or targets or classes).
import numpy as np
newsgroups_train.target
array([1, 3, 2, ..., 1, 0, 1])
np.unique(newsgroups_train.target)
array([0, 1, 2, 3])
So in this example we have 4 different classes/targets.
This variable is needed in order to fit a classifier.

Categories

Resources