I received this error while practicing the Simple Linear Regression Model; I assume there is an issue with my set of data.
Here is the Error
ValueError: Expected 2D array, got 1D array instead:
array=[1140. 1635. 1755. 1354. 1978. 1696. 1212. 2736. 1055. 2839. 2325. 1688.
2733. 2332. 2159. 2133.].
Here is my Dataset
Here the code
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv('C:/Users/AgroMech/Desktop/ASDS/data.csv')
df.shape
print(df.duplicated())
df.isnull().any()
df.isnull().sum()
df.dropna(inplace = True)
x=df["Area"]
y=df["Price"]
df.describe()
reg = linear_model.LinearRegression()
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
x_train.head()
reg=LinearRegression()
reg.fit(x_train,y_train)
LinearRegression(copy_x=True, fit_intercept=True, n_jobs=1, normalize=False)
reg.coef_
reg.predict(x_test)
np.mean((reg.predict(x_test) - y_test)**2)
As the error suggests when executing reg.fit(x_train, y_train):
ValueError: Expected 2D array, got 1D array instead:
array=[1140. 1635. 1755. 1354. 1978. 1696. 1212. 2736. 1055. 2839. 2325. 1688.
2733. 2332. 2159. 2133.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
This means your arrays don't have the right shape for reg.fit(). You can reshape them explicitly:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
x_train = x_train.values.reshape(-1,1)
x_test = x_test.values.reshape(-1,1)
y_train = y_train.values.reshape(-1,1)
y_test = y_test.values.reshape(-1,1)
or you can reshape your original x and y values:
x = df[['Area']]
y = df[['Price']]
Also note that LinearRegression takes a copy_X argument and not copy_x.
The easiest way to reshape your x variable (from a 1D array to a 2D) is:
x = df[["Area"]]
Related
Here's my code:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
california_housing = fetch_california_housing(as_frame=True)
data = california_housing.frame
X = data.drop(columns=['MedHouseVal'])
y = data['MedHouseVal']
model = LinearRegression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
model.score(predictions, y_test)
Here's the error message:
ValueError: Expected 2D array, got 1D array instead: array=[0.71912284
1.76401657 2.70965883 ... 4.46877017 1.18751119 2.00940251]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
score needs to be called on the testing features and not output:
model.fit(X_train, y_train)
model.score(X_test, y_test)
y = pd.get_dummies(messages['label'])
y = y.iloc[:,1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,random_state = 0)
from sklearn.naive_bayes import MultinomialNB
spam_detect_model = MultinomialNB().fit(X_train, y_train)
y_pred = spam_detect_model.predict(y_test)
<after this getting this error ValueError: y should be a 1d array, got an array of shape (4457, 2) instead.>
First of all, directly under your y variable, do this to convert it to integer:
y = y.apply(lambda x: x.argmax(), axis=1).values
And y_pred = spam_detect_model.predict(X_test) not y_test.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('C:/Users/Dell/Desktop/Salary.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,
random_state=0)
from sklearn.linear_model import LinearRegression
simplelinearRegresson = LinearRegression()
simplelinearRegresson.fit(X_train, y_train)
y_predict = simplelinearRegresson.predict(X_test)
Below line has error:
y_predict_val = simplelinearRegresson.predict(11)
You need to convert your scalar to a 2D array with shape (number of samples, number of features).
y_predict_val = simplelinearRegresson.predict([[11]])
This is what the predict method expects. See docs for more info.
I'm new using Machine Learning and I am trying to predict the price of the stocks in 30 days.
This is my code:
import pandas as pd
import matplotlib.pyplot as plt
import pymysql as MySQLdb
import numpy as np
import sqlalchemy
import datetime
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
forecast_out = int(30)
df['Prediction'] = df[['LastPrice']].shift(-forecast_out)
df['Prediction'].fillna(0)
X = np.array(df['Prediction'].fillna(0))
X = preprocessing.scale(X)
X_forecast = X[-forecast_out:]
X = X[:-forecast_out]
y = np.array(df['Prediction'].fillna(0))
y = y[:-forecast_out]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train, X_test, y_train, y_test.reshape(-1,1)
# Training
clf = LinearRegression()
clf.fit(X_train,y_train)
# Testing
confidence = clf.score(X_test, y_test)
print("confidence: ", confidence)
forecast_prediction = clf.predict(X_forecast)
print(forecast_prediction)
I got this error:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.46939923 -0.47076913 -0.47004993 ... -0.42782272 3.07433019 -0.46573474].
Reshape your data either using
array.reshape(-1, 1) if your data has a single feature
or
array.reshape(1, -1) if it contains a single sample.
It's expecting a 2D Array when you're only passing in a 1D Array. You can solve this by putting another set of brackets around where you're getting the probelm. For example
x = [1,2,3,4]
Foo(x)
If that throws the error, you could just do
Foo([x])
I would like to normalize a training and test data set using MinMaxScaler in sklearn.preprocessing. However, the package does not appear to be accepting my test data set.
import pandas as pd
import numpy as np
# Read in data.
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',
header=None)
df_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash',
'Alcalinity of ash', 'Magnesium', 'Total phenols',
'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins',
'Color intensity', 'Hue', 'OD280/OD315 of diluted wines',
'Proline']
# Split into train/test data.
from sklearn.model_selection import train_test_split
X = df_wine.iloc[:, 1:].values
y = df_wine.iloc[:, 0].values
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3,
random_state = 0)
# Normalize features using min-max scaling.
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
X_train_norm = mms.fit_transform(X_train)
X_test_norm = mms.transform(X_test)
When executing this, I get a DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. along with a ValueError: operands could not be broadcast together with shapes (124,) (13,) (124,).
Reshaping the data still yields an error.
X_test_norm = mms.transform(X_test.reshape(-1, 1))
This reshaping yields an error ValueError: non-broadcastable output operand with shape (124,1) doesn't match the broadcast shape (124,13).
Any input on how to get fix this error would be helpful.
The partitioning of train/test data must be specified in the same order as the input array to the train_test_split() function for it to unpack them corresponding to that order.
Clearly, when the order was specified as X_train, y_train, X_test, y_test, the resulting shapes of y_train (len(y_train)=54) and X_test (len(X_test)=124) got swapped resulting in the ValueError.
Instead, you must:
# Split into train/test data.
# _________________________________
# | | \
# | | \
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# | | /
# |__________|_____________________________________/
# (or)
# y_train, y_test, X_train, X_test = train_test_split(y, X, test_size=0.3, random_state=0)
# Normalize features using min-max scaling.
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
X_train_norm = mms.fit_transform(X_train)
X_test_norm = mms.transform(X_test)
produces:
X_train_norm[0]
array([ 0.72043011, 0.20378151, 0.53763441, 0.30927835, 0.33695652,
0.54316547, 0.73700306, 0.25 , 0.40189873, 0.24068768,
0.48717949, 1. , 0.5854251 ])
X_test_norm[0]
array([ 0.72849462, 0.16386555, 0.47849462, 0.29896907, 0.52173913,
0.53956835, 0.74311927, 0.13461538, 0.37974684, 0.4364852 ,
0.32478632, 0.70695971, 0.60566802])