I'm new using Machine Learning and I am trying to predict the price of the stocks in 30 days.
This is my code:
import pandas as pd
import matplotlib.pyplot as plt
import pymysql as MySQLdb
import numpy as np
import sqlalchemy
import datetime
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
forecast_out = int(30)
df['Prediction'] = df[['LastPrice']].shift(-forecast_out)
df['Prediction'].fillna(0)
X = np.array(df['Prediction'].fillna(0))
X = preprocessing.scale(X)
X_forecast = X[-forecast_out:]
X = X[:-forecast_out]
y = np.array(df['Prediction'].fillna(0))
y = y[:-forecast_out]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train, X_test, y_train, y_test.reshape(-1,1)
# Training
clf = LinearRegression()
clf.fit(X_train,y_train)
# Testing
confidence = clf.score(X_test, y_test)
print("confidence: ", confidence)
forecast_prediction = clf.predict(X_forecast)
print(forecast_prediction)
I got this error:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.46939923 -0.47076913 -0.47004993 ... -0.42782272 3.07433019 -0.46573474].
Reshape your data either using
array.reshape(-1, 1) if your data has a single feature
or
array.reshape(1, -1) if it contains a single sample.
It's expecting a 2D Array when you're only passing in a 1D Array. You can solve this by putting another set of brackets around where you're getting the probelm. For example
x = [1,2,3,4]
Foo(x)
If that throws the error, you could just do
Foo([x])
Related
I am having trouble to solve the array dimension problem showing in the code. When I am trying to figure out the y_predict, the valueerror problem is showing. here is the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#importing dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:,1:2].values
y = dataset.iloc[:,2].values
y=np.reshape(y,(10,1))
#Spliting dataset into training set and test set
'''from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 0)'''
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
######## SVR regression
from sklearn.svm import SVR
svr_regressor = SVR(kernel='rbf') #rbf = gaussian kernel
svr_regressor.fit(X, y)
#Prediction of given value using SVR regression
X = np.reshape(X,(-1, 1))
y_predict = sc_y.inverse_transform(svr_regressor.predict(sc_X.transform([[6.5]])))
########### Visulization of svr model
plt.scatter(X, y, color = 'blue')
plt.plot(X, svr_regressor.predict(X), color = 'red')
plt.show()
I am getting error:
ValueError: Expected 2D array, got 1D array instead:
array=[-0.27861589].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I received this error while practicing the Simple Linear Regression Model; I assume there is an issue with my set of data.
Here is the Error
ValueError: Expected 2D array, got 1D array instead:
array=[1140. 1635. 1755. 1354. 1978. 1696. 1212. 2736. 1055. 2839. 2325. 1688.
2733. 2332. 2159. 2133.].
Here is my Dataset
Here the code
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv('C:/Users/AgroMech/Desktop/ASDS/data.csv')
df.shape
print(df.duplicated())
df.isnull().any()
df.isnull().sum()
df.dropna(inplace = True)
x=df["Area"]
y=df["Price"]
df.describe()
reg = linear_model.LinearRegression()
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
x_train.head()
reg=LinearRegression()
reg.fit(x_train,y_train)
LinearRegression(copy_x=True, fit_intercept=True, n_jobs=1, normalize=False)
reg.coef_
reg.predict(x_test)
np.mean((reg.predict(x_test) - y_test)**2)
As the error suggests when executing reg.fit(x_train, y_train):
ValueError: Expected 2D array, got 1D array instead:
array=[1140. 1635. 1755. 1354. 1978. 1696. 1212. 2736. 1055. 2839. 2325. 1688.
2733. 2332. 2159. 2133.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
This means your arrays don't have the right shape for reg.fit(). You can reshape them explicitly:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=4)
x_train = x_train.values.reshape(-1,1)
x_test = x_test.values.reshape(-1,1)
y_train = y_train.values.reshape(-1,1)
y_test = y_test.values.reshape(-1,1)
or you can reshape your original x and y values:
x = df[['Area']]
y = df[['Price']]
Also note that LinearRegression takes a copy_X argument and not copy_x.
The easiest way to reshape your x variable (from a 1D array to a 2D) is:
x = df[["Area"]]
Here's my code:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
california_housing = fetch_california_housing(as_frame=True)
data = california_housing.frame
X = data.drop(columns=['MedHouseVal'])
y = data['MedHouseVal']
model = LinearRegression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
model.score(predictions, y_test)
Here's the error message:
ValueError: Expected 2D array, got 1D array instead: array=[0.71912284
1.76401657 2.70965883 ... 4.46877017 1.18751119 2.00940251]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
score needs to be called on the testing features and not output:
model.fit(X_train, y_train)
model.score(X_test, y_test)
I'm trying to work through an example script on machine learning: Common pitfalls in interpretation of coefficients of linear models but I'm having trouble understanding some of the steps. The beginning of the script looks like this:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_openml
survey = fetch_openml(data_id=534, as_frame=True)
# We identify features `X` and targets `y`: the column WAGE is our
# target variable (i.e., the variable which we want to predict).
X = survey.data[survey.feature_names]
X.describe(include="all")
X.head()
# Our target for prediction is the wage.
y = survey.target.values.ravel()
survey.target.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
_ = sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
My problem is in the lines
y = survey.target.values.ravel()
survey.target.head()
If we examine survey.target.head() immediately after these lines, the output is
Out[36]:
0 5.10
1 4.95
2 6.67
3 4.00
4 7.50
Name: WAGE, dtype: float64
How does the model know that WAGE is the target variable? Does is not have to be explicitly declared?
The line survey.target.values.ravel() is meant to flatten the array, but in this example it is not necessary. survey.target is a pd Series (i.e 1 column data frame) and survey.target.values is a numpy array. You can use both for train/test split since there is only 1 column in survey.target .
type(survey.target)
pandas.core.series.Series
type(survey.target.values)
numpy.ndarray
If we use just survey.target, you can see that the regression will work:
y = survey.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
If you have another dataset, for example iris, I want to regress petal width against the rest. You would call the column of the data.frame using the square brackets [] :
from sklearn.datasets import load_iris
from sklearn.linear_model import LinearRegression
dat = load_iris(as_frame=True).frame
X = dat[['sepal length (cm)','sepal width (cm)','petal length (cm)']]
y = dat[['petal width (cm)']]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
LR = LinearRegression()
LR.fit(X_train,y_train)
plt.scatter(x=y_test,y=LR.predict(X_test))
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('C:/Users/Dell/Desktop/Salary.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,
random_state=0)
from sklearn.linear_model import LinearRegression
simplelinearRegresson = LinearRegression()
simplelinearRegresson.fit(X_train, y_train)
y_predict = simplelinearRegresson.predict(X_test)
Below line has error:
y_predict_val = simplelinearRegresson.predict(11)
You need to convert your scalar to a 2D array with shape (number of samples, number of features).
y_predict_val = simplelinearRegresson.predict([[11]])
This is what the predict method expects. See docs for more info.