First I looked at all the related question.
There are very similar problems given.
So I followed suggestions from the links, but none of them worked for me.
Data Conversion Error while applying a function to each row in pandas Python
Getting deprecation warning in Sklearn over 1d array, despite not having a 1D array
I also tried to follow the error message, it also didn't work.
The code looks like this:
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)
## Attempt to avoid DeprecationWarning for sklearn.preprocessing
#X = X.reshape(-1,1) # attempt 1
#X = np.array(X).reshape((len(X), 1)) # attempt 2
#X = np.array([X]) # attempt 3
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(sc_X.transform(np.array([6.5])))
y_pred = sc_y.inverse_transform(y_pred)
The data looks like this:
Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000
The full error log goes like this:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
I am using only second and third column so there is no need for one hot encoding for the first column. The only problem is DeprecationWarning.
I tried all the suggestions given but none of them worked.
So, the help will be truly appreciated.
This was a strange one. The code I used to get rid of the deprecation warnings is below, with a slight modification to how you fit StandardScaler() and called transform(). The solution involved painstakingly reshaping and raveling the arrays according to the warning messages. Not sure if this is the best way, but it removed the warnings.
# Importing the libraries
import numpy as np
import pandas as pd
from io import StringIO
from sklearn.preprocessing import StandardScaler
# Setting up data string to be read in as a .csv
data = StringIO("""Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000""")
dataset = pd.read_csv(data)
# Importing the dataset
#dataset = pd.read_csv('Position_Salaries.csv')
# Deprecation warnings call for reshaping of single feature arrays with reshape(-1,1)
X = dataset.iloc[:, 1:2].values.reshape(-1,1)
y = dataset.iloc[:, 2].values.reshape(-1,1)
# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)
#sc_X = StandardScaler()
#sc_y = StandardScaler()
X_scaler = StandardScaler().fit(X)
y_scaler = StandardScaler().fit(y)
X_scaled = X_scaler.transform(X)
y_scaled = y_scaler.transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
# One of the warnings called for ravel()
regressor.fit(X_scaled, y_scaled.ravel())
# Predicting a new result
# The warnings called for single samples to reshaped with reshape(1,-1)
X_new = np.array([6.5]).reshape(1,-1)
X_new_scaled = X_scaler.transform(X_new)
y_pred = regressor.predict(X_new_scaled)
y_pred = y_scaler.inverse_transform(y_pred)
Related
Hello i have used many options for normalize data in my dataframe attribute elnino_1["air_temp"] ,but it always shows me an error like "Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample." or "'int' object is not callable" .
I try this code:
(1)
elnino_1["air_temp"].min=-1
elnino_1["air_temp"].max=1
elnino_1_std = (elnino_1["air_temp"] - elnino_1["air_temp"].min(axis=0)) / (elnino_1["air_temp"].max(axis=0) - elnino_1["air_temp"].min(axis=0))
elnino_1_scaled = elnino_1_std * (max - min) + min
(2)
XD=elnino_1["air_temp"]
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = MinMaxScaler(feature_range=(-1, 1))
In both option I use libraries:
from sklearn.preprocessing import scale
from sklearn import preprocessing
What I should to do for normalize this data please?
As I do not have access to your dataset, here I'm using make_classification to generate some synthetic data. Please run through in a notebook to gain understanding. (Do note as well there may be slight differences as I'm using a numpy array as dataset, yours is a DataFrame.)
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=1)
pd.DataFrame(X).head()
Thereafter, we fit a MinMaxScaler to the data. MinMaxScaler accepts a 2d array as input by default. In other words, a 'table'. Throughout these, please call X.shape to understand how array shapes work. For e.g in the above, X.shape >> (100, 2) where it is (numrows, numcolumns)
scaler = MinMaxScaler(feature_range=(-1, 1))
X_norm = scaler.fit_transform(X)
pd.DataFrame(X_norm).head()
In your case, you are only trying to fit/scale a single column. When you only fit elnino_1["air_temp"] it is a 1d array, shape is something like (100,).
So we have to reshape it into a 2d array.
x1_norm = scaler.fit_transform(X[:, 1].reshape(-1,1))
pd.DataFrame(x1_norm)
For example, if xyz.shape is (100,) and I want it to be (100,1), I can use xyz.reshape(100,1) if I'm being specific.
The length of the dimension set to -1 is automatically determined by inferring from the specified values of other dimensions. This is useful when converting a large array shape. Thus xyz.reshape(-1,1) achieves the same as above.
Whenever I am going to predict, I see an error.
I am stuck with the line y_pred = regressor.predict(6.5) in the code.
I am getting the error:
ValueError: Expected 2D array, got scalar array instead:
array=6.5.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
spyder
# SVR
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
Error: y_pred = regressor.predict(sc_X.transform(6.5))
Traceback (most recent call last):
File "<ipython-input-11-64bf1bca4870>", line 1, in <module>
y_pred = regressor.predict(sc_X.transform(6.5))
File "C:\Users\achiever\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 758, in transform
force_all_finite='allow-nan')
File "C:\Users\achiever\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 514, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got scalar array instead: array=6.5. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Well, obviously, since regressor.predit() expects a list/array of values to make a prediction of, and you're passing it a single float, it won't work:
# Predicting a new result
y_pred = regressor.predict(6.5)
At the very least :
# Predicting a new result
y_pred = regressor.predict(np.array([6.5]))
But presumably you have more stuff you want to pass to it, so more like:
# Predicting a new result
y_pred = regressor.predict(some_data_array)
EDIT:
you need to arrange the shape of the 2d array you pass to the predictor so it looks like this:
data = [[1,0,0,1],[0,1,12,5],....]
where [1,0,0,1] is ONE set of parameter for ONE datapoint for which you want a prediction. [0,1,12,5) it ANOTHER data point.
At any rate, they should all have the same # of feature (e.g. 4 in my example) and they should have the same number of features as the data you used to train your predictor.
y_pred = sc_Y.inverse_transform(regressor.predict(sc_X.transform(np.array([[6.5]]))))
Use reshape function:
sc_y.inverse_transform(regressor.predict(sc_X.transform([[6.5]])).reshape(1,-1))
I am getting this error
ValueError: Expected 2D array, got scalar array instead: array=6.5.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
while executing this code
# SVR
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import SVR
# Load dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Fitting the SVR to the data set
regressor = SVR(kernel = 'rbf', gamma = 'auto')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
You need to understand how SVM works. Your trainig data is a matrix of shape (n_samples, n_features). That means, your SVM operates in feature space of n_features dimensions. Hence, it cannot predict a value for a scalar input, unless n_features is 1. You can only predict values for vectors of dimension n_features. So, if your data set has 5 columns, you can predict values for an arbitrary row-vector of 5 columns. See the below example.
import numpy as np
from sklearn.svm import SVR
# Data: 200 instances of 5 features each
X = randint(1, 100, size=(200, 5))
y = randint(0, 2, size=200)
reg = SVR()
reg.fit(X, y)
y_test = np.array([[0, 1, 2, 3, 4]]) # Input to .predict must be 2-dimensional
reg.predict(y_test)
# Predicting a new result with Linear Regression
X_test = np.array([[6.5]])
print(lin_reg.predict(X_test))
# Predicting a new result with Polynomial Regression
print(lin_reg_2.predict(poly_reg.fit_transform(X_test)))
I am getting the following error while trying to load a scaler to make a prediction on a trained neural network: ValueError: operands could not be broadcast together with shapes (317,257) (269,) (317,257)
Some context: the first number (317, 257) is the shape of the data set I am trying to make a prediction on, while (269,) is the shape of the training set. I pickled the scaler and loaded it back in, but will get to that later. Before I apply OneHotEncoder, the data set only has 9 columns. After LabelEconder and OneHotEncoder is applied, it expands to 269 columns (there are a lot of large categorical variables within the data).
My code:
Applying OneHotEncoder after applying LabelEncoder:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_4 = LabelEncoder()
X[:, 3] = labelencoder_X_4.fit_transform(X[:, 3].astype(str))
labelencoder_X_5 = LabelEncoder()
X[:, 4] = labelencoder_X_5.fit_transform(X[:, 4].astype(str))
labelencoder_X_6 = LabelEncoder()
X[:, 5] = labelencoder_X_6.fit_transform(X[:, 5].astype(str))
labelencoder_X_7 = LabelEncoder()
X[:, 6] = labelencoder_X_7.fit_transform(X[:, 6].astype(str))
labelencoder_X_8 = LabelEncoder()
X[:, 7] = labelencoder_X_8.fit_transform(X[:, 7].astype(str))
labelencoder_X_9 = LabelEncoder()
X[:, 8] = labelencoder_X_9.fit_transform(X[:, 8].astype(str))
onehotencoder_X = OneHotEncoder(categorical_features = [1,3,4,5,6,7,8])
X = onehotencoder_X.fit_transform(X).toarray()
Fitting to the scaler during training:
from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib
sc = MinMaxScaler()
X_train = sc.fit_transform(X_train)
joblib.dump(sc, 'scaler.pkl')
X_test = sc.transform(X_test)
Then retrieving the scaler and attempting to transform test data:
from sklearn.externals import joblib
sc = joblib.load('scaler.pkl')
dataset = sc.transform(dataset) #Error happens here
The test data set is pretty large since it does expand to 257 of 269 of the variables - but ideally I want to be able to predict on just a single row of data. In order to get the correct shape, do I have to append that data to a set that contains ALL of the different categories within the categorical data? What happens if a row of data has a value that was not present in the dataset? This just seems inefficient, so there must be a simple fix to this, right?
Thank you for any help you can provide. If you need any additional details, please let me know.
Update: I am pickling all the encoders and loading them back in when trying to encode the test set. I am then using .transform instead of .fit_transform on the test set. I feel like this is a step in the right direction, but here is the error I am getting now: ValueError: y contains new labels: ['0' '1' '2']
I am using the following python program to implement a basic decision tree classifier.
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import numpy as np
features = [[140,1],[130,1],[150,0],[170,0]]
labels = [0,0,1,1]
clf = DecisionTreeClassifier()
model = clf.fit(features, labels)
a = model.predict ([160,0])
print (a)
It prints out the predicted value but gives a warning,
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and
willraise ValueError in 0.19. Reshape your data either using X.reshape(-1,
1) if your data has a single feature or X.reshape(1, -1) if it contains a
single sample.
I have tried to fix it using this,
features = np.array(features).reshape(-1, 2)
labels = np.array(labels).reshape(-1, 1)
But this showed the same warning. Any suggestions?
The problem is with model.predict. This works:
a = model.predict ([[160,0]])