I am using the following python program to implement a basic decision tree classifier.
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import numpy as np
features = [[140,1],[130,1],[150,0],[170,0]]
labels = [0,0,1,1]
clf = DecisionTreeClassifier()
model = clf.fit(features, labels)
a = model.predict ([160,0])
print (a)
It prints out the predicted value but gives a warning,
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and
willraise ValueError in 0.19. Reshape your data either using X.reshape(-1,
1) if your data has a single feature or X.reshape(1, -1) if it contains a
single sample.
I have tried to fix it using this,
features = np.array(features).reshape(-1, 2)
labels = np.array(labels).reshape(-1, 1)
But this showed the same warning. Any suggestions?
The problem is with model.predict. This works:
a = model.predict ([[160,0]])
Related
Hello i have used many options for normalize data in my dataframe attribute elnino_1["air_temp"] ,but it always shows me an error like "Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample." or "'int' object is not callable" .
I try this code:
(1)
elnino_1["air_temp"].min=-1
elnino_1["air_temp"].max=1
elnino_1_std = (elnino_1["air_temp"] - elnino_1["air_temp"].min(axis=0)) / (elnino_1["air_temp"].max(axis=0) - elnino_1["air_temp"].min(axis=0))
elnino_1_scaled = elnino_1_std * (max - min) + min
(2)
XD=elnino_1["air_temp"]
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = MinMaxScaler(feature_range=(-1, 1))
In both option I use libraries:
from sklearn.preprocessing import scale
from sklearn import preprocessing
What I should to do for normalize this data please?
As I do not have access to your dataset, here I'm using make_classification to generate some synthetic data. Please run through in a notebook to gain understanding. (Do note as well there may be slight differences as I'm using a numpy array as dataset, yours is a DataFrame.)
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=1)
pd.DataFrame(X).head()
Thereafter, we fit a MinMaxScaler to the data. MinMaxScaler accepts a 2d array as input by default. In other words, a 'table'. Throughout these, please call X.shape to understand how array shapes work. For e.g in the above, X.shape >> (100, 2) where it is (numrows, numcolumns)
scaler = MinMaxScaler(feature_range=(-1, 1))
X_norm = scaler.fit_transform(X)
pd.DataFrame(X_norm).head()
In your case, you are only trying to fit/scale a single column. When you only fit elnino_1["air_temp"] it is a 1d array, shape is something like (100,).
So we have to reshape it into a 2d array.
x1_norm = scaler.fit_transform(X[:, 1].reshape(-1,1))
pd.DataFrame(x1_norm)
For example, if xyz.shape is (100,) and I want it to be (100,1), I can use xyz.reshape(100,1) if I'm being specific.
The length of the dimension set to -1 is automatically determined by inferring from the specified values of other dimensions. This is useful when converting a large array shape. Thus xyz.reshape(-1,1) achieves the same as above.
Here is the problem
Extract just the median_income column from the independent variables (from X_train and X_test).
Perform Linear Regression to predict housing values based on median_income.
Predict output for test dataset using the fitted model.
Plot the fitted model for training data as well as for test data to check if the fitted model satisfies the test data.
I did a linear regression earlier.Following is the code
import pandas as pd
import os
os.getcwd()
os.chdir('/Users/saurabhsaha/Documents/PGP-AI:ML-Purdue/New/datasets')
df=pd.read_excel('California_housing.xlsx')
df.total_bedrooms=df.total_bedrooms.fillna(df.total_bedrooms.mean())
x = df.iloc[:,2:8]
y = df.median_house_value
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=.20)
from sklearn.linear_model import LinearRegression
california_model = LinearRegression().fit(x_train,y_train)
california_model.predict(x_test)
Prdicted_values = pd.DataFrame(california_model.predict(x_test),columns=['Pred'])
Prdicted_values
Final = pd.concat([x_test.reset_index(drop=True),y_test.reset_index(drop=True),Prdicted_values],axis=1)
Final['Err_pct'] = abs(Final.median_house_value-
Final.Pred)/Final.median_house_value
Here is my dataset- https://docs.google.com/spreadsheets/d/1vYngxWw7tqX8FpwkWB5G7Q9axhe9ipTu/edit?usp=sharing&ouid=114925088866643320785&rtpof=true&sd=true
Following is my code.
x1_train=x_train.median_income
x1_train
x1_train.shape
x1_test=x_test.median_income
x1_test
type(x1_test)
x1_test.shape
from sklearn.linear_model import LinearRegression
california_model_new = LinearRegression().fit(x1_train,y_train)```
I get an error right here and when I try converting my 2 D array to 1 D as follows , i can not
```python
import numpy as np
x1_train= x1_train.reshape(-1, 1)
x1_test = x1_train.reshape(-1, 1)
This is the error I get
AttributeError: 'Series' object has no attribute 'reshape'
I am new to data science so if you can explain a bit then it would be real helpful
x1_train and x1_test are pandas Series objects, whereas the the reshape() method is applied to numpy arrays.
Do this instead:
x1_train= x1_train.to_numpy().reshape(-1, 1)
x1_test = x1_train.to_numpy().reshape(-1, 1)
I am attempting to use MultinomialNB from sklearn to classify some data. I have made a sample csv with some labelled training data, which I want to use to train the model but I receive the following error message:
ValueError: Expected 2D array, got 1D array instead: array=[0 1 2 2].
I know it is a very small data set but I will eventually add more data once the code is working.
Here is my data:
Here is my code:
import numpy as np
import pandas as pd
import array as array
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
data_file = pd.read_csv("CSV_Labels.csv", engine='python')
data_file.tail()
vectorizer = CountVectorizer(stop_words='english')
all_features = vectorizer.fit_transform(data_file.Word)
all_features.shape
x_train = data_file.label
y_train = data_file.Word
x_train.values.reshape(1, -1)
y_train.values.reshape(1, -1)
classifer = MultinomialNB()
classifer.fit(x_train, y_train)
Try this:
x_train = x_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)
numpy reshape operations are not inplace. So the array's you're passing to the classifier have actual the old shapes.
I am working on a Multi-Target (binary) classification. There are 11 targets and I am using sklearn's MultiOutputClassifier. I am having difficulty with the Predict_proba function. See a snippet of the dataset, and code below:
import pandas as pd
import numpy as npy
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.multioutput import MultiOutputClassifier
data = pd.read_csv("123.csv")
dataset
target = ['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72']
train, test = train_test_split(data, test_size=0.2)
X_train = train.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
X_test = test.drop(['H67BC97','H67GC93','H67LC63','H67WC103','H67RC91','H67YC73','H67RC92','H67GC94','H67LC64','H67NC60','H67YC72','FORMULA_NUMBER'],axis=1)
Y_train = train[target]
Y_test = test[target]
model = MultiOutputClassifier(GradientBoostingClassifier())
model.fit(X_train, Y_train)
target_probabilities = model.predict_proba(X_test)
print(target_probabilities)
probabilities
The probabilities output do not seem to be in the correct form. I get 11 565x2 arrays (565 is length of my test set). I'd like to save the target_probabilities to a csv file, but I get the error: ValueError: Expected 1D or 2D array, got 3D array instead. My question is essentially the same as on the link -
https://datascience.stackexchange.com/questions/22762/understanding-predict-proba-from-multioutputclassifier, but the answer there only explains why the output is a set of arrays.
EDIT: I have simplified the problem.
target_probabilities = array(target_probabilities)
Now target_probabilities is an (11,565,2) matrix - need to change the form of the matrix to be (565,11), where each row is of the form target_probabilities[:,i][:,1], for i in the range(0,565).
First I looked at all the related question.
There are very similar problems given.
So I followed suggestions from the links, but none of them worked for me.
Data Conversion Error while applying a function to each row in pandas Python
Getting deprecation warning in Sklearn over 1d array, despite not having a 1D array
I also tried to follow the error message, it also didn't work.
The code looks like this:
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)
## Attempt to avoid DeprecationWarning for sklearn.preprocessing
#X = X.reshape(-1,1) # attempt 1
#X = np.array(X).reshape((len(X), 1)) # attempt 2
#X = np.array([X]) # attempt 3
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(sc_X.transform(np.array([6.5])))
y_pred = sc_y.inverse_transform(y_pred)
The data looks like this:
Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000
The full error log goes like this:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/preprocessing/data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
I am using only second and third column so there is no need for one hot encoding for the first column. The only problem is DeprecationWarning.
I tried all the suggestions given but none of them worked.
So, the help will be truly appreciated.
This was a strange one. The code I used to get rid of the deprecation warnings is below, with a slight modification to how you fit StandardScaler() and called transform(). The solution involved painstakingly reshaping and raveling the arrays according to the warning messages. Not sure if this is the best way, but it removed the warnings.
# Importing the libraries
import numpy as np
import pandas as pd
from io import StringIO
from sklearn.preprocessing import StandardScaler
# Setting up data string to be read in as a .csv
data = StringIO("""Position,Level,Salary
Business Analyst,1,45000
Junior Consultant,2,50000
Senior Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000""")
dataset = pd.read_csv(data)
# Importing the dataset
#dataset = pd.read_csv('Position_Salaries.csv')
# Deprecation warnings call for reshaping of single feature arrays with reshape(-1,1)
X = dataset.iloc[:, 1:2].values.reshape(-1,1)
y = dataset.iloc[:, 2].values.reshape(-1,1)
# avoid DataConversionError
X = X.astype(float)
y = y.astype(float)
#sc_X = StandardScaler()
#sc_y = StandardScaler()
X_scaler = StandardScaler().fit(X)
y_scaler = StandardScaler().fit(y)
X_scaled = X_scaler.transform(X)
y_scaled = y_scaler.transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
# One of the warnings called for ravel()
regressor.fit(X_scaled, y_scaled.ravel())
# Predicting a new result
# The warnings called for single samples to reshaped with reshape(1,-1)
X_new = np.array([6.5]).reshape(1,-1)
X_new_scaled = X_scaler.transform(X_new)
y_pred = regressor.predict(X_new_scaled)
y_pred = y_scaler.inverse_transform(y_pred)