AttributeError: 'SimpleImputer' object has no attribute 'mean' - python

I am trying to perform preprocessing on iris dataset but on the imputation step I get this error while using SimpleImputer to print the mean of every column.
here is the full code for reference. I am getting the error in last part.
import numpy as np
from sklearn.datasets import load_iris
from sklearn import preprocessing
iris = load_iris()
X = iris.data
iris_normalized = preprocessing.normalize(iris.datadata,norm='l2')
print(iris_normalized.mean(axis=0))
enc = preprocessing.OneHotEncoder()
iris_target_onehot = enc.fit_transform(iris.target.reshape(-1,1))
print(iris_target_onehot.toarray()[[0,50,100]])
X[:50,:] = np.nan
from sklearn.impute import SimpleImputer
iris_imputed = SimpleImputer(missing_values=np.nan,strategy='mean')
iris_imputed.fit_transform(X)
print(iris_imputed.mean(axis=0))
Sorry I am new to Machine Learning.

Just save the fit_transform to iris_imputed before calling its mean. It will work
iris_imputed = iris_imputed.fit_transform(iris.data)

Related

AttributeError: 'Pipeline' object has no attribute 'fit_resample'

Based on the documentation given on the following link pipeline and imbalanced
i have tried to implement code on some dataset, here is code :
import numpy as np
import pandas as pd
from collections import Counter
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import GaussianNB
data =pd.read_csv('aug_train.csv')
data.drop('id',axis=1,inplace=True)
print(data.info())
print(data.select_dtypes(include='object').columns.tolist())
data[data.select_dtypes(include='object').columns.tolist()]=data[data.select_dtypes(include='object').columns.tolist()].apply(LabelEncoder().fit_transform)
print(data.head())
#print(data['Response'].value_counts())
mymodel =GaussianNB()
y =data['Response'].values
print(Counter(y))
X =data.drop('Response',axis=1).values
#X,y =SMOTE().fit_resample(X,y)
#mymodel.fit(X,y)
#print(mymodel.score(X,y))
#print(Counter(y))
over = SMOTE(sampling_strategy=0.1)
under = RandomUnderSampler(sampling_strategy=0.5)
steps = [('o', over), ('u', under)]
pipeline = Pipeline(steps=steps)
# transform the dataset
X, y = pipeline.fit_sample(X, y)
the main problem in this code is with line :
X, y = pipeline.fit_sample(X, y)
error says that AttributeError: 'Pipeline' object has no attribute 'fit_resample' how can i fix this issue? thanks in advance
The tutorial employs imblearn.pipeline.Pipeline, while your code uses sklearn.pipeline.Pipeline (check import expressions). These appear to be different kinds of pipelines.

KeyError: None of [Index([ ], dtype='object')] are in the [columns]

I am Learing Sklearn and have been working out with it. But as I started with KNN there seem to be a problem. Here is my code for this:
# import required packages
import numpy as np # later use
import pandas as pd
from sklearn import neighbors, metrics # later use
from sklearn.model_selection import train_test_split # later use
from sklearn.preprocessing import LabelEncoder
# start training
data = pd.read_csv('data/car.data')
X = data[[
'buying',
'maint',
'safety'
]]
y = data[['class']]
Le = LabelEncoder()
for i in range(len[X[0]]):
X[:, i] = Le.fit_transform(X[:, i])
After running the debugger, the problem seems to be at X = data[[
And I have no Idea how to solve it
I once saw the same error before and for that I just added .values to the end of the variable X

creating a pipeline for onehotencoded variables not working

i have a problem where i am trying to apply transformations to my catgeorical feature 'country' and the rest of my numerical columns. how can i do this as i am trying below:
preprocess = make_column_transformer(
(numeric_cols, make_pipeline(MinMaxScaler())),
(categorical_cols, OneHotEncoder()))
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X_train, y_train)
note that numeric_cols is passed as a list and so is categorical_cols.
however i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. along with a list of all my numerical columns (type <class 'list'>) doesn't.
what am i doing wrong, also how can i deal with unseen categories in column country?
You need to put the transform function first, then the columns as subsequent arguments, if you check out the help page, it writes:
sklearn.compose.make_column_transformer(*transformers, **kwargs)
Some like below will work:
from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier
import numpy as np
import pandas as pd
X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
'x2':np.random.choice(['A','B'],5)})
y = pd.Series(np.random.choice(['0','1'],5))
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
preprocess = make_column_transformer(
(MinMaxScaler(),numeric_cols),
(OneHotEncoder(),categorical_cols)
)
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('minmaxscaler',
MinMaxScaler(), ['x1']),
('onehotencoder',
OneHotEncoder(), ['x2'])])),
('xgbclassifier', XGBClassifier())])

StandardScaler.fit() displaying value error

I was using StandardScaler to scale my data as was shown in a tutorial. But its not working.
I tried copy the same code as was used in the tutorial but still error was displayed.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
The error is as follows:
TypeError: fit() missing 1 required positional argument: 'X'
By trying to recreate your problem, it seems that everything in the code is correct and being executed perfectly. Here is a stand-alone example I created in order to test your code:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
data = load_iris()
df = pd.DataFrame(data.data, columns=['TARGET CLASS', 'a', 'b', 'c'])
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS', axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
I suggest you examine your variable df by printing it. For instance, you could try to transform it into a NumPy array before passing it and print its contents:
import numpy as np
X = df.drop('TARGET CLASS',axis=1).values
print(X)

Trying to run regression code. Getting error about 'linear_model'

I am trying to run this regression code.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
import sklearn.cross_validation
# Load the data
oecd_bli = pd.read_csv("C:/Users/Excel/Desktop/Briefcase/PDFs/ALL PYTHON & R CODE SAMPLES/Hands-On Machine_Learning_with_Scikit_Learn_and_Tensorflow/GDP Per Capita/oecd_bli.csv", thousands=',')
gdp_per_capita = pd.read_csv("C:/Users/Excel/Desktop/Briefcase/PDFs/ALL PYTHON & R CODE SAMPLES/Hands-On Machine_Learning_with_Scikit_Learn_and_Tensorflow/GDP Per Capita/gdp_per_capita.csv",thousands=',')
# view first 10 rows of data frame
oecd_bli[:10]
gdp_per_capita[:10]
country_stats = pd.merge(oecd_bli, gdp_per_capita, left_index=True, right_index=True)
country_stats[:10]
X = np.c_[country_stats["GDP"]]
Y = np.c_[country_stats["VALUE"]]
print(X)
print(Y)
# Visualize the data
country_stats.plot(kind='scatter', x="GDP", y='VALUE')
plt.show()
# Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression()
# Train the model
lin_reg_model.fit(X, Y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new))
I get this error.
AttributeError: module 'sklearn' has no attribute 'linear_model'
I'm not sure what's going on. I am trying to learn about this from an example that I saw in a book.
#import package, call the class
from sklearn.linear_model import LinearRegression
#build the model(create a regression object)
model = LinearRegression()
#fit the model
model.fit(x,y)
linear_model is a subpackage of sklearn. It wont work if you only imported via: import sklearn. Try import sklearn.linear_model instead.
Python does not automatically import all the subpackages. When I tried to explicitly import, linear_module, it works:
from sklearn import linear_model

Categories

Resources