I created a python file called dataFramePreprocessing.py with some defined functions to use in my other notebooks. In one of the functions I'm using sklearn.preprocessing. This is the function raising an error:
def scaleBinDF(df):
from sklearn import preprocessing
...
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
...
When I call the function in the other file (all the other functions work just fine), like so:
import dataFramePreprocessing as pr
from sklearn import preprocessing
pr.scaleBinDf(bindf)
this happens
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-616840fc11d7> in <module>
1 from sklearn import preprocessing
----> 2 pr.scaleBinDf(bindf)
~/Desktop/thesis/IDSProject/dataFramePreprocessing.py in scaleBinDf(df)
77 from sklearn import preprocessing
78 df2 = df.drop('Label', axis=1)
---> 79 colList = df2.columns
80 x = df2.values
81 min_max_scaler = preprocessing.MinMaxScaler()
NameError: name 'preprocessing' is not defined
Does anyone have an idea how I could fix that?
Just write the import statement outside the function as follows:
from sklearn import preprocessing
def scaleBinDF(df):
...
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
...
Now call this function like this
import dataFramePreprocessing as pr
pr.scaleBinDf(bindf)
Related
I am trying to do cancer detection using Random Vector Forest. I am trying to make a pickle file by using the command pickle.dump(forest,open("model.pkl","wb") .But I am getting a name error
NameError Traceback (most recent call last)
c:\Users\hp\newtest\pcancer.ipynb Cell 6' in <cell line: 1>()
----> 1 pickle.dump(forest,open("model.pkl","wb"))
NameError: name 'forest' is not defined
This is my source code for detection:
import numpy as np
import pandas as pd
import warnings as wr
#Ignoring warnings
from sklearn.exceptions import UndefinedMetricWarning
wr.filterwarnings("ignore", category=UndefinedMetricWarning)
import pickle
df=pd.read_csv('Prostate_cancer_data.csv')
print(df.head(10))
print(df.shape)
print(df.isna().sum())
df=df.dropna(axis=1)#Drop the column with empty data
df=df.drop(['id'],axis=1)
#Encoding first column
from sklearn.preprocessing import LabelEncoder
labelencoder_X=LabelEncoder()
df.iloc[:,0]=labelencoder_X.fit_transform(df.iloc[:,0].values)
#Splitting data for dependence
X=df.iloc[:,1:].values
Y=df.iloc[:,0].values
#Train-Test split
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.25,random_state=1)
#Standard scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)
from sklearn.ensemble import RandomForestClassifier
def models(X_train,Y_train):
#Random forest classifier
forest=RandomForestClassifier(n_estimators=10,criterion='entropy',random_state=0)
forest.fit(X_train,Y_train)
print("Random Forest:",forest.score(X_train,Y_train))
return forest
print("Accuracy")
model=models(X_train,Y_train)
model=models(X_train,Y_train)
this part of code is not indented in order. so its a local declaration and action as a recursive call
There is indentation problem in the last section of your code. This is correctly indented code and when you create a pickle file you'll write model object in it not the forest as forest is returned in object named model
from sklearn.ensemble import RandomForestClassifier
def models(X_train,Y_train):
#Random forest classifier
forest=RandomForestClassifier(n_estimators=10,criterion='entropy',random_state=0)
forest.fit(X_train,Y_train)
print("Random Forest:",forest.score(X_train,Y_train))
return forest
print("Accuracy")
model=models(X_train,Y_train)
pickle.dump(model,open("model.pkl","wb"))
When performing StandardScalar or MinMaxScalar using PythonAdv kernel the jupyter notebook is printing error. However, when using Python 3 environment the same jupyter note book is working fine:
from sklearn.preprocessing import MinMaxScaler
# Scale X values
X_scalar = MinMaxScaler().fit(X_train)
#print(X_scalar)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
Error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-e5dc00a586d3> in <module>
4 X_scalar = MinMaxScaler().fit(X_train)
5 #print(X_scalar)
----> 6 X_train_scaled = X_scaler.transform(X_train)
7 X_test_scaled = X_scaler.transform(X_test)
NameError: name 'X_scaler' is not defined
I have Anaconda 3, python 3.6 and PythonAdv environments on Git Bash on Windows.
from sklearn.preprocessing import MinMaxScaler
# Scale X values
X_scaler = MinMaxScaler().fit(X_train)
#print(X_scalar)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
There is a small typo. you define X_scalar then use X_scaler.
I am trying to implement nearest neighbor classifier in Turi Create, however I am unsure of this error I am getting. This error occurs when I create the actual model. I am using python 3.6 if that helps.
Error:
Traceback (most recent call last):
File "/Users/PycharmProjects/turi/turi.py", line 51, in <module>
iris_cross()
File "/Users/PycharmProjects/turi/turi.py", line 37, in iris_cross
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
TypeError: 'module' object is not callable
Code:
import turicreate as tc
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
import time
import numpy as np
#Iris Classification Cross Validation
def iris_cross():
iris = datasets.load_iris()
features = ['0','1','2','3']
target = iris.target_names
x = iris.data
y = iris.target.astype(int)
undata = np.column_stack((x,y))
data = tc.SFrame(pd.DataFrame(undata))
print(data)
train_data, test_data = data.random_split(.8)
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
print('done')
iris_cross()
You have to actually call the create() method of the nearest_neighbor_classifier. See the library API.
Run the following line of code instead:
clf = tc.nearest_neighbor_classifier.create(train_data, target='4', features=features)
I'm applying this tutorial of implementing recommendation system and I faced a problem when importing from sklearn.selection_model train_test_split in order to do the train/test split.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
#create columns name
header = ['user_id', 'item_id', 'rating', 'timestamp']
#read data containing the full dataset of ratings
df = pd.read_csv('ml-100k/u.data', sep='\t', names=header)
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print 'Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items)
#train_data, test_data = train_test_split(df,test_size=0.25)
#print 'train shape = ' + str(train_data.shape)
Log error:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/recommendation_system_trials/engine.py", line 3, in
from sklearn.model_selection import train_test_split
File "C:\Users\hello2\lib\site-packages\sklearn__init__.py", line 57, in
from .base import clone
File "C:\Users\hello2\lib\site-packages\sklearn\base.py", line 10, in
from scipy import sparse
ImportError: No module named scipy
Why Am I getting this error, I'm not using scipy, I just wanted to import train_test_split.
Thank you for your help.
I've been attempting to fit this data by a Linear Regression, following a tutorial on bigdataexaminer. Everything was working fine up until this point. I imported LinearRegression from sklearn, and printed the number of coefficients just fine. This was the code before I attempted to grab the coefficients from the console.
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import sklearn
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
boston = load_boston()
bos = pd.DataFrame(boston.data)
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
X = bos.drop('PRICE', axis = 1)
lm = LinearRegression()
After I had all this set up I ran the following command, and it returned the proper output:
In [68]: print('Number of coefficients:', len(lm.coef_)
Number of coefficients: 13
However, now if I ever try to print this same line again, or use 'lm.coef_', it tells me coef_ isn't an attribute of LinearRegression, right after I JUST used it successfully, and I didn't touch any of the code before I tried it again.
In [70]: print('Number of coefficients:', len(lm.coef_))
Traceback (most recent call last):
File "<ipython-input-70-5ad192630df3>", line 1, in <module>
print('Number of coefficients:', len(lm.coef_))
AttributeError: 'LinearRegression' object has no attribute 'coef_'
The coef_ attribute is created when the fit() method is called. Before that, it will be undefined:
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> from sklearn.linear_model import LinearRegression
>>> boston = load_boston()
>>> lm = LinearRegression()
>>> lm.coef_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-975676802622> in <module>()
7
8 lm = LinearRegression()
----> 9 lm.coef_
AttributeError: 'LinearRegression' object has no attribute 'coef_'
If we call fit(), the coefficients will be defined:
>>> lm.fit(boston.data, boston.target)
>>> lm.coef_
array([ -1.07170557e-01, 4.63952195e-02, 2.08602395e-02,
2.68856140e+00, -1.77957587e+01, 3.80475246e+00,
7.51061703e-04, -1.47575880e+00, 3.05655038e-01,
-1.23293463e-02, -9.53463555e-01, 9.39251272e-03,
-5.25466633e-01])
My guess is that somehow you forgot to call fit() when you ran the problematic line.
I also got the same problem while dealing with linear regression the problem object has no attribute 'coef'.
There are just slight changes in the syntax only.
linreg = LinearRegression()
linreg.fit(X,y) # fit the linesr model to the data
print(linreg.intercept_)
print(linreg.coef_)
I Hope this will help you Thanks