When performing StandardScalar or MinMaxScalar using PythonAdv kernel the jupyter notebook is printing error. However, when using Python 3 environment the same jupyter note book is working fine:
from sklearn.preprocessing import MinMaxScaler
# Scale X values
X_scalar = MinMaxScaler().fit(X_train)
#print(X_scalar)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
Error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-e5dc00a586d3> in <module>
4 X_scalar = MinMaxScaler().fit(X_train)
5 #print(X_scalar)
----> 6 X_train_scaled = X_scaler.transform(X_train)
7 X_test_scaled = X_scaler.transform(X_test)
NameError: name 'X_scaler' is not defined
I have Anaconda 3, python 3.6 and PythonAdv environments on Git Bash on Windows.
from sklearn.preprocessing import MinMaxScaler
# Scale X values
X_scaler = MinMaxScaler().fit(X_train)
#print(X_scalar)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
There is a small typo. you define X_scalar then use X_scaler.
Related
Im trying to run a RandomForestClassifier model on my dataset and below error pops up.Anyone knows a solution? Im using Spark version 3.3.1 and Python version 3.8.
model_df = output.select(["features","OrderMonth"])
train_df, test_df = model_df.randomSplit([0.7,0.3])
from pyspark.ml.classification import RandomForestClassifier
rfc = RandomForestClassifier(numTrees=10, labelCol="OrderMonth").fit(train_df)
rf_pred = rfc.transform(test_df)
rf_pred.show()
Py4JJavaError Traceback (most recent call last)
<ipython-input-56-5ed675f09e07> in <module>
7 from pyspark.ml.classification import RandomForestClassifier
8
----> 9 rfc = RandomForestClassifier(numTrees=10, labelCol="OrderMonth").fit(train_df)#change n_e to a bigger number
10
11 #rfc.fit(X_train,y_train)
I created a python file called dataFramePreprocessing.py with some defined functions to use in my other notebooks. In one of the functions I'm using sklearn.preprocessing. This is the function raising an error:
def scaleBinDF(df):
from sklearn import preprocessing
...
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
...
When I call the function in the other file (all the other functions work just fine), like so:
import dataFramePreprocessing as pr
from sklearn import preprocessing
pr.scaleBinDf(bindf)
this happens
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-616840fc11d7> in <module>
1 from sklearn import preprocessing
----> 2 pr.scaleBinDf(bindf)
~/Desktop/thesis/IDSProject/dataFramePreprocessing.py in scaleBinDf(df)
77 from sklearn import preprocessing
78 df2 = df.drop('Label', axis=1)
---> 79 colList = df2.columns
80 x = df2.values
81 min_max_scaler = preprocessing.MinMaxScaler()
NameError: name 'preprocessing' is not defined
Does anyone have an idea how I could fix that?
Just write the import statement outside the function as follows:
from sklearn import preprocessing
def scaleBinDF(df):
...
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
...
Now call this function like this
import dataFramePreprocessing as pr
pr.scaleBinDf(bindf)
I tried to use the code below for fitting a robust regression model using RANSAC
from sklearn.linear_model import RANSACRegressor
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_metric=lambda x: np.sum(np.abs(x), axis=1),
residual_threshold=5.0,
random_state=0)
ransac.fit(X,y)
And I get the following error below:
TypeError Traceback (most recent call last)
<ipython-input-38-832d8b5d351b> in <module>
5 residual_metric=lambda x: np.sum(np.abs(x), axis=1),
6 residual_threshold=5.0,
----> 7 random_state=0)
8 ransac.fit(X,y)
TypeError: __init__() got an unexpected keyword argument 'residual_metric'
Can you help me know what's wrong?
Most likely you got this code that was using an old version of ransac. The input residual_metric is deprecated. If you run without that, it works ok:
from sklearn.linear_model import RANSACRegressor, LinearRegression
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_threshold=5.0,
random_state=0)
ransac
RANSACRegressor(base_estimator=LinearRegression(), min_samples=50,
random_state=0, residual_threshold=5.0)
I am new to Pyspark. I am using logistic regression API. I followed some tutorials and worked this way :
from pyspark.ml.classification import LogisticRegression
train, test = df.randomSplit([0.80, 0.20], seed = some_seed)
LR = LogisticRegression(featuresCol = 'features', labelCol = 'label', maxIter=some_iter)
LR_model = LR.fit(train)
When I call
trainingSummary = LR_model.summary
trainingSummary.roc
I get
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-319-bf79768ab64e> in <module>()
1 trainingSummary = LR_model.summary
2
----> 3 trainingSummary.roc
AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'roc'
Someone has an idea ?
I started studying machine learning. I am following a google tutorial, but I face this error and the answers that I have found haven't work in my code. I'm not sure but it seems that the Python version has changed and doesn't use some library anymore.
This is the error:
[0 1 2]
[0 1 2]
Warning (from warnings module):
File "C:\Users\Moi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\externals\six.py", line 31
"(https://pypi.org/project/six/).", DeprecationWarning)
DeprecationWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
Traceback (most recent call last):
File "C:\Users\Moi\Desktop\python\ML\decision tree.py", line 30, in <module>
graph.write_pdf("iris.pdf")
AttributeError: 'list' object has no attribute 'write_pdf'
This is the code:
import numpy as np
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris ()
test_idx = [0,50,100]
#training data
train_target = np.delete(iris.target ,test_idx)
train_data = np.delete(iris.data, test_idx, axis= 0)
#testing data
test_target = iris.target [test_idx]
test_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier ()
clf.fit (train_data, train_target)
print (test_target )
print (clf.predict (test_data))
# viz code
from sklearn.externals.six import StringIO
import pydot
dot_data =StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled= True, rounded=True,
impurity=False)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
graph_from_dot_data returns a tuple, you have to explode it to get to the graph.
Change:
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
to:
(graph,) = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
Credit: https://www.programcreek.com/python/example/84621/pydot.graph_from_dot_data