sklearn doesn't have attribute 'datasets' - python

I have started using sckikit-learn for my work. So I was going through the tutorial which gives standard procedure to load some datasets:
$ python
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
However, for my convenience, I tried loading the data in the following way:
In [1]: import sklearn
In [2]: iris = sklearn.datasets.load_iris()
However, this throws following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-db77d2036db5> in <module>()
----> 1 iris = sklearn.datasets.load_iris()
AttributeError: 'module' object has no attribute 'datasets'
However, if I use the apparently similar method:
In [3]: from sklearn import datasets
In [4]: iris = datasets.load_iris()
It works without problem. In fact the following also works:
In [5]: iris = sklearn.datasets.load_iris()
I am completely confused about this. Am I missing something very trivial? What is the difference between the two approaches?

sklearn is a package. This answer said it very succinctly:
when you import a package, only variables/functions/classes in the __init__.py file of that package are directly visible, not sub-packages or modules.
datasets is a sub-package of sklearn. This is why this happens:
In [1]: import sklearn
In [2]: sklearn.datasets
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-325a2bfc35d0> in <module>()
----> 1 sklearn.datasets
AttributeError: module 'sklearn' has no attribute 'datasets'
However, the reason why this works:
In [3]: from sklearn import datasets
In [4]: sklearn.datasets
Out[4]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
is that when you load the sub-package datasets by doing from sklearn import datasets it is automatically added to the namespace of the package sklearn. This is one of the lesser-known "traps" of the Python import system.
Also, note that if you look at the __init__.py for sklearn you will see 'datasets' as a member of __all__, but this only allows you to do:
In [1]: from sklearn import *
In [2]: datasets
Out[2]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
One last point to note is that if you inspect either sklearn or datasets you will see that, although they are packages, their type is module. This is because all packages are considered modules - however, not all modules are packages.

Related

Import Error: cannot import name 'tree' from 'sklearn.tree'

I am on my second day of re-taking Python for the gazillionth time!
I am doing a tutorial on ML in Python, using the following code:
import sklearn.tree
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import tree
music_data = pd.read_csv('music.csv')
x = music_data.drop(columns=['genre'])
y = music_data['genre']
model = DecisionTreeClassifier()
model.fit(x,y)
tree.export_graphviz(model, out_file='music-recommender.dot',
feature_names=['age','gender'],
class_names= sorted(y.unique()),
label='all',
rounded=True,
filled=True)
I keep getting the following error:
ImportError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13088/3820271611.py in <module>
2 import pandas as pd
3 from sklearn.tree import DecisionTreeClassifier
----> 4 from sklearn.tree import tree
5
6 music_data = pd.read_csv('music.csv')
ImportError: cannot import name 'tree' from 'sklearn.tree' (C:\Anaconda\lib\site-packages\sklearn\tree\__init__.py)
I've tried to find a solution online, but I don't think it's the version of Python/Anaconda because I literally just installed both. I also don't think it's the sklearn.tree since I was able to import DecisionClassifer.
As this answer indicates, you're looking at some older code; this is always a risk with programming. But there's another thing you need to know about your code.
First off, scikit-learn contains several modules, and almost everything you need from it is in one of those. In my experience, most people import things like this:
from sklearn.tree import DecisionTreeRegressor # A regressor class.
from sklearn.tree import plot_tree # A helpful function.
from sklearn.metrics import mean_squared_error # An evaluation function.
It looks like the tutorial wants something similar to plot_tree(). This new-ish function is much easier to use than the older Graphviz visualization. So unless you really need the DOT file for some reasons, you should be able to do this:
from sklearn.tree import plot_tree
sklearn.tree.plot_tree(model)
Bottom line: there will probably be more broken things in that material. So if I were you I'd either make a new environment with a version of sklearn matching whatever material you're using... or ditch that material and look for something newer.
from sklearn.tree import tree looks wrong. Did you mean from sklearn import tree ?
According to the official Scikit Learn Decision Trees Documentation you really do not need too much of importing.
It can be done simply as follows:
from sklearn import tree
import pandas as pd
music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])
y = music_data['genre']
model = tree.DecisionTreeClassifier()
model.fit(X,y)

I get error , cannot import from file helper

Please help, I get the error below running jupyter notebook.
import numpy as np
import pandas as pd
from helper import boston_dataframe
np.set_printoptions(precision=3, suppress=True)
Error:
ImportError Traceback (most recent call last)
<ipython-input-3-a6117bd64450> in <module>
1 import numpy as np
2 import pandas as pd
----> 3 from helper import boston_dataframe
4
5
ImportError: cannot import name 'boston_dataframe' from 'helper' (/Users/irina/opt/anaconda3/lib/python3.8/site-packages/helper/__init__.py)
Since you are not giving the where you get the notebook, I have to guess that you get it from this course Supervised Learning: Regression provided IBM.
In the zip folder in week 1, it provides helper.py.
What you need to do it is to change the directory to where this file is. Change IPython/Jupyter notebook working directory
Alternatively, you can load boston data from sklearn then load it to Pandas Dataframe
Advices for you:
Learn how to use Jupyter notebook
Learn how Python import work
Learn how to provide information in a question so that no one need to guess

'svd' has no "split" attribute

I am trying to make an recommender system using SVD python package. I am importing csv file then doing the below operation, but it is showing error. How to solve this?
from surprise import SVD,Reader,Dataset
ratings = pd.read_csv("/content/ratings_small.csv")
data = Dataset.load_from_df(ratings[['userId','movieId','rating']],reader)
data.split(n_folds=5)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-29-f3bf344cf3e2> in <module>()
----> 1 data.split(n_folds=5)
AttributeError: 'DatasetAutoFolds' object has no attribute 'split'
It says it has not split attribute buti went through a question where they have used it.
You need to import KFold from model_selection to split the data and perform cross validation.
This works.
from surprise import SVD,Reader,Dataset
from surprise.model_selection import KFold
ratings = pd.read_csv("/content/ratings_small.csv")
data = Dataset.load_from_df(ratings[['userId','movieId','rating']],reader)
kf = KFold(n_splits=5)
kf.split(data)

Does xgboost have feature_importances_?

I'm calling xgboost via its scikit-learn-style Python interface:
model = xgboost.XGBRegressor()
%time model.fit(trainX, trainY)
testY = model.predict(testX)
Some sklearn models tell you which importance they assign to features via the attribute feature_importances. This doesn't seem to exist for the XGBRegressor:
model.feature_importances_
AttributeError Traceback (most recent call last)
<ipython-input-36-fbaa36f9f167> in <module>()
----> 1 model.feature_importances_
AttributeError: 'XGBRegressor' object has no attribute 'feature_importances_'
The weird thing is: For a collaborator of mine the attribute feature_importances_ is there! What could be the issue?
These are the versions I have:
In [2]: xgboost.__version__
Out[2]: '0.6'
In [4]: sklearn.__version__
Out[4]: '0.18.1'
... and the xgboost C++ library from github, commit ef8d92fc52c674c44b824949388e72175f72e4d1.
How did you install xgboost? Did you build the package after cloning it from github, as described in the doc?
http://xgboost.readthedocs.io/en/latest/build.html
As in this answer:
Feature Importance with XGBClassifier
There always seems to be a problem with the pip-installation and xgboost. Building and installing it from your build seems to help.
This worked for me:
model.get_booster().get_score(importance_type='weight')
hope it helps
This is useful for you,maybe.
xgb.plot_importance(bst)
And this is the link:plot

Why can't I import statsmodels directly?

I'm certainly missing something very obvious here, but why does this work:
a = [0.2635,0.654654,0.365,0.4545,1.5465,3.545]
import statsmodels.robust as rb
print rb.scale.mad(a)
0.356309343367
but this doesn't:
import statsmodels as sm
print sm.robust.scale.mad(a)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-1ce0c872b0be> in <module>()
----> 1 print statsmodels.robust.scale.mad(a)
AttributeError: 'module' object has no attribute 'robust'
Long answer see http://www.statsmodels.org/stable/importpaths.html
Statsmodels has intentionally mostly empty __init__.py but has a parallel import collection through the api.py.
The recommended import for interactive work import statsmodels.api as sm imports almost all of statsmodels, numpy, pandas and patsy, and large parts of scipy. This is slooow on cold start.
If we want to import just a specific part of statsmodels, then we don't need to import all these extras. Having empty __init__.py means that we can import just a single module (which of course imports the dependencies of that module).
e.g. from statsmodels.robust.scale import mad or
import statmodels.robust scale as smscale
smscale.mad(...)
(Small caveat: Some of the very low level imports might not remain always backwards compatible if the internal structure changes. However, the general policy is to deprecate functions over one or two releases while maintaining the old access structure.)
You can, you just have to import robust as well:
import statsmodels as sm
import statsmodels.robust
Then:
>>> sm.robust.scale.mad(a)
0.35630934336679576
robust is a subpackage of statsmodels, and importing a package does not in general automatically import subpackages (unless the package is written to do so explicitly).

Categories

Resources