Why can't I import statsmodels directly?

Why can't I import statsmodels directly? - python

I'm certainly missing something very obvious here, but why does this work:
a = [0.2635,0.654654,0.365,0.4545,1.5465,3.545]
import statsmodels.robust as rb
print rb.scale.mad(a)
0.356309343367
but this doesn't:
import statsmodels as sm
print sm.robust.scale.mad(a)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-1ce0c872b0be> in <module>()
----> 1 print statsmodels.robust.scale.mad(a)
AttributeError: 'module' object has no attribute 'robust'

Long answer see http://www.statsmodels.org/stable/importpaths.html
Statsmodels has intentionally mostly empty __init__.py but has a parallel import collection through the api.py.
The recommended import for interactive work import statsmodels.api as sm imports almost all of statsmodels, numpy, pandas and patsy, and large parts of scipy. This is slooow on cold start.
If we want to import just a specific part of statsmodels, then we don't need to import all these extras. Having empty __init__.py means that we can import just a single module (which of course imports the dependencies of that module).
e.g. from statsmodels.robust.scale import mad or
import statmodels.robust scale as smscale
smscale.mad(...)
(Small caveat: Some of the very low level imports might not remain always backwards compatible if the internal structure changes. However, the general policy is to deprecate functions over one or two releases while maintaining the old access structure.)

You can, you just have to import robust as well:
import statsmodels as sm
import statsmodels.robust
Then:
>>> sm.robust.scale.mad(a)
0.35630934336679576
robust is a subpackage of statsmodels, and importing a package does not in general automatically import subpackages (unless the package is written to do so explicitly).

Related

Scipy strange import behaviour

Scipy has an different way of handling submodules to Numpy, for example
import scipy as sp
import numpy as np
A = np.eye(4)
np.linalg.det(A)
sp.linalg.det(A)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'scipy' has no attribute 'linalg'
This is moderately annoying because of the asymmetry with respect to Numpy, but it is exactly the behaviour that the documentation describes. The proper usage according to the docs is
from scipy import linalg
import numpy as np
A = np.eye(4)
np.linalg.det(A)
linalg.det(A) # using Scipy
which works just fine.
Now, here's the weird thing
import scipy as sp
import numpy as np
from scipy.linalg import expm # extra line inserted into first example
A = np.eye(4)
np.linalg.det(A)
sp.linalg.det(A)
then the Numpy-style code works just fine. The extra line causes linalg to be added to the namespace sp, a side effect of the extra import.
I get the programming pattern I want but the third line is not easy to explain in example code.
QUESTION: why does Scipy do this? is there any more straightforward way to have it so that Scipy behaves more Numpyish?

In fact, you almost never need or to want import scipy as sp or anything like that.
There is almost nothing in the top level scipy namespace. All useful stuff is in subpackages (one exception is LowLevelCallable which is in the top-level namespace). This way, users are better off either importing from subpackages, from scipy.signal import detrend, or importing subpackages themselves (from scipy import signal; signal.detrend(...)).
As to the disparity with numpy, numpy is very much the opposite: a lot of useful stuff is in the top-level namespace, so you import it from there.
Unless you're using np.linalg, np.random, np.fft or np.testing, which are public-facing usable submodules.

ImportError: cannot import name 'deprecated' from 'gensim.utils

while importing the below lines Jupyter compiler result in an error.
ImportError: cannot import name 'deprecated' from 'gensim.utils
from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords**
Error as follows:
~\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\summarization\summarizer.py in <module>
54
55 import logging
---> 56 from gensim.utils import deprecated
57 from gensim.summarization.pagerank_weighted import pagerank_weighted as _pagerank
58 from gensim.summarization.textcleaner import clean_text_by_sentences as _clean_text_by_sentences
ImportError: cannot import name 'deprecated' from 'gensim.utils' (C:\Users\PavanKumar\AppData\Local\Programs\Python\Python39\Lib\site-packages\gensim\utils.py)

The summarization code was removed from Gensim 4.0. See:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#12-removed-gensimsummarization
12. Removed gensim.summarization
Despite its general-sounding name, the module will not satisfy the
majority of use cases in production and is likely to waste people's
time. See this Github
ticket for
more motivation behind this.
If you need it, you could try:
installing the older gensim version; or…
copy the source code out to your own local module
However, I expect you'd likely be disappointed by its inflexibility and how little it can do. It's only extractive summarization - choosing a few key sentences from those that already exist – which only gives impressive results when the source text was already well-written in an expository style mixing high-level summaries with details. And its method of analyzing/ranking words is very crude & hard-to-customize.

random subspace ensemble classifier python

i am in python i search for implementation for random subspace ensemble classifier , and i found the following code in github
https://github.com/mwygoda/randomSubspaceImplementation/blob/master/solution.py
author depend in this two lib
from utils import prepare_data_from_file
from utils import plot_results
i try to install utils using pip3, it installed and worked when i run import utils as ut put still get error cannot import name 'plot_results' or'prepare_data_from_file'
any one help me how can i fix it

That file is not in the repo. You will have to implement it yourself.
from the code it looks like it returns a feature vector and target labels.
e.g
def prepare_data_from_file(file):
import pandas as pd
df = pd.read_csv(file)
return df['A'], df['B']
But this is mere speculation. Now get off stackoverflow and go do your assisgnment.

Running CrossValidationCV in parallel

When I run a GridsearchCV() and a RandomizedsearchCV() methods in parallel ( having n_jobs>1 or n_jobs=-1 options set )
it shows this message:
ImportError: [joblib] Attempting to do parallel computing without
protecting your import on a system that does not support forking. To
use parallel-computing in a script, you must protect your main loop
using "if name == 'main'". Please see the joblib documentation on
Parallel for more information" I put the code in a class in .py file
and call it using if_name_=='main in other .py file but it still shows
this message
It works good when n_jobs=1
import platform; print(platform.platform())
Windows-10-10.0.10586-SP0
import numpy; print("NumPy", numpy.__version__)
NumPy 1.13.1
import scipy; print("SciPy", scipy.__version__)
SciPy 0.19.1
import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.19.0
UPDATE
I tried this code but it still gives me the same error
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
class Test():
def __init__(self):
attributes = [..]
dataset = pd.read_csv("..")
X=dataset[[..]]
Y=dataset[...]
model=DecisionTreeClassifier()
model = RandomizedSearchCV(....)
model.fit(X, Y)
if __name__ == '__main__':
Test()

joblib is know for this behaviour and rather explicit in documenting:
Warning
Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this:
import ....
def function1(...):
...
def function2(...):
...
...
if __name__ == '__main__':
# do stuff with imports and functions defined about
...
No code should run outside of the “if __name__ == ‘__main__’” blocks, only imports and definitions.
So, refactor your code so as to meet this well-defined requirement and your code will start to benefit from the joblib-tools powers.

I imagine this won't be the most useful answer, but you could always parallelize the process manually. https://docs.python.org/2/library/multiprocessing.html

Different import syntaxes not equivalent?

I tried two different import syntaxes I thought were equivalent. Weirdness seems to ensue:
In [7]: import sympy
In [8]:sympy.physics.units.find_unit("Giga Electron Volt")
Traceback (most recent call last):
File "<ipython-input-8-8a26ac4a085a>", line 1, in <module>
sympy.physics.units.find_unit("Giga Electron Volt")
AttributeError: 'module' object has no attribute 'physics'
In [9]:import sympy.physics.units as u
In [10]:u.find_unit("coul")
Out[10]: ['coulomb', 'coulombs']
In [11]:import sympy
In [12]:sympy.physics.units.find_unit("coul")
Out[12]: ['coulomb', 'coulombs']

Take a look at the source code of sympy here: https://github.com/sympy/sympy/blob/master/sympy/init.py#L55
from .calculus import *
# Adds about .04-.05 seconds of import time
# from combinatorics import *
# This module is slow to import:
#from physics import units
from .plotting import plot, textplot, plot_backends, plot_implicit
They are not importing the physics module, because it takes obviously quite some time to load. This is why you get the error in the first try.
After loading it manually, the interpreter has it loaded and knows where it is (from your manual import). Thats why it works on the second try.
So the phenomenon is not regarded to python import functionality, but to the module initialization.
P.S.
If you uncomment the line that loads unit from the physics module, it would be
import sympy
sympy.units.find_unit("coul")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why can't I import statsmodels directly? - python

Related

Scipy strange import behaviour

ImportError: cannot import name 'deprecated' from 'gensim.utils

random subspace ensemble classifier python

Running CrossValidationCV in parallel

Different import syntaxes not equivalent?

Categories

Resources