I have the following package structure:
analysis/
__init__.py
main.py
utils/
__init__.py
myzip.py
myzip.py contains the following:
import pandas
def save():
...
def load():
...
In my main.py script I do:
from utils import myzip
and when I type myzip.<TAB> or do dir(myzip) the imported pandas appears as well. Can I avoid showing the pandas imported in the submodule? Is there a best practice for importing third party modules?
I tried adding the following to analysis/utils/__init__.py:
from utils.myzip import save, load
but it still shows pandas when I dir(myzip) form main.py.
Looking at from sklearn import cluster they manage to achieve this, without showing all the numpy imports they have e.g. in cluster/k_means_.py
As described in this question, you can import the modules with an alias that begins with an underscore (e.g., import pandas as _pandas). The name will still be available as myzip._pandas, but IPython tab-completion will not autocomplete it (unless you explicitly type the underscore first). Also, it will not be imported if you do from myzip import *, although you shouldn't do that anyway.
However, as mentioned in this other question, the better "solution" is to just not worry about it. If someone does import myzip, it does no harm for them to be able to access myzip.pandas; it's not like they couldn't import pandas themselves anyway. Also, there is no risk of a name conflict in this situation, since pandas is namespaced under your module. The only way a name conflict could arise is if your module itself used the name pandas for two different things (e.g., defining a global variable called pandas in addition to the imported module); but this is a problem internal to your module, regardless of whether pandas is externally accessible.
A name conflict can arise if someone has their own variable called pandas and then does from myzip import *, but star import is discouraged for precisely that reason, and the names of imported modules are no different than other names in this regard. For instance, someone doing from myzip import * could face a conflict with the names save or load. There's no use in worrying specifically about imported module names when it comes to star-import name conflicts.
Also, it's worth noting that many widely-used libraries expose their own imports in this way, and it's not considered a problem. Pandas itself is an example:
>>> import pandas
>>> pandas.np
<module 'numpy' from '...'>
. . . so you are in good company if you just consider it a non-problem.
If moduleB is imported at the module-level of moduleA, then moduleB is part of the namespace of moduleA.
One way to hide it however would be to import it with an alias:
import pandas as _hidden_pandas
It would then appear as _hidden_pandas, hiding it to some extent.
The tab-completion would at least not find it.
A partial solution
Nest the submodule into a folder and import only the necessary methods in the __init__.py, that is:
analysis/
__init__.py
main.py
utils/
__init__.py
--> myzip/
__init__.py
myzip.py
where the myzip/__init__.py has:
from .myzip import load, save
then after from utils import myzip the dir(myzip) will list load, save and myzip, but not the pandas, which is hidden inside myzip.myzip.<TAB>.
Have not figured out how sklearn hide their third party modules.
Related
Simple question, I've searched to no avail. Say I have a file "funcs.py", in it there's a function I want to call into my current script. The function uses another library (e.g. pandas), where do I import that library? What's the convention?
Do I put it inside the function in funcs.py?
#funcs.py
def make_df():
import pandas as pd
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
Do I put it outside the function in funcs.py?
#funcs.py
import pandas as pd
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
Or do I put it in the current script I'm using?
#main.py
import pandas as pd
from funcs import make_df
df = make_df()
Thanks and kind regards.
In Python each file is a module. Each module has its own namespace - its own set of variables. Each function also has its own local namespace.
When you use the name pd in a function defined in the module func, it will first look for the local variable pd in the function - if it doesn't exist,
it will look for it in the namespace of its module, func. It will not look for it in the module main, even if the code calling the function is in main.py
This is known as lexical scoping - the rule is that variables are looked up close to where the code is defined, not where it is used. Some languages do look up variables close to where the code is used, it's known as dynamic scoping- in one of these languages something like your solution #3 would work, but most languages including Python follow lexical scoping rules, so it won't work.
So pandas has to be imported in funcs.py. main.py doesn't have to import or even
know anything about pandas to use make_df.
If you import pandas at the top of func.py, then when you import the module func from main.py, the line import pandas as pd at the top of func.py will be executed, the pandas module will be loaded, and a reference to it will be created in func bound to the name pd. There is no need to re-import it in main.py.
If you do re-import pandas in main.py, Python will be smart enough not to reload the entire module just because you imported it in two places, it will just give you a reference to the already loaded pandas module.
Putting the import in the body of the function will work but it's not considered good practice, unless you have a really good reason to do so. Normally imports go at the top of the file where they are used.
#3 wouldn't work. In most cases, #2 is the preferred option (the main exception would be if the library is a large (slow to import) library that's only used by that function). You might also want to consider one of these options (for optional dependencies):
#funcs.py
try:
import pandas as pd
except ImportError:
pass
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
or
#funcs.py
try:
import pandas as pd
except ImportError:
pass
if pd is not None:
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
The best practice is to include at the beginning of your code in funcs.py
There is no need and you should not include pandas in your main.py.
Basically when you use import pandas pandas library becomes part of your code.
Straight from the source at https://www.python.org/dev/peps/pep-0008/?#imports
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
Best practice is to do all imports on the first few lines of your script file, before doing any other coding
I have a file, myfile.py, which imports Class1 from file.py and file.py contains imports to different classes in file2.py, file3.py, file4.py.
In my myfile.py, can I access these classes or do I need to again import file2.py, file3.py, etc.?
Does Python automatically add all the imports included in the file I imported, and can I use them automatically?
Best practice is to import every module that defines identifiers you need, and use those identifiers as qualified by the module's name; I recommend using from only when what you're importing is a module from within a package. The question has often been discussed on SO.
Importing a module, say moda, from many modules (say modb, modc, modd, ...) that need one or more of the identifiers moda defines, does not slow you down: moda's bytecode is loaded (and possibly build from its sources, if needed) only once, the first time moda is imported anywhere, then all other imports of the module use a fast path involving a cache (a dict mapping module names to module objects that is accessible as sys.modules in case of need... if you first import sys, of course!-).
Python doesn't automatically introduce anything into the namespace of myfile.py, but you can access everything that is in the namespaces of all the other modules.
That is to say, if in file1.py you did from file2 import SomeClass and in myfile.py you did import file1, then you can access it within myfile as file1.SomeClass. If in file1.py you did import file2 and in myfile.py you did import file1, then you can access the class from within myfile as file1.file2.SomeClass. (These aren't generally the best ways to do it, especially not the second example.)
This is easily tested.
In the myfile module, you can either do from file import ClassFromFile2 or from file2 import ClassFromFile2 to access ClassFromFile2, assuming that the class is also imported in file.
This technique is often used to simplify the API a bit. For example, a db.py module might import various things from the modules mysqldb, sqlalchemy and some other helpers. Than, everything can be accessed via the db module.
If you are using wildcard import, yes, wildcard import actually is the way of creating new aliases in your current namespace for contents of the imported module. If not, you need to use the namespace of the module you have imported as usual.
in pandas we can directly do the following
import pandas as pd
df = pd.DataFrame()
here, pandas is a package. DataFrame is a class. So, how does this work because, DataFrame was actually defined in a pandas.core.frame ( it is defined in frame.py which is inside core folder of pandas.)
Note: i was thinking that this kind of behavior can be achieved by doing something in __init__.py file. Can anyone help me understand this.
__init__.py is technically just another python module, so any name defined in a package's __init__.py is directly accessible from the package itself. And it's indeed a common pattern in Python to use a package's __init__.py as a facade for submodules / subpackages.
FWIW note that panda's __init__.py, while an example of using __init__.py as a facade, is not following good practices in that it uses "star imports" (from submodule import *) which makes it very painful to trace the origin of a name (the module where it is defined) - specially with such a large package as Panda - and is very brittle too since if two submodules export a same name the one imported last will shadow the first one. The good practice is to always explicitely specify which names you want to import:
from submodule1 import foo, bar
from submodule2 import baaz, quux
which makes clear where a name comes from and will make a duplicate name much more obvious:
from submodule1 import foo, bar
from submodule2 import baaz, quux
from submodule3 import foo # oops, we will have to rename either this one or submodule1.foo
Dataframe as you said is defined in pandas/core/frame.py.
Let's take a look at pandas/__init__.py in pandas directory on github.
Line 42:
from pandas.core.api import *
And pandas/core/api.py imports Dataframe from pandas/core/frame.py in line 23:
from pandas.core.frame import DataFrame
So since you import * from pandas/core/api.py in pandas/__init__.py and pandas/core/api.py imports Dataframe then you have Dataframe imported directly to pandas.
I am new in Python and I am creating a module to re-use some code.
My module (impy.py) looks like this (it has one function so far)...
import numpy as np
def read_image(fname):
....
and it is stored in the following directory:
custom_modules/
__init.py__
impy.py
As you can see it uses the module numpy. The problem is that when I import it from another script, like this...
import custom_modules.impy as im
and I type im. I get the option of calling not only the function read_image() but also the module np.
How can I do to make it only available the functions I am writing in my module and not the modules that my module is calling (numpy in this case)?
Thank you very much for your help.
I've got a proposition, that could maybe answer the following concern: "I do not want to mess class/module attributes with class/module imports". Because, Idle also proposes access to imported modules within a class or module.
This simply consists in taking the conventional name that coders normally don't want to access and IDE not to propose: name starting with underscore. This is also known as "weak « internal use » indicator", as described in PEP 8 / Naming styles.
class C(object):
import numpy as _np # <-- here
def __init__(self):
# whatever we need
def do(self, arg):
# something useful
Now, in Idle, auto-completion will only propose do function; imported module is not proposed.
By the way, you should change the title of your question: you do not want to avoid imports of your imported modules (that would make them unusable), so it should rather be "how to prevent IDE to show imported modules of an imported module" or something similar.
You could import numpy inside your function
def read_image(fname):
import numpy as np
....
making it locally available to the read_image code, but not globally available.
Warning though, this might cause a performance hit (as numpy would be imported each time the code is run rather than just once on the initial import) - especially if you run read_image multiple times.
If you really want to hide it, then I suggest creating a new directory such that your structure looks like this:
custom_modules/
__init__.py
impy/
__init__.py
impy.py
and let the new impy/__init__.py contain
from impy import read_image
This way, you can control what ends up in the custom_modules.impy namespace.
I have the following situation, a module called enthought.chaco2 and I have many imports, like from enthought.chaco.api import ..
so what's the quickest way to add chaco.api and make it dispatch to the correct one?
I tried a few things, for example:
import enthought.chaco2 as c2
import enthought
enthought.chaco = c2
but it doesn't work. I might have to create a real module and add it to the path; is that the only way?
What is the behavior you're looking for?
You could use from enthought.chaco import api as ChacoApi and then address any content from the module through ChacoApi, like ChacoApi.foo() or chaco_class = ChacoApi.MyClass().
You could use (and that's not recommended) from enthought.chaco.api import * and have all the content of the module added to your base namespace.
You could add an __all__ variable declaration to chaco's __init__.py file and have the previous example (with the *) only import what you entered the list __all__.
Or you could import specifically any content you might use the way you do right now which is perfectly fine in my opinion...