how Dataframe class is imported directly from pandas package - python

in pandas we can directly do the following
import pandas as pd
df = pd.DataFrame()
here, pandas is a package. DataFrame is a class. So, how does this work because, DataFrame was actually defined in a pandas.core.frame ( it is defined in frame.py which is inside core folder of pandas.)
Note: i was thinking that this kind of behavior can be achieved by doing something in __init__.py file. Can anyone help me understand this.

__init__.py is technically just another python module, so any name defined in a package's __init__.py is directly accessible from the package itself. And it's indeed a common pattern in Python to use a package's __init__.py as a facade for submodules / subpackages.
FWIW note that panda's __init__.py, while an example of using __init__.py as a facade, is not following good practices in that it uses "star imports" (from submodule import *) which makes it very painful to trace the origin of a name (the module where it is defined) - specially with such a large package as Panda - and is very brittle too since if two submodules export a same name the one imported last will shadow the first one. The good practice is to always explicitely specify which names you want to import:
from submodule1 import foo, bar
from submodule2 import baaz, quux
which makes clear where a name comes from and will make a duplicate name much more obvious:
from submodule1 import foo, bar
from submodule2 import baaz, quux
from submodule3 import foo # oops, we will have to rename either this one or submodule1.foo

Dataframe as you said is defined in pandas/core/frame.py.
Let's take a look at pandas/__init__.py in pandas directory on github.
Line 42:
from pandas.core.api import *
And pandas/core/api.py imports Dataframe from pandas/core/frame.py in line 23:
from pandas.core.frame import DataFrame
So since you import * from pandas/core/api.py in pandas/__init__.py and pandas/core/api.py imports Dataframe then you have Dataframe imported directly to pandas.

Related

Multi layer package in python

I have a python package with packages in it. This explanation seems strange, so I'll include my package's structure:
package\_
__init__.py
subpackage1\_
__init__.py
file1.py
subpackage2\_
__init__.py
file2.py
(I'm simplifying it for easier understanding).
The __init__.py on the top level looks like this:
__all__ = ["subpackage1", "subpackage2"]
And, for some reason, when importing the package, it dosen't recognise anythong from file1.py or file2.py. Any ideas how to fix it?
If you need more details, here's the project on github: https://github.com/Retr0MrWave/mathModule
. The directory I called package is mathmodule_pkg in the actual project
Filling the __all__ field with names does not make imports possible, it merely serves as a hint of what you mean to make importable. This hint is picked up by star-imports to restrict what is imported, and IDEs like pycharm also use it to get an idea of what is and isn't exposed - but that's about it.
If you want to enable top-level imports of your nested classes and functions, you need to
import them into the top-level __init__.py
bind them to names that can be used for the import
optionally, reference said names in __all__ to make the API nice and obvious
Using the project you're referencing as an example, this is what it would look like:
mathmodule_pkg/__init__.py
import mathmodule_pkg.calculus.DerrivativeAndIntegral #1
integral = mathmodule_pkg.calculus.DerrivativeAndIntegral.integral #2
__all__ = ['integral'] # 3
Using the very common form of from some.package import some_name we can combine steps 1 and 2 and reduce the potential for bugs when re-binding the name:
from mathmodule_pkg.calculus.DerrivativeAndIntegral import integral # 1 and 2
__all__ = ['integral'] # 3
Using either form, after installing your package the following will be possible:
>>> from mathmodule_pkg import integral
>>> integral(...)

Importing a function from another file, where to import other libraries?

Simple question, I've searched to no avail. Say I have a file "funcs.py", in it there's a function I want to call into my current script. The function uses another library (e.g. pandas), where do I import that library? What's the convention?
Do I put it inside the function in funcs.py?
#funcs.py
def make_df():
import pandas as pd
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
Do I put it outside the function in funcs.py?
#funcs.py
import pandas as pd
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
Or do I put it in the current script I'm using?
#main.py
import pandas as pd
from funcs import make_df
df = make_df()
Thanks and kind regards.
In Python each file is a module. Each module has its own namespace - its own set of variables. Each function also has its own local namespace.
When you use the name pd in a function defined in the module func, it will first look for the local variable pd in the function - if it doesn't exist,
it will look for it in the namespace of its module, func. It will not look for it in the module main, even if the code calling the function is in main.py
This is known as lexical scoping - the rule is that variables are looked up close to where the code is defined, not where it is used. Some languages do look up variables close to where the code is used, it's known as dynamic scoping- in one of these languages something like your solution #3 would work, but most languages including Python follow lexical scoping rules, so it won't work.
So pandas has to be imported in funcs.py. main.py doesn't have to import or even
know anything about pandas to use make_df.
If you import pandas at the top of func.py, then when you import the module func from main.py, the line import pandas as pd at the top of func.py will be executed, the pandas module will be loaded, and a reference to it will be created in func bound to the name pd. There is no need to re-import it in main.py.
If you do re-import pandas in main.py, Python will be smart enough not to reload the entire module just because you imported it in two places, it will just give you a reference to the already loaded pandas module.
Putting the import in the body of the function will work but it's not considered good practice, unless you have a really good reason to do so. Normally imports go at the top of the file where they are used.
#3 wouldn't work. In most cases, #2 is the preferred option (the main exception would be if the library is a large (slow to import) library that's only used by that function). You might also want to consider one of these options (for optional dependencies):
#funcs.py
try:
import pandas as pd
except ImportError:
pass
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
or
#funcs.py
try:
import pandas as pd
except ImportError:
pass
if pd is not None:
def make_df():
return pd.DataFrame(index=[1,2,3],data=[1,2,3])
The best practice is to include at the beginning of your code in funcs.py
There is no need and you should not include pandas in your main.py.
Basically when you use import pandas pandas library becomes part of your code.
Straight from the source at https://www.python.org/dev/peps/pep-0008/?#imports
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
Best practice is to do all imports on the first few lines of your script file, before doing any other coding

Import specific names only

I have the following package structure:
analysis/
__init__.py
main.py
utils/
__init__.py
myzip.py
myzip.py contains the following:
import pandas
def save():
...
def load():
...
In my main.py script I do:
from utils import myzip
and when I type myzip.<TAB> or do dir(myzip) the imported pandas appears as well. Can I avoid showing the pandas imported in the submodule? Is there a best practice for importing third party modules?
I tried adding the following to analysis/utils/__init__.py:
from utils.myzip import save, load
but it still shows pandas when I dir(myzip) form main.py.
Looking at from sklearn import cluster they manage to achieve this, without showing all the numpy imports they have e.g. in cluster/k_means_.py
As described in this question, you can import the modules with an alias that begins with an underscore (e.g., import pandas as _pandas). The name will still be available as myzip._pandas, but IPython tab-completion will not autocomplete it (unless you explicitly type the underscore first). Also, it will not be imported if you do from myzip import *, although you shouldn't do that anyway.
However, as mentioned in this other question, the better "solution" is to just not worry about it. If someone does import myzip, it does no harm for them to be able to access myzip.pandas; it's not like they couldn't import pandas themselves anyway. Also, there is no risk of a name conflict in this situation, since pandas is namespaced under your module. The only way a name conflict could arise is if your module itself used the name pandas for two different things (e.g., defining a global variable called pandas in addition to the imported module); but this is a problem internal to your module, regardless of whether pandas is externally accessible.
A name conflict can arise if someone has their own variable called pandas and then does from myzip import *, but star import is discouraged for precisely that reason, and the names of imported modules are no different than other names in this regard. For instance, someone doing from myzip import * could face a conflict with the names save or load. There's no use in worrying specifically about imported module names when it comes to star-import name conflicts.
Also, it's worth noting that many widely-used libraries expose their own imports in this way, and it's not considered a problem. Pandas itself is an example:
>>> import pandas
>>> pandas.np
<module 'numpy' from '...'>
. . . so you are in good company if you just consider it a non-problem.
If moduleB is imported at the module-level of moduleA, then moduleB is part of the namespace of moduleA.
One way to hide it however would be to import it with an alias:
import pandas as _hidden_pandas
It would then appear as _hidden_pandas, hiding it to some extent.
The tab-completion would at least not find it.
A partial solution
Nest the submodule into a folder and import only the necessary methods in the __init__.py, that is:
analysis/
__init__.py
main.py
utils/
__init__.py
--> myzip/
__init__.py
myzip.py
where the myzip/__init__.py has:
from .myzip import load, save
then after from utils import myzip the dir(myzip) will list load, save and myzip, but not the pandas, which is hidden inside myzip.myzip.<TAB>.
Have not figured out how sklearn hide their third party modules.

Do not see module imports at top level of Python package

Consider the following python package structure
working_directory/
-- test_run.py
-- mypackge/
---- __init__.py
---- file1.py
---- file2.py
and say inside file1.py I have defined a function, func1() and I've also imported some functions from numpy with something like from numpy import array. Now I want to import and use mypackage from test_run.py without seeing these numpy functions in the namespace. I want to import it using import mypackage as mp and see
mp.file1.func1()
mp.file2.func2()
etc
I don't want to see mp.file1.array(). How can I do it?
One possibility would be to use underscores:
from numpy import array as _array.
Although this doesn't prohibit people from accessing mp.file1._array it is a general considered that variables beginning with underscores are 'private'.
AFAIK, there is no simple way to disallow access to any variable in python. (one way would be to make it properties of a class, see: https://docs.python.org/3/library/functions.html#property)

Monkey patching and dispatching

I have the following situation, a module called enthought.chaco2 and I have many imports, like from enthought.chaco.api import ..
so what's the quickest way to add chaco.api and make it dispatch to the correct one?
I tried a few things, for example:
import enthought.chaco2 as c2
import enthought
enthought.chaco = c2
but it doesn't work. I might have to create a real module and add it to the path; is that the only way?
What is the behavior you're looking for?
You could use from enthought.chaco import api as ChacoApi and then address any content from the module through ChacoApi, like ChacoApi.foo() or chaco_class = ChacoApi.MyClass().
You could use (and that's not recommended) from enthought.chaco.api import * and have all the content of the module added to your base namespace.
You could add an __all__ variable declaration to chaco's __init__.py file and have the previous example (with the *) only import what you entered the list __all__.
Or you could import specifically any content you might use the way you do right now which is perfectly fine in my opinion...

Categories

Resources