I am writing a python module and I am using many imports of other different modules.
I am bit confused that whether I should import all the necessary dependent modules in the opening of the file or shall I do it when necessary.
I also wanted to know the implications of both.
I come from C++ back ground so I am really thrilled with this feature and does not see any reason of not using __import__(), importing the modules only when needed inside my function.
Kindly throw some light on this.
To write less code, import a module at the first lines of the script, e.g.:
#File1.py
import os
#use os somewhere:
os.path.chdir(some_dir)
...
...
#use os somewhere else, you don't need to "import os" everywhere
os.environ.update(some_dict)
While sometimes you may need to import a module locally (e.g., in a function):
abc=3
def foo():
from some_module import abc #import inside foo avoids you from naming conflicts
abc(...) #call the function, nothing to do with the variable "abc" outside "foo"
Don't worry about the time consumption when calling foo() multiple times, since import statements loads modules/functions only one time. Once a module/function is imported, the object is stored in dictionary sys.modules, which is a lookup table for speedup when running the same import statement.
As #bruno desthuilliers mentioned, importing insede functions may not be that pythonic, it violates PEP8, here's a discussion I found, you should stick to importing at the top of the file most of the time.
First, __import__ isn't usually needed anywhere. It's main purpose is to support dynamic importing of things that you don't know ahead of time (think plug-ins). You can easily use the import statement inside your function:
import sys
def foo():
import this
if __name__ == "__main__":
print sys.version_info
foo()
The main advantage to importing everything up-front is that it is most customary. That's where people reading your code will go to see if something is imported or not. Also, you don't need to write import os in every function that uses os. The main downsides of this approach are that:
you can get yourself into unresolvable import loops (A imports B which imports A)
that you pull everything into memory even if you aren't going to use it.
The second problem isn't typically an issue -- very rarely do you notice the performance or memory impact of an import.
If you run into the first problem, it's likely a symptom of poorly grouped code and the common stuff should be factored into a new module C which both A and B can use.
Firstly, it's a violation of PEP8 using imports inside functions.
Calling import it's an expensive call EVEN if the module is already loaded, so if your function is gonna being called many times this will not compensate the performance gain.
Also when you call "import test" python do this:
dataFile = __ import__('test')
The only downside of imports at the top of file it's the namespace that get polluted very fast depending on complexity of the file, but if your file it's too complex it's a signal of bad design.
Related
I have kind of a tricky question, so that it is difficult to even describe it.
Suppose I have this script, which we will call master:
#in master.py
import slave as slv
def import_func():
import time
slv.method(import_func)
I want to make sure method in slave.py, which looks like this:
#in slave.py
def method(import_func):
import_func()
time.sleep(10)
actually runs like I imported the time package. Currently it does not work, I believe because the import stays exists only in the scope of import_func().
Keep in mind that the rules of the game are:
I cannot import anything in slave.py outside method
I need to pass the imports which method needs through import_func() in master.py
the procedure must work for a variable number of imports inside method. In other words, method cannot know how many imports it will receive but needs to work nonetheless.
the procedure needs to work for any import possible. So options like pyforest are not suitable.
I know it can theoretically be done through importlib, but I would prefer a more straightforward idea, because if we have a lot of imports with different 'as' labels it would become extremely tedious and convoluted with importlib.
I know it is kind of a quirky question but I'd really like to know if it is possible. Thanks
What you can do is this in the master file:
#in master.py
import slave as slv
def import_func():
import time
return time
slv.method(import_func)
Now use time return value in the slave file:
#in slave.py
def method(import_func):
time = import_func()
time.sleep(10)
Why would you have to do this? It's because of the application's stack. When import_func() is called on slave.py, it imports the library on the stack. However, when the function terminates, all stack data is released from memory. So the library would get released and collected by the garbage collector.
By returning time from import_func(), you guarantee it continues existing in memory once the function terminates executing.
Now, to import more modules? Simple. Return a list with multiples modules inside. Or maybe a Dictionary for simple access. That's one way of doing it.
[Edit] Using a dictionary and importlib to pass multiple imports to slave.py:
master.py:
import test2 as slv
import importlib
def master_import(packname, imports={}):
imports[packname] = importlib.import_module(packname)
def import_func():
imports = {}
master_import('time', imports)
return imports
slv.method(import_func)
slave.py:
#in slave.py
def method(import_func):
imports = import_func()
imports['time'].sleep(10)
This way, you can literally import any modules you want on master.py side, using master_import() function, and pass them to slave script.
Check this answer on how to use importlib.
I have a module that defines a class which instantiates a class from one of two (or more) other modules. Below are a couple of code examples. In the first example, two modules are imported, but only one is used (one per instance of MyIo). In the second example, only the required module is imported. There may be one or more instances of MyIo in a higher level module.
I like that the second example only imports what is used, but I don't really like that the import is taking place in a 'code execution' section.
My questions are:
Which of the examples is a better architectural choice, and why?
Is there a penalty for importing modules that are not eventually
used?
Are imports in code execution sections in Python considered 'bad form?'
This example imports both modules, but only uses one...
''' MyIo.py '''
...
...
from DevSerial import Device as DeviceSerial
from DevUSB import Device as DeviceUSB
class MyIo:
def __init__(self, port)
if port.lower() == 'usb':
self.device=DeviceUSB()
else:
self.device=DeviceSerial(port)
...
...
The following imports only the module being used...
''' MyIo.py '''
...
...
class MyIo:
def __init__(self, port)
if port.lower() == 'usb':
from DevUSB import Device
self.device=Device()
else:
from DevSerial import Device
self.device=Device(port)
...
...
As per PEP 8, all imports should be together at the top of the file. Having them spread throughout the file leads to hard to maintain and debug software.
The only performance overhead I can think of is at program startup - it has to load more modules. Once the program is running there shouldn't be any extra overhead.
To answer your questions:
The former. It is clearly obvious what other files are used, whereas you have to dig through the second to find all the dependencies.
Yes, but only at startup.
Yes.
Actually, even tho you are importing the modules into a function, they will still exists into sys.modules once your function is done executing unless your are deleting them manually. So yeah, there's no point to don't import them directly at the top of your code (like example #1).
The most common use for imports that are not just jammed up at the top of the page is for situations where sibling modules represent different, mutually exclusive options: the best example is os.path, which is automatically swapped for the appropriate module. Even there its common to do the differential import up at the top and not down in the code.
Is there any effect of unused imports in a Python script?
You pollute your namespace with names that could interfere with your variables and occupy some memory.
Also you will have a longer startup time as the program has to load the module.
In any case, I would not become too neurotic with this, as if you are writing code you could end up writing and deleting import os continuously as your code is modified. Some IDE's as PyCharm detect unused imports so you can rely on them after your code is finished or nearly completed.
"Unused" might be a bit harder to define than you think, for example this code in test.py:
import sys
import unused_stuff
sys.exit(0)
unused_stuff seems to be unused, but if it were to contain:
import __main__
def f(x): print "Oh no you don't"
__main__.sys.exit = f
Then running test.py doesn't do what you'd expect from a casual glance.
I am importing a lot of different scripts, so at the top of my file it gets cluttered with import statements, i.e.:
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
...
Is there a way to move all of these somewhere else and then all I have to do is import that file instead so it's just one clean import?
I strongly advise against what you want to do. You are doing the global include file mistake again. Although only one module is importing all your modules (as opposed to all modules importing the global one), the remaining point is that if there's a valid reason for all those modules to be collected under a common name, fine. If there's no reason, then they should be kept as separate includes. The reason is documentation. If I open your file, and see only one import, I don't get any information about what is imported and where it comes from. If on the other hand, I have the list of imports, I know at a glance what is needed and what not.
Also, there's another important error I assume you are doing. When you say
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
I assume you are importing, for example, a class, like this
from somewhere.fileA import MyClass
this is wrong. This alternative solution is much better
from somewhere import fileA
<later>
a=fileA.MyClass()
Why? two reasons: first, namespacing. If you have two modules having a class named MyClass, you would have a clash. Second, documentation. Suppose you use the first option, and I find in your code the following line
a=MyClass()
now I have no idea where this MyClass comes from, and I will have to grep around all your files in order to find it. Having it qualified with the module name allows me to immediately understand where it comes from, and immediately find, via a /search, where stuff coming from the fileA module is used in your program.
Final note: when you say "fileA" you are doing a mistake. There are modules (or packages), not files. Modules map to files, and packages map to directories, but they may also map to egg files, and you may even create a module having no file at all. This is naming of concepts, and it's a lateral issue.
Of course there is; just create a file called myimports.py in the same directory where your main file is and put your imports there. Then you can simply use from myimports import * in your main script.
Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module
You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?
There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.
The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.
Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data