I have a codebase where I'm cleaning up some messy decisions by the previous developer. Frequently, he has done something like:
from scipy import *
from numpy import *
...This, of course, pollutes the name space and makes it difficult to tell where an attribute in the module is originally from.
Is there any way to have Python analyze and fix this for me? Has anyone made a utility for this? If not, how might a utility like this be made?
I think PurityLake's and Martijn Pieters's assisted-manual solutions are probably the best way to go. But it's not impossible to do this programmatically.
First, you need to get a list of all names that existing in the module's dictionary that might be used in the code. I'm assuming your code isn't directly calling any dunder functions, etc.
Then, you need to iterate through them, using inspect.getmodule() to find out which module each object was originally defined in. And I'm assuming that you're not using anything that's been doubly from foo import *-ed. Make a list of all of the names that were defined in the numpy and scipy modules.
Now you can take that output and just replace each foo with numpy.foo.
So, putting it together, something like this:
for modname in sys.argv[1:]:
with open(modname + '.py') as srcfile:
src = srcfile.read()
src = src.replace('from numpy import *', 'import numpy')
src = src.replace('from scipy import *', 'import scipy')
mod = __import__(modname)
for name in dir(mod):
original_mod = inspect.getmodule(getattr(mod, name))
if original_mod.__name__ == 'numpy':
src = src.replace(name, 'numpy.'+name)
elif original_mod.__name__ == 'scipy':
src = src.replace(name, 'scipy.'+name)
with open(modname + '.tmp') as dstfile:
dstfile.write(src)
os.rename(modname + '.py', modname + '.bak')
os.rename(modname + '.tmp', modname + '.py')
If either of the assumptions is wrong, it's not hard to change the code. Also, you might want to use tempfile.NamedTemporaryFile and other improvements to make sure you don't accidentally overwrite things with temporary files. (I just didn't want to deal with the headache of writing something cross-platform; if you're not running on Windows, it's easy.) And add in some error handling, obviously, and probably some reporting.
Yes. Remove the imports and run a linter on the module.
I recommend using flake8, although it may also create a lot of noise about style errors.
Merely removing the imports and trying to run the code is probably not going to be enough, as many name errors won't be raised until you run just the right line of code with just the right input. A linter will instead analyze the code by parsing and will detect potential NameErrors without having to run the code.
This all presumes that there are no reliable unit tests, or that the tests do not provide enough coverage.
In this case, where there are multiple from module import * lines, it gets a little more painful in that you need to figure out for each and every missing name what module supplied that name. That will require manual work, but you can simply import the module in a python interpreter and test if the missing name is defined on that module:
>>> import scipy, numpy
>>> 'loadtxt' in dir(numpy)
True
You do need to take into account that in this specific case, that there is overlap between the numpy and scipy modules; for any name defined in both modules, the module imported last wins.
Note that leaving any from module import * line in place means the linter will not be able to detect what names might raise NameErrors!
I've now made a small utility for doing this which I call 'dedazzler'. It will find lines that are 'from module import *', and then expand the 'dir' of the target modules, replacing the lines.
After running it, you still need to run a linter. Here's the particularly interesting part of the code:
import re
star_match = re.compile('from\s(?P<module>[\.\w]+)\simport\s[*]')
now = str(time.time())
error = lambda x: sys.stderr.write(x + '\n')
def replace_imports(lines):
"""
Iterates through lines in a Python file, looks for 'from module import *'
statements, and attempts to fix them.
"""
for line_num, line in enumerate(lines):
match = star_match.search(line)
if match:
newline = import_generator(match.groupdict()['module'])
if newline:
lines[line_num] = newline
return lines
def import_generator(modulename):
try:
prop_depth = modulename.split('.')[1:]
namespace = __import__(modulename)
for prop in prop_depth:
namespace = getattr(namespace, prop)
except ImportError:
error("Couldn't import module '%s'!" % modulename)
return
directory = [ name for name in dir(namespace) if not name.startswith('_') ]
return "from %s import %s\n"% (modulename, ', '.join(directory))
I'm maintaining this in a more useful stand-alone utility form here:
https://github.com/USGM/dedazzler/
ok, this is what i think you could do, break the program. remove the imports and notice the errors that are made. Then import only the modules that you want, this may take a while but this is the only way I know of doing this, I will be happily surprised if someone does know of a tool to help
EDIT:
ah yes, a linter, I hadn't thought of that.
Related
Is there a way of unit testing what modules are imported in a Python file (a bit like ArchUnit in Java)? The context is in implementing a hexagonal architecture and wanting to ensure that the domain model does not import any code that resides in an adapter. I'd like unit tests to fail if there are forbidden imports.
For example, I might like to test that no modules within foo.bar.domain import anything from foo.bar.adapter. Imports of foo.bar.domain should be allowed from within foo.bar.adapter.
Is this possible in Python and what's the best way of achieving this?
You can use the -Ximporttime Python flag to trace imports. I'm not entirely sure what would be the logic for finding forbidden imports in your case, but here's a silly example script that might help:
import subprocess
import sys
process = subprocess.run(
('python3', '-Ximporttime', '-c', 'import ' + 'shlex'),
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
encoding='utf-8',
)
blacklisted_imports = {'enum', 're', 'zipfile'}
data = [
x.rpartition('|')[2].strip() for x in process.stderr.split('\n')
]
for import_ in data:
if import_ in blacklisted_imports:
print('found bad import:', import_)
Output:
found bad import: enum
found bad import: re
I am not aware that testing methods exist for this specific case, but someone might know more about it. One thing that comes to mind are try-catch with the methods from the other module being checked if you can call a method. Another hacky way, would be to add custom string constants in global context of the each module, and if they are exist you know that the submodule imported/used the other module.
Check more on this stack overflow post.
I have written a script for XBMC which optionally downloads a dll and then imports a module that depends on that dll if the download was successful.
However, placing the import inside a function generates a Python syntax warning.
Simplified example:
1 def importIfPresent():
2 if chkFunction() is True:
3 from myOptionModule import *
Line 3 generates the warning, but doesn't stop the script. I can't place this code at the start outside of a function because I need to generate dialog boxes to prompt the download and then hash the file once it is downloaded to check success. I also call this same code at startup in order to check if the user has already downloaded the dll.
Is there a different/better way to do this without generating the syntax warning? Or should I just ignore the warning and leave it as is?
Thank you! Using the useful responses below, I now have:
import importlib
myOptionalModule = None
def importIfPresent():
if chkFunction is True:
try:
myOptionalModule = importlib.import_module('modulex')
except ImportError:
myOptionalModule = None
...
importIfPresent()
...
def laterFunction():
if myOptionalModule != None:
myParam = 'something expected'
myClass = getattr(myOptionalModule, 'importClassName')
myFunction = getattr(myClass, 'functionName')
result = myFunction(myClass(), myParam)
else:
callAlternativeMethod()
I am posting this back mainly to share with other beginners like myself the way I learned through the discussion to use the functionality of a module imported this way instead of the standard import statement. I'm sure that there are more elegant ways of doing this that the experts will share as well...
You're not getting the warning for doing an import inside a function, you're getting the warning for using from <module> import * inside a function. Doing a In Python3, this actually becomes a SyntaxError, not a SyntaxWarning. See this answer for why wildcard imports like this in general, and expecially inside functions are discouraged.
Also, this code isn't doing what you think it does. When you do an import inside a function, the import only takes affect inside the function. You're not importing that module into the global namespace of the file, which I believe is what you're really trying to do.
As suggested in another answer importlib can help you here:
try:
import myOptionModule as opt
except ImportError:
opt = None
def importIfPresent():
global opt
if chkFunction() is True:
opt = importlib.import_module("myOptionModule")
I beleive you need to use the importlib library to facilitate this.
The code would be at the top of the mod:
import importlib
then replace "from myOptionModule import *" with "module = importlib.import_module(myOptionModule)". You can then import the defs/classes you want or import them all by using getattr(module,NAME(S)TOIMPORT).
See if that works.
Check out chapter 30 and 31 of Learning Python by Lutz for more info.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Dynamic module import in Python
I intend to make a suite of files at some point soon, and the best way to organize it is to have a list, that list will be at the very top of a file, and after it will come a ridiculous amount of code to handle what that list controls and how it operates. I'm looking to write said list only once, and said list is a list of folder and file names in this format:
[(folder/filename, bool, bool, int), (folder/filename, bool, bool, int)]
As you can see, folder/filename are the same (sort of). File name is folder name with .py on the end, but doing import XXX you don't need to do import XXX.py, so I don't see this causing an issue.
The problem I'm facing is importing using this method...
for (testName, auto, hardware, bit) in testList:
print(testName)
paths = "\\" + testName
print paths
addpath(paths)
sys.modules[testName] = testName # One of a few options I've seen suggested on the net
print("Path Added")
test = testName + ".Helloworld()"
eval(test)
So for each test I have, print the name, assemble a string which contains the path ("\\testName"), for this example, print the test path, then add the path to the list (sys.path.append(path)), then print to confirm it happened, then assemble a string which will be executed by eval for the tests main module and eventually eval it.
As you can see, I'm currently having to have a list of imports at the top. I can't simply do import testName (the contents of testName are the name of the module I wish to import), as it will try to find a module called testName, not a module called the contents of testName.
I've seen a few examples of where this has been done, but can't find any which work in my circumstances. If someone could literally throw a chunk of code which does it that would be wonderful.
I'd also request that I'm not hung, drawn, nor quartered for use of eval, it is used in a very controlled environment (the list through which it cycles is within the .py file, so no "end user" should mess with it).
Not sure if I understood everything correctly, but you can import a module dynamically using __import__:
mod = __import__(testName)
mod.HelloWorld()
Edit: I wasn't aware that the use of __import__ was discouraged by the python docs for user code: __import__ documentation (as noted by Bakuriu)
This should also work and would be considered better style:
import importlib
mod = importlib.import_module(testName)
mod.HelloWorld()
Never, ever, ever mess with sys.modules directly if you don't know exactly what you are doing.
There are a lot of ways to do what you want:
The build-in __import__ function
Using imp.load_module
Using importlib.import_module
I'd avoid using __import__ directly, and go for importlib.import_module(which is also suggested at the end of the documentation of __import__).
Add the path where module resides to sys.path. Import the module using __import__ function which accepts a string variable as module name.
import sys
sys.path.insert(0, mypath) # mypath = path of module to be imported
imported_module = __import__("string_module_name") # __import__ accepts string
imported_module.myfunction() # All symbols in mymodule are now available normally
I'm refactoring and eliminating wildcard imports on some fairly monolithic code.
Pylint seems to do a great job of listing all the unused imports that come along with a wildcard import, but what i wish it did was provide a list of used imports so I can quickly replace the wildcard import. Any quick ways of doing this? I'm about to go parse the output of pyLint and do a set.difference() on this and the dir() of the imported module. But I bet there's some tool/procedure I'm not aware of.
NB: pylint does not recommend a set of used imports. When changing this, you have to be aware of other modules importing the code you are modifying, which could use symbols which belong to the namespace of the module you are refactoring only because you have unused imports.
I recommend the following procedure to refactor from foo import *:
in an interactive shell, type:
import re
import foo as module # XXX use the correct module name here!
module_name = module.__name__
import_line = 'from %s import (%%s)' % module_name
length = len(import_line) - 3
print import_line % (',\n' + length * ' ').join([a for a in dir(module)
if not re.match('__.*[^_]{2}', a)])
replace the from foo import * line with the one printed above
run pylint, and remove the unused imports flagged by pylint
run pylint again on the whole code based, looking for imports of non existing sympols
run your unit tests
repeat with from bar import *
Here's dewildcard, a very simple tool based on Alex's initial ideas:
https://github.com/quentinsf/dewildcard
This is an old question, but I wrote something that does this based on autoflake.
See here: https://github.com/fake-name/autoflake/blob/master/autostar.py
It works the opposite way dewildcard does, in that it attempts to fully qualify all uses of wildcard items.
E.g.
from os.path import *
Is converted to
import os.path
and all uses of os.path.<func> are prepended with the proper function.
Can a python module detect if has been imported with import module or from module import *? Something like
if __something__=='something':
print 'Directly imported with "import ' + __name__ + '"'
else:
print 'Imported with "from ' + __name__ + ' import *"'
Thank you.
No, it's not possible to detect this from within the module's code. Upon the first import, the module body is executed and a new module object is inserted in sys.modules. Only after this, the requested names are inserted into the namespace of the importing module.
Upon later imports, the module body isn't even executed. So if a module is first imported as
import module
and a second time as
from module import name
it has no chance to do anything at all during the second import. In particular, it cannot check how it is imported.
While Svens answer is probably the correct one, and this might seem a bit obvious, It is what I was really looking for when I stumbled upon this question.
This module will at least know that you passed an input argument to it. While allows unit testing of just this specific script without the unit test being performed in the module that imported it.
import sys
def myfunction(blah):
return "something like: " + blah
noargs=len(sys.argv)
if noargs>1:
for i in range(noargs-1):
print myfunction(sys.argv[i+1])
However, It doesn't really help you, Emilio, if you have no input arguments. : )