Tool to help eliminate wildcard imports - python

I'm refactoring and eliminating wildcard imports on some fairly monolithic code.
Pylint seems to do a great job of listing all the unused imports that come along with a wildcard import, but what i wish it did was provide a list of used imports so I can quickly replace the wildcard import. Any quick ways of doing this? I'm about to go parse the output of pyLint and do a set.difference() on this and the dir() of the imported module. But I bet there's some tool/procedure I'm not aware of.

NB: pylint does not recommend a set of used imports. When changing this, you have to be aware of other modules importing the code you are modifying, which could use symbols which belong to the namespace of the module you are refactoring only because you have unused imports.
I recommend the following procedure to refactor from foo import *:
in an interactive shell, type:
import re
import foo as module # XXX use the correct module name here!
module_name = module.__name__
import_line = 'from %s import (%%s)' % module_name
length = len(import_line) - 3
print import_line % (',\n' + length * ' ').join([a for a in dir(module)
if not re.match('__.*[^_]{2}', a)])
replace the from foo import * line with the one printed above
run pylint, and remove the unused imports flagged by pylint
run pylint again on the whole code based, looking for imports of non existing sympols
run your unit tests
repeat with from bar import *

Here's dewildcard, a very simple tool based on Alex's initial ideas:
https://github.com/quentinsf/dewildcard

This is an old question, but I wrote something that does this based on autoflake.
See here: https://github.com/fake-name/autoflake/blob/master/autostar.py
It works the opposite way dewildcard does, in that it attempts to fully qualify all uses of wildcard items.
E.g.
from os.path import *
Is converted to
import os.path
and all uses of os.path.<func> are prepended with the proper function.

Related

Python Importing modules in a package

I currently have a module I created that has a number of functions.
It's getting quite large so I figured I should make it into a package and split the functions up to make it more manageable.
I'm just testing out how this all works before I do this for real so apologies if it seems a bit tenuous.
I've created a folder called pack_test and in it I have:
__init__.py
foo.py
bar.py
__init__.py contains:
__all__ = ['foo', 'bar']
from . import *
import subprocess
from os import environ
In the console I can write import pack_test as pt and this is fine, no errors.
pt. and two tabs shows me that I can see pt.bar, pt.environ, pt.foo and pt.subprocess in there.
All good so far.
If I want to reference subprocess or environ in foo.py or bar.py how do I do it in there?
If in bar.py I have a function which just does return subprocess.call('ls') it errors saying NameError: name 'subprocess' is not defined. There must be something I'm missing which enables me to reference subprocess from the level above? Presumably, once I can get the syntax from that I can also just call environ in a similar way?
The alternative as I could see it would be to have import subprocess in both foo.py and bar.py but then this seems a bit odd to me to have it appear across multiple files when I could have it the once at a higher level, particularly if I went on to have a large number of files rather than just 2 in this example.
TL;DR:
__init__.py :
import foo
import bar
__all__ = ["foo", "bar"]
foo.py:
import subprocess
from os import environ
# your code here
bar.py
import subprocess
from os import environ
# your code here
There must be something I'm missing which enables me to reference subprocess from the level above?
Nope, this is the expected behaviour.
import loads a module (if it isn't already), caches it in sys.modules (idem), and bind the imported names in the current namespace. Each Python module has (or "is") it's own namespace (there's no real "global" namespace). IOW, you have to import what you need in each module, ie if foo.py needs subprocess, it must explicitely import it.
This can seem a bit tedious at first but in the long run it really helps wrt/ maintainability - you just have to read the imports at the top of your module (pep 08: always put all imports at the beginning of the module) to know where a name comes from.
Also you should not use star imports (aka wild card imports aka from xxx import *) anywhere else than in your python shell (and even then...) - it's a maintainance time bomb. Not only because you don't know where each name comes from, but also because it's a sure way to rebind an already import name. Imagine that your foo module defines function "func". Somewhere you have "from foo import *; from bar import *", then later in the code a call to func. Now someone edits bar.py and adds a (distinct) "func" function, and suddenly you call fails, because you're not calling the expected "func". Now enjoy debugging this... And real-life examples are usually a bit more complex than this.
So if you fancy your mental sanity, don't be lazy, don't try to be smart either, just do the simple obvious thing: explicitely import the names you're interested in at the top of your modules.
(been here, done that etc)
You could create modules.py containing
import subprocess
import os
Then in foo.py or any of your files just have.
from modules import *
Your import statements in your files are then static and just update modules.py when you want to add an additional module accessible to them all.

is there any point in using relative paths in Python import statement?

I have a Python package called Util. It includes a bunch of files. Here is the include statements on top of one of the files in there:
from config_util import ConfigUtil #ConfigUtil is a class inside the config_util module
import error_helper as eh
This works fine when I run my unit tests.
When I install Util in a virtual environment in another package everything breaks. I will need to change the import statements to
from Util.config_util import ConfigUtil
from Util import error_helper as eh
and then everything works as before. So is there any point in using the first form or is it safe to say that it is better practice to always use the second form?
If there is no point in using the first form, then why is it allowed?
Just wrong:
from config_util import ConfigUtil
import error_helper as eh
It will only work if you happen to be in the directory Util, so that the imports resolve in the current working directory. Or you have messed with sys.path using some bad hack.
Right (using absolute imports):
from Util.config_util import ConfigUtil
import Util.error_helper as eh
Also right (using relative imports):
from .config_util import ConfigUtil
import .error_helper as eh
There is no particular advantage to using relative imports, only a couple of minor things I can think of:
Saves a few bytes in the source file (so what / who cares?)
Enables you to rename the top level without editing import statements in source code (...but how often do you do that?)
For your practical problems, maybe this answer can help you.
Regarding your direct question: there's not a lot to it, but they let you move files and rename containing directories more easily. You may also prefer relative imports for stylistic reasons; I sure do.
The semantics are the same if the paths are correct. If your module is foo.bar, then from foo.bar.baz import Baz and from .baz import Baz are the same. If they don't do the same, then you're likely calling your Python file as a script (python foo/bar.py), in which case it will be module __main__ instead of foo.bar.

Python : 'import module' vs 'import module as'

Is there any differences among the following two statements?
import os
import os as os
If so, which one is more preferred?
It is just used for simplification,like say
import random
print random.randint(1,100)
is same as:
import random as r
print r.randint(1,100)
So you can use r instead of random everytime.
The below syntax will help you in understanding the usage of using "as" keyword while importing modules
import NAMES as RENAME from MODULE searching HOW
Using this helps developer to make use of user specific name for imported modules.
Example:
import random
print random.randint(1,100)
Now I would like to introduce user specific module name for random module thus
I can rewrite the above code as
import random as myrand
print myrand.randint(1,100)
Now coming to your question; Which one is preferred?
The answer is your choice; There will be no performance impact on using "as" as part of importing modules.
Is there any differences among the following two statements?
No.
If so, which one is more preferred?
The first one (import os), because the second one does the exact same thing but is longer and repeats itself for no reason.
If you want to use name f for imported module foo, use
import foo as f
# other examples
import numpy as np
import pandas as pd
In your case, use import os
The import .... as syntax was designed to limit errors.
This syntax allows us to give a name of our choice to the package or module we are importing—theoretically this could lead to name clashes, but in practice the as syntax is used to avoid them.
Renaming is particularly useful when experimenting with different implementations of a module.
Example: if we had two modules ModA and ModB that had the same API we could write import ModA as MyMod in a program, and later on switch to using import MoB as MyMod.
In answering your question, there is no preferred syntax. It is all up to you to decide.

Reversing from module import *

I have a codebase where I'm cleaning up some messy decisions by the previous developer. Frequently, he has done something like:
from scipy import *
from numpy import *
...This, of course, pollutes the name space and makes it difficult to tell where an attribute in the module is originally from.
Is there any way to have Python analyze and fix this for me? Has anyone made a utility for this? If not, how might a utility like this be made?
I think PurityLake's and Martijn Pieters's assisted-manual solutions are probably the best way to go. But it's not impossible to do this programmatically.
First, you need to get a list of all names that existing in the module's dictionary that might be used in the code. I'm assuming your code isn't directly calling any dunder functions, etc.
Then, you need to iterate through them, using inspect.getmodule() to find out which module each object was originally defined in. And I'm assuming that you're not using anything that's been doubly from foo import *-ed. Make a list of all of the names that were defined in the numpy and scipy modules.
Now you can take that output and just replace each foo with numpy.foo.
So, putting it together, something like this:
for modname in sys.argv[1:]:
with open(modname + '.py') as srcfile:
src = srcfile.read()
src = src.replace('from numpy import *', 'import numpy')
src = src.replace('from scipy import *', 'import scipy')
mod = __import__(modname)
for name in dir(mod):
original_mod = inspect.getmodule(getattr(mod, name))
if original_mod.__name__ == 'numpy':
src = src.replace(name, 'numpy.'+name)
elif original_mod.__name__ == 'scipy':
src = src.replace(name, 'scipy.'+name)
with open(modname + '.tmp') as dstfile:
dstfile.write(src)
os.rename(modname + '.py', modname + '.bak')
os.rename(modname + '.tmp', modname + '.py')
If either of the assumptions is wrong, it's not hard to change the code. Also, you might want to use tempfile.NamedTemporaryFile and other improvements to make sure you don't accidentally overwrite things with temporary files. (I just didn't want to deal with the headache of writing something cross-platform; if you're not running on Windows, it's easy.) And add in some error handling, obviously, and probably some reporting.
Yes. Remove the imports and run a linter on the module.
I recommend using flake8, although it may also create a lot of noise about style errors.
Merely removing the imports and trying to run the code is probably not going to be enough, as many name errors won't be raised until you run just the right line of code with just the right input. A linter will instead analyze the code by parsing and will detect potential NameErrors without having to run the code.
This all presumes that there are no reliable unit tests, or that the tests do not provide enough coverage.
In this case, where there are multiple from module import * lines, it gets a little more painful in that you need to figure out for each and every missing name what module supplied that name. That will require manual work, but you can simply import the module in a python interpreter and test if the missing name is defined on that module:
>>> import scipy, numpy
>>> 'loadtxt' in dir(numpy)
True
You do need to take into account that in this specific case, that there is overlap between the numpy and scipy modules; for any name defined in both modules, the module imported last wins.
Note that leaving any from module import * line in place means the linter will not be able to detect what names might raise NameErrors!
I've now made a small utility for doing this which I call 'dedazzler'. It will find lines that are 'from module import *', and then expand the 'dir' of the target modules, replacing the lines.
After running it, you still need to run a linter. Here's the particularly interesting part of the code:
import re
star_match = re.compile('from\s(?P<module>[\.\w]+)\simport\s[*]')
now = str(time.time())
error = lambda x: sys.stderr.write(x + '\n')
def replace_imports(lines):
"""
Iterates through lines in a Python file, looks for 'from module import *'
statements, and attempts to fix them.
"""
for line_num, line in enumerate(lines):
match = star_match.search(line)
if match:
newline = import_generator(match.groupdict()['module'])
if newline:
lines[line_num] = newline
return lines
def import_generator(modulename):
try:
prop_depth = modulename.split('.')[1:]
namespace = __import__(modulename)
for prop in prop_depth:
namespace = getattr(namespace, prop)
except ImportError:
error("Couldn't import module '%s'!" % modulename)
return
directory = [ name for name in dir(namespace) if not name.startswith('_') ]
return "from %s import %s\n"% (modulename, ', '.join(directory))
I'm maintaining this in a more useful stand-alone utility form here:
https://github.com/USGM/dedazzler/
ok, this is what i think you could do, break the program. remove the imports and notice the errors that are made. Then import only the modules that you want, this may take a while but this is the only way I know of doing this, I will be happily surprised if someone does know of a tool to help
EDIT:
ah yes, a linter, I hadn't thought of that.

Import file using string as name [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Dynamic module import in Python
I intend to make a suite of files at some point soon, and the best way to organize it is to have a list, that list will be at the very top of a file, and after it will come a ridiculous amount of code to handle what that list controls and how it operates. I'm looking to write said list only once, and said list is a list of folder and file names in this format:
[(folder/filename, bool, bool, int), (folder/filename, bool, bool, int)]
As you can see, folder/filename are the same (sort of). File name is folder name with .py on the end, but doing import XXX you don't need to do import XXX.py, so I don't see this causing an issue.
The problem I'm facing is importing using this method...
for (testName, auto, hardware, bit) in testList:
print(testName)
paths = "\\" + testName
print paths
addpath(paths)
sys.modules[testName] = testName # One of a few options I've seen suggested on the net
print("Path Added")
test = testName + ".Helloworld()"
eval(test)
So for each test I have, print the name, assemble a string which contains the path ("\\testName"), for this example, print the test path, then add the path to the list (sys.path.append(path)), then print to confirm it happened, then assemble a string which will be executed by eval for the tests main module and eventually eval it.
As you can see, I'm currently having to have a list of imports at the top. I can't simply do import testName (the contents of testName are the name of the module I wish to import), as it will try to find a module called testName, not a module called the contents of testName.
I've seen a few examples of where this has been done, but can't find any which work in my circumstances. If someone could literally throw a chunk of code which does it that would be wonderful.
I'd also request that I'm not hung, drawn, nor quartered for use of eval, it is used in a very controlled environment (the list through which it cycles is within the .py file, so no "end user" should mess with it).
Not sure if I understood everything correctly, but you can import a module dynamically using __import__:
mod = __import__(testName)
mod.HelloWorld()
Edit: I wasn't aware that the use of __import__ was discouraged by the python docs for user code: __import__ documentation (as noted by Bakuriu)
This should also work and would be considered better style:
import importlib
mod = importlib.import_module(testName)
mod.HelloWorld()
Never, ever, ever mess with sys.modules directly if you don't know exactly what you are doing.
There are a lot of ways to do what you want:
The build-in __import__ function
Using imp.load_module
Using importlib.import_module
I'd avoid using __import__ directly, and go for importlib.import_module(which is also suggested at the end of the documentation of __import__).
Add the path where module resides to sys.path. Import the module using __import__ function which accepts a string variable as module name.
import sys
sys.path.insert(0, mypath) # mypath = path of module to be imported
imported_module = __import__("string_module_name") # __import__ accepts string
imported_module.myfunction() # All symbols in mymodule are now available normally

Categories

Resources