This may be a dumb question, but charging boldly ahead anyway.
I have a library of about a dozen Python modules I maintain for general use. Recently, after advice found here on SO, I changed all of the modules so they are imported in the import x as y style instead of from x import *. This solved several problems and made the code easier to manage.
However, there was an unintended side effect of this. Many of the modules use Python builtin modules like sys or os to do whatever, and the way the code was previously set up, if I typed import sys in module x, and used from x import * in module y, I didn't have to import sys in module y. As a result, I took this for granted quite a lot (terrible practice, I know). When I switched to import x, this caused a lot of broken functions, as you can imagine.
So here's the main issue: because Python is an interpreted language, errors about missing modules in a function won't show up until a function is actually run. And because this is just a general-use library, some of these errors could persist for months undetected, or longer.
I'm completely prepared to write a unit test for each module (if __name__ == "__main__" and all that), but I wanted to ask first: is there an automated way of checking every function in a module for import/syntax errors, or any other error that is not dependent on input? Things that a compiler would catch in C or another language. A brief Google and SO search didn't turn up anything. Any suggestions are welcome and appreciated.
Yes. PyFlakes will warn you about those most basic errors, and you should make sure it's integrated with your favorite text editor, so it tells you about missing imports or unused imports whenever you save the file.
PyFlakes will, amgonst other things, tell you about
syntax errors
undefined names
missing imports
unused imports
To just run PyFlakes on all your files in a directory, you can just do:
pyflakes /path/to/dir
One big advantage that PyFlakes has over more advanced linting tools like PyLint is that it does static analysis - which means, it doesn't need to import your code (which can be a pain if you've got some complex dependencies). It just analyses the abstract syntax tree of your Python source, and therefore catches the most basic of errors - those that usually prevent your script from having even a chance of running.
I should also mention that there is a related tool, flake8, which combines PyFlakes with PEP8 convention checks and McCabe code complexity analysis.
There's PyFlakes integrations for every editor (or IDE) I know. Here's just a couple (in no particular order):
Sublime Text
vim
emacs
TextMate
Eclipse / PyDev
Related
My apologies if this post is a bit inappropriate, but I'm desperate for comments on a programming problem.
I have been frustrated for years about what I believe is a Python design flaw, and many others agree. It's about how import statements work, particularly code like:
from ..mypackage import mymodule
from ..mypackage.mymodule import mymethod
from .. import mypackage
and similar statements.
Only the most simple, contrived, canonical cases actually work. Anything else results in the error message:
ImportError: attempted relative import with no known parent package
I use sys.path.append() as a workaround, but that should not be necessary, and it is not portable.
It seems the issue revolves around where Python thinks the importing module is in the file system at the time it attempts to execute the import statements. My opinion is that Python should be able to figure out if it is in a package and exactly where in the file hierachy that is. The import statements should work as expected if the importing module is called from another module, or if it is run from an interpreter, or PyCharm, IDLE, Spyder, or by some other way.
There is a SO post, Relative imports for the billionth time, which addresses this problem. The main article plus 15 answers and 36 comments indicate that this issue has been around for a long time. Many very smart people have offered exotic explanations and proposed cumbersome solutions to an issue that should not exist, and yet the powers that control the development of Python have not moved on the matter. It should not be necessary to be knee deep in Python internals in order to write a simple application. That's whole idea of a high level language; this is not C++.
Someone reading this must have influence with Python developers. Please use it.
Any comments, please.
I have an environment with some extreme constraints that require me to reduce the size of a planned Python 3.8.1 installation. The OS is not connected to the internet, and a user will never open an interactive shell or attach a debugger.
There are of course lots of ways to do this, and one of the ways I am exploring is to remove some core modules, for example python3-email. I am concerned that there are 3rd-party packages that future developers may include in their apps that have unused but required dependencies on core python features. For example, if python3-email is missing, what 3rd-party packages might not work that one would expect too? If a developer decides to use a logging package that contains an unreferenced EmailLogger class in a referenced module, it will break, simply because import email appears at the top.
Do package design requirements or guidelines exist that address this?
It's an interesting question, but it is too broad to be cleanly answered here. In short, the Python standard library is expected to always be there, even though sometimes it broken up in multiple parts (Debian for example). But you say it yourself, you don't know what your requirements are since you don't know yet what future packages will run on this interpreter... This is impossible to answer. One thing you could do is to use something like modulefinder on the future code before letting it run on that constrained Python interpreter.
I was able to get to a solution. The issue was best described to me as cascading imports. It is possible to stop a module from being loaded, by adding an entry to sys.modules. For example, when importing the asyncio module ssl and _ssl modules will be loaded, even though they are not needed outside of ssl. This can be stopped with the following code. This can be verified both by seeing the python process is 3MB smaller, but also by using module load hooks to watch each module as it loads:
import importhook
import sys
sys.modules['ssl'] = None
#importhook.on_import(importhook.ANY_MODULE)
def on_any_import(module):
print(module.__spec__.name)
assert module.__spec__.name not in ['ssl', '_ssl']
import asyncio
For my original question about 3rd-party design guidelines, some recommend placing the import statements within the class rathe that at the module level, however this is not routinely done.
I've got the following Python code in a module:
import ldap
import ldap.sasl
x = ldap.VERSION3
y = ldap.sasl.gssapi
Eclipse (with PyDev) warns me that my first import statement is unused. But it's clearly being used. Python apparently implicitly imports parent packages -- which I find weird, since Python prefers explicit, and I can't find any mention of this in the documentation. But that doesn't mean that I'm not using the first. Even more odd, if I remove the last line, PyDev claims that both of the import statements are unused. (I think this last case is clearly a bug in PyDev.)
So my question is, is there a way to turn off the warning for the first line, without turning off warnings for all unused imports? And I'd rather not pollute my code with #UnusedImport comments.
The right answer here is to do what PyDev says.
Because import ldap.sasl always imports ldap, the import ldap statement is not necessary, and therefore should be removed.
As for PyDev claiming that both are unused if you remove the last lineā¦ Well, that's definitely not the best messaging in the world, but it's not really wrong. The import ldap is unnecessary because you have import ldap.sasl. But the import ldap.sasl is unnecessary because you never use it. True, if you remove the import ldap.sasl, then the import ldap stops being unnecessary, but the warnings aren't about what would be true for a different version of your code, right?
You're right that the tutorial section on Packages doesn't explain this at all, and the 2.x reference documentation doesn't really say anything directly about it.
However, the 3.x reference documentation on the import system specifically describes this behavior, and gives examples (e.g., see the "Regular packages" section), and the 2.x reference does directly refer to the original package spec, which says:
Whenever a submodule of a package is loaded, Python makes sure that the package itself is loaded first, loading its __init__.py file if necessary. The same for packages. Thus, when the statement import Sound.Effects.echo is executed, it first ensures that Sound is loaded; then it ensures that Sound.Effects is loaded; and only then does it ensure that Sound.Effects.echo is loaded (loading it if it hasn't been loaded before).
Also, all existing Python 2.x implementations do things the way the 3.x documentation and the original package spec describe, and it's not likely people will be creating brand-new 2.x implementations in the future, so I think you can rely on this being a guarantee.
If you want to know the original rationale, you have to read the ni module from Python 1.3. (I don't have a link for it.) If you want to know why it's still that way in 2.7, it's because the first radical cleanup in Python didn't happen until 3.0. If you want to know why it's still that way in 3.0, and even in 3.3 (after import was improved and further cleaned up), you'll have to read the discussions around PEP 328, importlib, etc. on python-ideas and python-dev. When there's a consensus not to change something (or when there's so little discussion that nobody even finds it necessary to call for a consensus), you don't get a PEP or any other "paper trail" outside the mailing lists. If I remember correctly, it did come up in passing while discussing the relative-vs.-absolute import ideas that became PEP 328, but nobody thought it was a problem that needed to be fixed.
I have a Python project in eclipse, which imports modules that can't be found by Python. Here's a list of some of the cases:
some of the files might import both the 2.x and 3.x versions of some built-in modules for compatibility purposes (but I can specify only one grammar version in the project's settings)
since the scripts I'm writing will be ran in an environment very different from mine, some of the modules I use don't even exist in the system (like Windows-specific modules, or modules from other projects that I REALLY don't want to link directly to this one)
modules that might or might not be installed on the machine where the script is going to be executed (of course, wrapped into try-except clauses)
and so on...
It is very annoying to have these modules marked as errors, since they make REAL syntax errors much less noticeable.
I know that this behavior can somehow be overridden - I have another project that doesn't mark unresolved imports as errors, but I just can't find the right setting for it. Can anyone help me?
How about adding ##UnresolvedImport to your imports? Eg:
import a_module_pydev_doesnt_know ##UnresolvedImport
You can simply press Ctrl-1 when your cursor is placed in a line where PyDev marked an error and and select the corresponding entry to automatically add it.
Edit: I don't have much experience with it, but it seems that if you want to change this for a whole project (or do it without touching your code), you can also add the module in question to the forced built-ins: http://pydev.org/manual_101_interpreter.html#PyDevInterpreterConfiguration-ForcedBuiltins
Question: How can I systematically probe into files that are involved at any time by the interpreter (like in debug mode).
When everything fails I get error message. What I ask for is the opposite: Everything works, but I don't know how much redundant rubbish I have in comparison to its usage, even though I can imagine that something like pynotify probably could trace it.
Context:
I've spent all morning exercising trial & error to get a package to work. I'm sure I have copied the relevant python package into at least 3 directories and messed up my windows setx -m path badly with junk. Now I'm wondering how to clean it all up without breaking any dependencies, and actually learn from the process.
I can't possibly be the only one wondering about this. Some talented test-developer must have written a script/package that:
import everything from everywhere
check for all dependencies
E = list(errorMessages)
L = list_of_stuff_that_was_used
print L
print E
so if I have something stored which is not in L, I can delete it. But of course the probing has to be thorough to exhaust all accessible files (or at least actively used directories).
What the question is NOT about:
I'm not interested in what is on the sys.path. This is trivial.
More Context:
I know from The Hitchhikers Guide to Packaging that the future of this problem is being adressed, however it does not probe into the past. So with the transition from Python 2xx to 3xx this problem must become more and more relevant?
The dynamic nature of python makes this a next to impossible task.
Functions can import too, for example. Are you going to run all code in all modules?
And then there are backward-compatibility tests; import pysqlite2 if sqlite3 is not present, use a backport module if collections.Counter isn't present in the current version of Python, etc. There are platform-specific modules (os.path is posixpath, ntpath (same code but renamed) or riscospath depending on the current platform), and whole-sale imports into the main module (posix, nt, os2, ce and riscos all can be used by the os module depending on the platform to supply functions).
Packages that use setuptools declare their dependencies and are discoverable through the pkg_resources library. That's the limit of what you can reasonably discover.