What reasons (if any) exist for prohibiting script import? - python

An executable script usually looks like this:
import modules
define some CONSTANTS, Classes, functions
if __name__ == "__main__":
really_do_something()
Recently I saw a script using the negated form of the commonn idiom:
if __name__ != "__main__":
print('The executable must not be imported.')
sys.exit(1)
I find it not Pythonic. Why should anybody want to prevent consenting adults to import the file? Are there valid reasons?
I could not find any reasons, except that it is simpler to write this != guard on the top of the script compared to the standard == guard near the bottom of the script.
Even if the answer looks obvious, given the complexity of Python import system I decided to ask just to be sure.

Consider the possibility that the script executes immediate actions. That is, there are commands at "module scope" in the script file:
#!/usr/bin/env python3
with open('/etc/passwd') as pwd:
...
In this case, importing the file will cause those commands to be run. And while it may provide some subroutines or class definitions, it might not.
So putting in a warning saying "you imported this, but you shouldn't, because it won't do what you want" is a friendly thing. It really says "this file isn't set up to be imported. If you want this functionality, call system"

It depends. If the script is just a quick-and-dirty script to do some simply stuff, then I don't see any point in importing it. Just copy&paste what you need into your code and you are done (obviously I'm assuming a compatible open source license).
If the script is part of some library/software, then the best practice would be that each script should be of the form:
import argparse
from somewhere import main
args = <parse-arguments>
main(args)
In other words: it does not contain anything, it just imports stuff, parses the command line arguments and calls the main function. In this case it does not make sense to import this script, because it's empty. You can just perform the imports yourself, removing the argument parsing stuff.
This is where such a guard might be useful. Since it does not make any logical senso to import the script, maybe you can tell the user that instead of importing the script they should simply import the main function.
If, however, the script is complex, imports stuff, defines classes, functions, glues pieces together etc, then yes it does make sense to import it and in this case using a guard similar to the one you provided as an example limits the usefulness of the script.
However note that the script might contain some logic executed when defining the module, in this case importing it might be useless (however the script should be refactored to place such logic in a functin that can be safely imported instead).

Related

Proper way to use nested modules

I've split a program into three scripts. One of them, 'classes.py', is a module defining all the classes I need. Another one is a sort of setup module, call it 'setup.py', which instantiates a lot of objects from 'classes.py' (it's just a bunch of variable assignments with a few for loops, no functions or classes). It has a lot of strings and stuff I don't want to see when I'm working on the third script which is the program itself, i.e. the script that actually does something with all of the above.
The only way I got this to work was to add, in the 'setup.py' script:
from classes import *
This allows me to write quickly in the setup file without having the namespace added everywhere. And, in the main script:
import setup
This has the advantages of PyCharm giving me full code completion for classes and methods, which is nice.
What I'd like to achieve is having the main script import the classes, and then run the setup script to create the objects I need, with two simple commands. But I can't import the classes script into the main script because then the setup script can't do anything, having no class definitions. Should I import the classes into both scripts, or do something else entirely?
Import in each file. Consider this SO post. From the answer by Mr Fooz there,
Each module has its own namespace. So for boo.py to see something from an external module, boo.py must import it itself.
It is possible to write a language where namespaces are stacked the way you expect them to: this is called dynamic scoping. Some languages like the original lisp, early versions of perl, postscript, etc. do use (or support) dynamic scoping.
Most languages use lexical scoping instead. It turns out this is a much nicer way for languages to work: this way a module can reason about how it will work based on its own code without having to worry about how it was called.
See this article for additional details: http://en.wikipedia.org/wiki/Scope_%28programming%29
Intuitively this feels nicer too, as you can immediately (in the file itself) see which dependencies the code has - this will allow you to understand your code much better, a month, or even a year from now.

Circular imports hell

Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.
There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.
Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.
What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.
Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.

Two variations of Python's main function

When writing scripts for personal use, I am used to doing this:
def do_something():
# Do something.
if __name__ == '__main__':
do_something()
Or, we can also do this:
def do_something():
# Do something.
do_something() # No if __name__ thingy.
I know the first form is useful when differentiating between importing the script as a module or calling it directly, but otherwise for scripts that will only be executed (and never imported), is there any reason to prefer one over the other?
Even if the script is only meant to be executed, it might sometimes be useful to import it anyway -- in an interactive shell, by documentation generation tools, in unit tests or to perform timings. So routinely using the more general form will never hurt.
The first form is just good practice. One of the immutable laws of writing computer programs is that someone in the distant future (like right after you get assigned to another project or quit or get bored with maintaining the code) will want to use your "always standalone" script, or parts of it, for some other purpose.
If we assume that you are absolutely a rock-star programmer and that every character of your source files is saturated with genius (it is, right?), it makes sense that someone else will get an eyefull of your artwork and be simply knocked to the floor by your brilliance and will want to use it.
Now there's either the choice of making them cut-n-paste your code into a file with their name on the top, which is very un-DRY and makes the contribution of your extraordinary mind get credited to someone else, or you can just add that leetle bit of code and let them import your module and directly use the class or function that made them realize how very, very little they really knew about programming before they encountered it.
Your choice!

Does python import all the listed libraries?

I'm just wondering, I often have really long python files and imports tend to stack quite quickly.
PEP8 says that the imports should always be written at the beginning of the file.
Do all the imported libraries get imported when calling a function coded in the file? Or do only the necessary libraries get called?
Does it make sense to worry about this? Is there no reason to import libraries within the functions or classes that need them?
Every time Python hits an import statement, it checks to see if that module has already been imported, and if not, imports it. So the imports at the top of your file will happen as soon as your file is run or imported by another module.
There is some overhead to this, so it's generally best to keep imports at the top of your file so that cost gets taken care of up front.
The best place for imports is at the top of your file. That documents the dependencies in one place and makes errors from their absence appear earlier. The import itself actually occurs at the time of the import statement, but this seldom matters much.
It is not typical that you have anything to gain by not importing a library until you are in a function or method that needs it. (There is never anything to gain by doing so inside the body of a class.) It is rare that you want optional dependencies and even rarer that this is the right technique to get them, though. Perhaps you can share a compelling use case?
Does it make sense to worry about
this?
No
There no reason to import libraries within the functions or classes that need them.
It's just slow because the import statement has to check to see if it's been imported once, and realize that it has been imported.
If you put this in a function that's called frequently, you can waste some time with all the import checking.
Imports happen when the module that contains the imports gets executed or imported, not when the functions are called.
Ordinarily, I wouldn't worry about it. If you are encountering slowdowns, you might profile to see if your problem is related to this. If it is, you can check to see if your module can divided up into smaller modules.
But if all the files are getting used by the same program, you'll just end up importing everything anyway.
If a function inside a module is the only one to import a given other module (say you have a function sending tweets, only if some configuration option is on), then it makes sense to import that specific module in the function.
Unless I see some profiling data proving otherwise, my guess is that the overhead of an import statement in a function is completely negligible.

Python Profiling in Eclipse

This questions is semi-based of this one here:
How can you profile a python script?
I thought that this would be a great idea to run on some of my programs. Although profiling from a batch file as explained in the aforementioned answer is possible, I think it would be even better to have this option in Eclipse. At the same time, making my entire program a function and profiling it would mean I have to alter the source code?
How can I configure eclipse such that I have the ability to run the profile command on my existing programs?
Any tips or suggestions are welcomed!
if you follow the common python idiom to make all your code, even the "existing programs", importable as modules, you could do exactly what you describe, without any additional hassle.
here is the specific idiom I am talking about, which turns your program's flow "upside-down" since the __name__ == '__main__' will be placed at the bottom of the file, once all your defs are done:
# program.py file
def foo():
""" analogous to a main(). do something here """
pass
# ... fill in rest of function def's here ...
# here is where the code execution and control flow will
# actually originate for your code, when program.py is
# invoked as a program. a very common Pythonism...
if __name__ == '__main__':
foo()
In my experience, it is quite easy to retrofit any existing scripts you have to follow this form, probably a couple minutes at most.
Since there are other benefits to having you program also a module, you'll find most python scripts out there actually do it this way. One benefit of doing it this way: anything python you write is potentially useable in module form, including cProfile-ing of your foo().
You can always make separate modules that do just profiling specific stuff in your other modules. You can organize modules like these in a separate package. That way you don't change your existing code.

Categories

Resources