When writing scripts for personal use, I am used to doing this:
def do_something():
# Do something.
if __name__ == '__main__':
do_something()
Or, we can also do this:
def do_something():
# Do something.
do_something() # No if __name__ thingy.
I know the first form is useful when differentiating between importing the script as a module or calling it directly, but otherwise for scripts that will only be executed (and never imported), is there any reason to prefer one over the other?
Even if the script is only meant to be executed, it might sometimes be useful to import it anyway -- in an interactive shell, by documentation generation tools, in unit tests or to perform timings. So routinely using the more general form will never hurt.
The first form is just good practice. One of the immutable laws of writing computer programs is that someone in the distant future (like right after you get assigned to another project or quit or get bored with maintaining the code) will want to use your "always standalone" script, or parts of it, for some other purpose.
If we assume that you are absolutely a rock-star programmer and that every character of your source files is saturated with genius (it is, right?), it makes sense that someone else will get an eyefull of your artwork and be simply knocked to the floor by your brilliance and will want to use it.
Now there's either the choice of making them cut-n-paste your code into a file with their name on the top, which is very un-DRY and makes the contribution of your extraordinary mind get credited to someone else, or you can just add that leetle bit of code and let them import your module and directly use the class or function that made them realize how very, very little they really knew about programming before they encountered it.
Your choice!
Related
An executable script usually looks like this:
import modules
define some CONSTANTS, Classes, functions
if __name__ == "__main__":
really_do_something()
Recently I saw a script using the negated form of the commonn idiom:
if __name__ != "__main__":
print('The executable must not be imported.')
sys.exit(1)
I find it not Pythonic. Why should anybody want to prevent consenting adults to import the file? Are there valid reasons?
I could not find any reasons, except that it is simpler to write this != guard on the top of the script compared to the standard == guard near the bottom of the script.
Even if the answer looks obvious, given the complexity of Python import system I decided to ask just to be sure.
Consider the possibility that the script executes immediate actions. That is, there are commands at "module scope" in the script file:
#!/usr/bin/env python3
with open('/etc/passwd') as pwd:
...
In this case, importing the file will cause those commands to be run. And while it may provide some subroutines or class definitions, it might not.
So putting in a warning saying "you imported this, but you shouldn't, because it won't do what you want" is a friendly thing. It really says "this file isn't set up to be imported. If you want this functionality, call system"
It depends. If the script is just a quick-and-dirty script to do some simply stuff, then I don't see any point in importing it. Just copy&paste what you need into your code and you are done (obviously I'm assuming a compatible open source license).
If the script is part of some library/software, then the best practice would be that each script should be of the form:
import argparse
from somewhere import main
args = <parse-arguments>
main(args)
In other words: it does not contain anything, it just imports stuff, parses the command line arguments and calls the main function. In this case it does not make sense to import this script, because it's empty. You can just perform the imports yourself, removing the argument parsing stuff.
This is where such a guard might be useful. Since it does not make any logical senso to import the script, maybe you can tell the user that instead of importing the script they should simply import the main function.
If, however, the script is complex, imports stuff, defines classes, functions, glues pieces together etc, then yes it does make sense to import it and in this case using a guard similar to the one you provided as an example limits the usefulness of the script.
However note that the script might contain some logic executed when defining the module, in this case importing it might be useless (however the script should be refactored to place such logic in a functin that can be safely imported instead).
I use IPython Notebooks extensively in my research. I find them to be a wonderful tool.
However, on more than one occasion, I have been bitten by subtle bugs stemming from variable scope. For example, I will be doing some exploratory analysis:
foo = 1
bar = 2
foo + bar
And I decide that foo + bar is a useful algorithm for my purposes, so I encapsulate it in a function to make it easier to apply to a wider range of inputs:
def the_function(foo, bar):
return foo + bar
Inevitably, somewhere down the line, after building a workflow from the ground up, I will have a typo somewhere (e.g. def the_function(fooo, bar):) that causes a global variable to be used (and/or modified) in a function call. This causes unseen side effects and leads to spurious results. But because it typically returns a result, it can be difficult to find where the problem actually occurs.
Now, I recognize that this behavior is a feature, which I deliberately use often (for convenience, or for necessity i.e. function closures or decorators). But as I keep running into bugs, I'm thinking I need a better strategy for avoiding such problems (current strategy = "be careful").
For example, one strategy might be to always prepend '_' to local variable names. But I'm curious if there are not other strategies - even "pythonic" strategies, or community encouraged strategies.
I know that python 2.x differs in some regards to python 3.x in scoping - I use python 3.x.
Also, strategies should consider the interactive nature of scientific computing, as would be used in an IPython Notebook venue.
Thoughts?
EDIT: To be more specific, I am looking for IPython Notebook strategies.
I was tempted to flag this question as too broad, but perhaps the following will help you.
When you decide to wrap some useful code in a function, write some tests. If you think the code is useful, you must have used it with some examples. Write the test first lest you 'forget'.
My personal policy for a library module is to run the test in an if __name__
== '__main__': statement, whether the test code is in the same file or a different file. I also execute the file to run the tests multiple times during a programming session, after every small unit of change (trivial in Idle or similar IDE).
Use a code checker program, which will catch some typo-based errors. "'fooo' set but never used".
Keep track of the particular kinds of errors you make, analyze them and think about personal countermeasures, or at least learn to recognize the symptoms.
Looking at your example, when you do write a function, don't use the same names for both global objects and parameters. In your example, delete or change the global 'foo' and 'bar' or use something else for parameter names.
I would suggest that you separate your concerns. For your exploratory analysis, write your code in the iPython notebook, but when you've decided that there are some functions that are useful, instead, open up an editor and put your functions into a python file which you can then import.
You can use iPython magics to auto reload things you've imported. So once you've tested them in iPython, you can simply copy them to your module. This way, the scope of your functions is isolated from your notebook. An additional advantage is that when you're ready to run things in a headless environment, you already have your entire codebase in one place.
In the end, I made my own solution to the problem. It builds on both answers given so far.
You can find my solution, which is a cell magic extension, on github: https://github.com/brazilbean/modulemagic
In brief, this extension gives you the ability to create %%module cells in the notebook. These cells are saved as a file and imported back into your session. It effectively accomplishes what #shadanan had suggested, but allows you to keep all your work in the same place (convenient, and in line with the Notebook philosophy of providing code and results in the same place).
Because the import process sandboxes the code, it solves all of the scope shadowing errors that motivated my original question. It also involves little to no overhead to use - no renaming of variables, having other editors open, etc.
I have around 80 lines of a function in a file. I need the same functionality in another file so I am currently importing the other file for the function.
My question is that in terms of running time on a machine which technique would be better :- importing the complete file and running the function or copying the function as it is and run it from same package.
I know it won't matter in a large sense but I want to learn it in the sense that if we are making a large project is it better to import a complete file in Python or just add the function in the current namespace.....
Importing is how you're supposed to do it. That's why it's possible. Performance is a complicated question, but in general it really doesn't matter. People who really, really need performance, and can't be satisfied by just fixing the basic algorithm, are not using Python in the first place. :) (At least not for the tiny part of the project where the performance really matters. ;) )
Importing is good cause it helps you manage stuff easily. What if you needed the same function again? Instead of making changes at multiple places, there is just one centralized location - your module.
In case the function is small and you won't need it anywhere else, put it in the file itself.
If it is complex and would require to be used again, separate it and put it inside a module.
Performance should not be your concern here. It should hardly matter. And even if it does, ask yourself - does it matter to you?
Copy/Paste cannot be better. Importing affects load-time performance, not run-time (if you import it at the top-level).
The whole point of importing is to allow code reuse and organization.
Remember too that you can do either
import MyModule
to get the whole file or
from MyModule import MyFunction
for when you only need to reference that one part of the module.
If the two modules are unrelated except for that common function, you may wish to consider extracting that function (and maybe other things that are related to that function) into a third module.
My background is C and C++. I like Python a lot, but there's one aspect of it (and other interpreted languages I guess) that is really hard to work with when you're used to compiled languages.
When I've written something in Python and come to the point where I can run it, there's still no guarantee that no language-specific errors remain. For me that means that I can't rely solely on my runtime defense (rigorous testing of input, asserts etc.) to avoid crashes, because in 6 months when some otherwise nice code finally gets run, it might crack due to some stupid typo.
Clearly a system should be tested enough to make sure all code has been run, but most of the time I use Python for in-house scripts and small tools, which ofcourse never gets the QA attention they need. Also, some code is so simple that (if your background is C/C++) you know it will work fine as long as it compiles (e.g. getter-methods inside classes, usually a simple return of a member variable).
So, my question is the obvious - is there any way (with a special tool or something) I can make sure all the code in my Python script will "compile" and run?
Look at PyChecker and PyLint.
Here's example output from pylint, resulting from the trivial program:
print a
As you can see, it detects the undefined variable, which py_compile won't (deliberately).
in foo.py:
************* Module foo
C: 1: Black listed name "foo"
C: 1: Missing docstring
E: 1: Undefined variable 'a'
...
|error |1 |1 |= |
Trivial example of why tests aren't good enough, even if they cover "every line":
bar = "Foo"
foo = "Bar"
def baz(X):
return bar if X else fo0
print baz(input("True or False: "))
EDIT: PyChecker handles the ternary for me:
Processing ternary...
True or False: True
Foo
Warnings...
ternary.py:6: No global (fo0) found
ternary.py:8: Using input() is a security problem, consider using raw_input()
Others have mentioned tools like PyLint which are pretty good, but the long and the short of it is that it's simply not possible to do 100%. In fact, you might not even want to do it. Part of the benefit to Python's dynamicity is that you can do crazy things like insert names into the local scope through a dictionary access.
What it comes down to is that if you want a way to catch type errors at compile time, you shouldn't use Python. A language choice always involves a set of trade-offs. If you choose Python over C, just be aware that you're trading a strong type system for faster development, better string manipulation, etc.
I think what you are looking for is code test line coverage. You want to add tests to your script that will make sure all of your lines of code, or as many as you have time to, get tested. Testing is a great deal of work, but if you want the kind of assurance you are asking for, there is no free lunch, sorry :( .
If you are using Eclipse with Pydev as an IDE, it can flag many typos for you with red squigglies immediately, and has Pylint integration too. For example:
foo = 5
print food
will be flagged as "Undefined variable: food". Of course this is not always accurate (perhaps food was defined earlier using setattr or other exotic techniques), but it works well most of the time.
In general, you can only statically analyze your code to the extent that your code is actually static; the more dynamic your code is, the more you really do need automated testing.
Your code actually gets compiled when you run it, the Python runtime will complain if there is a syntax error in the code. Compared to statically compiled languages like C/C++ or Java, it does not check whether variable names and types are correct – for that you need to actually run the code (e.g. with automated tests).
This questions is semi-based of this one here:
How can you profile a python script?
I thought that this would be a great idea to run on some of my programs. Although profiling from a batch file as explained in the aforementioned answer is possible, I think it would be even better to have this option in Eclipse. At the same time, making my entire program a function and profiling it would mean I have to alter the source code?
How can I configure eclipse such that I have the ability to run the profile command on my existing programs?
Any tips or suggestions are welcomed!
if you follow the common python idiom to make all your code, even the "existing programs", importable as modules, you could do exactly what you describe, without any additional hassle.
here is the specific idiom I am talking about, which turns your program's flow "upside-down" since the __name__ == '__main__' will be placed at the bottom of the file, once all your defs are done:
# program.py file
def foo():
""" analogous to a main(). do something here """
pass
# ... fill in rest of function def's here ...
# here is where the code execution and control flow will
# actually originate for your code, when program.py is
# invoked as a program. a very common Pythonism...
if __name__ == '__main__':
foo()
In my experience, it is quite easy to retrofit any existing scripts you have to follow this form, probably a couple minutes at most.
Since there are other benefits to having you program also a module, you'll find most python scripts out there actually do it this way. One benefit of doing it this way: anything python you write is potentially useable in module form, including cProfile-ing of your foo().
You can always make separate modules that do just profiling specific stuff in your other modules. You can organize modules like these in a separate package. That way you don't change your existing code.