"Boilerplate" code in Python? - python

Google has a Python tutorial, and they describe boilerplate code as "unfortunate" and provide this example:
#!/usr/bin/python
# import modules used here -- sys is a very standard one
import sys
# Gather our code in a main() function
def main():
print 'Hello there', sys.argv[1]
# Command line args are in sys.argv[1], sys.argv[2] ..
# sys.argv[0] is the script name itself and can be ignored
# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == '__main__':
main()
Now, I've heard boilerplate code being described as "seemingly repetitive code that shows up again and again in order to get some result that seems like it ought to be much simpler" (example).
Anyways, in Python, the part considered "boilerplate" code of the example above was:
if __name__ == '__main__':
main()
Now, my questions are as follows:
1) Does boilerplate code in Python (like the example provided) take on the same definition as the definition I provided? If so, why?
2) Is this code even necessary? It seems to me like the code runs whether or not there's a main method. What makes using this code better? Is it even better?
3) Why do we use that code and what service does it provide?
4) Does this occur throughout Python? Are there other examples of "boilerplate code"?
Oh, and just an off topic question: do you need to import sys to use command line arguments in Python? How does it handle such arguments if its not there?

It is repetitive in the sense that it's repeated for each script that you might execute from the command line.
If you put your main code in a function like this, you can import the module without executing it. This is sometimes useful. It also keeps things organized a bit more.
Same as #2 as far as I can tell
Python is generally pretty good at avoiding boilerplate. It's flexible enough that in most situations you can write code to produce the boilerplate rather then writing boilerplate code.
Off topic question:
If you don't write code to check the arguments, they are ignored.

The reason that the if __name__ == "__main__": block is called boilerplate in this case is that it replicates a functionality that is automatic in many other languages. In Java or C++, among many others, when you run your code it will look for a main() method and run it, and even complain if it's not there. Python runs whatever code is in your file, so you need to tell it to run the main() method; a simple alternative would be to make running the main() method the default functionality.
So, if __name__ == "__main__": is a common pattern that could be shorter. There's no reason you couldn't do something different, like:
if __name__ == "__main__":
print "Hello, Stack Overflow!"
for i in range(3):
print i
exit(0)
This will work just fine; although my example is a little silly, you can see that you can put whatever you like there. The Python designers chose this behavior over automatically running the main() method (which may well not exist), presumably because Python is a "scripting" language; so you can write some commands directly into a file, run it, and your commands execute. I personally prefer it the Python way because it makes starting up in Python easier for beginners, and it's always nice to have a language where Hello World is one line.

The reason you use an "if main" check is so you can have a module that runs some part of its code at toplevel (to create the things – constants, functions, or classes – it exports), and some part only when executed as a script (e.g. unit tests for its functionality).
The reason the latter code should be wrapped in a function is because local variables of the main() block would leak into the module's scope.
Now, an alternate design could be that a file executed as a script would have to declare a function named, say, __main__(), but that would mean adding a new magic function name to the language, while the __name__ mechanism is already there. (And couldn't be removed, because every module has to have a __name__, and a module executed as a script has to have a "special" name because of how module names are assigned.) Introducing two mechanisms to do the same thing just to get rid of two lines of boilerplate – and usually two lines of boilerplate per application – just doesn't seem worth it.

You don't need to add a if __name__ == '__main__' for one off scripts that aren't intended to be a part of a larger project. See here for a great explanation. You only need it if you want to run the file by itself AND include it as a module along with other python files.
If you just want to run one file, you can have zero boilerplate:
print 1
and run it with $ python your_file.py
Adding the shebang line #!/usr/bin/python and running chmod +x print_one.py gets you the ability to run with
./print_one.py
Finally, # coding: utf-8 allows you to add unicode to your file if you want to put ❤'s all over the place.

1) main boilerplate is common, but cannot be any simpler
2) main() is not called without the boilerplate
3) the boilerplate allows module usage both as a standalone script, and as a library in other programs
4) it’s very common. doctest is another one.
Train to become a Python guru…and good luck with the thesis! ;-)

Let’s take a moment to see what happened when you called import sys:
Python looks at a list and brings in the sys module
It finds the argv function and runs it
So, what’s happening here?
A function written elsewhere is being used to perform certain operations within the scope of the current program. Programming in this fashion has a lots of benefits. It separates the logic from actual labour.
Now, as far as the boilerplate is concerned, there are two parts:
the program itself (the logic), defined under main, and
the call part that checks if main exists
You essentially write your program under main, using all the functions you defined just before defining main (or elsewhere), and let Python look for main.

I am equally confused by what the tutorial means by "boilerplate code": does it mean that this code can be avoided in a simple script? Or it is a criticism towards Python features that force the use of this syntax? Or even an invitation to use this "boilerplate" code?
I don't know, however, after many years of Python programming, I have at least clear what the different syntaxes do, even if I am probably still not sure on what is the best way of doing it.
Often you want to put at the end of the script code for tests or code that want to execute, but this has some implications/side-effects:
the code gets executed even when the script is imported, that it is rarely what is wanted.
variables and values in the code are defined and exported in the calling namespace
the code at the end of the script can be executed by calling the script (python script.py) or by running from ipython shell (%run script.py), but there is no way to run it from other scripts.
The most basic mechanism to avoid to execute following code in all conditions, is the syntax:
if __name__ == '__main__':
which makes the code run only if the script is called or run, avoiding problem 1. The other two points still hold.
The "boilerplate" code with a separate main() function, adds a further step, excluding also above points 2 and 3, so for example you can call a number of tests from different scripts, that sometimes can take another level (e.g.: a number of functions, one for each test, so they can be individually be called from outside, and a main that calls all test functions, without needs to know from outside which one they are).
I add that the main reason I find this structures often unsatisfying, apart from its complexity, is that sometimes I would like to maintain point 2 and I lose this possibility if the code is moved to a separate function.

Related

How to import script that requires __name__ == "__main__"

I'm pretty new to Python, this question probably shows that. I'm working on multiprocessing part of my script, couldn't find a definitive answer to my problem.
I'm struggling with one thing. When using multiprocessing, part of the code has to be guarded with if __name__ == "__main__". I get that, my pool is working great. But I would love to import that whole script (making it a one big function that returns an argument would be the best). And here is the problem. First, how can I import something if part of it will only run when launched from the main/source file because of that guard? Secondly, if I manage to work it out and the whole script will be in one big function, pickle can't handle that, will use of "multiprocessing on dill" or "pathos" fix it?
Thanks!
You are probably confused with the concept. The if __name__ == "__main__" guard in Python exists exactly in order for it to be possible for all Python files to be importable.
Without the guard, a file, once imported, would have the same behavior as if it were the "root" program - and it would require a lot of boyler plate and inter-process comunication (like writting a "PID" file at a fixed filesystem location) to coordinate imports of the same code, including for multiprocessing.
Just leave under the guard whatever code needs to run for the root process. Everything else you move into functions that you can call from the importing code.
If you'd run "all" the script, even the part setting up the multiprocessing workers would run, and any simple job would create more workers exponentially until all machine resources were taken (i.e.: it would crash hard and fast, potentially taking the machine to an unresponsive state).
So, this is a good pattern - th "dothejob" function can call all
other functions you need, so you just need to import and call it,
either from a master process, or from any other project importing
your file as a Python module.
import multiprocessing
...
def dothejob():
...
def start():
# code to setup and start multiprocessing workers:
# like:
worker1 = multiprocessing.Process(target=dothejob)
...
worker1.start()
...
worker1.join()
if __name__ == "__main__":
start()

Python Setuptools: quick way to add scripts without "main" function as "console_scripts" entry points

My request seems unorthodox, but I would like to quickly package an old repository, consisting mostly of python executable scripts.
The problem is that those scripts were not designed as modules, so some of them execute code directly at the module top-level, and some other have the if __name__=='__main__' part.
How would you distribute those scripts using setuptools, without too much rewrite?
I know I could just put them under the scripts option of setup(), but it's not advised, and also it doesn't allow me to rename them.
I would like to skip defining a main() function in all those scripts, also because some scripts call weird recursive functions with side effects on global variables, so I'm a bit afraid of breaking stuff.
When I try providing only the module name as console_scripts (e.g "myscript=mypkg.myscript" instead of "myscript=mypkg.myscript:main"), it logically complains after installation that a module is not callable.
Is there a way to create scripts from modules? At least when they have a if __name__=='__main__'?
I just realised part of the answer:
in the case where the module executes everything at the top-level, i.e. on import, it's therefore ok to define a dummy "no-op" main function, like so:
# Content of mypkg/myscript.py
print("myscript being executed!")
def main():
pass # Do nothing!
This solution will still force me to add this line to the existing scripts, but I think it's a quick but cautious solution.
No solution if the code is under a if __name__=='__main__' though...
You can use the following codes.
def main():
pass # or do something
if __name__ == "__main__":
main()

refactoring code to keep large objects/models in memory in iPython to be reused in python scripts

My script depends on loading lots of variables in a minute and uses them globally in many functions. Every time I call that script in iPython, it loads them again, taking time.
I tried to take these calls to load and populate functions out of that script, but then these global variables are not available to the functions in the script.
It gives NameError: name 'clf' is not defined error message.
Is there a best way to refactor this code to keep these globals in memory and make the script use them? The script loads many variables like these, and uses them in other functions as globals.
vectorizer_title, vectorizer_desc, clf,
df_instance, vocab, all_tokens, df_dist_all,
df_soc2class_proba, dict_p2s,
dict_f2m, token_pattern, cleanup_pattern,
excluded_words = load_data_and_model(lang)
dict_token2idx_all, dict_token2idx_instance,
dist_array, token_dist_to_instance_min,
dict_bigram_by_instance, denominate,
similar_threshold = populate_data(1)
I had asked this question after trying
from depended_library import *
it had not worked in iPython.
But used with python and used in a Flask Web API it works.
Importing library using the "from" statement executes also the codes out of functions in the depended_library in addition to defining functions.
(If someone explains the problem with iPython and suggest a solution, I shall select it as answer.)

Conventions of Importing Python Main Programs

Often I write command line utilities that are only meant to be run as main. For example, I might have a file that looks like this:
#!/usr/bin/env python
if __name__ == '__main__':
import sys
# do stuff
In other words, there is nothing going on that isn't under the if statement checking that this file is being run as main. I tried importing a file like this to see what would happen, and the import was successful.
So as I expected, one is allowed to import files like this, but what is the convention surrounding this practice? Is one supposed to throw an error telling the user that there is nothing to be imported? Or if all the contents of the file are supposed to be run as main, does one need to check if the program is being run as main? Or is the conditional not necessary?
Also, if I have import statements, should they be at the top of the file, or under the conditional? If the modules are only being used under the conditional, it would seem to me that they should be imported under the conditional and not at the top of the file.
If you are writing simple utilities that you are entirely certain that you will never import as a module in another program, then you really do not need to include the if __name__ == '__main__' stuff. The fundamental point of that construct is to allow a module to be developed that can both be imported as a module for use, and run as a stand-alone program for some other purpose. For example, if you had a module and had some test vectors you wanted to run on it regularly, you would put the trigger mechanism for your test vectors in the if __name__ block.
Another example might be if you have a stand-alone program that you develop, that would also provide useful functions for others. If you have a look at the pip module, this is an excellent example of this technique.

Embedded Python - Blocking operations in time module

I'm developing my own Python code interpreter using the Python C API, as described in the Python documentation. I've taken a look on the Python source code and I tried to follow the same steps that are carried out in the standard interpreter when executing a py file. These steps (sequence of C API function calls) are basically:
PyRun_AnyFileExFlags()
PyRun_SimpleFileExFlags()
PyRun_FileExFlags()
PyArena_New()
PyParser_ASTFromFile()
run_mod()
PyAST_Compile()
PyEval_EvalCode()
PyEval_EvalCodeEx()
PyThreadState_GET()
PyFrame_New()
PyEval_EvalFrameEx()
The only difference in my code is that I do manually the AST compilation, frame creation, etc. and then I call PyEval_EvalFrame.
With this, I am able to execute an arbitrary .py file with my program, as if it were the normal Python interpreter. My problem comes when the code that my program is executing makes use of the time module: all time module operations get blocked in the GIL! For example, if the Python code calls time.sleep(1), this call is blocked and never gets executed.
Obviously I am doing something wrong that blocks the GIL (and therefore blocks the time module) but I dont know how to correct it. The last statement in my code where I have control is in PyEval_EvalFrameEx, and from that point on, everything runs "as in regular Python interpreter", I think.
Anybody had a similar problem? What am I doing wrong, so that I block the time module?
Hope somebody can help me...
Thanks for your time. Best regards,
R.
You need to provide more detail.
How does your interpreter's behavior differ from the standard interpreter?
If you just want to run arbitrary source files, why are you not calling one of the higher level interfaces, like PyRun_SimpleFile? Did your code call Py_Initialize?

Categories

Resources