Programming in C I used to have code sections only used for debugging purposes (logging commands and the like). Those statements could be completely disabled for production by using #ifdef pre-processor directives, like this:
#ifdef MACRO
controlled text
#endif /* MACRO */
What is the best way to do something similar in python?
If you just want to disable logging methods, use the logging module. If the log level is set to exclude, say, debug statements, then logging.debug will be very close to a no-op (it just checks the log level and returns without interpolating the log string).
If you want to actually remove chunks of code at bytecode compile time conditional on a particular variable, your only option is the rather enigmatic __debug__ global variable. This variable is set to True unless the -O flag is passed to Python (or PYTHONOPTIMIZE is set to something nonempty in the environment).
If __debug__ is used in an if statement, the if statement is actually compiled into only the True branch. This particular optimization is as close to a preprocessor macro as Python ever gets.
Note that, unlike macros, your code must still be syntactically correct in both branches of the if.
To show how __debug__ works, consider these two functions:
def f():
if __debug__: return 3
else: return 4
def g():
if True: return 3
else: return 4
Now check them out with dis:
>>> dis.dis(f)
2 0 LOAD_CONST 1 (3)
3 RETURN_VALUE
>>> dis.dis(g)
2 0 LOAD_GLOBAL 0 (True)
3 JUMP_IF_FALSE 5 (to 11)
6 POP_TOP
7 LOAD_CONST 1 (3)
10 RETURN_VALUE
>> 11 POP_TOP
3 12 LOAD_CONST 2 (4)
15 RETURN_VALUE
16 LOAD_CONST 0 (None)
19 RETURN_VALUE
As you can see, only f is "optimized".
It is important to understand that in Python def and class are two regular executable statements...
import os
if os.name == "posix":
def foo(x):
return x * x
else:
def foo(x):
return x + 42
...
so to do what you do with preprocessor in C and C++ you can use the the regular Python language.
Python language is fundamentally different from C and C++ on this point because there exist no concept of "compile time" and the only two phases are "parse time" (when the source code is read in) and "run time" when the parsed code (normally mostly composed of definition statements but that is indeed arbitrary Python code) is executed.
I am using the term "parse time" even if technically when the source code is read in the transformation is a full compilation to bytecode because the semantic of C and C++ compilation is different and for example the definition of a function happens during that phase (while instead it happens at runtime in Python).
Even the equivalent of #include of C and C++ (that in Python is import) is a regular statement that is executed at run time and not at compile (parse) time so it can be placed inside a regular python if. Quite common is for example having an import inside a try block that will provide alternate definitions for some functions if a specific optional Python library is not present on the system.
Finally note that in Python you can even create new functions and classes at runtime from scratch by the use of exec, not necessarily having them in your source code. You can also assemble those objects directly using code because classes and functions are indeed just regular objects (this is normally done only for classes, however).
There are some tools that instead try to consider def and class definitions and import statements as "static", for example to do a static analysis of Python code to generate warnings on suspicious fragments or to create a self-contained deployable package that doesn't depend on having a specific Python installation on the system to run the program. All of them however need to be able to consider that Python is more dynamic than C or C++ in this area and they also allow adding exceptions for where the automatic analysis will fail.
Here is an example that I use to distinguish between Python 2 & 3 for my Python Tk programs:
import sys
if sys.version_info[0] == 3:
from tkinter import *
from tkinter import ttk
else:
from Tkinter import *
import ttk
""" rest of your code """
Hope that is a useful illustration.
As far as I am aware, you have to use actual if statements. There is no preprocessor, so there is no analogue to preprocessor directives.
Edit: Actually, it looks like the top answer to this question will be more illuminating: How would you do the equivalent of preprocessor directives in Python?
Supposedly there is a special variable __debug__ which, when used with an if statement, will be evaluated once and then not evaluated again during execution.
There is no direct equivalent that I'm aware of, so you might want to zoom-out and reconsider the problems that you used to solve using the preprocessor.
If it's just diagnostic logging you're after then there is a comprehensive logging module which should cover everything you wanted and more.
http://docs.python.org/library/logging.html
What else do you use the preprocessor for? Test configurations? There's a config module for that.
http://docs.python.org/library/configparser.html
Anything else?
If you are using #ifdef to check for variables that may have been defined in the scope above the current file, you can use exceptions. For example, I have scripts that I want to run differently from within ipython vs outside ipython (show plots vs save plots, for example). So I add
ipy = False
try:
ipy = __IPYTHON__
except NameError:
pass
This leaves me with a variable ipy, which tells me whether or not __IPYTHON__ was declared in a scope above my current script. This is the closest parallel I know of for an #ifdef function in Python.
For ipython, this is a great solution. You could use similar constructions in other contexts, in which a calling script sets variable values and the inner scripts check accordingly. Whether or not this makes sense, of course, would depend on your specific use case.
If you're working on Spyder, you probably only need this:
try:
print(x)
except:
#code to run under ifndef
x = "x is defined now!"
#other code
The first time, you run your script, you'll run the code under #code to run under ifndef, the second, you'll skip it.
Hope it works:)
This can be achieved by passing command line argument as below:
import sys
my_macro = 0
if(len(sys.argv) > 1):
for x in sys.argv:
if(x == "MACRO"):
my_macro = 1
if (my_macro == 1):
controlled text
Try running the following script and observe the results after this:
python myscript.py MACRO
Hope this helps.
Related
I was doing some troubleshooting and I was curious if it is possible to run a Python script interactively, change a function defined in the script, save the file, then have the interactive shell recognize the changes. Here is an example of what I am doing currently:
my_script.py:
def dummy_func():
print('Something')
def main():
dummy_func()
if __name__ == '__main__':
main()
I go to my terminal and run:
>python -i my_script.py
Something
>>>
If I go back to my_script.py in my editor and make the following change:
def dummy_func():
print('Something else')
Then go back to the terminal (which is still open) and re-run the updated function:
>>>dummy_func()
Something
>>>
Is it possible to do something to instead get the following behavior?:
>>>dummy_func()
Something else
>>>
I know it is possible to reload modules using importlib and reload but as far as I can tell that does not apply here since I am not importing anything.
I think this may be distinct from How do I unload (reload) a Python module?. I am asking if there is a way to reload the current file you are running interactively through the python shell, while that question is asking about reloading a module you have imported into another python script.
From what I can find, the short answer is:
No, normally the Python interpreter does not recognize changes to a file once that file has been parsed, analyzed, and fed into the interpreter.
What you should do instead apparently is use your .py file as a module, import that as a module into another .py file, then run that new file. This allows your first file to be reloaded through the interactive interpreter. Here's an example:
from importlib import reload # Python 3.4+ only.
import foo
while True:
# Do some things.
if is_changed(foo):
foo = reload(foo)
I am still a little fuzzy on the details, but maybe someone can help fill those in. As far as I can tell from the sources I linked below, the interpreter basically takes some steps to load your program from the saved python file into memory (glossing over a lot of details). Once this process has been performed, the interpreter does not perform it again unless you explicitly ask it to do so, for example by using the importlib's reload() function to again perform the process.
Sources:
How do I unload (reload) a Python module? (quoted above)
A Python Interpreter Written in Python:
This link has a lot more information about how the interpreter works, and I found this section particularly helpful:
Real Python Bytecode
At this point, we'll abandon our toy instruction
sets and switch to real Python bytecode. The structure of bytecode is
similar to our toy interpreter's verbose instruction sets, except that
it uses one byte instead of a long name to identify each instruction.
To understand this structure, we'll walk through the bytecode of a
short function. Consider the example below:
>>> def cond():
... x = 3
... if x < 5:
... return 'yes'
... else:
... return 'no'
...
Python exposes a boatload of its internals at run time, and we can access them right
from the REPL. For the function object cond, cond.code is the code
object associated it, and cond.code.co_code is the bytecode.
There's almost never a good reason to use these attributes directly
when you're writing Python code, but they do allow us to get up to all
sorts of mischief—and to look at the internals in order to understand
them.
>>> cond.__code__.co_code # the bytecode as raw bytes
b'd\x01\x00}\x00\x00|\x00\x00d\x02\x00k\x00\x00r\x16\x00d\x03\x00Sd\x04\x00Sd\x00\x00S'
>>> list(cond.__code__.co_code) # the bytecode as numbers
[100, 1, 0, 125, 0, 0, 124, 0, 0, 100, 2, 0, 107, 0, 0, 114, 22, 0, 100, 3, 0, 83,
100, 4, 0, 83, 100, 0, 0, 83]
When we just print the bytecode, it
looks unintelligible—all we can tell is that it's a series of bytes.
Luckily, there's a powerful tool we can use to understand it: the dis
module in the Python standard library.
dis is a bytecode disassembler. A disassembler takes low-level code
that is written for machines, like assembly code or bytecode, and
prints it in a human-readable way. When we run dis.dis, it outputs an
explanation of the bytecode it has passed.
>>> dis.dis(cond)
2 0 LOAD_CONST 1 (3)
3 STORE_FAST 0 (x)
3 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (5)
12 COMPARE_OP 0 (<)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_CONST 3 ('yes')
21 RETURN_VALUE
6 >> 22 LOAD_CONST 4 ('no')
25 RETURN_VALUE
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
What does all this mean? Let's look at the first instruction LOAD_CONST as an example. The number in the
first column (2) shows the line number in our Python source code. The
second column is an index into the bytecode, telling us that the
LOAD_CONST instruction appears at position zero. The third column is
the instruction itself, mapped to its human-readable name. The fourth
column, when present, is the argument to that instruction. The fifth
column, when present, is a hint about what the argument means.
How does the Python Runtime actually work?:
With Python, it uses an interpreter rather than a compiler. An
interpreter works in exactly the same way as a compiler, with one
difference: instead of code generation, it loads the output in-memory
and executes it directly on your system. (The exact details of how
this happens can vary wildly between different languages and different
interpreters.)
importlib — The implementation of import:
When reload() is executed:
Python module’s code is recompiled and the module-level code
re-executed, defining a new set of objects which are bound to names in
the module’s dictionary by reusing the loader which originally loaded
the module. The init function of extension modules is not called a
second time.
Again, please let me know if I need to edit this answer to follow etiquette.
My python scripts often contain "executable code" (functions, classes, &c) in the first part of the file and "test code" (interactive experiments) at the end.
I want python, py_compile, pylint &c to completely ignore the experimental stuff at the end.
I am looking for something like #if 0 for cpp.
How can this be done?
Here are some ideas and the reasons they are bad:
sys.exit(0): works for python but not py_compile and pylint
put all experimental code under def test():: I can no longer copy/paste the code into a python REPL because it has non-trivial indent
put all experimental code between lines with """: emacs no longer indents and fontifies the code properly
comment and uncomment the code all the time: I am too lazy (yes, this is a single key press, but I have to remember to do that!)
put the test code into a separate file: I want to keep the related stuff together
PS. My IDE is Emacs and my python interpreter is pyspark.
Use ipython rather than python for your REPL It has better code completion and introspection and when you paste indented code it can automatically "de-indent" the pasted code.
Thus you can put your experimental code in a test function and then paste in parts without worrying and having to de-indent your code.
If you are pasting large blocks that can be considered individual blocks then you will need to use the %paste or %cpaste magics.
eg.
for i in range(3):
i *= 2
# with the following the blank line this is a complete block
print(i)
With a normal paste:
In [1]: for i in range(3):
...: i *= 2
...:
In [2]: print(i)
4
Using %paste
In [3]: %paste
for i in range(10):
i *= 2
print(i)
## -- End pasted text --
0
2
4
In [4]:
PySpark and IPython
It is also possible to launch PySpark in IPython, the enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To use IPython, set the IPYTHON variable to 1 when running bin/pyspark:1
$ IPYTHON=1 ./bin/pyspark
Unfortunately, there is no widely (or any) standard describing what you are talking about, so getting a bunch of python specific things to work like this will be difficult.
However, you could wrap these commands in such a way that they only read until a signifier. For example (assuming you are on a unix system):
cat $file | sed '/exit(0)/q' |sed '/exit(0)/d'
The command will read until 'exit(0)' is found. You could pipe this into your checkers, or create a temp file that your checkers read. You could create wrapper executable files on your path that may work with your editors.
Windows may be able to use a similar technique.
I might advise a different approach. Separate files might be best. You might explore iPython notebooks as a possible solution, but I'm not sure exactly what your use case is.
Follow something like option 2.
I usually put experimental code in a main method.
def main ():
*experimental code goes here *
Then if you want to execute the experimental code just call the main.
main()
With python-mode.el mark arbitrary chunks as section - for example via py-sectionize-region.
Than call py-execute-section.
Updated after comment:
python-mode.el is delivered by melpa.
M-x list-packages RET
Look for python-mode - the built-in python.el provides 'python, while python-mode.el provides 'python-mode.
Developement just moved hereto: https://gitlab.com/python-mode-devs/python-mode
I think the standard ('Pythonic') way to deal with this is to do it like so:
class MyClass(object):
...
def my_function():
...
if __name__ == '__main__':
# testing code here
Edit after your comment
I don't think what you want is possible using a plain Python interpreter. You could have a look at the IEP Python editor (website, bitbucket): it supports something like Matlab's cell mode, where a cell can be defined with a double comment character (##):
## main code
class MyClass(object):
...
def my_function():
...
## testing code
do_some_testing_please()
All code from a ##-beginning line until either the next such line or end-of-file constitutes a single cell.
Whenever the cursor is within a particular cell and you strike some hotkey (default Ctrl+Enter), the code within that cell is executed in the currently running interpreter. An additional feature of IEP is that selected code can be executed with F9; a pretty standard feature but the nice thing here is that IEP will smartly deal with whitespace, so just selecting and pasting stuff from inside a method will automatically work.
I suggest you use a proper version control system to keep the "real" and the "experimental" parts separated.
For example, using Git, you could only include the real code without the experimental parts in your commits (using add -p), and then temporarily stash the experimental parts for running your various tools.
You could also keep the experimental parts in their own branch which you then rebase on top of the non-experimental parts when you need them.
Another possibility is to put tests as doctests into the docstrings of your code, which admittedly is only practical for simpler cases.
This way, they are only treated as executable code by the doctest module, but as comments otherwise.
I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?
The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.
I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2
my python code is interlaced with lots of function calls used for (debugging|profiling|tracing etc.)
for example:
import logging
logging.root.setLevel(logging.DEBUG)
logging.debug('hello')
j = 0
for i in range(10):
j += i
logging.debug('i %d j %d' % (i,j))
print(j)
logging.debug('bye')
i want to #define these resource consuming functions out of the code. something like the c equivalent
#define logging.debug(val)
yes, i know the logging module logging level mechanism can be used to mask out loggings below set log level. but, im asking for a general way to have the python interpreter skip functions (that take time to run even if they dont do much)
one idea is to redefine the functions i want to comment out into empty functions:
def lazy(*args): pass
logging.debug = lazy
the above idea still calls a function, and may create a myriad of other problems
Python does not have a preprocessor, although you could run your python source through an external preprocessor to get the same effect - e.g. sed "/logging.debug/d" will strip out all the debug logging commands. This is not very elegant though - you will end up needing some sort of build system to run all your modules through the preprocessor and perhaps create a new directory tree of the processed .py files before running the main script.
Alternatively if you put all your debug statements in an if __debug__: block they will get optimised out when python is run with the -O (optimise) flag.
As an aside, I checked the code with the dis module to ensure that it did get optimised away. I discovered that both
if __debug__: doStuff()
and
if 0: doStuff()
are optimised, but
if False: doStuff()
is not. This is because False is a regular Python object, and you can in fact do this:
>>> False = True
>>> if False: print "Illogical, captain"
Illogical, captain
Which seems to me a flaw in the language - hopefully it is fixed in Python 3.
Edit:
This is fixed in Python 3: Assigning to True or False now gives a SyntaxError.
Since True and False are constants in Python 3, it means that if False: doStuff() is now optimised:
>>> def f():
... if False: print( "illogical")
...
>>> dis.dis(f)
2 0 LOAD_CONST 0 (None)
3 RETURN_VALUE
Although I think the question is perfectly clear and valid (notwithstanding the many responses that suggest otherwise), the short answer is "there's no support in Python for this".
The only potential solution other than the preprocessor suggestion would be to use some bytecode hacking. I won't even begin to imagine how this should work in terms of the high-level API, but at a low level you could imagine examining code objects for particular sequences of instructions and re-writing them to eliminate them.
For example, look at the following two functions:
>>> def func():
... if debug: # analogous to if __debug__:
... foo
>>> dis.dis(func)
2 0 LOAD_GLOBAL 0 (debug)
3 JUMP_IF_FALSE 8 (to 14)
6 POP_TOP
3 7 LOAD_GLOBAL 1 (foo)
10 POP_TOP
11 JUMP_FORWARD 1 (to 15)
>> 14 POP_TOP
>> 15 LOAD_CONST 0 (None)
18 RETURN_VALUE
Here you could scan for the LOAD_GLOBAL of debug, and eliminate it and everything up to the JUMP_IF_FALSE target.
This one is the more traditional C-style debug() function that gets nicely obliterated by a preprocessor:
>>> def func2():
... debug('bar', baz)
>>> dis.dis(func2)
2 0 LOAD_GLOBAL 0 (debug)
3 LOAD_CONST 1 ('bar')
6 LOAD_GLOBAL 1 (baz)
9 CALL_FUNCTION 2
12 POP_TOP
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
Here you would look for LOAD_GLOBAL of debug and wipe everything up to the corresponding CALL_FUNCTION.
Of course, both of those descriptions of what you would do are far simpler than what you'd really need for all but the most simplistic patterns of use, but I think it would be feasible. Would make a cute project, if nobody's already done it.
Well, you can always implement your own simple preprocessor that does the trick. Or, even better, you can use an already existing one. Say http://code.google.com/p/preprocess/
Use a module scoped variable?
from config_module import debug_flag
and use this "variable" to gate access to the logging function(s). You would build yourself a logging module that uses the debug_flag to gate the logging functionality.
I think that completely aboiding the calling on a function is not posible, as Python works in a different way that C. The #define takes place in the pre-compiler, before the code is compiled. In Python, there's no such thing.
If you want to completely remove the calling to debug in a work environment, I think the only way if to actually change the code before execution. With a script previous to execution you could comment/uncomment the debug lines.
Something like this:
File logging.py
#Main module
def log():
print 'logging'
def main():
log()
print 'Hello'
log()
File call_log.py
import re
#To log or not to log, that's the question
log = True
#Change the loging
with open('logging.py') as f:
new_data = []
for line in f:
if not log and re.match(r'\s*log.*', line):
#Comment
line = '#' + line
if log and re.match(r'#\s*log.*', line):
#Uncomment
line = line[1:]
new_data.append(line)
#Save file with adequate log level
with open('logging.py', 'w') as f:
f.write(''.join(new_data))
#Call the module
import logging
logging.main()
Of course, it has its problems, specially if there are a lot of modules and are complex, but could be usable if you need to absolutely avoid the calling to a function.
Before you do this, have you profiled to verify that the logging is actually taking a substantial amount of time? You may find that you spend more time trying to remove the calls than you save.
Next, have you tried something like Psyco? If you've got things set up so logging is disabled, then Psyco may be able to optimise away most of the overhead of calling the logging function, noticing that it will always return without action.
If you still find logging taking an appreciable amount of time, you might then want to look at overriding the logging function inside critical loops, possibly by binding a local variable to either the logging function or a dummy function as appropriate (or by checking for None before calling it).
define a function that does nothing, ie
def nuzzing(*args, **kwargs): pass
Then just overload all the functions you want to get rid of with your function, ala
logging.debug = nuzzing
I like the 'if __debug_' solution except that putting it in front of every call is a bit distracting and ugly. I had this same problem and overcame it by writing a script which automatically parses your source files and replaces logging statements with pass statements (and commented out copies of the logging statements). It can also undo this conversion.
I use it when I deploy new code to a production environment when there are lots of logging statements which I don't need in a production setting and they are affecting performance.
You can find the script here: http://dound.com/2010/02/python-logging-performance/
You can't skip function calls. You could redefine these as empty though, e.g. by creating another logging object that provides the same interface, but with empty functions.
But by far the cleanest approach is to ignore the low priority log messages (as you suggested):
logging.root.setLevel(logging.CRITICAL)
I want to programmatically edit python source code. Basically I want to read a .py file, generate the AST, and then write back the modified python source code (i.e. another .py file).
There are ways to parse/compile python source code using standard python modules, such as ast or compiler. However, I don't think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.
UPDATE: The reason I want to do this is I'd like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.
Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).
Both these tools uses the lib2to3 library which is an implementation of the python parser/compiler machinery that can preserve comments in source when it's round tripped from source -> AST -> source.
The rope project may meet your needs if you want to do more refactoring like transforms.
The ast module is your other option, and there's an older example of how to "unparse" syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.
The redbaron project also may be a good fit (ht Xavier Combelle)
The builtin ast module doesn't seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so.
eg.
import ast
import codegen
expr="""
def foo():
print("hello world")
"""
p=ast.parse(expr)
p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"
print(codegen.to_source(p))
This will print:
def foo():
return 42
Note that you may lose the exact formatting and comments, as these are not preserved.
However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.
Took a while, but Python 3.9 has this:
https://docs.python.org/3.9/whatsnew/3.9.html#ast
https://docs.python.org/3.9/library/ast.html#ast.unparse
ast.unparse(ast_obj)
Unparse an ast.AST object and generate a string with code that would produce an equivalent ast.AST object if parsed back with ast.parse().
In a different answer I suggested using the astor package, but I have since found a more up-to-date AST un-parsing package called astunparse:
>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))
def foo(x):
return (2 * x)
I have tested this on Python 3.5.
You might not need to re-generate source code. That's a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:
If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don't want to change it into an AST and back because you'll lose all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together) (ast nodes have lineno and col_offset attributes) comments. Instead, you'll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland's MetaPython extension.
If you are trying to make a change during compilation of a module, note that you don't have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.
But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I'm a bit of a skeptic that you've found the first good use-case. :-)
Update: now that you've explained what you're trying to do, I'd be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?
Parsing and modifying the code structure is certainly possible with the help of ast module and I will show it in an example in a moment. However, writing back the modified source code is not possible with ast module alone. There are other modules available for this job such as one here.
NOTE: Example below can be treated as an introductory tutorial on the usage of ast module but a more comprehensive guide on using ast module is available here at Green Tree snakes tutorial and official documentation on ast module.
Introduction to ast:
>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!
You can parse the python code (represented in string) by simply calling the API ast.parse(). This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.
Another very useful API is ast.dump() which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,
On Python 2.7:
>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"
On Python 3.5:
>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"
Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.
How to modify code using ast:
Now, let's a have a look at an example of modification of python code by ast module. The main tool for modifying AST structure is ast.NodeTransformer class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.
For our example, let's try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.
Print statement to Fun call converter utility: print2to3.py:
#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.
USAGE:
python print2to3.py <filename>
'''
import ast
import sys
class P2to3(ast.NodeTransformer):
def visit_Print(self, node):
new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
args=node.values,
keywords=[], starargs=None, kwargs=None))
ast.copy_location(new_node, node)
return new_node
def main(filename=None):
if not filename:
return
with open(filename, 'r') as fp:
data = fp.readlines()
data = ''.join(data)
tree = ast.parse(data)
print "Converting python 2 print statements to Python 3 function calls"
print "-" * 35
P2to3().visit(tree)
ast.fix_missing_locations(tree)
# print ast.dump(tree)
exec(compile(tree, filename="p23", mode="exec"))
if __name__ == '__main__':
if len(sys.argv) <=1:
print ("\nUSAGE:\n\t print2to3.py <filename>")
sys.exit(1)
else:
main(sys.argv[1])
This utility can be tried on small example file, such as one below, and it should work fine.
Test Input file : py2.py
class A(object):
def __init__(self):
pass
def good():
print "I am good"
main = good
if __name__ == '__main__':
print "I am in main"
main()
Please note that above transformation is only for ast tutorial purpose and in real case scenario one will have to look at all different scenarios such as print " x is %s" % ("Hello Python").
If you are looking at this in 2019, then you can use this libcst
package. It has syntax similar to ast. This works like a charm, and preserve the code structure. It's basically helpful for the project where you have to preserve comments, whitespace, newline etc.
If you don't need to care about the preserving comments, whitespace and others, then the combination of ast and astor works well.
I've created recently quite stable (core is really well tested) and extensible piece of code which generates code from ast tree: https://github.com/paluh/code-formatter .
I'm using my project as a base for a small vim plugin (which I'm using every day), so my goal is to generate really nice and readable python code.
P.S.
I've tried to extend codegen but it's architecture is based on ast.NodeVisitor interface, so formatters (visitor_ methods) are just functions. I've found this structure quite limiting and hard to optimize (in case of long and nested expressions it's easier to keep objects tree and cache some partial results - in other way you can hit exponential complexity if you want to search for best layout). BUT codegen as every piece of mitsuhiko's work (which I've read) is very well written and concise.
One of the other answers recommends codegen, which seems to have been superceded by astor. The version of astor on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version of astor as follows.
pip install git+https://github.com/berkerpeksag/astor.git#egg=astor
Then you can use astor.to_source to convert a Python AST to human-readable Python source code:
>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
return 2 * x
I have tested this on Python 3.5.
Unfortunately none of the answers above actually met both of these conditions
Preserve the syntactical integrity for the surrounding source code (e.g keeping comments, other sorts of formatting for the rest of the code)
Actually use AST (not CST).
I've recently written a small toolkit to do pure AST based refactorings, called refactor. For example if you want to replace all placeholders with 42, you can simply write a rule like this;
class Replace(Rule):
def match(self, node):
assert isinstance(node, ast.Name)
assert node.id == 'placeholder'
replacement = ast.Constant(42)
return ReplacementAction(node, replacement)
And it will find all acceptable nodes, replace them with the new nodes and generate the final form;
--- test_file.py
+++ test_file.py
## -1,11 +1,11 ##
def main():
- print(placeholder * 3 + 2)
- print(2 + placeholder + 3)
+ print(42 * 3 + 2)
+ print(2 + 42 + 3)
# some commments
- placeholder # maybe other comments
+ 42 # maybe other comments
if something:
other_thing
- print(placeholder)
+ print(42)
if __name__ == "__main__":
main()
We had a similar need, which wasn't solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.
It doesn't do modifications of code directly, but that's not hard to add on top, since it does tell you the range of text you need to modify.
For example, this wraps a function call in WRAP(...), preserving comments and everything else:
example = """
def foo(): # Test
'''My func'''
log("hello world") # Print
"""
import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)
call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end]) + atok.text[end:])
Produces:
def foo(): # Test
'''My func'''
WRAP(log("hello world")) # Print
Hope this helps!
A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations ("if you see this pattern, replace it by that pattern"). Such tools are ideal for doing mutation of existing source codes, which are just "if you see this pattern, replace by a pattern variant".
Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.
See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do "fidelity printing", which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).
To implement a "mutation" rule for Python with DMS, you could write the following:
rule mutate_addition(s:sum, p:product):sum->sum =
" \s + \p " -> " \s - \p"
if mutate_this_place(s);
This rule replace "+" with "-" in a syntactically correct way; it operates on the AST and thus won't touch strings or comments that happen to look right. The extra condition on "mutate_this_place" is to let you control how often this occurs; you don't want to mutate every place in the program.
You'd obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.
I used to use baron for this, but have now switched to parso because it's up to date with modern python. It works great.
I also needed this for a mutation tester. It's really quite simple to make one with parso, check out my code at https://github.com/boxed/mutmut