Python meta-circular evaluator - python

It's not uncommon for an intro programming class to write a Lisp metacircular evaluator. Has there been any attempt at doing this for Python?
Yes, I know that Lisp's structure and syntax lends itself nicely to a metacircular evaluator, etc etc. Python will most likely be more difficult. I am just curious as to whether such an attempt has been made.

For those who don't know what a meta-circular evaluator is, it is an interpreter which is written in the language to be interpreted. For example: a Lisp interpreter written in Lisp, or in our case, a Python interpreter written in Python. For more information, read this chapter from SICP.
As JBernardo said, PyPy is one. However, PyPy's Python interpreter, the meta-circular evaluator that is, is implemented in a statically typed subset of Python called RPython.
You'll be pleased to know that, as of the 1.5 release, PyPy is fully compliant with the official Python 2.7 specification. Even more so: PyPy nearly always beats Python in performance benchmarks.
For more information see PyPy docs and PyPy extra docs.

I think i wrote one here:
"""
Metacircular Python interpreter with macro feature.
By Cees Timmerman, 14aug13.
"""
import re
re_macros = re.compile("^#define (\S+) ([^\r\n]+)", re.MULTILINE)
def meta_python_exec(code):
# Optional meta feature.
macros = re_macros.findall(code)
code = re_macros.sub("", code)
for m in macros:
code = code.replace(m[0], m[1])
# Run the code.
exec(code)
if __name__ == "__main__":
#code = open("metacircular_overflow.py", "r").read() # Causes a stack overflow in Python 3.2.3, but simply raises "RuntimeError: maximum recursion depth exceeded while calling a Python object" in Python 2.7.3.
code = "#define 1 2\r\nprint(1 + 1)"
meta_python_exec(code)

Related

Simulate Python interactive mode

Im writing a private online Python interpreter for VK, which would closely simulate IDLE console. Only me and some people in whitelist would be able to use this feature, no unsafe code which can harm my server. But I have a little problem. For example, I send the string with code def foo():, and I dont want to get SyntaxError but continue defining function line by line without writing long strings with use of \n. exec() and eval() doesn't suit me in that case. What should I use to get desired effect? Sorry if duplicate, still dont get it from similar questions.
The Python standard library provides the code and codeop modules to help you with this. The code module just straight-up simulates the standard interactive interpreter:
import code
code.interact()
It also provides a few facilities for more detailed control and customization of how it works.
If you want to build things up from more basic components, the codeop module provides a command compiler that remembers __future__ statements and recognizes incomplete commands:
import codeop
compiler = codeop.CommandCompiler()
try:
codeobject = compiler(some_source_string)
# codeobject is an exec-utable code object if some_source_string was a
# complete command, or None if the command is incomplete.
except (SyntaxError, OverflowError, ValueError):
# If some_source_string is invalid, we end up here.
# OverflowError and ValueError can occur in some cases involving invalid literals.
It boils down to reading input, then
exec <code> in globals,locals
in an infinite loop.
See e.g. IPython.frontend.terminal.console.interactiveshell.TerminalInteractiveSh
ell.mainloop().
Continuation detection is done in inputsplitter.push_accepts_more() by trying ast.parse().
Actually, IPython already has an interactive web console called Jupyter Notebook, so your best bet should be to reuse it.

What is the correct way (if any) to use Python 2 and 3 libraries in the same program?

I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?
The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.
I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2

unavailiable assertion methods in python 3.1 unittest

I'm new to python programming and especially to unit-testing framework.
For some reason working with pyDev (py 3.1 interpreter) I cannot use all of those new
assert methods (such as assertRegexpMatches etc..).
Here's an example code:
class TestParser(unittest.TestCase):
def testskipCommentAndSpaces(self):
if os.path.isfile(sys.argv[1]):
#self.vmFilesListPath = sys.argv[1]
vmFilesListPath = sys.argv[1]
else:
#self.vmFilesListPath = get_all_vm_files(sys.argv[1])
vmFilesListPath = get_all_vm_files(sys.argv[1])
#parser = Parser(self.vmFilesListPath)
parser = Parser(vmFilesListPath)
commands = parser.getCommands()
for command in commands:
for token in commands:
p=re.search(r"(////)",str(token))
**self.assertNotRegexpMatches(str(token),p)**
What I get is: AttributeError: 'TestParser' object has no attribute 'assertNotRegexpMatches'
Needless to say that: hasattr(self, 'assertNotRegexpMatches') returns false while the "simple" asserts methods works good.
I'm sure the interpreter is set to 3.1 - i.e the correct version I need (since I also have py 2.7 installed on my system).
Would thank you for your help,
Igor.L
While the unittest module in Python 3.1 had an assertRegexpMatches method, there is no documented assertNotRegexpMatches. In Python 3.2, assertRegexpMatches was renamed to assertRegex and the complementary assertNotRegex was added.
Note that Python 3.1 is obsolete and no longer maintained other than critical security fixes. There have been many features, fixes, and major performance improvements added in Python 3.2 and now 3.3 which was just released. Consider upgrading to one of them.

How is Lisp's read-eval-print loop different than Python's?

I've encounter a following statement by Richard Stallman:
'When you start a Lisp system, it enters a read-eval-print loop. Most other languages have nothing comparable to read, nothing comparable to eval, and nothing comparable to print. What gaping deficiencies! '
Now, I did very little programming in Lisp, but I've wrote considerable amount of code in Python and recently a little in Erlang. My impression was that these languages also offer read-eval-print loop, but Stallman disagrees (at least about Python):
'I skimmed documentation of Python after people told me it was fundamentally similar to Lisp. My conclusion is that that is not so. When you start Lisp, it does 'read', 'eval', and 'print', all of which are missing in Python.'
Is there really a fundamental technical difference between Lisp's and Python's read-eval-print loops? Can you give examples of things that Lisp REPL makes easy and that are difficult to do in Python?
In support of Stallman's position, Python does not do the same thing as typical Lisp systems in the following areas:
The read function in Lisp reads an S-expression, which represents an arbitrary data structure that can either be treated as data, or evaluated as code. The closest thing in Python reads a single string, which you would have to parse yourself if you want it to mean anything.
The eval function in Lisp can execute any Lisp code. The eval function in Python evaluates only expressions, and needs the exec statement to run statements. But both these work with Python source code represented as text, and you have to jump through a bunch of hoops to "eval" a Python AST.
The print function in Lisp writes out an S-expression in exactly the same form that read accepts. print in Python prints out something defined by the data you're trying to print, which is certainly not always reversible.
Stallman's statement is a bit disingenuous, because clearly Python does have functions named exactly eval and print, but they do something different (and inferior) to what he expects.
In my opinion, Python does have some aspects similar to Lisp, and I can understand why people might have recommended that Stallman look into Python. However, as Paul Graham argues in What Made Lisp Different, any programming language that includes all the capabilities of Lisp, must also be Lisp.
Stallman's point is that not implementing an explicit "reader" makes Python's REPL appear crippled compared to Lisps because it removes a crucial step from the REPL process. Reader is the component that transforms a textual input stream into the memory — think of something like an XML parser built into the language and used for both source code and for data. This is useful not only for writing macros (which would in theory be possible in Python with the ast module), but also for debugging and introspection.
Say you're interested in how the incf special form is implemented. You can test it like this:
[4]> (macroexpand '(incf a))
(SETQ A (+ A 1)) ;
But incf can do much more than incrementing symbol values. What exactly does it do when asked to increment a hash table entry? Let's see:
[2]> (macroexpand '(incf (gethash htable key)))
(LET* ((#:G3069 HTABLE) (#:G3070 KEY) (#:G3071 (+ (GETHASH #:G3069 #:G3070) 1)))
(SYSTEM::PUTHASH #:G3069 #:G3070 #:G3071)) ;
Here we learn that incf calls a system-specific puthash function, which is an implementation detail of this Common Lisp system. Note how the "printer" is making use of features known to the "reader", such as introducing anonymous symbols with the #: syntax, and referring to the same symbols within the scope of the expanded expression. Emulating this kind of inspection in Python would be much more verbose and less accessible.
In addition to the obvious uses at the REPL, experienced Lispers use print and read in the code as a simple and readily available serialization tool, comparable to XML or json. While Python has the str function, equivalent to Lisp's print, it lacks the equivalent of read, the closest equivalent being eval. eval of course conflates two different concepts, parsing and evaluation, which leads to problems like this and solutions like this and is a recurring topic on Python forums. This would not be an issue in Lisp precisely because the reader and the evaluator are cleanly separated.
Finally, advanced features of the reader facility enable the programmer to extend the language in ways that even macros could not otherwise provide. A perfect example of such making hard things possible is the infix package by Mark Kantrowitz, implementing a full-featured infix syntax as a reader macro.
In a Lisp-based system one typically develops the program while it is running from the REPL (read eval print loop). So it integrates a bunch of tools: completion, editor, command-line-interpreter, debugger, ... The default is to have that. Type an expression with an error - you are in another REPL level with some debugging commands enabled. You actually have to do something to get rid of this behavior.
You can have two different meanings of the REPL concept:
the Read Eval Print Loop like in Lisp (or a few other similar languages). It reads programs and data, it evaluates and prints the result data. Python does not work this way. Lisp's REPL allows you to work directly in a meta-programming way, writing code which generates (code), check the expansions, transform actual code, etc.. Lisp has read/eval/print as the top loop. Python has something like readstring/evaluate/printstring as the top-loop.
the Command Line Interface. An interactive shell. See for example for IPython. Compare that to Common Lisp's SLIME.
The default shell of Python in default mode is not really that powerful for interactive use:
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a+2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>>
You get an error message and that's it.
Compare that to the CLISP REPL:
rjmba:~ joswig$ clisp
i i i i i i i ooooo o ooooooo ooooo ooooo
I I I I I I I 8 8 8 8 8 o 8 8
I \ `+' / I 8 8 8 8 8 8
\ `-+-' / 8 8 8 ooooo 8oooo
`-__|__-' 8 8 8 8 8
| 8 o 8 8 o 8 8
------+------ ooooo 8oooooo ooo8ooo ooooo 8
Welcome to GNU CLISP 2.49 (2010-07-07) <http://clisp.cons.org/>
Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2010
Type :h and hit Enter for context help.
[1]> (+ a 2)
*** - SYSTEM::READ-EVAL-PRINT: variable A has no value
The following restarts are available:
USE-VALUE :R1 Input a value to be used instead of A.
STORE-VALUE :R2 Input a new value for A.
ABORT :R3 Abort main loop
Break 1 [2]>
CLISP uses Lisp's condition system to break into a debugger REPL. It presents some restarts. Within the error context, the new REPL provides extended commands.
Let's use the :R1 restart:
Break 1 [2]> :r1
Use instead of A> 2
4
[3]>
Thus you get interactive repair of programs and execution runs...
Python's interactive mode differs from Python's "read code from file" mode in several, small, crucial ways, probably inherent in the textual representation of the language. Python is also not homoiconic, something that makes me call it "interactive mode" rather than "read-eval-print loop". That aside, I'd say that it is more a difference of grade than a difference in kind.
Now, something tahtactually comes close to "difference in kind", in a Python code file, you can easily insert blank lines:
def foo(n):
m = n + 1
return m
If you try to paste the identical code into the interpreter, it will consider the function to be "closed" and complain that you have a naked return statement at the wrong indentation. This does not happen in (Common) Lisp.
Furthermore, there are some rather handy convenience variables in Common Lisp (CL) that are not available (at least as far as I know) in Python. Both CL and Python have "value of last expression" (* in CL, _ in Python), but CL also has ** (value of expression before last) and *** (the value of the one before that) and +, ++ and +++ (the expressions themselves). CL also doesn't distinguish between expressions and statements (in essence, everything is an expression) and all of that does help build a much richer REPL experience.
As I said at the beginning, it is more a difference in grade than difference in kind. But had the gap been only a smidgen wider between them, it would probably be a difference in kind, as well.

Parse a .py file, read the AST, modify it, then write back the modified source code

I want to programmatically edit python source code. Basically I want to read a .py file, generate the AST, and then write back the modified python source code (i.e. another .py file).
There are ways to parse/compile python source code using standard python modules, such as ast or compiler. However, I don't think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.
UPDATE: The reason I want to do this is I'd like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.
Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).
Both these tools uses the lib2to3 library which is an implementation of the python parser/compiler machinery that can preserve comments in source when it's round tripped from source -> AST -> source.
The rope project may meet your needs if you want to do more refactoring like transforms.
The ast module is your other option, and there's an older example of how to "unparse" syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.
The redbaron project also may be a good fit (ht Xavier Combelle)
The builtin ast module doesn't seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so.
eg.
import ast
import codegen
expr="""
def foo():
print("hello world")
"""
p=ast.parse(expr)
p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"
print(codegen.to_source(p))
This will print:
def foo():
return 42
Note that you may lose the exact formatting and comments, as these are not preserved.
However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.
Took a while, but Python 3.9 has this:
https://docs.python.org/3.9/whatsnew/3.9.html#ast
https://docs.python.org/3.9/library/ast.html#ast.unparse
ast.unparse(ast_obj)
Unparse an ast.AST object and generate a string with code that would produce an equivalent ast.AST object if parsed back with ast.parse().
In a different answer I suggested using the astor package, but I have since found a more up-to-date AST un-parsing package called astunparse:
>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))
def foo(x):
return (2 * x)
I have tested this on Python 3.5.
You might not need to re-generate source code. That's a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:
If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don't want to change it into an AST and back because you'll lose all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together) (ast nodes have lineno and col_offset attributes) comments. Instead, you'll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland's MetaPython extension.
If you are trying to make a change during compilation of a module, note that you don't have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.
But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I'm a bit of a skeptic that you've found the first good use-case. :-)
Update: now that you've explained what you're trying to do, I'd be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?
Parsing and modifying the code structure is certainly possible with the help of ast module and I will show it in an example in a moment. However, writing back the modified source code is not possible with ast module alone. There are other modules available for this job such as one here.
NOTE: Example below can be treated as an introductory tutorial on the usage of ast module but a more comprehensive guide on using ast module is available here at Green Tree snakes tutorial and official documentation on ast module.
Introduction to ast:
>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!
You can parse the python code (represented in string) by simply calling the API ast.parse(). This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.
Another very useful API is ast.dump() which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,
On Python 2.7:
>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"
On Python 3.5:
>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"
Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.
How to modify code using ast:
Now, let's a have a look at an example of modification of python code by ast module. The main tool for modifying AST structure is ast.NodeTransformer class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.
For our example, let's try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.
Print statement to Fun call converter utility: print2to3.py:
#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.
USAGE:
python print2to3.py <filename>
'''
import ast
import sys
class P2to3(ast.NodeTransformer):
def visit_Print(self, node):
new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
args=node.values,
keywords=[], starargs=None, kwargs=None))
ast.copy_location(new_node, node)
return new_node
def main(filename=None):
if not filename:
return
with open(filename, 'r') as fp:
data = fp.readlines()
data = ''.join(data)
tree = ast.parse(data)
print "Converting python 2 print statements to Python 3 function calls"
print "-" * 35
P2to3().visit(tree)
ast.fix_missing_locations(tree)
# print ast.dump(tree)
exec(compile(tree, filename="p23", mode="exec"))
if __name__ == '__main__':
if len(sys.argv) <=1:
print ("\nUSAGE:\n\t print2to3.py <filename>")
sys.exit(1)
else:
main(sys.argv[1])
This utility can be tried on small example file, such as one below, and it should work fine.
Test Input file : py2.py
class A(object):
def __init__(self):
pass
def good():
print "I am good"
main = good
if __name__ == '__main__':
print "I am in main"
main()
Please note that above transformation is only for ast tutorial purpose and in real case scenario one will have to look at all different scenarios such as print " x is %s" % ("Hello Python").
If you are looking at this in 2019, then you can use this libcst
package. It has syntax similar to ast. This works like a charm, and preserve the code structure. It's basically helpful for the project where you have to preserve comments, whitespace, newline etc.
If you don't need to care about the preserving comments, whitespace and others, then the combination of ast and astor works well.
I've created recently quite stable (core is really well tested) and extensible piece of code which generates code from ast tree: https://github.com/paluh/code-formatter .
I'm using my project as a base for a small vim plugin (which I'm using every day), so my goal is to generate really nice and readable python code.
P.S.
I've tried to extend codegen but it's architecture is based on ast.NodeVisitor interface, so formatters (visitor_ methods) are just functions. I've found this structure quite limiting and hard to optimize (in case of long and nested expressions it's easier to keep objects tree and cache some partial results - in other way you can hit exponential complexity if you want to search for best layout). BUT codegen as every piece of mitsuhiko's work (which I've read) is very well written and concise.
One of the other answers recommends codegen, which seems to have been superceded by astor. The version of astor on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version of astor as follows.
pip install git+https://github.com/berkerpeksag/astor.git#egg=astor
Then you can use astor.to_source to convert a Python AST to human-readable Python source code:
>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
return 2 * x
I have tested this on Python 3.5.
Unfortunately none of the answers above actually met both of these conditions
Preserve the syntactical integrity for the surrounding source code (e.g keeping comments, other sorts of formatting for the rest of the code)
Actually use AST (not CST).
I've recently written a small toolkit to do pure AST based refactorings, called refactor. For example if you want to replace all placeholders with 42, you can simply write a rule like this;
class Replace(Rule):
def match(self, node):
assert isinstance(node, ast.Name)
assert node.id == 'placeholder'
replacement = ast.Constant(42)
return ReplacementAction(node, replacement)
And it will find all acceptable nodes, replace them with the new nodes and generate the final form;
--- test_file.py
+++ test_file.py
## -1,11 +1,11 ##
def main():
- print(placeholder * 3 + 2)
- print(2 + placeholder + 3)
+ print(42 * 3 + 2)
+ print(2 + 42 + 3)
# some commments
- placeholder # maybe other comments
+ 42 # maybe other comments
if something:
other_thing
- print(placeholder)
+ print(42)
if __name__ == "__main__":
main()
We had a similar need, which wasn't solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.
It doesn't do modifications of code directly, but that's not hard to add on top, since it does tell you the range of text you need to modify.
For example, this wraps a function call in WRAP(...), preserving comments and everything else:
example = """
def foo(): # Test
'''My func'''
log("hello world") # Print
"""
import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)
call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end]) + atok.text[end:])
Produces:
def foo(): # Test
'''My func'''
WRAP(log("hello world")) # Print
Hope this helps!
A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations ("if you see this pattern, replace it by that pattern"). Such tools are ideal for doing mutation of existing source codes, which are just "if you see this pattern, replace by a pattern variant".
Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.
See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do "fidelity printing", which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).
To implement a "mutation" rule for Python with DMS, you could write the following:
rule mutate_addition(s:sum, p:product):sum->sum =
" \s + \p " -> " \s - \p"
if mutate_this_place(s);
This rule replace "+" with "-" in a syntactically correct way; it operates on the AST and thus won't touch strings or comments that happen to look right. The extra condition on "mutate_this_place" is to let you control how often this occurs; you don't want to mutate every place in the program.
You'd obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.
I used to use baron for this, but have now switched to parso because it's up to date with modern python. It works great.
I also needed this for a mutation tester. It's really quite simple to make one with parso, check out my code at https://github.com/boxed/mutmut

Categories

Resources