Recursive regular expressions for defining syntax using 'fr' strings

Recursive regular expressions for defining syntax using 'fr' strings - python

When creating grammar rules for a language I am making, I would like to be able to check syntax and step through it instead of the current method which often will miss syntax errors.
I've started off using regular expressions to define the grammar like so:
add = r"(\+)"
sub = r"(-)"
mul = r"(\*)"
div = r"(\\)"
pow = r"(\^)"
bin_op = fr"({add}|{sub}|{mul}|{div}|{pow})"
open_br = r"(\()"
close_br = r"(\))"
open_sq = r"(\[)"
close_sq = r"(\])"
dot = r"(\.)"
short_id = r"([A-Za-z]\d*)" # i.e. "a1", "b1232", etc.
long_id = r"([A-Za-z0-9]+)" # i.e. "sin2", "tan", etc. for use in assignment
long_id_ref = r"('" + long_id + "')" # i.e. "'sin'", for referencing
#note that "'sin'" is one value while "sin" = "s" * "i" * "n"
id_assign = fr"({short_id}|{long_id})" # for assignment
id_ref = fr"({short_id}|{long_id_ref})" # for reference (apostrophes)
integer = r"(\d+)" # i.e 123
float = fr"(\d+{dot}\d+)" # i.e. 3.4
operand = fr"({integer}|{float}|{id_ref})"
Now the issue here is that definitions may be recursive, for example in expression = fr"{expression}{bin_op}{expression}|({open_br}{expression}{close_br})|({expression}{open_sq}{expression}{close_sq})|..." and as you can see, I have shown some possible expressions that would be recursive. The issue is, of course, that expression is not defined when defining expression therefore an error would be raised.
It seems that (?R) would not work since it would copy everything before it not the whole string. Does Python's regex have a way of dealing with this or do I have to create my own BNF or regex interpreter that supports recursion?
Alternatively would it be feasible to use regular expressions but not use any recursion?
I know that there are 3rd-party applications that can help with this but I'd like to be able to do it all myself without external code.

Related

pyparsing optional parenthesis around an expression: pp.Opt(Suppress()) vs. nested_expr

QUESTIONS
This is a long post, so I will highlight my main two questions now before giving details:
How can one succinctly allow for optional matched parentheses/brackets around an expression?
How does one properly parse the content of nested_expr? This answer suggests that this function is not quite appropriate for this, and infix_notation is better, but that doesn't seem to fit my use case (I don't think).
DETAILS
I am working on a grammar to parse prolog strings. The data I have involves a lot of optional brackets or parentheses.
For example, both predicate([arg1, arg2, arg3]) and predicate(arg1, arg2, arg3) are legal and appear in the data.
My full grammar is a little complicated, and likely could be cleaned up, but I will paste it here for reproducibility. I have a couple versions of the grammar as I found new data that I had to account for. The first one works with the following example string:
pred(Var, arg_name1:arg#arg_type, arg_name2:(sub_arg1, sub_arg2))
For some visual clarity, I am turning the parsed strings into graphs, so this is what this one should look like:
Note that the arg2:(sub_arg1, sub_arg1) is slightly idiosyncratic syntax where the things inside the parens are supposed to be thought of as having an AND operator between them. The only thing indicating this is the fact that this wrapped expression essentially appears "naked" (i.e. has no predicate name of its own, it's just some values lumped together with parens).
VERSION 1: works on the above string
# GRAMMAR VER 1
predication = pp.Forward()
join_predication = pp.Forward()
entity = pp.Forward()
args_list = pp.Forward()
# atoms are used either as predicate names or bottom level argument values
# str_atoms are just quoted strings which may also appear as arguments
atom = pp.Word(pp.alphanums + '_' + '.')
str_atom = pp.QuotedString("'")
# TYPICAL ARGUMENT: arg_name:ARG_VALUE, where the ARG_VALUE may be an entity, join_predication, predication, or just an atom.
# Note that the arg_name is optional and may not always appear
# EXAMPLES:
# with name: pred(arg1:val1, arg2:val2)
# without name: pred(val1, val2)
argument = pp.Group(pp.Opt(atom("arg_name") + pp.Suppress(":")) + (entity | join_predication | predication | atom("arg_value") | str_atom("arg_value")))
# List of arguments
args_list = pp.Opt(pp.Suppress("[")) + pp.delimitedList(argument) + pp.Opt(pp.Suppress("]"))
# As in the example string above, sometimes predications are grouped together in parentheses and are meant to be understood as having an AND operator between them when evaluating the truth of both together
# EXAMPLE: pred(arg1:(sub_pred1, subpred2))
# I am just treating it as an args_list inside obligatory parentheses
join_predication <<= pp.Group(pp.Suppress("(") + args_list("args_list") + pp.Suppress(")"))("join_predication")
# pred_name with optional arguments (though I've never seen one without arguments, just in case)
predication <<= pp.Group(atom("pred_name") + pp.Suppress("(") + pp.Opt(args_list)("args_list") + pp.Suppress(")"))("predication")
# ent_name with optional arguments and a #type
entity <<= (pp.Group(((atom("ent_name")
+ pp.Suppress("(") + pp.Opt(args_list)("args_list") + pp.Suppress(")"))
| str_atom("ent_name") | atom("ent_name"))
+ pp.Suppress("#") + atom("type"))("entity"))
# starter symbol
lf_fragment = entity | join_predication | predication
Although this works, I came across another very similar string which used brackets instead of parentheses for a join_predication:
pred(Var, arg_name1:arg#arg_type, arg_name2:[sub_arg1, sub_arg2])
This broke my parser seemingly because the brackets are used in other places and because they are often optional, it could mistakenly match one with the wrong parser element as I am doing nothing to enforce that they must go together. For this I thought to turn to nested_expr, but this caused further problems because as mentioned in this answer, parsing the elements inside of a nested_expr doesn't work very well, and I have lost a lot of the substructure I need for the graphs I'm building.
VERSION 2: using nested_expr
# only including those expressions that have been changed
# args_list might not have brackets
args_list = pp.nested_expr("[", "]", pp.delimitedList(argument)) | pp.delimitedList(argument)
# join_predication is an args_list with obligatory wrapping parens/brackets
join_predication <<= pp.nested_expr("(", ")", args_list("args_list"))("join_predication") | pp.nested_expr("[", "]", args_list("args_list"))("join_predication")
I likely need to ensure matching for predication and entity, but haven't for now.
Using the above grammar, I can parse both example strings, but I lose the named structure that I had before.
In the original grammar, parse_results['predication']['args_list'] was a list of every argument, exactly as I expected. In the new grammar, it only contains the first argument, Var, in the example strings.

Convert custom formula to python function [duplicate]

This question already has answers here:
Creating a function object from a string
(3 answers)
Closed 6 years ago.
Consider that we have the following input
formula = "(([foo] + [bar]) - ([baz]/2) )"
function_mapping = {
"foo" : FooFunction,
"bar" : BarFunction,
"baz" : BazFunction,
}
Is there any python library that lets me parse the formula and convert it into
a python function representation.
eg.
converted_formula = ((FooFunction() + BarFunction() - (BazFunction()/2))
I am currently looking into something like
In [11]: ast = compiler.parse(formula)
In [12]: ast
Out[12]: Module(None, Stmt([Discard(Sub((Add((List([Name('foo')]), List([Name('bar')]))), Div((List([Name('baz')]), Const(2))))))]))
and then process this ast tree further.
Do you know of any cleaner alternate solution?
Any help or insight is much appreciated!

You could use the re module to do what you want via regular-expression pattern matching and relatively straight-forward text substitution.
import re
alias_pattern = re.compile(r'''(?:\[(\w+)\])''')
def mapper(mat):
func_alias = mat.group(1)
function = function_alias_mapping.get(func_alias)
if not function:
raise NameError(func_alias)
return function.__name__ + '()'
# must be defined before anything can be mapped to them
def FooFunction(): return 15
def BarFunction(): return 30
def BazFunction(): return 6
function_alias_mapping = dict(foo=FooFunction, bar=BarFunction, baz=BazFunction)
formula = "(([foo] + [bar]) - ([baz]/2))" # Custom formula.
converted_formula = re.sub(alias_pattern, mapper, formula)
print('converted_formula = "{}"'.format(converted_formula))
# define contexts and function in which to evalute the formula expression
global_context = dict(FooFunction=FooFunction,
BarFunction=BarFunction,
BazFunction=BazFunction)
local_context = {'__builtins__': None}
function = lambda: eval(converted_formula, global_context, local_context)
print('answer = {}'.format(function())) # call function
Output:
converted_formula = "((FooFunction() + BarFunction()) - (BazFunction()/2))"
answer = 42

You can use what's called string formatting to accomplish this.
function_mapping = {
"foo" : FooFunction(),
"bar" : BarFunction(),
"baz" : BazFunction(),
}
formula = "(({foo} + {bar}) - ({baz}/2) )".format( **function_mapping )
Will give you the result of ((FooFunction() + BarFunction() - (BazFunction()/2))
But I believe the functions will execute when the module is loaded, so perhaps a better solution would be
function_mapping = {
"foo" : "FooFunction",
"bar" : "BarFunction",
"baz" : "BazFunction",
}
formula = "(({foo}() + {bar}()) - ({baz}()/2) )".format( **function_mapping )
This will give you the string '((FooFunction() + BarFunction() - (BazFunction()/2))' which you can then execute at any time with the eval function.

If you change the syntax used in the formulas slightly, (another) way to do this — as I mentioned in a comment — would be to use string.Template substitution.
Out of curiosity I decided to find out if this other approach was viable — and consequently was able to come up with better answer in the sense that not only is it simpler than my other one, it's also a little more flexible in the sense that it would be easy to add arguments to the functions being called as noted in a comment below.
from string import Template
def FooFunction(): return 15
def BarFunction(): return 30
def BazFunction(): return 6
formula = "(($foo + $bar) - ($baz/2))"
function_mapping = dict(foo='FooFunction()', # note these calls could have args
bar='BarFunction()',
baz='BazFunction()')
converted_formula = Template(formula).substitute(function_mapping)
print('converted_formula = "{}"'.format(converted_formula))
# define contexts in which to evalute the expression
global_context = dict(FooFunction=FooFunction,
BarFunction=BarFunction,
BazFunction=BazFunction)
local_context = dict(__builtins__=None)
function = lambda: eval(converted_formula, global_context, local_context)
answer = function() # call it
print('answer = {}'.format(answer))
As a final note, notice that string.Template supports different kinds of Advanced usage which would allow you to fine-tune the expression syntax even further — because internally it uses the re module (in a more sophisticated way than I did in my original answer).
For the cases where the mapped functions all return values that can be represented as Python literals — like numbers — and aren't being called just for the side-effects they produce, you could make the following modification which effectively cache (aka memoize) the results:
function_cache = dict(foo=FooFunction(), # calls and caches function results
bar=BarFunction(),
baz=BazFunction())
def evaluate(formula):
print('formula = {!r}'.format(formula))
converted_formula = Template(formula).substitute(function_cache)
print('converted_formula = "{}"'.format(converted_formula))
return eval(converted_formula, global_context, local_context)
print('evaluate(formula) = {}'.format(evaluate(formula)))
Output:
formula = '(($foo + $bar) - ($baz/2))'
converted_formula = "((15 + 30) - (6/2))"
evaluate(formula) = 42

How to convert a Python expression into a string

I would like to output a user input expression to a string.
The reason is that the input expression is user defined. I want to output the result of the expression, and print the statement which lead to this result.
import sys
import shutil
expression1 = sys.path
expression2 = shutil.which
def get_expression_str(expression):
if callable(expression):
return expression.__module__ +'.'+ expression.__name__
else:
raise TypeError('Could not convert expression to string')
#print(get_expression_str(expression1))
# returns : builtins.TypeError: Could not convert expression to string
#print(get_expression_str(expression2))
# returns : shutil.which
#print(str(expression1))
#results in a list like ['/home/bernard/clones/it-should-work/unit_test', ... ,'/usr/lib/python3/dist-packages']
#print(repr(expression1))
#results in a list like ['/home/bernard/clones/it-should-work/unit_test', ... ,'/usr/lib/python3/dist-packages']
I looked into the Python inspect module but even
inspect.iscode(sys.path)
returns False
For those who wonder why it is the reverse of a string parsed to an expression using functools.partial see parse statement string
Background.
A program should work. Should, but it not always does. Because a program need specific resources, OS, OS version, other packages, files, etc. Every program needs different requirements (resources) to function properly.
Which specific requirement are needed can not be predicted. The system knows best which resources are and are not available. So instead of manually checking all settings and configurations let a help program do this for you.
So the user, or developer of a program, specify his requirements together with statements how to to retrieve this information : expressions. Which could be executed using eval. Could. Like mentioned on StackOverflow eval is evil.
Use of eval is hard to make secure using a blacklist, see : http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
Using multiple tips of SO I use a namedtuple, with a string, to compare with the user input string, and a function.
A white-list is better then a blacklist. Only if the parsed expression string match a "bare_expression" then an expression is returned.
This white-list contains more information how to process f.e. the "unit_of_measurement" . It goes to far to explain what and why, but this is needed. The list of the namedtuples is much more then just a white-list and is defined :
Expr_UOfM = collections.namedtuple('Expr_UOfM', ['bare_expression', 'keylist', 'function', 'unit_of_measurement', 'attrlist'])
The namedtuple which match a (very limited) list:
Exp_list = [Expr_UOfM('sys.path', '' , sys.path, un.STR, []),
Expr_UOfM('shutil.which', '', shutil.which, None, [])]
This list may be very long and the content is crucial for further correct processing. Note the first and third field are very similar. There should be a single point of reference, but for me, this is on this moment not possible. Note the string : 'sys.path' is equal to (a part of) the user input, and the expression : sys.path is part of the namedtuple list. A good separation, limiting possible abuse.
If the string and the expression are not 100% identical weird behavior may occur which is very hard to debug.
So it want using the get_expression_str function check if the first and third field are identical. Just for total robustness of
the program.
I use Python 3.4

You could use inspect.getsource() and wrap your expression in a lambda. Then you can get an expression with this function:
def lambda_to_expr_str(lambda_fn):
"""c.f. https://stackoverflow.com/a/52615415/134077"""
if not lambda_fn.__name__ == "<lambda>":
raise ValueError('Tried to convert non-lambda expression to string')
else:
lambda_str = inspect.getsource(lambda_fn).strip()
expression_start = lambda_str.index(':') + 1
expression_str = lambda_str[expression_start:].strip()
if expression_str.endswith(')') and '(' not in expression_str:
# i.e. l = lambda_to_expr_str(lambda x: x + 1) => x + 1)
expression_str = expression_str[:-1]
return expression_str
Usage:
$ lambda_to_expr_str(lambda: sys.executable)
> 'sys.executable'
OR
$ f = lambda: sys.executable
$ lambda_to_expr_str(f)
> 'sys.executable'
And then eval with
$ eval(lambda_to_expr_str(lambda: sys.executable))
> '/usr/bin/python3.5'
Note that you can take parameters with this approach and pass them with the locals param of eval.
$ l = lambda_to_expr_str(lambda x: x + 1) # now l == 'x + 1'
$ eval(l, None, {'x': 1})
> 2
Here be Dragons. There are many pathological cases with this approach:
$ l, z = lambda_to_expr_str(lambda x: x + 1), 1234
$ l
> 'x + 1), 1234'
This is because inspect.getsource gets the entire line of code the lambda was declared on. Getting source of functions declared with def would avoid this problem, however passing a function body to eval is not possible as there could be side effects, i.e. setting variables, etc... Lambda's can produce side effects as well in Python 2, so even more dragons lie in pre-Python-3 land.

Why not use eval?
>>> exp1 = "sys.path"
>>> exp2 = "[x*x for x in [1,2,3]]"
>>> eval(exp1)
['', 'C:\\Python27\\lib\\site-packages\\setuptools-0.6c11-py2.7.egg', 'C:\\Pytho
n27\\lib\\site-packages\\pip-1.1-py2.7.egg', 'C:\\Python27\\lib\\site-packages\\
django_celery-3.1.1-py2.7.egg', 'C:\\Python27\\lib\\site-packages\\south-0.8.4-p
y2.7.egg', 'C:\\Windows\\system32\\python27.zip', 'C:\\Python27\\DLLs', 'C:\\Pyt
hon27\\lib', 'C:\\Python27\\lib\\plat-win', 'C:\\Python27\\lib\\lib-tk', 'C:\\Py
thon27', 'C:\\Python27\\lib\\site-packages', 'C:\\Python27\\lib\\site-packages\\
PIL']
>>> eval(exp2)
[1, 4, 9]

Check if a formula is a term in Z3Py

In Z3Py, I need to check if something is a term using the standard grammar term := const | var | f(t1,...,tn)). I have written the following function to determine that but my method to check if for n-ary function doesn't seem very optimal.
Is there a better way to do so? These utility functions is_term, is_atom, is_literal, etc would be useful to be included in Z3. I will put them in the contrib section
CONNECTIVE_OPS = [Z3_OP_NOT,Z3_OP_AND,Z3_OP_OR,Z3_OP_IMPLIES,Z3_OP_IFF,Z3_OP_ITE]
REL_OPS = [Z3_OP_EQ,Z3_OP_LE,Z3_OP_LT,Z3_OP_GE,Z3_OP_GT]
def is_term(a):
"""
term := const | var | f(t1,...,tn)
"""
if is_const(a):
return True
else:
r = (is_app(a) and \
a.decl().kind() not in CONNECTIVE_OPS + REL_OPS and \
all(is_term(c) for c in a.children()))
return r

The function is reasonable, a few comments:
It depends on what you mean by "var" in your specification. Z3 has variables as de-Brujin indices. There is a function in z3py "is_var(a)" to check if "a" is a variable index.
There is another Boolean connective Z3_OP_XOR.
There are additional relational operations, such as operations that compare bit-vectors.
It depends on your intent and usage of the code, but you could alternatively check if the
sort of the expression is Boolean, and if it is ensure that the head function symbol is
uninterpreted.
is_const(a) is defined as return is_app(a) and a.num_args() == 0. So is_const is really handled by the default case.
Expressions that Z3 creates as a result of simplification, parsing or other transformations may have many shared sub-expressions. So a straight-forward recursive descent can take exponential time in the DAG size of the expression. You can deal with this by maintaining a hash table of visited nodes. From Python you can use Z3_get_ast_id to retrieve a unique number for the expression and maintain this in a set. The identifiers are unique as long as terms are not garbage collected, so
you should just maintain such a set as a local variable.
So, something along the lines of:
def get_expr_id(e):
return Z3_get_ast_id(e.ctx.ref(), e.ast)
def is_term_aux(a, seen):
if get_expr_id(a) in seen:
return True
else:
seen[get_expr_id(a)] = True
r = (is_app(a) and \
a.decl().kind() not in CONNECTIVE_OPS + REL_OPS and \
all(is_term_aux(c, seen) for c in a.children()))
return r
def is_term(a):
return is_term_aux(a, {})

The "text book" definitions of term, atom and literal used in first-order logic cannot be directly applied to Z3 expressions. In Z3, we allow expressions such as f(And(a, b)) > 0 and f(ForAll([x], g(x) == 0)), where f is a function from Boolean to Integer. This extensions do not increase the expressivity, but they are very convenient when writing problems. The SMT 2.0 standard also allows "term" if-then-else expressions. This is another feature that allows us to nest "formulas" inside "terms". Example: g(If(And(a, b), 1, 0)).
When implementing procedures that manipulate Z3 expressions, we sometimes need to distinguish between Boolean and non-Boolean expressions. In this case, a "term" is just an expression that does not have Boolean sort.
def is_term(a):
return not is_bool(a)
In other instances, we want to process the Boolean connectives (And, Or, ...) in a special way. For example, we are defining a CNF translator. In this case, we define an "atom" as any Boolean expression that is not a quantifier, is a (free) variable or an application that is not one of the Boolean connectives.
def is_atom(a):
return is_bool(a) and (is_var(a) or (is_app(a) and a.decl().kind() not in CONNECTIVE_OPS))
After we define a atom, a literal can be defined as:
def is_literal(a):
return is_atom(a) or (is_not(a) and is_atom(a.arg(0)))
Here is an example that demonstrates these functions (also available online at rise4fun):
x = Int('x')
p, q = Bools('p q')
f = Function('f', IntSort(), BoolSort())
g = Function('g', IntSort(), IntSort())
print is_literal(Not(x > 0))
print is_literal(f(x))
print is_atom(Not(x > 0))
print is_atom(f(x))
print is_atom(x)
print is_term(f(x))
print is_term(g(x))
print is_term(x)
print is_term(Var(1, IntSort()))

Python. Transform str('+') to mathematical operation

How transform str('+') to mathematical operation?
For example:
a = [0,1,2] # or a = ['0','1','2']
b = ['+','-','*']
c = int(a[0]+b[0]+a[1])
In other words, how transform str('-1*2') to int(), without for i in c: if i == '+': ...
Thanks.

You can also use the operator module:
import operator as op
#Create a mapping between the string and the operator:
ops = {'+': op.add, '-': op.sub, '*': op.mul}
a = [0,1,2]
b = ['+','-','*']
#use the mapping
c = ops[b[0]](a[0], a[1])

i thin you're looking for eval(), but i advice to use something else...
however,
>>> eval('-1*2')
-2
eval 'executes' the string you pass to it, like code. so it's quite dangerous for security, especially if the parameters are user input...
in this case i suggest to use parsing library, such as ply http://www.dabeaz.com/ply/
that for such thing is really simple to use and very effective :)

If your math expressions will fit Python syntax but you are skeered of eval (you should be) you can look into python's ast module (docs). It will parse the expression into an abstract syntax tree you can iterate over. You can evaluate a limited subset of Python and throw errors if you encounter anything outside your expression grammar.

You can read about http://en.wikipedia.org/wiki/Reverse_Polish_notation

Use eval like everyone else is saying but filter it first.
>>> s = '1 + 12 / 2 - 12*31'
>>> allowed = set(' 1234567890()+-/*\\')
>>> if allowed.issuperset(s):
... eval(s)
...
-365

Use eval:
> eval(str('-1*2'))
> -2

Dett,
A Simple eval on the whole string... However be aware that if the user inputs the string, eval is risky, unless you do some parsing first
x = '-1*2'
y = eval(x)
y will then be the integer value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursive regular expressions for defining syntax using 'fr' strings - python

Related

pyparsing optional parenthesis around an expression: pp.Opt(Suppress()) vs. nested_expr

Convert custom formula to python function [duplicate]

How to convert a Python expression into a string

Check if a formula is a term in Z3Py

Python. Transform str('+') to mathematical operation

Categories

Resources