What are `lexpr` and `ApplicationExpression` nltk?

What are `lexpr` and `ApplicationExpression` nltk? - python

What exactly does lexpr mean and what do the folloring r'/F x.x mean? Also what is Application Expression?
from nltk.sem.logic import *
lexpr = Expression.fromstring
zero = lexpr(r'\F x.x')
one = lexpr(r'\F x.F(x)')
two = lexpr(r'\F x.F(F(x))')
three = lexpr(r'\F x.F(F(F(x)))')
four = lexpr(r'\F x.F(F(F(F(x))))')
succ = lexpr(r'\N F x.F(N(F,x))')
plus = lexpr(r'\M N F x.M(F,N(F,x))')
mult = lexpr(r'\M N F.M(N(F))')
pred = lexpr(r'\N F x.(N(\G H.H(G(F)))(\u.x)(\u.u))')
v1 = ApplicationExpression(succ, zero).simplify()

See http://goo.gl/zog68k, nltk.sem.logic.Expression is:
"""This is the base abstract object for all logical expressions"""
There are many types of logical expressions implemented in nltk. See line 1124, the ApplicationExpression is:
This class is used to represent two related types of logical expressions.
The first is a Predicate Expression, such as "P(x,y)". A predicate expression is comprised of a FunctionVariableExpression or
ConstantExpression as the predicate and a list of Expressions as the arguments.
The second is a an application of one expression to another, such as
"(\x.dog(x))(fido)".
The reason Predicate Expressions are treated as Application Expressions is
that the Variable Expression predicate of the expression may be replaced
with another Expression, such as a LambdaExpression, which would mean that
the Predicate should be thought of as being applied to the arguments.
The logical expression reader will always curry arguments in a application expression.
So, "\x y.see(x,y)(john,mary)" will be represented internally as
"((\x y.(see(x))(y))(john))(mary)". This simplifies the internals since
there will always be exactly one argument in an application.
The str() method will usually print the curried forms of application
expressions. The one exception is when the the application expression is
really a predicate expression (ie, underlying function is an
AbstractVariableExpression). This means that the example from above
will be returned as "(\x y.see(x,y)(john))(mary)".
I'm not exactly an expert in formal logics but your code above is trying to declare a logical function variable x:
>>> from nltk.sem.logic import *
>>> lexpr = Expression.fromstring
>>> zero = lexpr(r'\F x.x')
>>> succ = lexpr(r'\N F x.F(N(F,x))')
>>> v1 = ApplicationExpression(succ, zero).simplify()
>>> v1
<LambdaExpression \F x.F(x)>
>>> print v1
\F x.F(x)
For a crash course, see http://theory.stanford.edu/~arbrad/slides/cs156/lec2-4.pdf and a nltk crash course to lambda expressions, see http://www.cs.utsa.edu/~bylander/cs5233/nltk-intro.pdf

You are looking at a small part of quite a complicted toolkit. I try to give some background from a bit of researching on the web below. Or you can just skip to the "direct answers" section if you like. I'll try to answer your question on the specific part you quote, but I am not an expert on either philosophical logic or natural language processing. The more I read about it, the less I seem to know, but I've included a load of hopefully useful references.
Description of tool / principles/ introduction
The code you've posted is a sub-series of the regression tests for the logic module of the Natural Language toolkit for python (NLTK). This toolkit is described in a fairly accessible academic paper here, seemingly written by the authors of the tool. It describes the motivation for the toolkit and writing the logic module - in a nutshell to help automate interpretation of natural language.
The code you've posted defines a number of logical forms (LFs as they are referred to in the paper I linked). LFs cover statements in First order predicate logic, combined with the lambda operator (i.e. first order lambda calculus). I will not attempt to completely describe First order predicate logic here. There's a tutorial on lambda calculus here.
The code comes from a set of regression tests (i.e. demonstrations that the toolbox works correctly on simple, known exmample tests) on the howto page, demonstrating how the toolbox can be demonstrated by using it to do simple arithmetic operations. They are an exact encoding of this approach to arithmetic via lambda calculus (Wikipedia link) in the nltk toolkit.
The first four are the first four numbers in lambda calculus (Church Encoding). The next four are arithmetic operators - succ (successor), plus (addition), mult (multiplication) and pred (division), You have not got the tests that go along with these, so at the moment, you simply have a number of LFs, followed by one example of Lambda calculus, combining two of these LFs (succ and zero) to get v1. as you have applied succ to zero, the result should be one - and that is what they test for on the howto page - i.e. v1 == one should evaluate True.
Direct answer to python bits
Lets go through the elements of the code you've posted one by one.
lexpr is the function that generates Logical EXPRessions - it is an alias for Expression.fromstring as lexpr = Expression.fromstring
It takes a string argument. The r before the string tells python to interpret it as a raw string literal. For the purposes of this question - that means that we don't have to escape the \ symbol
Within the Strings, \ is the lambda operator.
F denotes a function and x a bound variable in lambda calculus
The . or dot operator separates the bound function from the body of the expression / abstraction
So - to take the string you quote in the question:
r'/F x.x'
It is the Church Encoding of zero. Church encoding is pretty abstract and hard to get your head round. This tutorial might help - I think I'm starting to get it... Unfortunately the example you've chosen is zero and from what I can work out, this is a definition, rather than something you can derive. It can't be "evaluated to 0" in any meaningful sense. This is the simplest explanation I've found. I'm not in a position to comment on its rigour / correctness.
A Church numeral is a procedure that takes one argument, and that argument is itself another procedure that also takes one argument. The procedure zero represents the integer 0 by returning a procedure that applies its input procedure zero times
Finally, the ApplicationExpression is taking one expression and applying it to the other, in this case applying succ (succesor) to zero. This is, aptly, called an application in lambda calculus
EDIT:
Wrote all that and then found a book hidden on the nltk site - Chapter 10 is particularly applicable to this question, with this section describing lambda calculus.

Related

Parsing Python function declaration

In order to write a custom documentation generator for my Python code, I'd like to write a regular expression capable of matching to following:
def my_function(arg1,arg2,arg3):
"""
DOC
"""
My current problem is that, using the following regex:
def (.+)\((?:(\w+),)*(\w+)\)
I can only match my_function, arg2 and arg3 (according to Pythex.org).
I don't understand what I'm doing wrong, since my (?:(\w+),)* should match as many arguments as possible, until the last one (here arg3). Could anyone explain?
Thanks

This isn't possible in a general sense because Python functions are not regular expressions -- they can take on forms that can't be captured with regular expression syntax, especially in the context of OTHER Python structures. But take heart, there is a lot to learn from your question!
The fascinating thing is that although you said you're trying to learn regular expressions, you accidentally stumbled into the very heart of computer science itself, compiler theory!
I'm going to address less than a fraction of the tip of the iceberg in this post to help get you started, and then suggest a few free and paid resources to help you continue.
A python function by itself may take on several forms:
def foo(x):
"docstring"
<body>
def foo1(x):
"""doc
string"""
<body>
def foo2(x):
<body>
Additionally, what comes before and after the function may not be another function!
This is what would make using a regex by itself impossible to autogenerate documentation (well, not possible for me. I'm not smart enough to write a single regular expression than can account for the entire Python language!).
What you need to look into is parsing (by the way, I'm using the term parsing very loosely to cover parsing, tokenizing, and lexing just to keep things "simple") Using regular expressions are typically a very important part of parsing.
The general strategy would be to parse the file into syntactic constructs. Identify which of those constructs are functions. Isolate the text of that function. THEN you can use a regular expression to parse the text of that construct. OR you can parse one level further and break up the function into distinct syntactic constructions -- function name, parameter declaration, doc string, body, etc... at which point your problem will be solved.
I was attempting to write a regular expression for a standard function definition (without parsing) like foo or foo1 but I was struggling to do-so even having written a few languages.
So just to be clear, the point that I would think about parsing as opposed to simple regex is any time your input spans multiple lines. Regex is most effective on single lines.
A parsing function looks like this:
def parse_fn_definition(definition):
def parse_name(lines):
<code>
def parse_args(lines):
<code>
def parse_doc(lines):
<code>
def parse_body(lines):
<code>
...
Now here's the real trick: Each of these parse functions returns two things:
0) The chunk of parsed regex
1) The REST of line
so for instance,
def parse_name(lines):
pattern = 'def\s*?([a-zA-Z_][a-zA-Z0-9_]*?)'
for line in lines:
m = re.match(pattern, line)
if m:
res, rest = m.groups()
return res, [rest] + lines
else:
raise Exception("Line cannot be parsed by parse_name: {}".format(line))
So, once you've isolated the function text (that's a whole other set of tricks to do, usually involves creating something called a "grammar" -- don't worry, I set you up with some resources down below), you can parse the function text with the following technique:
def parse_fn(lines_of_text):
name, rest = parse_name(lines_of_text)
params, rest = parse_params(rest)
doc_string, rest = parse_doc(rest)
body, rest = parse_body(rest)
function = [name, params, doc_string, body]
res = function, rest
return res
This function would return some data structure that represents the function (I just used a simple list for illustration) and the rest of the lines of text. That would get passed on to something that will appropriately catalog your function data and then classify and process the rest of the text!
Anyway, if this is something that interests you, don't give up! I would offer a few humble suggestions:
1) Start with an EASIER language to parse, like Scheme/LISP. These languages were designed to be easy to parse and manipulate! Then work your way up to more irregular languages.
2a) Peter Norvig has done some amazing and very accessible work on this. Check out Lispy!
2b) Peter Norvig's class CS212 (specifically unit 3 code) is very challenging but does an excellent job introducing fundamental language design concepts. Every job I've ever gotten, and my love for programming, is because of that course.
3) If you want to advance yourself even further and you can afford it, I would strongly recommend checking out Dave Beazely's workshops on compilers or interpreters. I've taken two courses from Dave, and while I can't promise this for everyone, my salary has literally doubled after each course, so I think it's a worthwhile investment.
4) Absolutely check out Structure and Interpretation of Computer Programs (the wizard book) and Compilers (the dragon book). They'll change your life.
5) DON'T GIVE UP! YOU GOT THIS!! Good luck to you!

How to change priority in math order(asterisk)

I want users to input math formula in my system. How can convert case1 formula to case2 formula using Python? In another word, I would like to change math order specifically for double asterisks.
#case1
3*2**3**2*5
>>>7680
#case2
3*(2**3)**2*5
>>>960

Not only is this not something that Python supports, but really, why would you want to? Modifying BIDMAS or PEMDAS (depending on your location), would not only give you incorrect answers, but also confuse the hell out of any devs looking at the code.
Just use brackets like in Case 2, it's what they're for.

If users are supposed to enter formulas into your program, I would suggest keeping it as is. The reason is that exponentiation in mathematics is right-associative, meaning the execution goes from the top level down. For example: a**b**c = a**(b**c), by convention.
There are some programs that use bottom-up resolution of the stacked exponentiation -- MS Excel and LibreOffice are some of them, however, it is against the regular convention, and always confused the hell out of me.
If you would like to override this behavior, and still be mathematically correct, you have to use brackets.
You can always declare your own power method that would resolve the way you want it -- something like numpy.pow(). You could overload the built-in, but that's too much hastle.
Read this

Below is the example to achieve this using re as:
expression = '3*2**3**2*5'
asterisk_exprs = re.findall(r"\d+\*\*\d+", expression) # List of all expressions matching the criterion
for expr in asterisk_exprs:
expression = expression.replace(expr, "({})".format(expr)) # Replace the expression with required expression
# Value of variable 'expression': '3*(2**3)**2*5'
In order to evaluate the mathematical value of str expression, use eval as:
>>> eval(expression)
960

How to apply Morgan's law to parsed string? (transforming the string or with parseactions)

I am trying to do a program that evaluates if a propositional logic formula is valid or invalid using the semantic three method.
I managed to evaluate if a formula is well formed or not so far:
from pyparsing import *
from string import lowercase
def fbf():
atom = Word(lowercase, max=1) #alfabeto minusculas
op = oneOf('^ V => <=>') #Operadores
identOp = oneOf('( [ {')
identCl = oneOf(') ] }')
form = Forward() #Iniciar de manera recursiva
#Gramatica
form << ( (Group(Literal('~') + form)) | ( Group(identOp + form + op + form + identCl) ) | ( Group(identOp + form + identCl) ) | (atom) )
return form
#Haciendo todo lo que se debe
entrada = raw_input("Entrada: ")
try:
print fbf().parseString(entrada, parseAll=True)
except ParseException as error: #Manejando error
print error.markInputline()
print error
print
Now I need to convert the negated forumla ~(form) acording to the Monrgan's Law, The BNF of Morgan's Law its something like this:
~((form) V (form)) = (~(form) ^ ~(form))
~((form) ^ (form)) = (~(form) V ~(form))
http://en.wikipedia.org/wiki/De_Morgans_laws
Parsing must be recursive; I was reading about Parseactions, but I don't really understand I'm new to python and very unskilled.
Can somebody help me on how to get this to work?

Juan Jose -
You are asking for a lot of work on the part of this audience, whether you realize it or not. Here are some suggestions on how to make progress on this problem:
Recognize that parsing the input is only the first step in this overall program. You can't just write any parser that gets through the input, and then declare yourself ready for the next step. You need to anticipate what you will do with the parsed output, and try to parse the data in such a way that it readies you to take the next step - which in your case is to do some logical transformations to apply DeMorgans Laws. In fact, you may be best off working backwards - assume you have a parser, what would you need your transformation code to work with, how would an expression look, and how would you perform the transform itself? This will naturally structure your thinking to the application domain, and give you a target result format when you start writing the parser.
When you start to write your parser, look at other pyparsing examples that do similar tasks, such as SimpleBool.py on the pyparsing wiki. See how they parse the input to create a set of evaluatable objects, which can then be acted upon in the application domain (whether it is to evaluate them, transform them, or whatever). Think about what kind of objects you want to create in your parser that will work with the transformation methods you outlined in the last step.
Take time to write a BNF for the syntax you will parse. Write out some sample test strings that you would parse to help you anticipate syntax issues. Is "~~p ^ q V r" a valid string? Can identifiers be multiple characters, or will you restrict to just single characters (single will be easier to work with at the beginning, and you can expand it later easily)? Keep your syntax simple if you can, such as just supporting ()'s for grouping, instead of any matched pair of ()'s, []'s, or {}'s.
When you implement your parser, start with simple test cases first and work your way up. You may have to backtrack a bit if you find that you made some assumptions early on that more complicated strings don't support, but that's pretty typical for most programming projects.
As an implementation tip, read up on using the operatorPrecedence helper, as it is specifically designed for these types of parsing jobs. Look at how it is used in SimpleBool.py to create an object hierarchy that mirrors the structure of the input string. Then think about what objects would do in your transformation process.
Good luck!

Safe expression parser in Python

How can I allow users to execute mathematical expressions in a safe way?
Do I need to write a full parser?
Is there something like ast.literal_eval(), but for expressions?

The examples provided with Pyparsing include several expression parsers:
https://github.com/pyparsing/pyparsing/blob/master/examples/fourFn.py is a conventional arithmetic infix notation parser/evaluator implementation using pyparsing. (Despite its name, this actually does 5-function arithmetic, plus several trig functions.)
https://github.com/pyparsing/pyparsing/blob/master/examples/simpleBool.py is a boolean infix notation parser/evaluator, using a pyparsing helper method operatorPrecedence, which simplifies the definition of infix operator notations.
https://github.com/pyparsing/pyparsing/blob/master/examples/simpleArith.py and https://github.com/pyparsing/pyparsing/blob/master/examples/eval_arith.py recast fourFn.py using operatorPrecedence. The first just parses and returns a parse tree; the second adds evaluation logic.
If you want a more pre-packaged solution, look at plusminus, a pyparsing-based extensible arithmetic parsing package.

What sort of expressions do you want? Variable assignment? Function evaluation?
SymPy aims to become a full-fledged Python CAS.

Few weeks ago I did similar thing, but for logical expressions (or, and, not, comparisons, parentheses etc.). I did this using Ply parser. I have created simple lexer and parser. Parser generated AST tree that was later use to perform calculations. Doing this in that way allow you to fully control what user enter, because only expressions that are compatible with grammar will be parsed.

Yes. Even if there were an equivalent of ast.literal_eval() for expressions, a Python expression can be lots of things other than just a pure mathematical expression, for example an arbitrary function call.
It wouldn't surprise me if there's already a good mathematical expression parser/evaluator available out there in some open-source module, but if not, it's pretty easy to write one of your own.

maths functions will consist of numeric and punctuation characters, possible 'E' or 'e' if you allow scientific notation for rational numbers, and the only (other) legal use of alpha characters will be if you allow/provide specific maths functions (e.g. stddev). So, should be trivial to run along the string for alpha characters and check the next little bit isn't suspicious, then simply eval the string in a try/except block.
Re the comments this reply has received... I agree this approach is playing with fire. Still, that doesn't mean it can't be done safely. I'm new to python (< 2 months), so may not know the workarounds to which this is vulnerable (and of course a new Python version could always render the code unsafe in the future), but - for what little it's worth (mainly my own amusement) - here's my crack at it:
def evalMaths(s):
i = 0
while i < len(s):
while s[i].isalpha() and i < len(s):
idn += s[i]
i += 1
if (idn and idn != 'e' and idn != 'abs' and idn != 'round'):
raise Exception("you naughty boy: don't " + repr(idn))
else:
i += 1
return eval(s)
I would be very interested to hear if/how it can be circumvented... (^_^) BTW / I know you can call functions like abs2783 or _983 - if they existed, but they won't. I mean something practical.
In fact, if anyone can do so, I'll create a question with 200 bounty and accept their answer.

Function Parser with RegEx in Python

I have a source code in Fortran (almost irrelevant) and I want to parse the function names and arguments.
eg using
(\w+)\([^\(\)]+\)
with
a(b(1 + 2 * 2), c(3,4))
I get the following: (as expected)
b, 1 + 2 * 2
c, 3,4
where I would need
a, b(1 + 2 * 2), c(3,4)
b, 1 + 2 * 2
c, 3,4
Any suggestions?
Thanks for your time...

It can be done with regular expressions-- use them to tokenize the string, and work with the tokens. i.e. see re.Scanner. Alternatively, just use pyparsing.

This is a nonlinear grammar -- you need to be able to recurse on a set of allowed rules. Look at pyparsing to do simple CFG (Context Free Grammar) parsing via readable specifications.
It's been a while since I've written out CFGs, and I'm probably rusty, so I'll refer you to the Python EBNF to get an idea of how you can construct one for a subset of a language syntax.
Edit: If the example will always be simple, you can code a small state machine class/function that iterates over the tokenized input string, as #Devin Jeanpierre suggests.

You can take a look at PLY (Python Lex-Yacc), it's (in my opinion) very simple to use and well documented, and it comes with a calculator example which could be a good starting point.

I don't think this is a job for regular expressions... they can't really handle nested patterns.
This is because regexes are compiled into FSMs (Finite State Machines). In order to parse arbitrarily nested expressions, you can't use a FSM, because you need infinitely many states to keep track of the arbitrary nesting. Also see this SO thread.

You can't do this with regular expression only. It's sort of recursive. You should match first the most external function and its arguments, print the name of the function, then do the same (match the function name, then its arguments) with all its arguments. Regex alone are not enough.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.