Algorithm for parsing expressions in python?

Algorithm for parsing expressions in python? - python

I have next algorithm for parsing expressions in Python:
def parse(strinput):
for operator in ["+-", "*/"]:
depth = 0
for p in range(len(strinput) - 1, -1, -1):
if strinput[p] == ')': depth += 1
elif strinput[p] == '(': depth -= 1
elif depth==0 and strinput[p] in operator:
# strinput is a compound expression
return (strinput[p], parse(strinput[:p]), parse(strinput[p+1:]))
strinput = strinput.strip()
if strinput[0] == '(':
# strinput is a parenthesized expression?
return parse(strinput[1:-1])
# strinput is an atom!
return strinput
(it can be found here: http://news.ycombinator.com/item?id=284842)
I have hard time understanding it, since I don't find Python docs very helpful for this situation. Can someone tell me what line: for operator in ["+-", "*/"]:means?
I know it's structure like for each string variable which is operator in array of this 2 elements, but why is it written like this ["+-, */"]? How does Python separate it? In first iteration, operator is "+-"?
Any help would mean a lot. Thanks

You're correct; for operator in ["+-", "*/"]: means operator will be "+-" the first time through and "*/" the second time through the loop.
Notice how later it checks if strinput[p] in operator. Python treats a string as a list of characters, so this expression will only be true if strinput[p] is equal to "+" or "-" on the first time through and "*" or "/" the second time through.
(The reason they do this is for order of operations- "+" and "-" get equal but lower precedence to "*" and "/")

Related

Function that is made to remove round brackets does not work, Python 3

I have a function that takes in an argument, preferably a string, takes each value of the string and implements them as elements in a list. After that, it iterate's through the list and is supposed to delete/remove elements that are round brackets, so basically, these: ( ). Here is the code:
def func(s):
n = 0
s = [i for i in s]
for i in s:
if s[n] == "(" or s[n] == ")":
del s[n]
else:
n += 1
continue
return s
print(func("ubib0_)IUBi(biub()()()9uibib()((U*H)9g)*(GB(uG(*UV(V79V*&^&87vyutgivugyrxerdtufcviO)()(()()()(0()90Y*(g780(&*^(UV(08U970u9yUV())))))))))"))
However, the function stops the iteration and ends/returns the list early (when some round brackets are still there).
I also went with another way, a way that works:
def func(s):
n = 0
s = [i for i in s]
s2 = [i for i in s if i != "(" and i != ")"]
return s2
print(func("ubib0_)IUBi(biub()()()9uibib()((U*H)9g)*(GB(uG(*UV(V79V*&^&87vyutgivugyrxerdtufcviO)()(()()()(0()90Y*(g780(&*^(UV(08U970u9yUV())))))))))"))
Why does this work while the other way doesn't? They like they'd output the same result.
What am I doing wrong in the first example?

Your concept is correct, in that you either delete the current item or increment n.
Where you've gone wrong is that you're iterating over each letter which doesn't make sense given the above info. Changing for i in s to while n < len(s) will fix the problem.
A couple of things you may find useful:
list(s) looks cleaner than [i for i in s]
i not in "()" is another way to write i != "(" and i != ")"

At the beginning when you're increasing n, n equals to i. But when you meet a bracket, n has the same value the next iteration, and i increases. It happens every time s[n] == "(" or s[n] == ")" and the difference between n's and i's values increases.
To work correctly you program needs to check every symbol in the list (string) for equality of either '(' or ')' using s[n], but it doesn't happen because the iteration stops when i achieves the end of the list and n at that time is much less than i and it hasn't achieved the end of the list yet and hasn't checked all symbols.

How to deduce operator precedence from a string input in python?

I am a beginner to Python. I have tried learning python and C++ in the past had learnt about like classes and stuff but then had to abandon it for reasons, now I am learning python from the beginning as I have forgotten it all.
So I was trying to make a calculator like the ones you have in mobile using python but without any GUI. Now, the problem I am having right now is, see in your mobile calculator you can do one operation after the other i.e. say you typed 95+8x2, that mobile calculator will have no problem in deducing operator precedence from your input and give result as 111 and I am trying to do something similar.
But the problem is, the way I know how to do it as a beginner would require a lot of code, it would not be complex but get too long and hence a lot of time wasted. Here is how I have thought of doing it right now :
Find the location of each of the operators in the input for that I am using their indexes i.e. I have named the input as 'alg_operation' and for example, I am using alg_operation.find('*') to where the multiplaction operator is, I am calling this index as location_prod using this I am able to calculate the product simply via converting the part of the string that comes before the operator into float then multiply it with the other part that comes just after (obviously converting it into float as well).
After finding the location of each of the 5 operators that I have decided to include i.e. exponentiation, multiplication, division (// not /), addition and subtraction, I would have to write code for 120 different cases of the arrangement of these operators, which may not be complex but definitely will take a lot of time.
How can I quickly deduce operator precedence from the string input ?
I will update this post if I learn anything new, since I am a beginner to programming.

You can indeed evaluate an arbitrary python expression with eval. Use of eval is a pretty big code smell, because it lets you execute anything, but for completeness it would be done like this:
expr = input("Please, please, please only write maths: ")
print(eval(exp))
note that you could type import shutil; shutil.rmtree("/home") and python would cheerfully run it. So obviously we don't want to do this.
We could try to protect ourselves by sanitising the input. In this case this might actually work, with something like:
safe_chars = (".", "*", "+", "/", "-"," ", *range(10))
if not all(x in safe_chars for x in expr):
raise ValueError("You tried to enter dangerous data!")
I can't immediately think of any way to do anything dangerous with input consisting only of those chars, but doubtless someone will point it out immediately in the comments [in which case I'll add it here]. More generally, sanitising data like this is hard, because in order to know what's safe you really need to understand the input, by which point you've just written a parser.
Please do note that eval is inherently dangerous. It can be a useful hack for once-off code, although even then... and it is of course useful when you actually want to evaluate python code.

Converting it to reverse polish notation will solve your problem
def readNumber(string: str,index: int):
number = ""
while index < len(string) and isNumber(string[index]):
number += string[index]
index += 1
return float(number), index
def isOperator(token):
operators = ['+', '-', '*', '/', '%']
return token in operators
def isNumber(token):
return (token >= '0' and token <= '9') or token == "."
def toRpn(string: str):
"""
Converts infix notation to reverse polish notation
"""
precedence = {
'(': 0,
'-': 1,
'+': 1,
'*': 2,
'/': 2,
'%': 2,
}
i = 0
fin = []
ops = []
while i < len(string):
token = string[i]
if isNumber(token):
number, i = readNumber(string,i)
fin.append(number)
continue
if isOperator(token):
top = ops[-1] if ops else None
if top is not None and precedence[top] >= precedence[token]:
fin.append(ops.pop())
ops.append(token)
i += 1
continue
if token == '(':
ops.append(token)
i += 1
continue
if token == ')':
while True:
operator = ops.pop()
if operator == '(':
break
fin.append(operator)
if not ops:
break
i += 1
continue
i += 1
while ops:
fin.append(ops.pop())
return fin
def calculate_rpn(rpn: list):
"""
Calculates the result of an expression in reverse polish notation
"""
stack = []
for token in rpn:
if isOperator(token):
a = stack.pop()
b = stack.pop()
if token == '+':
stack.append(b + a)
elif token == '-':
stack.append(b - a)
elif token == '*':
stack.append(b * a)
elif token == '/':
stack.append(b / a)
elif token == '%':
stack.append(b % a)
else:
stack.append(token)
return stack[0]
print ("90165: ", calculate_rpn(toRpn("27+38+81+48*33*53+91*53+82*14+96")))
print ("616222: ", calculate_rpn(toRpn("22*26*53+66*8+7*76*25*44+78+100")))
print (calculate_rpn(toRpn("(22+4)*4")))
My Github
You can easily add more operators and their precedence if you want.
You have to modify the precedence array and the isOperator function. Also you should modify the function of the respective operator in the calculate_rpn function.

Covert a string expression to numerical value in python

Recently, I got an interview question which says to convert string expressions like "1+2-3" and "-2+4" to 0 and 2 respectively. Assuming the inputs are single digits numbers followed by signs and no NULL input. I tried this output but the interviewer said I am close but not perfect solution. Please help me here. Thanks.
def ans(input):
result, j = 0, 0
for i in input:
if i == '+' or i == '-':
j = i
else:
i = int(i)
result = result j i
return result
ans("1+2-3")
ans("-2+4")
I am making some silly mistake but I am learning. Thanks in advance.

Two things need fixing to work at all:
You need to handle the initial value properly; when the initial value is non-negative, this fails. Before the loop, set j = '+' so a non-sign prefixed value is added (also, for style points, j is a terrible name, could you use op or something?).
You can't use variables as operators.
Replace:
result = result j i
with:
if j == '+':
result += i
else:
result -= i
Note: If modules are allowed, a generalization can be used to handle operators the "nice" way (though more work would be needed to obey operator precedence). You'd define:
import operator
ops = {'+': operator.add, '-': operator.sub, ...}
then make the initial value of op operator.add and change the test for operators to:
if i in ops:
op = ops[i]
else:
result = op(result, int(i))
which scales to many more operators, dynamically selecting the operation to perform without cascading if/elif checks.
Side-note: While violating the spirit of the challenge, ast.literal_eval (at least as of Python 3.5, and this may change, see bug #22525) will actually safely parse strings like this (eval is unsafe, since it can execute arbitrary code, but ast.literal_eval can only parse Python literals and apparently some basic compile-time math). So you could just do:
import ast
ans = ast.literal_eval
Sure, it handles many other literals too, but we never defined the failure case behavior anyway. :-)

Using eval() is the simplest solution. Like
eval("1+2-3")
The following code give another solution without using built-in eval
import operator
class Parse(object):
def __init__(self, input):
self.input = input
self.pos = 0
self.end = len(input)
def eval(self):
result = self.match_digits()
while self.pos < self.end:
op = self.match_operator()
operand = self.match_digits()
result = op(result, operand)
return result
def match_operator(self):
look_ahead = self.input[self.pos]
self.advance()
return operator.add if look_ahead == '+' else operator.sub
def match_digits(self):
look_ahead = self.input[self.pos]
positive = 1
if look_ahead == '-':
positive = -1
self.advance()
digits, s = 0, self.pos
while s < self.end and self.input[s].isdigit():
digits = digits * 10 + int(self.input[s])
s += 1
self.advance(s-self.pos)
return digits * positive
def advance(self, offset=1):
self.pos += offset
For testing
p = Parse(input='2+1+0-3')
print p.eval()
p = Parse(input='-2+-13+3')
print p.eval()

I think the most flexible solution (not using eval and able to handle any operations) is to parse the string into a binary (red-black) tree, where leafs are numbers and branches operators (+,-,/,*,etc).
For example, "1+(5*12)/17" would be parsed into following structure:
"+"
/ \
1 "/"
/ \
"()" 17
/
"*"
/ \
5 12
Once you've parsed a string into this structure, it's easy to compute by traversing branches depth-first, right to left.
If you need to handle variables, then you'd have to get locals() and replace accordingly, either as you parse the string, or as you traverse the tree.
EDIT:
I created a working example to illustrate this, you can find the source on github: https://github.com/MJWunderlich/py-math-expression-evaluator

what about:
def f(s):
s = s.strip()
for i, c in enumerate(s):
if c == '+':
return f(s[:i]) + f(s[i+1:])
if c == '-':
return f(s[:i]) - f(s[i+1:])
for i, c in enumerate(s):
if c == '*':
return f(s[:i]) * f(s[i+1:])
if c == '/':
return f(s[:i]) / f(s[i+1:])
return 0 if s == '' else int(s)
? Doesn't work with parenthesis

string match algorithm code for advice

Debugging the following problem, post problem and code reference I am debugging. My question is, I think this if condition check if not necessary, and could be removed safely? If I am wrong, please feel free to correct me. Thanks.
if len(first) > 1 and first[0] == '*' and len(second) == 0:
return False
Given two strings where first string may contain wild card characters and second string is a normal string. Write a function that returns true if the two strings match. The following are allowed wild card characters in first string.
* --> Matches with 0 or more instances of any character or set of characters.
? --> Matches with any one character.
For example, g*ks matches with geeks match. And string ge?ks* matches with geeksforgeeks (note * at the end of first string). But g*k doesn’t match with gee as character k is not present in second string.
# Python program to match wild card characters
# The main function that checks if two given strings match.
# The first string may contain wildcard characters
def match(first, second):
# If we reach at the end of both strings, we are done
if len(first) == 0 and len(second) == 0:
return True
# Make sure that the characters after '*' are present
# in second string. This function assumes that the first
# string will not contain two consecutive '*'
if len(first) > 1 and first[0] == '*' and len(second) == 0:
return False
# If the first string contains '?', or current characters
# of both strings match
if (len(first) > 1 and first[0] == '?') or (len(first) != 0
and len(second) !=0 and first[0] == second[0]):
return match(first[1:],second[1:]);
# If there is *, then there are two possibilities
# a) We consider current character of second string
# b) We ignore current character of second string.
if len(first) !=0 and first[0] == '*':
return match(first[1:],second) or match(first,second[1:])
return False
thanks in advance,
Lin

That if statement is critical to the proper operation of the function. Removing it will have disastrous consequences.
For example, assume that first="*a" and second="". In other words, the function was called as match("*a",""). Then the if statement will cause the function to return False (which is correct since there is no a in second). Without the if statement, the code will proceed to the line
return match(first[1:],second) or match(first,second[1:])
The call match(first[1:],second) will evaluate to match("a","") which will return False. But when the code calls match(first,second[1:]), the call is equivalent to match("*a",""), and the result is infinite recursion.

Python reversing a string using recursion

I want to use recursion to reverse a string in python so it displays the characters backwards (i.e "Hello" will become "olleh"/"o l l e h".
I wrote one that does it iteratively:
def Reverse( s ):
result = ""
n = 0
start = 0
while ( s[n:] != "" ):
while ( s[n:] != "" and s[n] != ' ' ):
n = n + 1
result = s[ start: n ] + " " + result
start = n
return result
But how exactly do I do this recursively? I am confused on this part, especially because I don't work with python and recursion much.
Any help would be appreciated.

def rreverse(s):
if s == "":
return s
else:
return rreverse(s[1:]) + s[0]
(Very few people do heavy recursive processing in Python, the language wasn't designed for it.)

To solve a problem recursively, find a trivial case that is easy to solve, and figure out how to get to that trivial case by breaking the problem down into simpler and simpler versions of itself.
What is the first thing you do in reversing a string? Literally the first thing? You get the last character of the string, right?
So the reverse of a string is the last character, followed by the reverse of everything but the last character, which is where the recursion comes in. The last character of a string can be written as x[-1] while everything but the last character is x[:-1].
Now, how do you "bottom out"? That is, what is the trivial case you can solve without recursion? One answer is the one-character string, which is the same forward and reversed. So if you get a one-character string, you are done.
But the empty string is even more trivial, and someone might actually pass that in to your function, so we should probably use that instead. A one-character string can, after all, also be broken down into the last character and everything but the last character; it's just that everything but the last character is the empty string. So if we handle the empty string by just returning it, we're set.
Put it all together and you get:
def backward(text):
if text == "":
return text
else:
return text[-1] + backward(text[:-1])
Or in one line:
backward = lambda t: t[-1] + backward(t[:-1]) if t else t
As others have pointed out, this is not the way you would usually do this in Python. An iterative solution is going to be faster, and using slicing to do it is going to be faster still.
Additionally, Python imposes a limit on stack size, and there's no tail call optimization, so a recursive solution would be limited to reversing strings of only about a thousand characters. You can increase Python's stack size, but there would still be a fixed limit, while other solutions can always handle a string of any length.

I just want to add some explanations based on Fred Foo's answer.
Let's say we have a string called 'abc', and we want to return its reverse which should be 'cba'.
def reverse(s):
if s == "":
return s
else:
return reverse(s[1:]) + s[0]
s = "abc"
print (reverse(s))
How this code works is that:
when we call the function
reverse('abc') #s = abc
=reverse('bc') + 'a' #s[1:] = bc s[0] = a
=reverse('c') + 'b' + 'a' #s[1:] = c s[0] = a
=reverse('') + 'c' + 'b' + 'a'
='cba'

If this isn't just a homework question and you're actually trying to reverse a string for some greater goal, just do s[::-1].

def reverse_string(s):
if s: return s[-1] + reverse_string(s[0:-1])
else: return s
or
def reverse_string(s):
return s[-1] + reverse_string(s[0:-1]) if s else s

I know it's too late to answer original question and there are multiple better ways which are answered here already. My answer is for documentation purpose in case someone is trying to implement tail recursion for string reversal.
def tail_rev(in_string,rev_string):
if in_string=='':
return rev_string
else:
rev_string+=in_string[-1]
return tail_rev(in_string[:-1],rev_string)
in_string=input("Enter String: ")
rev_string=tail_rev(in_string,'')
print(f"Reverse of {in_string} is {rev_string}")

s = input("Enter your string: ")
def rev(s):
if len(s) == 1:
print(s[0])
exit()
else:
#print the last char in string
#end="" prints all chars in string on same line
print(s[-1], end="")
"""Next line replaces whole string with same
string, but with 1 char less"""
return rev(s.replace(s, s[:-1]))
rev(s)

if you do not want to return response than you can use this solution. This question is part of LeetCode.
class Solution:
i = 0
def reverseString(self, s: List[str]) -> None:
"""
Do not return anything, modify s in-place instead.
"""
if self.i >= (len(s)//2):
return
s[self.i], s[len(s)-self.i-1] = s[len(s)-self.i-1], s[self.i]
self.i += 1
self.reverseString(s)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.