Pythonic way to separate operators and operands in an expression - python

I am trying to separate the operators (including parentheses) and the operands in an expression. For example given an expression
expr = "(32+54)*342-(4*(3-9))"
I am trying to get
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']
Here is the code that I wrote. Is there a better way of doing it in python.
l = list(expr)
n = ''
expr = []
try:
for c in l:
if c in string.digits:
n += c
else:
if n != '':
expr.append(n)
n = ''
expr.append(c)
finally:
if n != '':
expr.append(n)

We can do this with re.split():
>>> import re
>>> expr = "(32+54)*342-(4*(3-9))"
>>> re.split("([-()+*/])", expr)
['', '(', '32', '+', '54', ')', '', '*', '342', '-', '', '(', '4', '*', '', '(', '3', '-', '9', ')', '', ')', '']
This does insert some empty strings, but these can probably be handled or stripped out trivially enough. E.g with a list comprehension:
>>> [part for part in re.split("([-()+*/])", expr) if part]
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']

If you are only trying to tokenize the stream, your approach is fine, but somewhat old-fashioned. You can use a regular expression, to split the tokens more easily.
However, if you also want to do something with the tokens (such as evaluate them) then I suggest you look at a parsing module that can handle recursion (regular expressions cannot handle recursion), such as pyparsing.

Python: Batteries Included.
>>> [x[1] for x in tokenize.generate_tokens(StringIO.StringIO('(32+54)*342-(4*(3-9))').readline)]
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')', '']

>>> if True:
exp=[]
expr = "(32+54)*342-(4*(3-9))"
flag=False
for i in expr:
if i.isdigit() and flag:
exp.append(str(exp.pop(len(exp)-1))+i)
elif i.isdigit():
flag=True
exp.append(i)
else:
flag=False
exp.append(i)
print(exp)
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']
>>>

Related

How to split duplicated separator in Python

I have a string with the format
exp = '(( 200 + (4 * 3.14)) / ( 2 ** 3 ))'
I would like to separate the string into tokens by using re.split() and include the separators as well. However, I am not able to split ** together and eventually being split by * instead.
This is my code: tokens = re.split(r'([+|-|**?|/|(|)])',exp)
My Output (wrong):
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '*', '*', '3', ')', ')']
I would like to ask is there a way for me to split the separators between * and **? Thank you so much!
Desired Output:
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '**', '3', ')', ')']
Using the [...] notation only allows you to specify individual characters. To get variable sized alternate patterns you need to use the | operator outside of these brackets. This also means that you need to escape the regular expression operators and that you need to place the longer patterns before the shorter ones (i.e. ** before *)
tokens = re.split(r'(\*\*|\*|\+|\-|/|\(|\))',exp)
or even shorter:
tokens = re.split(r'(\*\*|[*+-/()])',exp)

re.split on multiple characters (and maintaining the characters) produces a list containing also empty strings

I need to split a mathematical expression based on the delimiters. The delimiters are (, ), +, -, *, /, ^ and space. I came up with the following regular expression
"([\\s\\(\\)\\-\\+\\*/\\^])"
which also keeps the delimiters in the resulting list (which is what I want), but it also produces empty strings "" elements, which I don't want. I hardly ever use regular expression (unfortunately), so I am not sure if it is possible to avoid this.
Here's an example of the problem:
>>> import re
>>> e = "((12*x^3+4 * 3)*3)"
>>> re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e)
['', '(', '', '(', '12', '*', 'x', '^', '3', '+', '4',
' ', '', ' ', '', ' ', '', '*', '', ' ', '3', ')', '', '*', '3', ')', '']
Is there a way to not produce those empty strings, maybe by modifying my regular expression? Of course I can remove them using for example filter, but the idea would be not to produce them at all.
Edit
I would also need to not include spaces. If you can help also in that matter, it would be great.
You could add \w+, remove the \s and do a findall:
import re
e = "((12*x^3+44 * 3)*3)"
print re.findall("(\w+|[()\-+*/^])", e)
Output:
['(', '(', '12', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
Depending on what you want you can change the regex:
e = "((12a*x^3+44 * 3)*3)"
print re.findall("(\d+|[a-z()\-+*/^])", e)
print re.findall("(\w+|[()\-+*/^])", e)
The first considers 12a to be two strings the latter one:
['(', '(', '12', 'a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
['(', '(', '12a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
Just strip/filter them out in a comprehension.
result = [item for item in re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e) if item.strip()]

How to split the integers and Operators characters from string in python?

I want to split the string into integers and operators for doing Infix expression evaluation in python.
Here is my string:
>>> s = (1-2+3)*5+10/2
I tried this to split:
>>>list(s)
['(', '1', '-', '2', '+', '3', ')', '*', '5', '+', '1', '0', '/', '2']
This is wrong. Since '10' is splitted into '1','0'
I tried alternative:
>>> re.findall('[+-/*//()]+|\d+',s)
['(', '1', '-', '2', '+', '3', ')*', '5', '+', '10', '/', '2']
This is also went wrong. Since ')*' should be splitted into ')', '*'
Could you help to split the operators and integers from the given expression?
This is not the best solution for infix. Remove the + after [] like:
import re
s = "(1-2+3)*5+10/2"
print re.findall('[+-/*//()]|\d+',s)
['(', '1', '-', '2', '+', '3', ')', '*', '5', '+', '10', '/', '2']
Try the following link for correct solution: Simple Balanced Parentheses
from pythonds.basic.stack import Stack
def postfixEval(postfixExpr):
operandStack = Stack()
tokenList = postfixExpr.split()
for token in tokenList:
if token in "0123456789":
operandStack.push(int(token))
else:
operand2 = operandStack.pop()
operand1 = operandStack.pop()
result = doMath(token,operand1,operand2)
operandStack.push(result)
return operandStack.pop()
def doMath(op, op1, op2):
if op == "*":
return op1 * op2
elif op == "/":
return op1 / op2
elif op == "+":
return op1 + op2
else:
return op1 - op2
print(postfixEval('7 8 + 3 2 + /'))
Keep in mind that this is a postfix implementation and its just for example. Do the infix by yourself and if you have any difficulties just ask.
Try
re.findall('[+-/*//()]|\d+',s)
You don't need the +, since you only want to have one special sign.
Using split:
print filter(lambda x: x, re.split(r'([-+*/()])|\s+', s))
If you can avoid regular expressions, you can try an iterative solution (just a rough code):
s = "(1-2+3)*5+10/2"
numbers = "0123456789."
def split_operators(s):
l = []
last_number = ""
for c in s:
if c in numbers:
last_number += c
else:
if last_number:
l.append(last_number)
last_number = ""
if c:
l.append(c)
if last_number:
l.append(last_number)
return l
print split_operators(s)
result:
['(', '1', '-', '2', '+', '3', ')', '*', '5', '+', '10', '/', '2']

Extracting expression

I have a expression and I want to extract it in python 2.6. Here is the example:
[a]+[c]*0.6/[b]-([a]-[f]*0.9)
this going to:
(
'[a]',
'+',
'[c]',
'*',
'0.6',
'/',
'[b]',
'-',
'(',
'[a]',
'-',
'[f]',
'*',
'0.9',
')',
)
I need it a list. Please give me a hand. Thanks.
>>> import re
>>> expr = '[a]+[c]*0.6/[b]-([a]-[f]*0.9)'
>>> re.findall('(?:\[.*?\])|(?:\d+\.*\d*)|.', expr)
['[a]', '+', '[c]', '*', '0.6', '/', '[b]', '-', '(', '[a]', '-', '[f]', '*', '0.9', ')']
One approach would be to create a list of regular expressions to match each token, something like:
import re
tokens = [r'\[.?\]', r'\(', r'\)', r'\+', r'\*', r'\-', r'/', r'\d+?.\d+', r'\d+']
regex = re.compile('|'.join(tokens))
Then you could use findall on your expression to return a list of matches:
>>> regex.findall('[a]+[c]*0.6/[b]-([a]-[f]*0.9)')
<<<
['[a]',
'+',
'[c]',
'*',
'0.6',
'/',
'[b]',
'-',
'(',
'[a]',
'-',
'[f]',
'*',
'0.9',
')']

Splitting a list in python

I'm writing a parser in Python. I've converted an input string into a list of tokens, such as:
['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')', '/', '3', '.', 'x', '^', '2']
I want to be able to split the list into multiple lists, like the str.split('+') function. But there doesn't seem to be a way to do my_list.split('+'). Any ideas?
Thanks!
You can write your own split function for lists quite easily by using yield:
def split_list(l, sep):
current = []
for x in l:
if x == sep:
yield current
current = []
else:
current.append(x)
yield current
An alternative way is to use list.index and catch the exception:
def split_list(l, sep):
i = 0
try:
while True:
j = l.index(sep, i)
yield l[i:j]
i = j + 1
except ValueError:
yield l[i:]
Either way you can call it like this:
l = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')',
'/', '3', '.', 'x', '^', '2']
for r in split_list(l, '+'):
print r
Result:
['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')']
['4', ')', '/', '3', '.', 'x', '^', '2']
For parsing in Python you might also want to look at something like pyparsing.
quick hack, you can first use the .join() method to join create a string out of your list, split it at '+', re-split (this creates a matrix), then use the list() method to further split each element in the matrix to individual tokens
a = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')', '/', '3', '.', 'x', '^', '2']
b = ''.join(a).split('+')
c = []
for el in b:
c.append(list(el))
print(c)
result:
[['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')'], ['4', ')', '/', '3', '.', 'x', '^', '2']]

Categories

Resources