I have a expression and I want to extract it in python 2.6. Here is the example:
[a]+[c]*0.6/[b]-([a]-[f]*0.9)
this going to:
(
'[a]',
'+',
'[c]',
'*',
'0.6',
'/',
'[b]',
'-',
'(',
'[a]',
'-',
'[f]',
'*',
'0.9',
')',
)
I need it a list. Please give me a hand. Thanks.
>>> import re
>>> expr = '[a]+[c]*0.6/[b]-([a]-[f]*0.9)'
>>> re.findall('(?:\[.*?\])|(?:\d+\.*\d*)|.', expr)
['[a]', '+', '[c]', '*', '0.6', '/', '[b]', '-', '(', '[a]', '-', '[f]', '*', '0.9', ')']
One approach would be to create a list of regular expressions to match each token, something like:
import re
tokens = [r'\[.?\]', r'\(', r'\)', r'\+', r'\*', r'\-', r'/', r'\d+?.\d+', r'\d+']
regex = re.compile('|'.join(tokens))
Then you could use findall on your expression to return a list of matches:
>>> regex.findall('[a]+[c]*0.6/[b]-([a]-[f]*0.9)')
<<<
['[a]',
'+',
'[c]',
'*',
'0.6',
'/',
'[b]',
'-',
'(',
'[a]',
'-',
'[f]',
'*',
'0.9',
')']
Related
For example, I have a string section 213(d)-456(c)
How can I split it to get a list of strings:
['section', '213', '(', 'd', ')', '-', '456', '(', 'c', ')'].
Thank you!
You can do so using Regex.
import re
text = "section 213(d)-456(c)"
output = re.split("(\W)", text)
Output: ['section', ' ', '213', '(', 'd', ')', '', '-', '456', '(', 'c', ')', '']
Here \W is for non-word character!
You can come close with
re.split(r'([-\s()])', 'section 213(d)-456(c)')
When the delimiter contains a capture group, the result includes the captured text.
However, this will also include the space delimiters in the result:
['section', ' ', '213', '(', 'd', ')', '', '-', '456', '(', 'c', ')', '']
You can easily remove these afterward.
I have a string with the format
exp = '(( 200 + (4 * 3.14)) / ( 2 ** 3 ))'
I would like to separate the string into tokens by using re.split() and include the separators as well. However, I am not able to split ** together and eventually being split by * instead.
This is my code: tokens = re.split(r'([+|-|**?|/|(|)])',exp)
My Output (wrong):
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '*', '*', '3', ')', ')']
I would like to ask is there a way for me to split the separators between * and **? Thank you so much!
Desired Output:
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '**', '3', ')', ')']
Using the [...] notation only allows you to specify individual characters. To get variable sized alternate patterns you need to use the | operator outside of these brackets. This also means that you need to escape the regular expression operators and that you need to place the longer patterns before the shorter ones (i.e. ** before *)
tokens = re.split(r'(\*\*|\*|\+|\-|/|\(|\))',exp)
or even shorter:
tokens = re.split(r'(\*\*|[*+-/()])',exp)
I need to split a mathematical expression based on the delimiters. The delimiters are (, ), +, -, *, /, ^ and space. I came up with the following regular expression
"([\\s\\(\\)\\-\\+\\*/\\^])"
which also keeps the delimiters in the resulting list (which is what I want), but it also produces empty strings "" elements, which I don't want. I hardly ever use regular expression (unfortunately), so I am not sure if it is possible to avoid this.
Here's an example of the problem:
>>> import re
>>> e = "((12*x^3+4 * 3)*3)"
>>> re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e)
['', '(', '', '(', '12', '*', 'x', '^', '3', '+', '4',
' ', '', ' ', '', ' ', '', '*', '', ' ', '3', ')', '', '*', '3', ')', '']
Is there a way to not produce those empty strings, maybe by modifying my regular expression? Of course I can remove them using for example filter, but the idea would be not to produce them at all.
Edit
I would also need to not include spaces. If you can help also in that matter, it would be great.
You could add \w+, remove the \s and do a findall:
import re
e = "((12*x^3+44 * 3)*3)"
print re.findall("(\w+|[()\-+*/^])", e)
Output:
['(', '(', '12', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
Depending on what you want you can change the regex:
e = "((12a*x^3+44 * 3)*3)"
print re.findall("(\d+|[a-z()\-+*/^])", e)
print re.findall("(\w+|[()\-+*/^])", e)
The first considers 12a to be two strings the latter one:
['(', '(', '12', 'a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
['(', '(', '12a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
Just strip/filter them out in a comprehension.
result = [item for item in re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e) if item.strip()]
I am trying to separate the operators (including parentheses) and the operands in an expression. For example given an expression
expr = "(32+54)*342-(4*(3-9))"
I am trying to get
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']
Here is the code that I wrote. Is there a better way of doing it in python.
l = list(expr)
n = ''
expr = []
try:
for c in l:
if c in string.digits:
n += c
else:
if n != '':
expr.append(n)
n = ''
expr.append(c)
finally:
if n != '':
expr.append(n)
We can do this with re.split():
>>> import re
>>> expr = "(32+54)*342-(4*(3-9))"
>>> re.split("([-()+*/])", expr)
['', '(', '32', '+', '54', ')', '', '*', '342', '-', '', '(', '4', '*', '', '(', '3', '-', '9', ')', '', ')', '']
This does insert some empty strings, but these can probably be handled or stripped out trivially enough. E.g with a list comprehension:
>>> [part for part in re.split("([-()+*/])", expr) if part]
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']
If you are only trying to tokenize the stream, your approach is fine, but somewhat old-fashioned. You can use a regular expression, to split the tokens more easily.
However, if you also want to do something with the tokens (such as evaluate them) then I suggest you look at a parsing module that can handle recursion (regular expressions cannot handle recursion), such as pyparsing.
Python: Batteries Included.
>>> [x[1] for x in tokenize.generate_tokens(StringIO.StringIO('(32+54)*342-(4*(3-9))').readline)]
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')', '']
>>> if True:
exp=[]
expr = "(32+54)*342-(4*(3-9))"
flag=False
for i in expr:
if i.isdigit() and flag:
exp.append(str(exp.pop(len(exp)-1))+i)
elif i.isdigit():
flag=True
exp.append(i)
else:
flag=False
exp.append(i)
print(exp)
['(', '32', '+', '54', ')', '*', '342', '-', '(', '4', '*', '(', '3', '-', '9', ')', ')']
>>>
I'm writing a parser in Python. I've converted an input string into a list of tokens, such as:
['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')', '/', '3', '.', 'x', '^', '2']
I want to be able to split the list into multiple lists, like the str.split('+') function. But there doesn't seem to be a way to do my_list.split('+'). Any ideas?
Thanks!
You can write your own split function for lists quite easily by using yield:
def split_list(l, sep):
current = []
for x in l:
if x == sep:
yield current
current = []
else:
current.append(x)
yield current
An alternative way is to use list.index and catch the exception:
def split_list(l, sep):
i = 0
try:
while True:
j = l.index(sep, i)
yield l[i:j]
i = j + 1
except ValueError:
yield l[i:]
Either way you can call it like this:
l = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')',
'/', '3', '.', 'x', '^', '2']
for r in split_list(l, '+'):
print r
Result:
['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')']
['4', ')', '/', '3', '.', 'x', '^', '2']
For parsing in Python you might also want to look at something like pyparsing.
quick hack, you can first use the .join() method to join create a string out of your list, split it at '+', re-split (this creates a matrix), then use the list() method to further split each element in the matrix to individual tokens
a = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')', '/', '3', '.', 'x', '^', '2']
b = ''.join(a).split('+')
c = []
for el in b:
c.append(list(el))
print(c)
result:
[['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')'], ['4', ')', '/', '3', '.', 'x', '^', '2']]