pyparsing: setResultsName for multiple elements get combined

pyparsing: setResultsName for multiple elements get combined - python

Here is the text I'm parsing:
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
The parser matches those lines using:
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')
// end of line: .setResultsName('model_definition')
The problem is that when there are two model definitions, they aren't named separately in the ParseResults object:
It looks like the first one gets overridden by the second. The reason I'm naming them is to make executing the lines easier - this way I (hopefully) don't have to figure out what is going on at evaluation time - the parser has already labelled everything. How can I get both model_definitions labelled? It would be nice if model_definition held a list of every model definition found.
Just in case, here is some more of my code:
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression).setResultsName('model_definition')
expression << Or([function_application, number, identifier, list_literal, probability_expression])
statement = Optional(newline) + Or([model_definition, assignment, function_application]) + Optional(newline)
line = OneOrMore('\n').suppress()
comment = Group('#' + SkipTo(newline)).suppress()
program = OneOrMore(Or([line, statement, comment]))
ast = program.parseString(input_string)
return ast

Not documented that I know of, but I found something in pyparsing.py:
I changed .setResultsName('model_definition') to .setResultsName('model_definition*') and they listed correctly!
Edit: it is documented, but it is a flag you pass to setResultsName:
setResultsName( string, listAllMatches=False ) - name to be given to tokens matching the element; if multiple tokens within a repetition group (such as ZeroOrMore or delimitedList) the default is to return only the last matching token - if listAllMatches is set to True, then a list of matching tokens is returned.

Here is enough of your code to get things to work:
from pyparsing import *
# fake in the bare minimum to parse the given test strings
identifier = Word(alphas, alphanums)
integer = Word(nums)
function_call = identifier + '(' + Optional(delimitedList(identifier | integer)) + ')'
expression = function_call
model_definition = Group(identifier.setResultsName('random_variable_name') + '~' + expression)
sample = """
x ~ normal(mu, 1)
y ~ normal(mu2, 1)
"""
The trailing '*' is there in setResultsName for those cases where you use the short form of setResultsName: expr("name*") vs expr.setResultsName("name", listAllMatches=True). If you prefer calling setResultsName, then I would not use the '*' notation, but would pass the listAllMatches argument.
If you are getting names that step on each other, you may need to add a level of Grouping. Here is your solution using listAllMatches=True, by virtue of the trailing '*' notation:
model_definition1 = model_definition('model_definition*')
print OneOrMore(model_definition1).parseString(sample).dump()
It returns this parse result:
[['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: [['x', '~', 'normal', '(', 'mu', '1', ')'], ['y', '~', 'normal', '(', 'mu2', '1', ')']]
[0]:
['x', '~', 'normal', '(', 'mu', '1', ')']
- random_variable_name: x
[1]:
['y', '~', 'normal', '(', 'mu2', '1', ')']
Here is a variation that does not use listAllMatches, but adds another level of Group:
model_definition2 = model_definition('model_definition')
print OneOrMore(Group(model_definition2)).parseString(sample).dump()
gives:
[[['x', '~', 'normal', '(', 'mu', '1', ')']], [['y', '~', 'normal', '(', 'mu2', '1', ')']]]
[0]:
[['x', '~', 'normal', '(', 'mu', '1', ')']]
- model_definition: ['x', '~', 'normal', '(', 'mu', '1', ')']
- random_variable_name: x
[1]:
[['y', '~', 'normal', '(', 'mu2', '1', ')']]
- model_definition: ['y', '~', 'normal', '(', 'mu2', '1', ')']
- random_variable_name: y
In both cases, I see the full content being returned, so I don't quit understand what you mean by "if you return multiple, it fails to split out each child."

Related

How to split duplicated separator in Python

I have a string with the format
exp = '(( 200 + (4 * 3.14)) / ( 2 ** 3 ))'
I would like to separate the string into tokens by using re.split() and include the separators as well. However, I am not able to split ** together and eventually being split by * instead.
This is my code: tokens = re.split(r'([+|-|**?|/|(|)])',exp)
My Output (wrong):
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '*', '*', '3', ')', ')']
I would like to ask is there a way for me to split the separators between * and **? Thank you so much!
Desired Output:
['(', '(', '200', '+', '(', '4', '*', '3.14', ')', ')', '/', '(', '2', '**', '3', ')', ')']

Using the [...] notation only allows you to specify individual characters. To get variable sized alternate patterns you need to use the | operator outside of these brackets. This also means that you need to escape the regular expression operators and that you need to place the longer patterns before the shorter ones (i.e. ** before *)
tokens = re.split(r'(\*\*|\*|\+|\-|/|\(|\))',exp)
or even shorter:
tokens = re.split(r'(\*\*|[*+-/()])',exp)

How to properly split this list of strings?

I have a list of strings such as this :
['z+2-44', '4+55+z+88']
How can I split this strings in the list such that it would be something like
[['z','+','2','-','44'],['4','+','55','+','z','+','88']]
I have tried using the split method already however that splits the 44 into 4 and 4, and am not sure what else to try.

You can use regex:
import re
lst = ['z+2-44', '4+55+z+88']
[re.findall('\w+|\W+', s) for s in lst]
# [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
\w+|\W+ matches a pattern that consists either of word characters (alphanumeric values in your case) or non word characters (+- signs in your case).

That will work, using itertools.groupby
z = ['z+2-44', '4+55+z+88']
print([["".join(x) for k,x in itertools.groupby(i,str.isalnum)] for i in z])
output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
It just groups the chars if they're alphanumerical (or not), just join them back in a list comprehension.
EDIT: the general case of a calculator with parenthesis has been asked as a follow-up question here. If z is as follows:
z = ['z+2-44', '4+55+((z+88))']
then with the previous grouping we get:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+((', 'z', '+', '88', '))']]
Which is not easy to parse in terms of tokens. So a change would be to join only if alphanum, and let as list if not, flattening in the end using chain.from_iterable:
print([list(itertools.chain.from_iterable(["".join(x)] if k else x for k,x in itertools.groupby(i,str.isalnum))) for i in z])
which yields:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
(note that the alternate regex answer can also be adapted like this: [re.findall('\w+|\W', s) for s in lst] (note the lack of + after W)
also "".join(list(x)) is slightly faster than "".join(x), but I'll let you add it up to avoid altering visibility of that already complex expression.

Alternative solution using re.split function:
l = ['z+2-44', '4+55+z+88']
print([list(filter(None, re.split(r'(\w+)', i))) for i in l])
The output:
[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]

You could only use str.replace() and str.split() built-in functions within a list comprehension:
In [34]: lst = ['z+2-44', '4+55+z+88']
In [35]: [s.replace('+', ' + ').replace('-', ' - ').split() for s in lst]
Out[35]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
But note that this is not an efficient approach for longer strings. In that case the best way to go is using regex.
As another pythonic way you can also use tokenize module:
In [56]: from io import StringIO
In [57]: import tokenize
In [59]: [[t.string for t in tokenize.generate_tokens(StringIO(i).readline)][:-1] for i in lst]
Out[59]: [['z', '+', '2', '-', '44'], ['4', '+', '55', '+', 'z', '+', '88']]
The tokenize module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty-printers,” including colorizers for on-screen displays.

If you want to stick with split (hence avoiding regex), you can provide it with an optional character to split on:
>>> testing = 'z+2-44'
>>> testing.split('+')
['z', '2-44']
>>> testing.split('-')
['z+2', '44']
So, you could whip something up by chaining the split commands.
However, using regular expressions would probably be more readable:
import re
>>> re.split('\+|\-', testing)
['z', '2', '44']
This is just saying to "split the string at any + or - character" (the backslashes are escape characters because both of those have special meaning in a regex.
Lastly, in this particular case, I imagine the goal is something along the lines of "split at every non-alpha numeric character", in which case regex can still save the day:
>>> re.split('[^a-zA-Z0-9]', testing)
['z', '2', '44']
It is of course worth noting that there are a million other solutions, as discussed in some other SO discussions.
Python: Split string with multiple delimiters
Split Strings with Multiple Delimiters?
My answers here are targeted towards simple, readable code and not performance, in honor of Donald Knuth

re.split on multiple characters (and maintaining the characters) produces a list containing also empty strings

I need to split a mathematical expression based on the delimiters. The delimiters are (, ), +, -, *, /, ^ and space. I came up with the following regular expression
"([\\s\\(\\)\\-\\+\\*/\\^])"
which also keeps the delimiters in the resulting list (which is what I want), but it also produces empty strings "" elements, which I don't want. I hardly ever use regular expression (unfortunately), so I am not sure if it is possible to avoid this.
Here's an example of the problem:
>>> import re
>>> e = "((12*x^3+4 * 3)*3)"
>>> re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e)
['', '(', '', '(', '12', '*', 'x', '^', '3', '+', '4',
' ', '', ' ', '', ' ', '', '*', '', ' ', '3', ')', '', '*', '3', ')', '']
Is there a way to not produce those empty strings, maybe by modifying my regular expression? Of course I can remove them using for example filter, but the idea would be not to produce them at all.
Edit
I would also need to not include spaces. If you can help also in that matter, it would be great.

You could add \w+, remove the \s and do a findall:
import re
e = "((12*x^3+44 * 3)*3)"
print re.findall("(\w+|[()\-+*/^])", e)
Output:
['(', '(', '12', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
Depending on what you want you can change the regex:
e = "((12a*x^3+44 * 3)*3)"
print re.findall("(\d+|[a-z()\-+*/^])", e)
print re.findall("(\w+|[()\-+*/^])", e)
The first considers 12a to be two strings the latter one:
['(', '(', '12', 'a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']
['(', '(', '12a', '*', 'x', '^', '3', '+', '44', '*', '3', ')', '*', '3', ')']

Just strip/filter them out in a comprehension.
result = [item for item in re.split("([\\s\\(\\)\\-\\+\\*/\\^])", e) if item.strip()]

How to replace a character in a list of lists

I have to replace some of the list of lists by a specific word given by the user,I tried multiple times and kept getting an error and then after fixing the errors I have a code but the code wont print. Even though I typed print just so I could see how the code ran, nothing shows up.
Here is the list of lists:
table = [['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*']]
and here is the code I tried:
i = 0
def create_table(secret):
secret = input("Enter the secret Word: ")
secret = secret.upper()
secret = secret.replace('J','I')
return secret
for row in range(5):
for col in range(5):
table = [t.replace(table[row][col], secret[i]) for t in table]
i +=1
print(table)
print(create_table(secret))

You have return secret half way through your function. This means that the remainder of the code in that function will not execute. You should move return secret to the end of the function definition.
You also are accepting a parameter to the create_table() function that you then immediately overwrite, you can get rid of this.

table is a mutable list, so just do:
table[row][col] = secret[i]
and remove the return secret or you won't get to the code.
A simple example:
import pprint
table = [['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*'],
['*', '*', '*', '*', '*']]
def create_table():
secret = 'ABCDEFGHIJKLMNOPQRSTUVWXY'
for row in range(5):
for col in range(5):
table[row][col] = secret[row*5 + col]
pprint.pprint(table)
create_table()
Output:
[['A', 'B', 'C', 'D', 'E'],
['F', 'G', 'H', 'I', 'J'],
['K', 'L', 'M', 'N', 'O'],
['P', 'Q', 'R', 'S', 'T'],
['U', 'V', 'W', 'X', 'Y']]

You have few problem with your code.
One thing to note is a function stops further execution once it returns something.*
So, In your function create_table, the lines after your return statement are not being executed at all.
Also note that, you either print from within the function or just return some value and print from the main body. You're printing from inside your function and also calling function as argument to print in main body.
Just do return table from your function and print from the main body. That's just the standard practice and right way to do it.
Or you don't even need to do that since you're modifying a global variable from inside your function anyway.
Edit: To modify the variable table from inside your function, add the line,
global table within your function before you try to make any changes in table so that your function knows it's the global variable you're trying to modify and not creating a new local variable with the same name.
*I think there is a way to work around this.Not sure though.

Separating strings (list elements) with many spliters without loosing spliter from list

I want to separate list elements if list element contain any value from
list_operators = ['+', '-', '*', '(', ')']
without losing operator from list and without using regex.
For instance:
my_list = ['a', '=', 'x+y*z', '//', 'moo']
Wanted output :
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']
and x y z are words not one character:
['john+doe/12*5']
['john','+','doe','/','12','*','5']

You can use itertools.groupby() to achieve this:
from itertools import groupby
operators = {'+', '-', '*', '(', ')'}
fragments = ['a', '=', 'x+y*z', '//', 'moo', '-', 'spam*(eggs-ham)']
separated = []
for fragment in fragments:
for is_operator, group in groupby(fragment, lambda c: c in operators):
if is_operator:
separated.extend(group)
else:
separated.append(''.join(group))
>>> separated
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo', '-',
'spam', '*', '(', 'eggs', '-', 'ham', ')']
Note that I've changed the names of your variables to be a little more meaningful, and made operators a set because we only care about membership, not order (although the code would work just as well, if a little more slowly, with a list).
groupby() returns an iterable of (key, group) pairs, starting a new group whenever key changes. Since I've chosen a key function (lambda c: c in operators) that just tests for a character's membership in operators, the result of the groupby() call looks something like this:
[
(False, ['s', 'p', 'a', 'm']),
(True, ['*', '(']),
(False, ['e', 'g', 'g', 's']),
(True, ['-']),
(False, ['h', 'a', 'm']),
(True, [')'])
]
(groupby() actually returns a groupby object made up of (key,grouper object) tuples - I've converted those objects to lists in the example above for clarity).
The rest of the code is straightforward: if is_operator is True, the characters in group are used to extend separated; if it's False, the characters in group are joined back into a string and appended to separated.

This is an easy way of doing it:
for x in my_list:
if len(set(list_operators) & set(list(x)))!=0:
for i in list(x):
slist.append(i)
else:
slist.append(x)
slist
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']

You can also do something like this:
import re
from itertools import chain
list_operators = ['+', '-', '*', '(', ')']
tokenizer = re.compile(r"[{}]|\w+".format("".join(map(re.escape, list_operators))))
my_list = ['a', '=', 'x+y*z', '//', 'moo', 'john+doe/12*5']
parsed = list(chain.from_iterable(map(tokenizer.findall, my_list)))
parsed result:
['a', 'x', '+', 'y', '*', 'z', 'moo', 'john', '+', 'doe', '12', '*', '5']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pyparsing: setResultsName for multiple elements get combined - python

Related

How to split duplicated separator in Python

How to properly split this list of strings?

re.split on multiple characters (and maintaining the characters) produces a list containing also empty strings

How to replace a character in a list of lists

Separating strings (list elements) with many spliters without loosing spliter from list

Categories

Resources