I am using the "awesomest" parsing library in the world existing right now. Pyparsing. The problem at hand is to generate a PyMongo dictionary from a given SQL string (For select statements). The grammar def I am using is following :
sql_stmt = (select_key_word + ('*' | column_list).setResultsName
("columns") + form_key_word + table_name_list.setResultsName
("collections") +
Optional(where_condition, "").setResultsName("where"))
Here the select_key_word, column_list etc. constructs are valid grammar defs. and using this i can parse a string like "Select * from collection_1 where (Sal = 1000 or Sal=5000) AND Car>2"
The problem i have is that, the where part is being parsed is like this :
[[u'where', [u'(', [u'Sal', '=', u'1000'], 'or', [u'Sal', '=', u'5000'], u')'], 'and', [u'Car', '>', u'2']]]
Which is fine if i want it translated into something sqlish. But a valid representation of that same in pymongo would be something like this :
{u'$or': [{u'$and': [{u'Sal': u'1000'}, {u'Sal': u'5000'}]}, {u'Car': {u'$gte': u'2'}}]}
That is where I am stuck. Can anybody give me a direction? it seems to me that setParseAction will be a way to go, but just can't figure that out
the code for the where_contidion is :
where_expr = Forward()
and_keyword = get_conjunction_as_grammar("and")
or_keyword = get_conjunction_as_grammar("or")
in_operation = get_operation_as_grammar("in")
column_value = get_real_number_as_grammar() | get_int_as_grammar() | \
quotedString
binary_operator = get_bin_op_as_grammar()
col_name = get_column_name_as_grammar()
where_condn = Group(
(col_name + binary_operator + column_value) |
(col_name + in_operation + "(" + delimitedList(column_value) + ")" ) |
("(" + where_expr + ")")
)
where_expr << where_condn + ZeroOrMore((and_keyword | or_keyword)
+ where_expr)
where_condition = Group(CaselessLiteral("where") + where_expr)
Thanks in advance. Please let me know if you need any other information.
Yes, parse actions are just the thing for this kind of project. Also, if you are trying to evaluate an expression that can have parenthetical nesting of operations of varying precedence, then operatorPrecedence is often a handy shortcut:
from pyparsing import *
and_keyword = CaselessKeyword("and")
or_keyword = CaselessKeyword("or")
in_operation = CaselessKeyword("in")
value = quotedString | Word(alphanums)
comparisonOp = oneOf("= != > < >= <=")
LPAR,RPAR = map(Suppress,"()")
valueList = LPAR + delimitedList(value) + RPAR
comparisonExpr = value + comparisonOp + value | value + in_operation + Group(valueList)
def makePymongoComparison(tokens):
v1,op,v2 = tokens
if op != 'in':
if op != '=':
op = {
"!=" : "$ne",
">" : "$gt",
"<" : "$lt",
">=" : "$gte",
"<=" : "$lte",
}[op]
v2 = "{'%s': '%s'}" % (op, v2)
return "{'%s': '%s'}" % (v1, v2)
else:
return "{'%s': {'$in': [%s]}}" % (v1, ','.join("'%s'"%v for v in v2))
comparisonExpr.setParseAction(makePymongoComparison)
def handleBinaryOp(op):
def pa(tokens):
return "{'$%s': %s}" % (op, ', '.join(tokens.asList()[0][::2]))
return pa
handleAnd = handleBinaryOp("and")
handleOr = handleBinaryOp("or")
whereOperand = comparisonExpr
where_expr = operatorPrecedence(whereOperand,
[
(and_keyword, 2, opAssoc.LEFT, handleAnd),
(or_keyword, 2, opAssoc.LEFT, handleOr),
])
where_condition = Group(CaselessLiteral("where") + where_expr)
print where_expr.parseString("(Sal = 1000 or Sal=5000) AND Car>2")[0]
print where_expr.parseString("(Sal = 1000 or Sal=5000) AND Car in (1,2,3)")[0]
prints:
{'$and': {'$or': {'Sal': '1000'}, {'Sal': '5000'}}, {'Car': '{'$gt': '2'}'}}
{'$and': {'$or': {'Sal': '1000'}, {'Sal': '5000'}}, {'Car': {'$in': ['1','2','3']}}}
Still needs a few tweaks, but I hope this gets you further along.
Related
I'd like to use pyparsing to parse an expression of the form: expr = '(gimme [some {nested [lists]}])', and get back a python list of the form: [[['gimme', ['some', ['nested', ['lists']]]]]]. Right now my grammar looks like this:
nestedParens = nestedExpr('(', ')')
nestedBrackets = nestedExpr('[', ']')
nestedCurlies = nestedExpr('{', '}')
enclosed = nestedParens | nestedBrackets | nestedCurlies
Presently, enclosed.searchString(expr) returns a list of the form: [[['gimme', ['some', '{nested', '[lists]}']]]]. This is not what I want because it's not recognizing the square or curly brackets, but I don't know why.
Here's a pyparsing solution that uses a self-modifying grammar to dynamically match the correct closing brace character.
from pyparsing import *
data = '(gimme [some {nested, nested [lists]}])'
opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
closingStack.append(closing.expr)
closing << Literal( closingFor[t[0]] )
def popClosing():
closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)
matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )
print matchedNesting.parseString(data).asList()
prints:
[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]
Updated: I posted the above solution because I had actually written it over a year ago as an experiment. I just took a closer look at your original post, and it made me think of the recursive type definition created by the operatorPrecedence method, and so I redid this solution, using your original approach - much simpler to follow! (might have a left-recursion issue with the right input data though, not thoroughly tested):
from pyparsing import *
enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed)
nestedBrackets = nestedExpr('[', ']', content=enclosed)
nestedCurlies = nestedExpr('{', '}', content=enclosed)
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)
data = '(gimme [some {nested, nested [lists]}])'
print enclosed.parseString(data).asList()
Gives:
[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]
EDITED:
Here is a diagram of the updated parser, using the railroad diagramming support coming in pyparsing 3.0.
This should do the trick for you. I tested it on your example:
import re
import ast
def parse(s):
s = re.sub("[\{\(\[]", '[', s)
s = re.sub("[\}\)\]]", ']', s)
answer = ''
for i,char in enumerate(s):
if char == '[':
answer += char + "'"
elif char == '[':
answer += "'" + char + "'"
elif char == ']':
answer += char
else:
answer += char
if s[i+1] in '[]':
answer += "', "
ast.literal_eval("s=%s" %answer)
return s
Comment if you need more
I have this string: a9*a9 + a10*a10
and I would like to have: a9*a8 + a10*a9
I think re.sub() from Python should be useful, but I am not familiar with group() that I've seen in some examples. Any help would be appreciated.
here's another solution method:
import re
s = 'a9*a9 + a10*a10 + a8*a8 + a255*a255 + b58*b58 + c58*c58'
string = re.sub('[ ]', '', s) # removed whitespace from string (optional:only if you are not sure how many space you can get in string)
x = string.split('+')
pattern = re.compile(r'([a-z])([\d]+)')
ans = ''
for element in x:
for letter, num in re.findall(pattern, element):
st = ''
for i in range(len(element.split('*'))):
st = st + '*' + (letter+str(int(num)-i))
# print(str(letter) + str(int(num)-i))
ans = ans + '+' + st[1:]
print(ans[1:])
Output :
a9*a8+a10*a9+a8*a7+a255*a254+b58*b57+c58*c57
Assuming the structure of the input is a\d*a\d + a\d*a\d + ... you can use a callback in the re.sub function:
import re
def decrement(match):
if match.group(1) != match.group(2):
return match.group(0)
return 'a{}*a{}'.format(match.group(1), str(int(match.group(2)) - 1))
re.sub(r'a(\d)\*a(\d)', decrement, 'a3*a3 + a5*a5 + a3*a7')
# a3*a2 + a5*a4 + a3*a7
So, i have some expressions in EBNF form for parsing some systems of differential equations
END = Literal(';').suppress()
POINT = Literal('.')
COMMA = Literal(',').suppress()
COLON = Word(':', max=1).suppress()
EQUAL = Literal('=').suppress()
VARNAME = Word(alphas, max=1)
NATNUM = Word(nums) # 1234567890
SIGN = oneOf('+ -')
OPER = oneOf('+ - * / ^ ')
REALNUM = Combine(Optional(SIGN) + NATNUM + Optional(POINT + NATNUM)) # Real Numbers 2.3, 4.5
STEP = Dict(Group('Step' + COLON + REALNUM + END)) # Step: 0.01 ;
RANGE = Dict(Group('Range' + COLON + REALNUM + END)) # Range: 2.0 ;
VARINIT = Group(VARNAME + Suppress('=') + REALNUM) # x=32.31
ZEROVAR = Dict(Group('Vars0' + COLON + VARINIT + Optional(COMMA + VARINIT) + END))
COEFF = Dict(Group('Coeff' + COLON + VARINIT + Optional(COMMA + VARINIT) + END))
EXPESS = Forward()
EXPESS << Combine((REALNUM | VARNAME) + ZeroOrMore(OPER + EXPESS), adjacent=False)
IDENT = Combine('d'+VARNAME)
FUNC = Group(IDENT + EQUAL + EXPESS)
DIFUR = Dict(Group('Exp' + COLON + FUNC + ZeroOrMore(COMMA + FUNC) + END))
STATE = Suppress("Start") + DIFUR + ZEROVAR + COEFF + STEP + RANGE + Suppress("Stop")
I'd like to receive such kind of JSON by parsing the finally STATE expression:
{
'Vars0': {
'y', '0.55',
'x', '0.02',
},
'Exp': {
'dx': 'a*x-y',
'dy': 'b*x-y',
'dz':'800-2*4*x+z'
},
'Range': '2.0',
'Step': '0.05',
'Coeff': {
'a': '5',
'b': '2'
}
}
But instead i've got some thing ugly like this for example 'Vars0': ([(['y', '0.55'], {}), (['x', '0.02'], {})], {}) and etc.
What is my stupid mistake?
p.s. parsing plain text for parsing can be like this
What you have isn't JSON, it's a Python dictionary variable, which fortunately means it can be pretty printed with the pprint module.
Have a look, specifically, at pprint.pprint: https://docs.python.org/2/library/pprint.html#pprint.pprint .
Setting an indent of 4 and a width of 1 might produce something pleasing to you. Example: https://ideone.com/pYESaW
I am using pyparsing to parse a hex string and I am searching for an automatic way of print the parser tree.
A near approach is command dump but it print a lot of duplicated info.
For example:
from pyparsing import * #Word, Optional, OneOrMore, Group, ParseException
data = Forward()
arrayExpr = Forward()
def data_array(s,l,t):
n = int(t[0], 16)
arrayExpr << ( n * data)
return t[0]
array = Word(hexnums, exact=2).setParseAction(data_array) + arrayExpr
data << (Literal('01') + array.setResultsName('array')
| Literal('03') + Word(hexnums, exact=2)('char')
| Literal('04') + Word(hexnums, exact=2)('boolean'))
frame = (Word(hexnums, exact=2)('id') \
+ data('data'))('frame')
result = frame.parseString("02010203010302");
print result.dump()
The goal is that result of result.dump() was something similar to
- frame: ['02', '01', '03', '03', '01', '04', '02', '03', '02']
- id: 02
- array: ['03', '03', '01', '04', '02', '03', '02']
- char: 01
- boolean: 02
- char: 02
The pretty print isn't mandatory, the pretended is the tree structure.
Is there a way of make this print or I will need to had a setParseAction for all rules ?
Looks like you'll need a setParseAction for each of the rules.
From parsing to object hierarchy: "Attach parse actions to each expression, but here is the trick: use a class instead of a function. The class's init method will get called, and return an instance of that class. "
Prefer to add an answer instead of the edit the question, to much code ..
Isn't perfect, the levels don't get right and the classes could be discarded if I could get the resultsName from printAction. Maybe should create a new question :-/
If someone use it and improve please say how :)
#!/usr/bin/python
from pyparsing import * #Word, Optional, OneOrMore, Group, ParseException
data = Forward()
level = 0
arrayExpr = Forward()
def data_array(s,l,t):
n = int(t[0], 16)
arrayExpr << ( n * data)
return t[0]
class TreeChild(object):
def __init__(self,t):
self.args = t
def __str__(self):
ret = " %s: " % self.name
return ' ' * level + ret + self.args[0] + "\n"
class TreeBranch(object):
def __init__(self,t):
self.args = t
def __str__(self):
global level
level = level + 1
childs = " ".join(map(str,self.args))
level = level - 1
ret = " %s: " % self.name + '\n'
return ' ' * level + ret + childs + "\n"
class Frame(TreeBranch):
name = 'frame'
class Char(TreeChild):
name = 'char'
class Boolean(TreeChild):
name = 'boolean'
class Id(TreeChild):
name = 'id'
class Array(TreeBranch):
name = 'array'
array = Suppress(Word(hexnums, exact=2).setParseAction(data_array)) + arrayExpr
data << (Suppress(Literal('01')) + array.setResultsName('array').setParseAction(Array)
| Suppress(Literal('03')) + Word(hexnums, exact=2)('char').setParseAction(Char)
| Suppress(Literal('04')) + Word(hexnums, exact=2)('boolean').setParseAction(Boolean))
frame = (Word(hexnums, exact=2)('id').setParseAction(Id) \
+ data('data'))('frame').setParseAction(Frame)
result = frame.parseString("020103030104020302");
print result[0]
I'd like to use pyparsing to parse an expression of the form: expr = '(gimme [some {nested [lists]}])', and get back a python list of the form: [[['gimme', ['some', ['nested', ['lists']]]]]]. Right now my grammar looks like this:
nestedParens = nestedExpr('(', ')')
nestedBrackets = nestedExpr('[', ']')
nestedCurlies = nestedExpr('{', '}')
enclosed = nestedParens | nestedBrackets | nestedCurlies
Presently, enclosed.searchString(expr) returns a list of the form: [[['gimme', ['some', '{nested', '[lists]}']]]]. This is not what I want because it's not recognizing the square or curly brackets, but I don't know why.
Here's a pyparsing solution that uses a self-modifying grammar to dynamically match the correct closing brace character.
from pyparsing import *
data = '(gimme [some {nested, nested [lists]}])'
opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
closingStack.append(closing.expr)
closing << Literal( closingFor[t[0]] )
def popClosing():
closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)
matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )
print matchedNesting.parseString(data).asList()
prints:
[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]
Updated: I posted the above solution because I had actually written it over a year ago as an experiment. I just took a closer look at your original post, and it made me think of the recursive type definition created by the operatorPrecedence method, and so I redid this solution, using your original approach - much simpler to follow! (might have a left-recursion issue with the right input data though, not thoroughly tested):
from pyparsing import *
enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed)
nestedBrackets = nestedExpr('[', ']', content=enclosed)
nestedCurlies = nestedExpr('{', '}', content=enclosed)
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)
data = '(gimme [some {nested, nested [lists]}])'
print enclosed.parseString(data).asList()
Gives:
[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]
EDITED:
Here is a diagram of the updated parser, using the railroad diagramming support coming in pyparsing 3.0.
This should do the trick for you. I tested it on your example:
import re
import ast
def parse(s):
s = re.sub("[\{\(\[]", '[', s)
s = re.sub("[\}\)\]]", ']', s)
answer = ''
for i,char in enumerate(s):
if char == '[':
answer += char + "'"
elif char == '[':
answer += "'" + char + "'"
elif char == ']':
answer += char
else:
answer += char
if s[i+1] in '[]':
answer += "', "
ast.literal_eval("s=%s" %answer)
return s
Comment if you need more