PLY Lex: ID could be anything

PLY Lex: ID could be anything - python

I have a following simple format:
BLOCK ID {
SUBBLOCK ID {
SUBSUBBLOCK ID {
SOME STATEMENTS;
};
};
};
I configured ply to work with this format. But the issue is that ID could be any string including "BLOCK", "SUBBLOCK", etc.
In the lexer I define ID as:
#TOKEN(r'[a-zA-Z_][a-zA-Z_0-9]*')
def t_ID(self, t):
t.type = self.keyword_map.get(t.value, "ID")
return t
But it means that BLOCK word will not be allowed as a block name.
How I can overcome this issue?

The easiest solution is to create a non-terminal name to be used instead of ID in productions which need a name, such as block : BLOCK name braced_statements:
# Docstring is added later
def p_name(self, p):
p[0] = p[1]
Then you compute the productions for name and assign them to p_name's docstring by executing this before you generate the parser:
Parser.p_name.__doc__ = '\n| '.join(
['name : ID']
+ list(Lexer.keyword_map.values())
)

I probably dont understand your question
but i think something like the following would work (its been a long time since
I messed with PLY so keep in mind this is pseudocode
FUNCTION_ARGS = zeroOrMore(lazy('STATEMENT'),sep=',')
FUNCTION_CALL = t_ID + lparen + FUNCTION_ARGS + rparen
STATEMENT= FUNCTION_CALL | t_Literal | t_ID
SUBBLOCK = Literal('SUBBLOCK') + t_ID + lbrace + STATEMENT + rbrace
BLOCK = Literal('BLOCK') + lbrace + oneOrMore(SUBBLOCK) + rbrace

Related

PyParsing: parse if not a keyword

I am trying to parse a file as follows:
testp.txt
title = Test Suite A;
timeout = 10000
exp_delay = 500;
log = TRUE;
sect
{
type = typeA;
name = "HelloWorld";
output_log = "c:\test\out.log";
};
sect
{
name = "GoodbyeAll";
type = typeB;
comm1_req = 0xDEADBEEF;
comm1_resp = (int, 1234366);
};
The file first contains a section with parameters and then some sects. I can parse a file containing just parameters and I can parse a file just containing sects but I can't parse both.
from pyparsing import *
from pathlib import Path
command_req = Word(alphanums)
command_resp = "(" + delimitedList(Word(alphanums)) + ")"
kW = Word(alphas+'_', alphanums+'_') | command_req | command_resp
keyName = ~Literal("sect") + Word(alphas+'_', alphanums+'_') + FollowedBy("=")
keyValue = dblQuotedString.setParseAction( removeQuotes ) | OneOrMore(kW,stopOn=LineEnd())
param = dictOf(keyName, Suppress("=")+keyValue+Optional(Suppress(";")))
node = Group(Literal("sect") + Literal("{") + OneOrMore(param) + Literal("};"))
final = OneOrMore(node) | OneOrMore(param)
param.setDebug()
p = Path(__file__).with_name("testp.txt")
with open(p) as f:
try:
x = final.parseFile(f, parseAll=True)
print(x)
print("...")
dx = x.asDict()
print(dx)
except ParseException as pe:
print(pe)
The issue I have is that param matches against sect so it expects a =. So I tried putting in ~Literal("sect") in keyName but that just leads to another error:
Exception raised:Found unwanted token, "sect", found '\n' (at char 188), (line:4, col:56)
Expected end of text, found 's' (at char 190), (line:6, col:1)
How do I get it use one parse method for sect and another (param) if not sect?
My final goal would be to have the whole lot in a Dict with the global params and sects included.
EDIT
Think I've figured it out:
This line...
final = OneOrMore(node) | OneOrMore(param)
...should be:
final = ZeroOrMore(param) + ZeroOrMore(node)
But I wonder if there is a more structured way (as I'd ultimately like a dict)?

Dealing with ZeroOrMore in pyparsing

I'm trying to parse pactl list with pyparsing: So far all parse is working correctly but I cannot make ZeroOrMore to work correctly.
I can find foo: or foo: bar and try to deal with that with ZeroOrMore but it doesn't work, I have to add special case "Argument:" to find results without value, but there're Argument: foo results (with value) so it will not work, and I expect any other property to exist without value.
With this definition, and a fixed pactl list output:
#!/usr/bin/env python
#
# parsing pactl list
#
from pyparsing import *
import os
from subprocess import check_output
import sys
data = '''
Module #6
Argument:
Name: module-alsa-card
Usage counter: 0
Properties:
module.author = "Lennart Poettering"
module.description = "ALSA Card"
module.version = "14.0-rebootstrapped"
'''
indentStack = [1]
stmt = Forward()
identifier = Word(alphanums+"-_.")
sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)
value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1)))))
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=") + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)
stmt << ( section | prop | ("Argument:") | value | prop_val )
syntax = OneOrMore(stmt)
parseTree = syntax.parseString(data)
parseTree.pprint()
This gets:
$ ./pactl.py
Module #6
Argument:
Name: module-alsa-card
Usage counter: 0
Properties:
module.author = "Lennart Poettering"
module.description = "ALSA Card"
module.version = "14.0-rebootstrapped"
[[['Module'], ['6']],
[['Argument:'],
[[['Name'], ['module-alsa-card']]],
[[['Usage counter'], ['0']]],
['Properties:',
[[[['module.author'], ['"Lennart Poettering"']]],
[[['module.description'], ['"ALSA Card"']]],
[[['module.version'], ['"14.0-rebootstrapped"']]]]]]]
So far so good, but removing special case for Argument: it gets into error, as ZeroOrMore doesn't behave as expected:
#!/usr/bin/env python
#
# parsing pactl list
#
from pyparsing import *
import os
from subprocess import check_output
import sys
data = '''
Module #6
Argument:
Name: module-alsa-card
Usage counter: 0
Properties:
module.author = "Lennart Poettering"
module.description = "ALSA Card"
module.version = "14.0-rebootstrapped"
'''
indentStack = [1]
stmt = Forward()
identifier = Word(alphanums+"-_.")
sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)
value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=") + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)
stmt << ( section | prop | value | prop_val )
syntax = OneOrMore(stmt)
parseTree = syntax.parseString(data)
parseTree.pprint()
This results in:
$ ./pactl.py
Module #6
Argument:
Name: module-alsa-card
Usage counter: 0
Properties:
module.author = "Lennart Poettering"
module.description = "ALSA Card"
module.version = "14.0-rebootstrapped"
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 19(3,9)
Matched Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) -> [[['Argument'], ['Name']]]
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 1(2,1)
Exception raised:Expected ":", found '#' (at char 8), (line:2, col:8)
Traceback (most recent call last):
File "/home/alberto/projects/node/pacmd_list_json/./pactl.py", line 55, in <module>
parseTree = syntax.parseString(partial)
File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 1955, in parseString
raise exc
File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 6336, in checkUnindent
raise ParseException(s, l, "not an unindent")
pyparsing.ParseException: Expected {{Group:({Group:(W:(ABCD...)) Suppress:("#") Group:(W:(0123...))}) indented block} | {"Properties:" indented block} | Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) | Group:({Group:(W:(ABCD...)) Suppress:("=") Group:(Combine:({{W:(ABCD...) | <SP><TAB>}}...))})}, found ':' (at char 41), (line:4, col:13)
See from setDebug value grammar ZeroOrMore is getting the tokens from next line [[['Argument'], ['Name']]]
I tried LineEnd() and other tricks but none works.
Any idea on how to deal with ZeroOrMore to stop on LineEnd() or without special cases?
NOTE: Real output can be retrieved using:
env = os.environ.copy()
env['LANG'] = 'C'
data = check_output(
['pactl', 'list'], universal_newlines=True, env=env)

indentedBlock is not the easiest pyparsing element to work with. But there are a few things that you are doing that are getting in your way.
To debug this, I broke down some of your more complex expressions, use setName() to give them names, and then added .setDebug(). Like this:
identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()
This will tell pyparsing to output a message whenever this expression is about to be matched, if it matched successfully, or if not, the exception that was raised.
Match identifier at loc 1(2,1)
Matched identifier -> ['Module']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 23(3,13)
Exception raised:Expected identifier, found ':' (at char 23), (line:3, col:13)
It looks like these expressions are messing up the indentedBlock matching, by processing whitespace that should be indentation space:
Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))
The " character in the Word and the whitespace lead me to believe you are trying to match quoted strings. I replaced this expression with:
Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString))
You also need to take care not to read past the end of the line, or you'll also mess up the indentedBlock indentation tracking. I added this expression for a newline at the top:
NL = LineEnd()
and then used it as the stopOn argument to OneOrMore and ZeroOrMore:
prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=") + Group(prop_val_value)).setName("prop_val")#.setDebug()
Here is the parser I ended up with:
indentStack = [1]
stmt = Forward()
NL = LineEnd()
identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()
sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums))).setName("sect_def")#.setDebug()
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)
#~ value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
value_label = originalTextFor(OneOrMore(identifier)).setName("value_label")#.setDebug()
value = Group(value_label
+ Suppress(":")
+ Optional(~NL + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_.') | quotedString(), stopOn=NL))))).setName("value")#.setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
#~ prop_val = Group(Group(identifier) + Suppress("=") + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=") + Group(prop_val_value)).setName("prop_val")#.setDebug()
prop = (prop_name + prop_section).setName("prop")#.setDebug()
stmt << ( section | prop | value | prop_val )
Which gives this:
[[['Module'], ['6']],
[[['Argument']],
[['Name', ['module-alsa-card']]],
[['Usage counter', ['0']]],
['Properties:',
[[['module.author', ['"Lennart Poettering"']]],
[['module.description', ['"ALSA Card"']]],
[['module.version', ['"14.0-rebootstrapped"']]]]]]]

pyparsing, ignore can't ignore string inline

I want to use the pyparsing's commaSeparatedList to seperate a string and ignore the staff inside '{' '}'.
example:
a = 'xyz,abc{def,123,456}'
after parse, I want to got
['xyz','abc{def,123,456}']
I wrote this:
nested_expr = '{' + SkipTo('}') + '}'
commaSeparatedList.ignore(nested_expr).parseString(a)
result: (['xyz', 'abc{def', '123', '456}'], {})
Actulally
It seems like when there is a separater before '{', this will work
a = 'xyz,abc,{def,123,456}'
commaSeparatedList.ignore(nested_expr).parseString(a)
result: (['xyz', 'abc', ''], {})
Could you take a look why this is happening?

Open up the pyparsing.py source file and see how commaSeparatedList is implemented - it isn't that much, and easily adapted to your case:
# original
_commasepitem = Combine(OneOrMore(Word(printables, excludeChars=',') +
Optional( Word(" \t") +
~Literal(",") + ~LineEnd() ) ) ).streamline().setName("commaItem")
commaSeparatedList = delimitedList( Optional( quotedString.copy() | _commasepitem, default="") ).setName("commaSeparatedList")
# modified
_commasepitem = Combine(OneOrMore(QuotedString('{',endQuoteChar='}',unquoteResults=False) | Word(printables, excludeChars=',{}') +
Optional( Word(" \t") +
~Literal(",") + ~LineEnd() ) ) ).streamline().setName("commaItem")
commaSeparatedList = delimitedList( Optional(_commasepitem, default="") ).setName("commaSeparatedList")
It's important that the Word in _commasepitem not allow for inclusion of '{}' characters.

Parsing significant whitespace with a PEG in Python (specificly Parsley)

I'm creating a syntax that supports significant whitespace (most like the "Z" lisp variant than Python or yaml, but same idea)
I came across this article on how to do significant whitespace parsing in a pegasus a PEG parser for C#
But I've been less than successful at converting that to parsley, looks like the #STATE# variable in Pegasus follows backtracking in some way.
This is the closest I've gotten to a simple parser, If I use the version of indent with look ahead it can't parse children, and if I use the version without, it can't parse siblings.
If this is a limitation of parsley and I need to use PyPEG or Parsimonious or something, I'm open to that, but it seems like if the internal indent variable could follow the PEGs internal backtracking this would all work.
import parsley
def indent(s):
s['i'] += 2
print('indent i=%d' % s['i'])
def deindent(s):
s['i'] -= 2
print('deindent i=%d' % s['i'])
grammar = parsley.makeGrammar(r'''
id = <letterOrDigit+>
eol = '\n' | end
nots = anything:x ?(x != ' ')
node = I:i id:name eol !(fn_print(_state['i'], name)) -> i, name
#I = !(' ' * _state['i'])
I = (' '*):spaces ?(len(spaces) == _state['i'])
#indent = ~~(!(' ' * (_state['i'] + 2)) nots) -> fn_indent(_state)
#deindent = ~~(!(' ' * (_state['i'] - 2)) nots) -> fn_deindent(_state)
indent = -> fn_indent(_state)
deindent = -> fn_deindent(_state)
child_list = indent (ntree+):children deindent -> children
ntree = node:parent (child_list?):children -> parent, children
nodes = ntree+
''', {
'_state': {'i': 0},
'fn_indent': indent,
'fn_deindent': deindent,
'fn_print': print,
})
test_string = '\n'.join((
'brother',
' brochild1',
#' gchild1',
#' brochild2',
#' grandchild',
'sister',
#' sischild',
#'brother2',
))
nodes = grammar(test_string).nodes()

PLPGSQL using single quotes in function call (python)

I am having problems when using single quotes in a insert value for a plpgsql function
It looks like this:
"AND (u.firstname LIKE 'koen') OR
(u.firstname LIKE 'dirk')"
This is done with python
I have tried \' and '' and ''' and '''' and ''''' and even '''''''
none of them seem to be working and return the following error:
[FAIL][syntax error at or near "koen"
LINE 1: ...'u.firstname', 'ASC', 'AND (u.firstname LIKE 'koe...
Any help is appreciated!
Thanks a lot!
======================== EDIT =========================
Sorry! here is my plpgsql function:
CREATE FUNCTION get_members(IN in_company_uuid uuid, IN in_start integer, IN in_limit integer, IN in_sort character varying, IN in_order character varying, IN in_querystring CHARACTER VARYING, IN in_filterstring CHARACTER VARYING, IN OUT out_status integer, OUT out_status_description character varying, OUT out_value character varying[]) RETURNS record
LANGUAGE plpgsql
AS $$DECLARE
temp_record RECORD;
temp_out_value VARCHAR[];
--temp_member_struct MEMBER_STRUCT;
temp_iterator INTEGER := 0;
BEGIN
FOR temp_record IN EXECUTE '
SELECT DISTINCT ON
(' || in_sort || ')
u.user_uuid,
u.firstname,
u.preposition,
u.lastname,
array_to_string_ex(ARRAY(SELECT email FROM emails WHERE user_uuid = u.user_uuid)) as emails,
array_to_string_ex(ARRAY(SELECT mobilenumber FROM mobilenumbers WHERE user_uuid = u.user_uuid)) as mobilenumbers,
array_to_string_ex(ARRAY(SELECT c.name FROM targetgroupusers AS tgu LEFT JOIN membercategories as mc ON mc.targetgroup_uuid = tgu.targetgroup_uuid LEFT JOIN categories AS c ON mc.category_uuid = c.category_uuid WHERE tgu.user_uuid = u.user_uuid)) as categories,
array_to_string_ex(ARRAY(SELECT color FROM membercategories WHERE targetgroup_uuid IN(SELECT targetgroup_uuid FROM targetgroupusers WHERE user_uuid = u.user_uuid))) as colors
FROM
membercategories AS mc
LEFT JOIN
targetgroups AS tg
ON
tg.targetgroup_uuid = mc.targetgroup_uuid
LEFT JOIN
targetgroupusers AS tgu
ON
tgu.targetgroup_uuid = tg.targetgroup_uuid
LEFT JOIN
users AS u
ON
u.user_uuid = tgu.user_uuid
WHERE
mc.company_uuid = ''' || in_company_uuid || '''
' || in_querystring || '
' || in_filterstring || '
ORDER BY
' || in_sort || ' ' || in_order || '
OFFSET
' || in_start || '
LIMIT
' || in_limit
LOOP
temp_out_value[temp_iterator] = ARRAY[temp_record.user_uuid::VARCHAR(36),
temp_record.firstname,
temp_record.preposition,
temp_record.lastname,
temp_record.emails,
temp_record.mobilenumbers,
temp_record.categories,
temp_record.colors];
temp_iterator = temp_iterator+1;
END LOOP;
out_status := 0;
out_status_description := 'Members retrieved';
out_value := temp_out_value;
RETURN;
END$$;
Here is how i call the function:
def get_members(companyuuid, start, limit, sort, order, querystring = None, filterstring = None):
logRequest()
def doWork(cursor):
if not companyuuid:
raise Exception("companyuuid cannot be None!")
queryarray = [str(s) for s in querystring.split("|")]
queryfields = ['firstname', 'preposition', 'lastname', 'emails', 'mobilenumbers']
temp_querystring = ""
for j in xrange(len(queryfields)):
for i in xrange(len(queryarray)):
temp_querystring += "(u.%s LIKE ''%%%s%'') OR "%(queryfields[j], queryarray[i])
temp_querystring = "AND %s"%temp_querystring.rstrip(" OR ")
temp_filterstring = filterstring
print "querystring: %s"%temp_querystring
heizoodb.call(cursor=cursor,
scheme="public",
function="get_members",
functionArgs=(companyuuid, start, limit, sort, order, temp_querystring, temp_filterstring),
returnsValue=True)
And my latest error =D
com.gravityzoo.core.libs.sql.PostgreSQLDB.runSQLTransaction: not enough arguments for format string, result=[None]
SQLinjection to be added later ;)
Thanks!

I don't know Python well, but it probably supports data binding where you first prepare a statement (and you don't need quotes around the question marks there):
prepare("..... AND (u.firstname LIKE ?) OR (u.firstname LIKE ?)")
and then you call execute('koen', 'dirk') or whatever that function is called in Python.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PLY Lex: ID could be anything - python

Related

PyParsing: parse if not a keyword

Dealing with ZeroOrMore in pyparsing

pyparsing, ignore can't ignore string inline

Parsing significant whitespace with a PEG in Python (specificly Parsley)

PLPGSQL using single quotes in function call (python)

Categories

Resources