Translating an EBNF grammar to pyparsing give error - python

I am making a parser to convert a simple DSL into elasticsearch query. some of the possible queries are:
response:success
response:success AND extension:php OR extension:css
response:sucess AND (extension:php OR extension:css)
time >= 2020-01-09
time >= 2020-01-09 AND response:success OR os:windows
NOT reponse:success
response:success AND NOT os:windows
I have written the following EBNF grammar for this :
<expr> ::= <or>
<or> ::= <and> (" OR " <and>)*
<and> ::= <unary> ((" AND ") <unary>)*
<unary> ::= " NOT " <unary> | <equality>
<equality> ::= (<word> ":" <word>) | <comparison>
<comparison> ::= "(" <expr> ")" | (<word> (" > " | " >= " | " < " | " <= ") <word>)+
<word> ::= ("a" | "b" | "c" | "d" | "e" | "f" | "g"
| "h" | "i" | "j" | "k" | "l" | "m" | "n"
| "o" | "p" | "q" | "r" | "s" | "t" | "u"
| "v" | "w" | "x" | "y" | "z")+
The precdence of operators in the DSL is:
() > NOT > AND > OR
aslo exact mathing i.e ':' has higher precedence than comparison operators.
I believe the above grammar capture the idea of my DSL. I am having a difficult time translating it to pyparsing, this is what i have now:
from pyparsing import *
AND = Keyword('AND') | Keyword('and')
OR = Keyword('OR') | Keyword('or')
NOT = Keyword('NOT') | Keyword('not')
word = Word(printables, excludeChars=':')
expr = Forward()
expr << Or
Comparison = Literal('(') + expr + Literal(')') + OneOrMore(word + ( Literal('>') | Literal('>=') | Literal('<') | Literal('<=')) + word)
Equality = (word + Literal(':') + word) | Comparison
Unary = Forward()
Unary << (NOT + Unary) | Equality
And = Unary + ZeroOrMore(AND + Unary)
Or = And + ZeroOrMore(OR + And)
The error i get is :
Traceback (most recent call last):
File "qql.py", line 54, in <module>
expr << Or
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pyparsing.py", line 5006, in __lshift__
self.mayIndexError = self.expr.mayIndexError
AttributeError: type object 'Or' has no attribute 'mayIndexError'
I think its becuase i am unable to understand Forward() correctly.
Questions: how can i correctly translate the above grammar to pyparsing?
**EDIT **: when i changed the pyparsing code to:
AND = Keyword('AND')
OR = Keyword('OR')
NOT = Keyword('NOT')
word = Word(printables, excludeChars=':')
expr = Forward()
Comparison = Literal('(') + expr + Literal(')') + OneOrMore(word + ( Literal('>') | Literal('>=') | Literal('<') | Literal('<=')) + word)
Equality = (word + Literal(':') + word) | Comparison
Unary = Forward()
Unary << ((NOT + Unary) | Equality)
And = Unary + ZeroOrMore(AND) + Unary
Or = And + ZeroOrMore(OR + And)
expr << Or
Q = """response : 200 \
AND extesnion: php \
OR extension: css \
"""
print(expr.parseString(Q))
I get this output:
['response', ':', '200', 'AND', 'extesnion', ':', 'php']
why OR expression is not parsed?

Related

Is there anyway to print complex tables embedded in strings in python?

I'm trying to print a confusion matrix, but I'm not sure if there is an easy way to print this kind of tables without write complex loops to concatenate the string. Right now, I just did it like this example. Is it there a different way.
t1 = "|" + " " \* 8 + "|" + " " \* 10 + "| Predicted |" + " " \*10 + "|\\n"
t2 = "|" + " " \* 8 + "|" + " " \* 10 + "| Positive |" + " Negative |\\n"
t3 = "| Actual |" + " Positive |" + " " \* (10 - len(str(TP))) + str(TP) + " |" + " " \* (9 - len(str(FN))) + str(FN) + " |\\n"
t4 = "|" + " " \* 8 + "|" + " Negative |" + " " \* (10 - len(str(FP))) + str(FP) + " |" + " " \* (9 - len(str(TN))) + str(TN) + " |\\n"
print(t1+t2+t3+t4)
| | | Predicted | |
| | | Positive | Negative |
| Actual | Positive | 1 | 2 |
| | Negative | 1 | 2 |
I am trying to find an easy way to print this kind of tables.

Flatten nested array in Spark DataFrame

I'm reading in some JSON on the from:
{"a": [{"b": {"c": 1, "d": 2}}]}
That is, the array items are unnecessarily nested. Now, because this happens inside an array, the answers given in How to flatten a struct in a Spark dataframe? don't apply directly.
This is how the dataframe looks when parsed:
root
|-- a: array
| |-- element: struct
| | |-- b: struct
| | | |-- c: integer
| | | |-- d: integer
I'm looking to transform the dataframe into this:
root
|-- a: array
| |-- element: struct
| | |-- b_c: integer
| | |-- b_d: integer
How do I go about aliasing the columns inside the array to effectively unnest it?
You can use transform:
df2 = df.selectExpr("transform(a, x -> struct(x.b.c as b_c, x.b.d as b_d)) as a")
Using the method presented in the accepted answer I wrote a function to recursively unnest a dataframe (recursing into nested arrays as well):
from pyspark.sql.types import ArrayType, StructType
def flatten(df, sentinel="x"):
def _gen_flatten_expr(schema, indent, parents, last, transform=False):
def handle(field, last):
path = parents + (field.name,)
alias = (
" as "
+ "_".join(path[1:] if transform else path)
+ ("," if not last else "")
)
if isinstance(field.dataType, StructType):
yield from _gen_flatten_expr(
field.dataType, indent, path, last, transform
)
elif (
isinstance(field.dataType, ArrayType) and
isinstance(field.dataType.elementType, StructType)
):
yield indent, "transform("
yield indent + 1, ".".join(path) + ","
yield indent + 1, sentinel + " -> struct("
yield from _gen_flatten_expr(
field.dataType.elementType,
indent + 2,
(sentinel,),
True,
True
)
yield indent + 1, ")"
yield indent, ")" + alias
else:
yield (indent, ".".join(path) + alias)
try:
*fields, last_field = schema.fields
except ValueError:
pass
else:
for field in fields:
yield from handle(field, False)
yield from handle(last_field, last)
lines = []
for indent, line in _gen_flatten_expr(df.schema, 0, (), True):
spaces = " " * 4 * indent
lines.append(spaces + line)
expr = "struct(" + "\n".join(lines) + ") as " + sentinel
return df.selectExpr(expr).select(sentinel + ".*")
Simplified Approach:
from pyspark.sql.functions import col
def flatten_df(nested_df):
stack = [((), nested_df)]
columns = []
while len(stack) > 0:
parents, df = stack.pop()
flat_cols = [
col(".".join(parents + (c[0],))).alias("_".join(parents + (c[0],)))
for c in df.dtypes
if c[1][:6] != "struct"
]
nested_cols = [
c[0]
for c in df.dtypes
if c[1][:6] == "struct"
]
columns.extend(flat_cols)
for nested_col in nested_cols:
projected_df = df.select(nested_col + ".*")
stack.append((parents + (nested_col,), projected_df))
return nested_df.select(columns)
ref: https://learn.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema

How to have a python script print out the exact match in a txt file?

I have a python script that takes user input, then searches a txt file and prints the line. However, when the user inputs a number that is apart of other numbers it will print out everything in relation.
For example, if the user inputs STT1, but there is another line thats called STT12, 13, 14 etc. The script prints out each line since it has the number 1 in it. How would do I make the script only print out the exact match?
Example Output
Please enter your area: TTT17
Please enter STT name: STT1
STT Location:
TTT17 | STT1 | Floor 1 | Row 2 | Section 2
STT Location:
TTT17 | STT13 | Floor 1 | Row 22 | Section 2
STT Location:
TTT17 | STT14 | Floor 1 | Row 22 | Section 2
STT Location:
TTT17 | STT17 | Floor 1 | Row 42 | Section 2
STT Location:
TTT17 | STT18 | Floor 1 | Row 42 | Section 2
Example Code
def find_stt():
with open('{}'.format(db_file), 'r') as f:
find_flag = False
for line in f.readlines():
if line.startswith(area) and name in line:
print("\n" + "\n" + "\n" + color.BOLD + "STT Location:" + color.END +
"\n" + "\n" + color.BOLD + ' '.join(line.split()) + color.END + "\n")
find_flag = True
if not find_flag:
failed_search()
You are checking with name in line. This condition will be true for all the substrings that contain name. Using regular expression will be a solution for this.
if line.startswith(area) and re.search(r'\b'+ name + r'\b', line)
I was able to figure it out. May not be the best way, but it works. Basically I took my input variable,
name = input("Please enter STT name: ").upper()
and created a new variable with it adding a space at the end
name_search = name + " "
def find_stt():
with open("{}".format(db_file), "r") as f:
find_flag = False
for line in f.readlines():
if line.startswith(area) and name_search in line:
print("\n" + "\n" + "\n" + color.BOLD + "SPP Location:" + color.END +
"\n" + "\n" + color.BOLD + " ".join(line.split()) + color.END +
"\n")
find_flag = True
if not find_flag:
failed_search()
This resolved my issue.

Create a 3x3 grid

I am fairly new to Python and I want to create a printable 3x3 grid to represent a Tic-Tac-Toe board. I just want ideas to tidy it up and make the code look better many thanks
def display_board(board):
print(' | | ')
print(board[7] + ' | ' + board[8] + ' | ' + board[9])
print(' | | ')
print('---- ---- ----')
print(' | | ')
print(board[4] + ' | ' + board[5] + ' | ' + board[6])
print(' | | ')
print('---- ---- ----')
print(' | | ')
print(board[1] + ' | ' + board[2] + ' | ' + board[3])
print(' | | ')
This function does create the desired result but I want ideas of how to tidy it up. I've tried for loops but it ends up breaking.
print("""
| |
------------
| |
------------
| |
""")
will result in a board like this:
| |
------------
| |
------------
| |
Similar to Brian's answer however you can set the value for each case within the .format method.
print("""
{}|{}|{}
-----
{}|{}|{}
-----
{}|{}|{}
""".format('x','x','o','o','o','x','o','x','o'))
will result with:
x|x|o
-----
o|o|x
-----
o|x|o
Each {} will be replaced by an argument of the .format method; in order.
Check this link for more information.

python maze using recursion

I want to solve a maze using recursion. The program opens a text file like this one:
10 20
1 1
10 20
-----------------------------------------
| | | | | | | | | |
|-+ +-+-+ +-+ + +-+ + + +-+-+ +-+-+ + + |
| | | | | | | | | |
| + +-+ + + +-+-+-+ + + + + + +-+ + + + |
| | | | | | | | | | |
| +-+-+-+-+ +-+ +-+-+-+-+ +-+ + +-+-+ +-|
| | | | | | | | | | |
| + + +-+ +-+-+ + + + +-+ +-+ + + + +-+ |
| | | | | | | | | | | |
|-+-+ +-+-+-+-+-+-+-+ +-+ +-+-+ +-+-+ +-|
| | | | | | | | | | | |
| +-+-+ +-+-+ +-+ + +-+-+ +-+ +-+ + + + |
| | | | | | | | | | |
|-+ +-+ + + +-+ +-+-+ + +-+ + + +-+ +-+ |
| | | | | | | | | | | | | | | | |
|-+ + +-+ + + + + + +-+ + + + + +-+-+ + |
| | | | | | | |
| + + +-+ + +-+-+-+ + +-+ + + +-+-+ +-+ |
| | | | | | | | | | |
-----------------------------------------
The first line of the file is the size of the maze(10 20), the second line is the starting point(1 1), and the third line is the exit(10, 20). I want to mark the correct path with "*". This is what my code looks like:
EDIT: I change some of the code in the findpath() funtion, and now I dont get any errors but the maze is empty, the path('*') is not 'drawn' on the maze.
class maze():
def __init__(self, file):
self.maze_list = []
data= file.readlines()
size = data.pop(0).split() # size of maze
start = data.pop(0).split() # starting row and column; keeps being 0 because the previous not there anymore
self.start_i = start[0] # starting row
self.start_j = start[1] # starting column
exit = data.pop(0).split() # exit row and column
self.end_i = exit[0]
self.end_j = exit[1]
for i in data:
self.maze_list.append(list(i[:-1])) # removes \n character off of each list of list
print(self.maze_list) # debug
def print_maze(self):
for i in range(len(self.maze_list)):
for j in range(len(self.maze_list[0])):
print(self.maze_list[i][j], end='')
print()
def main():
filename = input('Please enter a file name to be processed: ') # prompt user for a file
try:
file = open(filename, 'r')
except: # if a non-existing file is atempted to open, returns error
print("File Does Not Exist")
return
mazeObject = maze(file)
mazeObject.print_maze() # prints maze
row = int(mazeObject.start_i)
col = int(mazeObject.start_j)
findpath(row, col, mazeObject) # finds solution route of maze if any
def findpath(row, col, mazeObject):
if row == mazeObject.end_i and col == mazeObject.end_j: # returns True if it has found the Exit of the maze
mazeObject.maze_list[row][col] = ' ' # to delete wrong path
return True
elif mazeObject.maze_list[row][col] == "|": # returns False if it finds wall
return False
elif mazeObject.maze_list[row][col] '-': # returns False if it finds a wall
return False
elif mazeObject.maze_list[row][col] '+': # returns False if it finds a wall
return False
elif mazeObject.maze_list[row][col] '*': # returns False if the path has been visited
return False
mazeObject.maze_list[row][col] = '*' # marks the path followed with an *
if ((findpath(row + 1, col, mazeObject))
or (findpath(row, col - 1, mazeObject))
or (findpath(row - 1, col, mazeObject))
or (findpath(row, col + 1, mazeObject))): # recursion method
mazeObject.maze_list[row][col] = ' ' # to delete wrong path
return True
return False
So now my question is, where is the error? I mean the program just prints out the maze without the solution. I want to fill the correct path with "*".
Looking at your code I see several errors. You do not handle the entry and exit row/column pairs correctly. (10,20) is correct for this maze, if you assume that every other row, and every other column, is a grid line. That is, if the | and - characters represent infinitely thin lines that have occasional breaks in them, much like traditional maze drawings.
You'll need to multiple/divide by two, and deal with the inevitable fencepost errors, in order to correctly translate your file parameters into array row/column values.
Next, your findpath function is confused:
First, it should be a method of the class. It accesses internal data, and contains "inner knowledge" about the class details. Make it a method!
Second, your exit condition replaces the current character with a space "to delete wrong path". But if you have found the exit, the path is by definition correct. Don't do that!
Third, you have a bunch of if statements for various character types. That is fine, but please replace them with a single if statement using
if self.maze_list[row][col] in '|-+*':
return False
Fourth, you wait to mark the current cell with a '*' until after your checks. But you should mark the cell before you declare victory when you reach the exit location. Move the exit test down, I think.
That should clean things up nicely.
Fifth, and finally, your recursive test is backwards. Your code returns True when it reached the exit location. Your code returns False when it runs into a wall, or tries to cross its own path. Therefore, if the code takes a dead end path, it will reach the end, return false, unroll the recursion a few times, returning false all along, until it gets back.
Thus, if you EVER see a True return, you know the code found the exit down that path. You want to immediately return true and do nothing else. Certainly don't erase the path - it leads to the exit!
On the other hand, if none of your possible directions return true, then you have found a dead end - the exit does not lie in this direction. You should erase your path, return False, and hope that the exit can be found at a higher level.

Categories

Resources