I am having an issue parsing a string and adding parenthesis to it in Python. One of my issues is that the string may be input in a fully parenthesized manner (ie (01U(1U0))) or it may be none at all (ie 01U1U0). The grammar that I cannot seem to split it on is:
A -> e* | (eUe)
e -> any combination of chars from the grammar
The e* has higher precedence and than the U.
Hopefully this makes sense. Anyone have any ideas how I can parse through and check parenthesis?
What is probably the simplest way to write a parser from a LL grammar is to have one function per non-terminal symbol. This function should eat the characters from the input string corresponding to one reduction of the rules. Below is the parser corresponding to the grammar
A -> e+ | '(' A 'U' A ')'
e -> '0' | '1'
This is not exactly the grammar you wanted, but it was more relevent from the examples you gave. The grammar is LL(1), you just need to read one character to know how to proceed. The following parser defines two functions A() and e() for those two non-terminal symbols :
class MySyntaxError(Exception):
def __init__(self, text, index):
self.text = text
self.index = index
def __str__(self):
return "Syntax error at index " + repr(self.index) + ": " + self.text
def parse(input):
global index
index = 0
def eat_char(set):
global index
if index < len(input) and input[index] in set:
index += 1
else:
raise MySyntaxError("expected char in " + repr(set), index)
def e():
eat_char(['0', '1'])
def A():
global index
if index == len(input): # Successfully parsed
return
elif input[index] in ['0', '1']:
while (input[index] in ['0', '1']):
e()
elif input[index] is '(':
eat_char(['('])
A()
eat_char(['U'])
A()
eat_char([')'])
else:
raise MySyntaxError("expected char '0', '1' or '('", index)
A()
if index != len(input): # Parsing didn't reach the end of the string
raise MySyntaxError("parsing ended before the end of the input", index)
def test(string):
try:
parse(string)
print "The string " + string + " have been successfully parsed"
except MySyntaxError as e:
print "The string " + string + " can't be parsed:"
print e
test("(01U(1U0))")
test("(01U(1U0)") # Missing parenthesis, syntax error
test("(01U(1U0)))") # Too many parenthesis, syntax error
test("01U(1U0)") # No parenthesis, syntax error
Note taht using e* instead of e+ makes empty reducing to A which complicates the parser.
The push-down automata is hidden in the parser. o build the automaton from the grammar is not so simple. Here, the PDA have only one state. We can describe the automaton this way:
from state [1]
read a '0' or a '1' and loop into state [1]
read a '(', push the parenthesis and loop into state [1]
read a 'U', when there is an open parenthesis on the top of stack, push the 'U' and loop into state [1]
read a ')', when there is a 'U' on the top of the stack, pop the 'U', pop the opend parenthesis following, and loop into state[1]
There is a straightforward way to write this automaton in Python. Usually, you need a goto to write an automaton. Each state binds to a label, and going from one state to another is done with a goto. Unfortunately, there is no goto in Python. Howeve, since you have only one state, it isn't necessary and a loop will do:
def parse(input):
index = 0
stack = [];
def top():
if len(stack) > 0:
return stack[len(stack)-1]
else:
return None
while index < len(input):
if input[index] in ['0', '1']:
pass
elif input[index] is '(':
stack.append('(')
elif input[index] is 'U' and top() == '(':
stack.append('U')
elif input[index] is ')' and top() == 'U':
stack.pop()
stack.pop()
else:
raise MySyntaxError("Unexpected char '" + input[index] + "'", index)
index += 1
if len(stack):
raise MySyntaxError("unterminated input", index)
This is a PDA with an explicit stack. The previous parser used the program stack instead of an implicit stack, and remembered the number of parenthesis read in the number of recursive calls. This second parser is usefull when you want to check the validity of a string. But it is unhandy when you want to produce a representation of the input, like an Abstract Syntax Tree: they are usually built from grammar and thus easier to produce from the first parser. The first one is also easier to read as you don't have to compute the automaton to understand it.
Related
So far I have done this. I am stuck on recursion. I have no idea how to move forward, joining and reversing etc.
def callrecursion(s):
a=s.index('(')
z=len(s) - string[::-1].index(')') -1
newStr=s[a+1:z]
# Something is missing here i cant figure it out
print(newStr)
return newStr
def reverseParentheses(s):
if '(' in s:
return reverseParentheses(callrecursion(s))
print('wabba labba dub dub')
else:
return s
string='a(bcdefghijkl(mno)p)q'
reverseParentheses(string)
EXPECTED OUTPUT : "apmnolkjihgfedcbq"
def reverseParentheses(s):
if '(' in s:
posopen=s.find('(')
s=s[:posopen]+reverseParentheses(s[posopen+1:])
posclose=s.find(')',posopen+1)
s=s[:posopen]+s[posopen:posclose][::-1]+s[posclose+1:]
return s
string='a(bcdefghijkl(mno)p)q'
print(string)
print(reverseParentheses(string))
print('apmnolkjihgfedcbq') # your test
string='a(bc)(ef)g'
print(string)
print(reverseParentheses(string))
The idea is to go 'inward' as long as possible (where 'inward' does not even mean 'nesting', it goes as long as there are any opening parentheses), so the innermost pairs are flipped first, and then the rest as the recursion returns. This way 'parallel' parentheses seem to work too, while simple pairing of "first opening parentheses" with "last closing ones" do not handle them well. Or at least that is what I think.
Btw: recursion is just a convoluted replacement for rfind here:
def reverseParentheses(s):
while '(' in s:
posopen=s.rfind('(')
posclose=s.find(')',posopen+1)
s=s[:posopen]+s[posopen+1:posclose][::-1]+s[posclose+1:]
return s;
(... TBH: now I tried, and the recursive magic dies on empty parentheses () placed in the string, while this one works)
I've come up with tho following logic (assuming the parentheses are properly nested).
The base case is the absence of parentheses in s, so it is returned unchanged.
Otherwise we locate indices of leftmost and rightmost opening and closing parentheses
(taking care of possible string reversal, so ')' might appear opening and '(' -- as closing).
Having obtained beg and end the remaining job is quite simple: one has to pass the reversed substring contained between beg and end to the subsequent recursive call.
def reverseParentheses(s):
if s.find('(') == -1:
return s
if s.find('(') < s.find(')'):
beg, end = s.find('('), s.rfind(')')
else:
beg, end = s.find(')'), s.rfind('(')
return s[:beg] + reverseParentheses(s[beg + 1:end][::-1]) + s[end + 1:]
Assuming that number of opening and closing brackets always match, this might be the one of the simplest method to reverse words in parenthesis:
def reverse_parentheses(st: str) -> str:
while True:
split1 = st.split('(')
split2 = split1[-1].split(')')[0]
st = st.replace(f'({split2})', f'{split2[::-1]}')
if '(' not in st and ')' not in st:
return st
# s = "(abcd)"
# s = "(ed(et)el)"
# s = "(ed(et(oc))el)"
# s = "(u(love)i)"
s= "((ng)ipm(ca))"
reversed = reverse_parentheses(s)
print(reversed)
You have a few issues in your code, and much of the logic missing. This adapts your code and produces the desired output:
def callrecursion(s):
a=s.index('(')
# 's' not 'string'
z=len(s) - s[::-1].index(')') -1
newStr=s[a+1:z][::-1]
# Need to consider swapped parentheses
newStr=newStr.replace('(', "$") # Placeholder for other swap
newStr=newStr.replace(')', "(")
newStr=newStr.replace('$', ")")
#Need to recombine initial and trailing portions of original string
newStr = s[:a] + newStr + s[z+1:]
return newStr
def reverseParentheses(s):
if '(' in s:
return reverseParentheses(callrecursion(s))
print('wabba labba dub dub')
else:
return s
string='a(bcdefghijkl(mno)p)q'
print(reverseParentheses(string))
>>>apmnolkjihgfedcbq
While the existing O(n^2) solutions were sufficient here, this problem is solvable in O(n) time, and the solution is pretty fun.
The idea is to build a k-ary tree to represent our string, and traverse it with DFS. Each 'level' of the tree represents one layer of nested parentheses. There is one node for each set of parentheses, and one node for each letter, so there are only O(n) nodes in the tree.
For example, the tree-nodes at the top level are either:
A letter that is not contained in parentheses
A tree-node representing a pair of parentheses at the outermost layer of our string, which may have child tree-nodes
To get the effect of reversals, we can traverse the tree in a depth-first way recursively. Besides knowing our current node, we just need to know if we're in 'reverse mode': a boolean to tell us whether to visit our node's children from left to right, or right to left.
Every time we go down a level in our tree, whether we're in 'reverse mode' or not is flipped.
Python code:
class TreeNode:
def __init__(self, parent=None):
self.parent = parent
self.children = []
def reverseParentheses(s: str) -> str:
root_node = TreeNode()
curr_node = root_node
# Build the tree
for let in s:
# Go down a level-- new child
if let == '(':
new_child = TreeNode(parent=curr_node)
curr_node.children.append(new_child)
curr_node = new_child
# Go back to our parent
elif let == ')':
curr_node = curr_node.parent
else:
curr_node.children.append(let)
answer = []
def dfs(node, is_reversed: bool):
nonlocal answer
num_children = len(node.children)
if is_reversed:
range_start, range_end, range_step = num_children-1, -1, -1
else:
range_start, range_end, range_step = 0, num_children, 1
for i in range(range_start, range_end, range_step):
if isinstance(node.children[i], str):
answer.append(node.children[i])
else:
dfs(node.children[i], not is_reversed)
dfs(root_node, False)
return ''.join(answer)
Here is the correct version for your callrecursion function:
def callrecursion(text):
print(text)
a = text.find('(') + 1
z = text.rfind(')') + 1
newStr = text[:a - 1] + text[a:z-1][::-1].replace('(', ']').replace(')', '[').replace(']', ')').replace('[', '(') + text[z:]
return newStr
You probably have to take into account if the parethesis is the first/last character.
I would understand how to do this assuming that I was only looking for one specific character, but in this instance I am looking for any of the 4 operators, '+', '-', '*', '/'. The function returns -1 if there is no operator in the passed string, txt, otherwise it returns the position of the leftmost operator. So I'm thinking find() would be optimal here.
What I have so far:
def findNextOpr(txt):
# txt must be a nonempty string.
if len(txt) <= 0 or not isinstance(txt, str):
print("type error: findNextOpr")
return "type error: findNextOpr"
if '+' in txt:
return txt.find('+')
elif '-' in txt:
return txt.find('-')
else
return -1
I think if I did what I did for the '+' and '-' operators for the other operators, it wouldn't work for multiple instances of that operator in one expression. Can a loop be incorporated here?
Your current approach is not very efficient, as you will iterate over txt, multiple times, 2 (in and find()) for each operator.
You could use index() instead of find() and just ignore the ValueError exception , e.g.:
def findNextOpr(txt):
for o in '+-*/':
try:
return txt.index(o)
except ValueError:
pass
return -1
You can do this in a single (perhaps more readable) pass by enumerate()ing the txt and return if you find the character, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
return i
return -1
Note: if you wanted all of the operators you could change the return to yield, and then just iterate over the generator, e.g.:
def findNextOpr(txt):
for i, c in enumerate(txt):
if c in '+-*/':
yield i
In []:
for op in findNextOpr('1+2-3+4'):
print(op)
Out[]:
1
3
5
You can improve your code a bit because you keep looking at the string a lot of times. '+' in txt actually searches through the string just like txt.find('+') does. So you can combine those easily to avoid having to search through it twice:
pos = txt.find('+')
if pos >= 0:
return pos
But this still leaves you with the problem that this will return for the first operator you are looking for if that operator is contained anywhere within the string. So you don’t actually get the first position any of these operators is within the string.
So what you want to do is look for all operators separately, and then return the lowest non-negative number since that’s the first occurence of any of the operators within the string:
plusPos = txt.find('+')
minusPos = txt.find('-')
multPos = txt.find('*')
divPos = txt.find('/')
return min(pos for pos in (plusPos, minusPos, multPos, divPos) if pos >= 0)
First, you shouldn't be printing or returning error messages; you should be raising exceptions. TypeError and ValueError would be appropriate here. (A string that isn't long enough is the latter, not the former.)
Second, you can simply find the the positions of all the operators in the string using a list comprehension, exclude results of -1, and return the lowest of the positions using min().
def findNextOpr(text, start=0):
ops = "+-/*"
assert isinstance(text, str), "text must be a string"
# "text must not be empty" isn't strictly true:
# you'll get a perfectly sensible result for an empty string
assert text, "text must not be empty"
op_idxs = [pos for pos in (text.find(op, start) for op in ops) if pos > -1]
return min(op_idxs) if op_idxs else -1
I've added a start argument that can be used to find the next operator: simply pass in the index of the last-found operator, plus 1.
I'm trying to split and organize a string in a single function, my goal is to seperate lowercase and uppercase characters and then return a new string essentially like so:
"lowercasestring" + " " + "uppercasestring".
Importantly all characters must return in the order they were recieved but split up.
My problem is that i have to do this recursively in a single function(for educational purposes) and i struggle to understand how this is doable without an external function calling the recursive and then modifying the string.
def split_rec(string):
if string == '':
return "-" #used to seperate late
elif str.islower(string[0]) or string[0] == "_" or string[0] == ".": #case1
return string[0] + split_rec(string[1:])
elif str.isupper(string[0]) or string[0] == " " or string[0] == "|": #case2
return split_rec(string[1:]) + string[0]
else: #discard other
return split_rec(string[1:])
def call_split_rec(string):
##Essentially i want to integrate the functionality of this whole function into the recursion
mystring = split_rec(string)
left, right = mystring.split("-")
switch_right = right[::1]
print(left + " " + switchright)
The recursion alone would return:
"lowerUPPERcaseCASE" -> "lowercase" + "ESACREPPU"
My best attempt at solving this in a single function was to make case2:
elif str.isupper(string[-1]) or string[-1] == " " or string[-1] == "|": #case2
return split_rec(string[:-1]) + string[-1]
So that the uppercase letters would be added with last letter first, in order to correctly print the string. The issue here is that i obviously just get stuck when the first character is uppercase and the last one is lowercase.
I've spent alot of time trying to figure out a good solution to this, but im unable and there's no help for me to be found. I hope the question is not too stupid - if so feel free to remove it. Thanks!
I wouldn't do this recursively, but I guess you don't have a choice here. ;)
The simple way to do this in one function is to use a couple of extra arguments to act as temporary storage for the lower and upper case chars.
def split_rec(s, lo='', up=''):
''' Recursively split s into lower and upper case parts '''
# Handle the base case: s is the empty string
if not s:
return lo + ' ' + up
#Otherwise, append the leading char of s
# to the appropriate destination...
c = s[0]
if c.islower():
lo += c
else:
up += c
# ... and recurse
return split_rec(s[1:], lo, up)
# Test
print(split_rec("lowerUPPERcaseCASE"))
output
lowercase UPPERCASE
I have a couple of comments about your code.
It's not a great idea to use string as a variable name, since that's the name of a standard module. It won't hurt anything, unless you want to import that module, but it's still potentially confusing to people reading your code. The string module doesn't get a lot of use these days, but in the early versions of Python the standard string functions lived there. But then the str type inherited those functions as methods, making the old string functions obsolete.
And on that note, you generally should call those str methods as methods, rather than as functions. So don't do:
str.islower(s[0])
instead, do
s[0].islower()
Another take with recursive helper functions
def f(s):
def lower(s):
if not s:
return ''
c = s[0] if s[0].islower() else ''
return c + lower(s[1:])
def upper(s):
if not s:
return ''
c = s[0] if s[0].isupper() else ''
return c + upper(s[1:])
return lower(s) + ' ' + upper(s)
The easiest way would be to use sorted with a custom key:
>>> ''.join(sorted("lowerUPPERcaseCASE" + " ", key=str.isupper))
'lowercase UPPERCASE'
There's really no reason to use any recursive function here. If it's for educational purpose, you could try to find a problem for which it's actually a good idea to write a recursive function (fibonacci, tree parsing, merge sort, ...).
As mentioned by #PM2Ring in the comments, this sort works fine here because Python sorted is stable: when sorting by case, letters with the same case stay at the same place relative to one another.
Here is a way to do it with only the string as parameter:
def split_rec(s):
if not '|' in s:
s = s + '|'
if s.startswith('|'):
return s.replace('|', ' ')
elif s[0].islower():
return s[0] + split_rec(s[1:])
elif s[0].isupper():
# we move the uppercase first letter to the end
s = s[1:] + s[0]
return split_rec(s)
else:
return split_rec(s[1:])
split_rec('aAbBCD')
# 'ab ABCD'
The idea is:
We add a marker at the end (I chose |)
If the first char is lowercase, we return it + the organized rest
If it is uppercase, we move it to the end, and reorganize the whole string
We stop once we reach the marker: the current string is the marker followed by the organized uppercase letters. We replace the marker by a space and return it.
The goal is to implement a simplification operation: remove the parentheses around the very first element in an expression tree and in each of its sub-expression trees, where the expression is given as a string input enclosed in various parentheses. This must work for an arbitrary number of parentheses, so for example:
(12)3((45)6) -> 123(456), remove the parentheses around 12 then around 45
((12)3)4(((5)67)8) -> 1234(5678), remove the parentheses around 12, then 123, then 5, then 567. Do not remove the parentheses around 5678 since that is the second element.
How do I do this?
EDIT: So far what I have is this:
def simplify(expression):
"""
call itself recursively until no consecutive parentheses exist
"""
result = []
consec_parens = 0
inside_nested = False
for char in expression:
if char == ')' and inside_nested:
inside_nested = False
consec_parens = 0
continue
if char == '(':
consec_parens += 1
else:
consec_parens = 0
if consec_parens == 2:
inside_nested = True
else:
result.append(char)
result = ''.join(result)
if result == expression:
return result
return simplify(result)
It works for all cases where the number of nested parentheses is at least two, but it doesn't work for the head, i.e. for (AB)C, it does not remove the parentheses around AB. However, for ((AB)C) it removes the parentheses around AB resulting in (ABC).
This can be viewed as a finite state machine (with three states) which you instantiate once per level, where each ( symbol creates a new level. Alternatively, it is a deterministic pushdown automaton with two trivial states (an in-progress state and a done state, neither of which we model explicitly) and three stack symbols, each representing the state of the machine for the current level:
Before - The state we are in immediately after entering a level. Encountering any characters except ) transitions to some other state.
Inside - The state we are in while inside parentheses that need to be removed. Entered by encoutering a ( while in Before.
Done - The state we are in when the current level has been processed. This means that either we already removed a set of parentheses or we did not need to, since the first element wasn't enclosed in them.
Additionally, encountering a ( pushes a new symbol onto the stack, which models entering a new level, and a ) pops the one symbol from it, which models leaving from a level. All input characters get appended onto the result except when the Before → Inside and Inside → Done transitions occur.
The following code is a simple translation of the above into Python:
from enum import Enum
class State(Enum):
Before = 0
Inside = 1
Done = 2
def simplify(expression):
levels = [State.Before]
result = []
for c in expression:
if c == '(':
if levels[-1] == State.Before:
levels[-1] = State.Inside
else:
result.append(c)
levels.append(State.Before)
elif c == ')':
levels.pop()
if levels[-1] == State.Inside:
levels[-1] = State.Done
else:
result.append(c)
else:
if levels[-1] == State.Before:
levels[-1] = State.Done
result.append(c)
return ''.join(result)
Testing the above out, we get:
>>> simplify('(12)3((45)6)')
'123(456)'
>>> simplify('((12)3)4(((5)67)8)')
'1234(5678)'
>>> simplify('(AB)C')
'ABC'
>>> simplify('((AB)C)')
'ABC'
I'm solving such issue:
I need to implement a feature for a string to find errors in the usage of brackets.
If string is balanced, then I should return
{}
Success
If not, I need to mention the position of problematic bracket.
{[}
3
So for that reason I decided to create a class Bracket
class Bracket:
def __init__(self, bracket_type, position):
self.bracket_type = bracket_type
self.position = position
def Match(self, c):
if self.bracket_type == '[' and c == ']':
return True
if self.bracket_type == '{' and c == '}':
return True
if self.bracket_type == '(' and c == ')':
return True
return False
Next, I'm using stack, whether sting is balanced or not. I created a loop, going through every symbol, and if symbol is a bracket I want to assign it to my special class in order to match for closing one further.
if __name__ == "__main__":
text = sys.stdin.read()
brackets_stack = []
Balanced = True
for i, symbol in enumerate(str(text)):
if symbol in ['(', '[', '{']:
j = i
brackets_stack.append(symbol)
new_bracket = Bracket(brackets_stack[j], j)
elif new_bracket.Match(symbol) == True:
brackets_stack.pop(i)
elif len(brackets_stack) == 0:
print("Success")
But it works good only with cases like this
{}
Success
For other tests, like
[()]
It shows that array is not empty yet, as his tenth is equal to 1. I think the problem lies in variable new_bracket. After removing "(" from array my program doesn't compare "{" for matching.
I don't know why.
Can somebody help me?
Your code has a few problems. First, there are a few things that I want to address:
Your code will throw an error if the symbols string does not start with an opening bracket. This is because you initialize new_bracket inside of the first if statement. This would cause problems since if the first character is NOT an opening bracket, it will try to call new_bracket in the elif statement, and since it has not been initiated, it will raise an UnboundLocalError. This can be fixed with an elif symbol in [')', ']', '}']: return False. Since starting with a close bracket obviously means its unbalanced, this makes sense and prevents the program from calling new_bracket without it being initialized.
Your use of the j variable is useless since you never use it outside of the scope of that if statement. You could have just used i.
Your use of list.pop(i) may cause errors if i is out of range of brackets_stack. You don't need the index since you want the symbol at the end of the stack, which is automatically what list.pop() does.
elif len(brackets_stack) == 0: should not be inside of the for loop. This should be outside of the loop since you only want to say success after all the text has been processed.
new_bracket = Bracket(brackets_stack[j], j) is wrong in many ways. The first is that you don't want to make a new symbol with brackets_stack[j], you want it to be made with either text[j] or better yet, symbol. Another is that this will only be changed whenever a new open bracket is seen. However, if you find the match to that bracket, since you never change this variable, it will stay as the previous bracket. You should re-declare what the symbol of your new_bracket should be. The name itself is then misleading and is changed below.
With all this in mind, below is a revised version of what you want to do, encapsulated in a function. There are some stuff you need to add to your class as well.
class Bracket:
def __init__(self, bracket_type, position):
self.bracket_type = bracket_type
self.position = position
def Match(self, c):
if self.bracket_type: //handles empty bracket_type
if self.bracket_type == '[' and c == ']':
return True
if self.bracket_type == '{' and c == '}':
return True
if self.bracket_type == '(' and c == ')':
return True
return False
//update function to use the same object instead of initializing a new one each time (better space complexity)
def update(self, new_type, new_position):
self.bracket_type = new_type
self.position = new_position
// pass in text to check if text has balanced brackets or not
def isBalanced(text):
brackets_stack = []
current_bracket = Bracket('',0) //initialize an empty bracket
for i, symbol in enumerate(str(text)):
print(brackets_stack)
if symbol in ['(', '[', '{']:
brackets_stack.append(symbol)
current_bracket.update(symbol, i) //call update function
elif symbol in [')', ']', '}']:
return False
elif new_bracket.Match(symbol):
brackets_stack.pop()
else: //since it assumes symmetrical balanced brackets, this can return False immediately
return False
//same as saying return len(list) == 0
return not(len(brackets_stack))
This code assumes that text ONLY contains brackets. I do not know if this is a reasonable assumption and you should keep this in mind for whatever application you are using it for. Also, like MTset commented on your post, it also assumes simply, symmetrical, balanced brackets. [{]} will return False, although it is technically balanced.