Remove the parentheses - python

Recently I was solving a problem in Codewars and got stuck. The link of the problem link
Basically what it is asking for is :
You are given a string for example :
"example(unwanted thing)example"
Your task is to remove everything inside the parentheses as well as the parentheses themselves.
The example above would return:
"exampleexample"
Don't worry about other brackets like "[]" and "{}" as these will never appear.
There can be multiple parentheses.
The parentheses can be nested.
Some other test cases are given below :
test.assert_equals(remove_parentheses("hello example (words(more words) here) something"), "hello example something")
test.assert_equals(remove_parentheses("(first group) (second group) (third group)"), " ")
I looked up online and I found some solution involving Regex, but I wanted to solve this problem without Regex.
Till now I have tried similar solutions as given below :
def remove_parentheses(s):
while s.find('(') != -1 or s.find(')') != -1 :
f = s.find('(')
l = s.find(')')
s = s[:f] + s [l+1:]
return s
But when I try to run this snippet, I get Execution Timed Out.

You just need to track the number of open parentheses (the nested depth, technically) to see whether the current character should be included in the output.
def remove_parentheses(s):
parentheses_count = 0
output = ""
for i in s:
if i=="(":
parentheses_count += 1
elif i==")":
parentheses_count -= 1
else:
if parentheses_count == 0:
output += i
return output
print(remove_parentheses("hello example (words(more words) here) something"))
print(remove_parentheses("(first group) (second group) (third group)"))

Use Stack to check whether '(' is closed or not.
If the length of Stack is not zero, that means that parentheses are still open, So you have to ignore the characters.
The code below will pass all the test cases.
def remove_parentheses(s):
stack = []
answer = []
for character in s:
if(character == '('):
stack.append('(')
continue
if(character == ')'):
stack.pop()
continue
if(len(stack) == 0):
answer.append(character)
return "".join(answer)

The reason for your code to have Execution Timed out is because it is stuck in an infinity loop. Since s = s[:f] + s [l+1:] doesn't remove the parentheses properly, such as
a nested example hello example (words(more words) here) something
your code will locate the first ( and the first ) and return hello example here) something on the first loop, which will lead to incorrect result in the next loop as one of your ( is removed.
To be honest, an approach like this is not ideal as it is difficult to understand and read since you have to dry run the index in the loop one by one. You may continue to debug this code and fix the indexing, such as only search the nearest/enclosed closing bracket according to your first located (, which will make it even more harder to read but get the job done.
For me, I would personally suggest you to look up regular expression, or what is often referred as regex,
a very simple algorithm that builds on regex is
import re
def remove_parentheses(s):
s = re.sub("\(.{1,25}\)", "", s)
return s

def f(s):
pairs = []
output = ''
for i, v in enumerate(s):
if "(" == v:
pairs.append(1)
if ")" == v:
pairs.pop()
continue
if len(pairs) ==0:
output +=v
return output

Can be achieved easily if we use a recursive function.. Try this out.
def rmp(st):
if st.find('(') == -1 or st.find(')') == -1: return st
else :
i=st.rindex('(')
j=st[i+1:].index(')')
return rmp(st[:i] + st[i+1+j+1:])
Here are a few cases I tested...
print(rmp("hello example (words(more words) here) something"))
print(rmp("(first group) (second group) (third group)"))
print(rmp("This does(n't) work (so well)"))
print(rmp("(1233)qw()"))
print(rmp("(1(2(3(4(5(6(7(8))))))))abcdqw(hkfjfj)"))
And the results are..
hello example something
This does work
qw
abcdqw

Related

Formatting in python(Kivy) like in Stack overflow

My issue is that I would like to take input text with formatting like you would use when creating a Stackoverflow post and reformat it into the required text string. The best way I can think is to give an example....
# This is the input string
Hello **there**, how are **you**
# This is the intended output string
Hello [font=Nunito-Black.ttf]there[/font], how are [font=Nunito-Black.ttf]you[/font]
SO the ** is replaced by a different string that has an opening and a closing part but also needs to work as many times as needed for any string. (As seen 2 times in the example)
I have tried to use a variable to record if the ** in need of replacing is an opening or a closing part, but haven't managed to get a function to work yet, hence it being incomplete
I think replacing the correct ** is hard because I have been trying to use index which will only return the position of the 1st occurrence in the string
My attempt as of now
def formatting_text(input_text):
if input_text:
if '**' in input_text:
d = '**'
for line in input_text:
s = [e+d for e in line.split(d) if e]
count = 0
for y in s:
if y == '**' and count == 0:
s.index(y)
# replace with required part
return output_text
return input_text
I have tried to find this answer so I'm sorry if has already been asked but I have had no luck finding it and don't know what to search
Of course thank you for any help
A general solution for your case,
Using re
import re
def formatting_text(input_text, special_char, left_repl, right_repl):
# Define re pattern.
RE_PATTERN = f"[{special_char}].\w+.[{special_char}]"
for word in re.findall(RE_PATTERN, input_text):
# Re-assign with replacement with the parts.
new_word = left_repl+word.strip(special_char)+right_repl
input_text = input_text.replace(word, new_word)
return input_text
print(formatting_text("Hello **there**, how are **you**", "**", "[font=Nunito-Black.ttf]", "[/font]"))
Without using re
def formatting_text(input_text, special_char, left_repl, right_repl):
while True:
# Replace the left part.
input_text = input_text.replace(special_char, left_repl, 1)
# Replace the right part.
input_text = input_text.replace(special_char, right_repl, 1)
if input_text.find(special_char) == -1:
# Nothing found, time to stop.
break
return input_text
print(formatting_text("Hello **there**, how are **you**", "**", "[font=Nunito-Black.ttf]", "[/font]"))
However the above solution should work for other special_char like __, *, < etc. But if you want to just make it bold only, you may prefer kivy's bold markdown for label i.e. [b] and escape [/b].
So the formatting stack overflow uses is markdown, implemented in javascript. If you just want the single case to be formatted then you can see an implementation here where they use regex to find the matches and then just iterate through them.
STRONG_RE = r'(\*{2})(.+?)\1'
I would recommend against re-implementing an entire markdown solution yourself when you can just import one.

Python re.findall regex and text processing

I'm looking to find and modify some sql syntax around the convert function. I want basically any convert(A,B) or CONVERT(A,B) in all my files to be selected and converted to B::A.
So far I tried selecting them with re.findall(r"\bconvert\b\(.*?,.*\)", l, re.IGNORECASE) But it's only returning a small selection out of what I want and I also have trouble actually manipulating the A/B I mentioned.
For example, a sample line (note the nested structure here is irrelevant, I'm only getting the outer layer working if possible)
convert(varchar, '/' || convert(nvarchar, es.Item_ID) || ':' || convert(nvarchar, o.Option_Number) || '/') as LocPath
...should become...
'/' || es.Item_ID::nvarchar || ':' || o.Option_Number::nvarchar || '/' :: varchar as LocPath
Example2:
SELECT LocationID AS ItemId, convert(bigint, -1),
...should become...
SELECT LocationID AS ItemId, -1::bigint,
I think this should be possible with some kind of re.sub with groups and currently have a code structure inside a for each loop where line is the each line in the file:
matchConvert = ["convert(", "CONVERT("]
a = next((a for a in matchConvert if a in line), False)
if a:
print("convert() line")
#line = re.sub(re.escape(a) + r'', '', line)
Edit: In the end I went with a non re solution and handled each line by identifying each block and manipulate them accordingly.
This may be an X/Y problem, meaning you’re asking how to do something with Regex that may be better solved with parsing (meaning using/modifying/writing a SQL parser). An indication that this is the case is the fact that “convert” calls can be nested. I’m guessing Regex is going to be more of a headache than it’s worth here in the long run if you’re working with a lot of files and they’re at all complicated.
The task:
Swap the parameters of all the 'convert' functions in this given. Parameters can contain any character, including nested 'convert' functions.
A solution:
def convert_py(s):
#capturing start:
left=s.index('convert')
start=s[:left]
#capturing part_1:
c=0
line=''
for n1,i in enumerate(s[left+8:],start=len(start)+8):
if i==',' and c==0:
part_1=line
break
if i==')':
c-=1
if i=='(':
c+=1
line+=i
#capturing part_2:
c=0
line=''
for n2,i in enumerate(s[n1+1:],start=n1+1):
if i==')':
c-=1
if i=='(':
c+=1
if c<0:
part_2=line
break
line+=i
#capturing end:
end=s[n2+1:]
#capturing result:
result=start+part_2.lstrip()+' :: '+part_1+end
return result
def multi_convert_py(s):
converts=s.count('convert')
for n in range(converts):
s=convert_py(s)
return s
Notes:
Unlike the solution based on the re module, which is presented in another answer - this version should not fail if there are more than two parameters in the 'convert' function in the given string. However, it will swap them only once, for example: convert(a,b, c) --> b, c : a
I am afraid that unforeseen cases may arise that will lead to failure. Please tell if you find any flaws
The task:
Swap the parameters of all the 'convert' functions in the given string. Parameters can contain any character, including nested 'convert' functions.
A solution based on the re module:
def convert_re(s):
import re
start,part_1,part_2,end=re.search(r'''
(.*?)
convert\(
([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
,
([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
\)
(.*)
''',s,re.X).groups()
result=start+part_2.lstrip()+' :: '+part_1+end
return result
def multi_convert_re(s):
converts=s.count('convert')
for n in range(converts):
s=convert_re(s)
return s
Discription of the 'convert_re' function:
Regular expression:
start is the first group with what comes before 'convert'
Then follows convert\() which has no group and contains the name of the function and the opening '('
part_1 is the second group ([^,)(]+\(.+?\)[^,)(]*|[^,)(]+). This should match the first parameter. It can be anything except - ,)(, or a function preceded by anything except ,)(, optionally followed by anything except ,)( and with anything inside (except a new line)
Then follows a comma ,, which has no group
part_2 is the third group and it acts like the second, but should catch everything what's left inside the external function
Then follows ), which has no group
end is the fourth group (.*) with what's left before the new line.
The resulting string is then created by swapping part_1 and part_2, putting ' :: ' between them, removing spaces on the left from part_2 and adding start to the beginning and end to the end.
Description of the 'multi_convert_re' function
Repeatedly calls 'convert_re' function until there are no "convert" left.
Notes:
N.B.: The code implies that the 'convert' function in the string has exactly two parameters.
The code works on the given examples, but I'm afraid there may still be unforeseen flaws when it comes to other examples. Please tell, if you find any flaws.
I have provided another solution presented in another answer that is not based on the re module. It may turn out that the results will be different.
Here's my solution based on #Иван-Балван's code. Breaking this structure into blocks makes further specification a lot easier than I previously thought and I'll be using this method for a lot of other operations as well.
# Check for balanced brackets
def checkBracket(my_string):
count = 0
for c in my_string:
if c == "(":
count+=1
elif c == ")":
count-=1
return count
# Modify the first convert in line
# Based on suggestions from stackoverflow.com/questions/73040953
def modifyConvert(l):
# find the location of convert()
count = l.index('convert(')
# select the group before convert() call
before = l[:count]
group=""
n1=0
n2=0
A=""
B=""
operate = False
operators = ["|", "<", ">", "="]
# look for A group before comma
for n1, i in enumerate(l[count+8:], start=len(before)+8):
# find current position in l
checkIndex = checkBracket(l[count+8:][:n1-len(before)-8])
if i == ',' and checkIndex == 0:
A = group
break
group += i
# look for B group after comma
group = ""
for n2, i in enumerate(l[n1+1:], start=n1+1):
checkIndex = checkBracket(l[count+n1-len(before):][:n2-n1+1])
if i == ',' and checkIndex == 0:
return l
elif checkIndex < 0:
B = group
break
group += i
# mark operators
if i in operators:
operate = True
# select the group after convert() call
after = l[n2+1:]
# (B) if it contains operators
if operate:
return before + "(" + B.lstrip() + ') :: ' + A + after
else:
return before + B.lstrip() + '::' + A + after
# Modify cast syntax with convert(a,b). return line.
def convertCast(l):
# Call helper for nested cases
i = l.count('convert(')
while i>0:
i -= 1
l = modifyConvert(l)
return l

How do I remove specific elements from a string without using replace() method?

I'm trying to traverse through a string using its indices and remove specific elements. Due to string length getting shorter as elements are removed, it always goes out of range by the time the final element is reached.
Here's some code to ilustrate what I'm trying to do. For example, going from "1.2.3.4" to "1234".
string = "1.2.3.4"
for i in range(len(string)):
if string[i] == ".":
string = string[:i] + string[i+1:]
I know there are alternate approaches like using string method called replace() and I can run string = string.replace(string[i], "", 1) OR I can traverse through individual elements (not indicies).
But how would I solve it using the approach above (traversing string indices)? What techniques can I use to halt the loop after it reaches the final element of the string? Without continuing to advance the index, which will go out of range as elements are removed earlier in the string.
Use this:
string = "1.2.3.4"
res = ""
for s in string:
if s != '.':
res += s
The result is of course '1234'.
you can use the re module:
import re
string = "1.2.3.4"
string = re.sub('\.','',string)
print(string)
If I understand correctly, you want to modify a string by its index while the length of it keep changing.
That's pretty dangerous.
The problem you ran into is caused by range(len(string)).See, once the range is fixed, it won't change.And in the loop, string changes, it gets shorter, and that's why you got out of range error.
So what you want to do is to track the string while looping, and use if-else to find the '.'s, here is an example:
string = '1.2.3.4'
i = 0
while i < len(string):
if string[i] == '.':
string = string[:i] + string[i+1:]
else:
i += 1
Still, there are plenty of ways to deal with your string, don't use this, this is not good.
it could be done like this (with a try/except block), but that's not really a great way to approach this problem (or any problem)
string = "1.2.3.4"
for i in range(len(string)):
try:
if string[i] == ".":
string= string[:i]+string[i+1:]
except:
IndexError
result is 1234
The only real change of course is that by adding a try/except around our loop, we save ourselves from the IndexError that would normally come up once we try to access an element in the string that is now out of bounds
Once that happens, the Exception is caught and we simply exit the loop with our finished string

How do I find the first of a few characters in a string in python

How do I find the first of a few characters in a string in python? I have used find() and index() but they find only one character. How do I find the first position of a single character out of the few characters I want to be searched for?
So I want to find the position of the first operator(out of the 4 arithmetic operators) in an inputted string else it should return -1.
Sorry if this is a very stupid question but I have been searching and trying out multiple options over the past few days. I am also a beginner in python.
I tried this but i know its wrong:
>>> str1 ='12-23+23*12/12'
>>> str1.find('+') or str1.find('-') or str1.find('*') or str1.find('/')
This returns the first operator shown that is the + operator.
Also, I have tried
for x in str1:
if (x=='+' or x=='-' or x=='*' or x=='/'):
print(str1[x])
I know this is wrong.
I am a beginner and I'm trying to learn over a summer course I have taken. So I have not much knowledge on the topic.
str1 ='12-23+23*12/12+65'
while(1):
if('+' not in str2): #Taking '+' as an example
break;
int found = str1.find(+)
print(found)
str2 = str1.replace('+','',1)
#str2 = str1[:found]+str1[found+1:]
There is another way to do what I did in the last line of the code. I have added it in the comment above.
str2 = str1[:found]+str1[found+1:]
you can do do something like below:
validInput = set(['+','-','*','/'])
checkString = '12-23+23*12*12'
def checkInput(input):
if input not in validInput:
raise Exception
return input
def findSign(sign,string):
sign = checkInput(sign)
if sign in string:
return string.find(sign)
return -1
print(findSign('/',checkString))

Python item in list not working

I am having an issue with an if statement regarding an item in a list. Here is the code I am using
score = 0
for j in range(0,1):
for k in range(0,len(split)):
keyword = str(split[k][1])
words = texts[j]
print(keyword,words)
if str(keyword) in list(words):
print("true")
score = score + float(split[k][0])
else:
print("false")
print(score)
Here is the portion of the output where the statement is visibly wrong. What is wrong in the situation?
"now" ['anonym', 'now']
false
0
Your keyword is "now" - INCLUDING the quote marks. It indeed does not exist in words, which only includes words without quote marks. Either fix whatever problem with the source of the data is adding those quotes, or strip them off with something like keyword = keyword.strip('"').

Categories

Resources