Split string by comma, but ignore commas within brackets

Split string by comma, but ignore commas within brackets - python

I'm trying to split a string by commas using python:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
But I want to ignore any commas within brackets []. So the result for above would be:
["year:2020", "concepts:[ab553,cd779]", "publisher:elsevier"]
Anybody have advice on how to do this? I tried to use re.split like so:
params = re.split(",(?![\w\d\s])", param)
But it is not working properly.

result = re.split(r",(?!(?:[^,\[\]]+,)*[^,\[\]]+])", subject, 0)
, # Match the character “,” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
(?: # Match the regular expression below
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
, # Match the character “,” literally
)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^,\[\]] # Match any single character NOT present in the list below
# The literal character “,”
# The literal character “[”
# The literal character “]”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
] # Match the character “]” literally
)
Updated to support more than 2 items in brackets. E.g.
year:2020,concepts:[ab553,cd779],publisher:elsevier,year:2020,concepts:[ab553,cd779,xx345],publisher:elsevier

This regex works on your example:
,(?=[^,]+?:)
Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon. This correctly finds the <comma><key> pattern you are searching for. Of course, if the keys are allowed to have commas, this would have to be adapted a little further.
You can check out the regexr here

You can work this out using a user-defined function instead of split:
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
def split_by_commas(s):
lst = list()
last_bracket = ''
word = ""
for c in s:
if c == '[' or c == ']':
last_bracket = c
if c == ',' and last_bracket == ']':
lst.append(word)
word = ""
continue
elif c == ',' and last_bracket == '[':
word += c
continue
elif c == ',':
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst
main_lst = split_by_commas(s)
print(main_lst)
The result of the run of above code:
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

Using a pattern with only a lookahead to assert a character to the right, will not assert if there is an accompanying character on the left.
Instead of using split, you could either match 1 or more repetitions of values between square brackets, or match any character except a comma.
(?:[^,]*\[[^][]*])+[^,]*|[^,]+
Regex demo
s = "year:2020,concepts:[ab553,cd779],publisher:elsevier"
params = re.findall(r"(?:[^,]*\[[^][]*])+[^,]*|[^,]+", s)
print(params)
Output
['year:2020', 'concepts:[ab553,cd779]', 'publisher:elsevier']

I adapted #Bemwa's solution (which didn't work for my use-case)
def split_by_commas(s):
lst = list()
brackets = 0
word = ""
for c in s:
if c == "[":
brackets += 1
elif c == "]":
if brackets > 0:
brackets -= 1
elif c == "," and not brackets:
lst.append(word)
word = ""
continue
word += c
lst.append(word)
return lst

Related

How to use a list of numbers as index inputs

So I have a list of numbers (answer_index) which correlate to the index locations (indicies) of a characters (char) in a word (word). I would like to use the numbers in the list as index inputs later (indexes) on in code to replace every character except my chosen character(char) with "*" so that the final print (new_word) in this instance would be (****ee) instead of (coffee). it is important that (word) maintains it's original value while (new_word) becomes the modified version. Does anyone have a solution for turning a list into valid index inputs? I will also except easier ways to meet my goal. (Note: I am extremely new to python so I'm sure my code looks horrendous) Code below:
word = 'coffee'
print(word)
def find(string, char):
for i, c in enumerate(string):
if c == char:
yield i
string = word
char = "e"
indices = (list(find(string, char)))
answer_index = (list(indices))
print(answer_index)
for t in range(0, len(answer_index)):
answer_index[t] = int(answer_index[t])
indexes = [(answer_index)]
new_character = '*'
result = ''
for i in indexes:
new_word = word[:i] + new_character + word[i+1:]
print(new_word)

You hardly ever need to work with indices directly:
string = "coffee"
char_to_reveal = "e"
censored_string = "".join(char if char == char_to_reveal else "*" for char in string)
print(censored_string)
Output:
****ee
If you're trying to implement a game of hangman, you might be better off using a dictionary which maps characters to other characters:
string = "coffee"
map_to = "*" * len(string)
mapping = str.maketrans(string, map_to)
translated_string = string.translate(mapping)
print(f"All letters are currently hidden: {translated_string}")
char_to_reveal = "e"
del mapping[ord(char_to_reveal)]
translated_string = string.translate(mapping)
print(f"'{char_to_reveal}' has been revealed: {translated_string}")
Output:
All letters are currently hidden: ******
'e' has been revealed: ****ee

The easiest and fastest way to replace all characters except some is to use regular expression substitution. In this case, it would look something like:
import re
re.sub('[^e]', '*', 'coffee') # returns '****ee'
Here, [^...] is a pattern for negative character match. '[^e]' will match (and then replace) anything except "e".
Other options include decomposing the string into an iterable of characters (#PaulM's answer) or working with bytearray instead

In Python, it's often not idiomatic to use indexes, unless you really want to do something with them. I'd avoid them for this problem and instead just iterate over the word, read each character and and create a new word:
word = "coffee"
char_to_keep = "e"
new_word = ""
for char in word:
if char == char_to_keep:
new_word += char_to_keep
else:
new_word += "*"
print(new_word)
# prints: ****ee

How can i remove only the last bracket from a string in python?

How can i remove only the last bracket from a string ?
For example,
INPUT 1:
"hell(h)o(world)"
i want this result:
"hell(h)o"
Input 2 :-
hel(lo(wor)ld)
i want :-
hel
as you can see the middle brackets remain intact only the last bracket got removed.
I tried :-
import re
string = 'hell(h)o(world)'
print(re.sub('[()]', '', string))
output :-
hellhoworld
i figured out a solution :-
i did it like this
string = 'hell(h)o(world)'
if (string[-1] == ")"):
add=int(string.rfind('(', 0))
print(string[:add])
output :-
hell(h)o
looking for other optimised solutions/suggestions..

Please see the below if this is useful, Let me know I will optimize further.
string = 'hell(h)o(world)'
count=0
r=''
for i in reversed(string):
if count <2 and (i == ')' or i=='('):
count+=1
pass
else:
r+=i
for i in reversed(r):
print(i, end='')

If you want to remove the last bracket from the string even if it's not at the end of the string, you can try something like this. This will only work if you know you have a substring beginning and ending with parentheses somewhere in the string, so you may want to implement some sort of check for that. You will also need to modify if you are dealing with nested parenthesis.
str = "hell(h)o(world)"
r_str = str[::-1] # creates reverse copy of string
for i in range(len(str)):
if r_str[i] == ")":
start = i
elif r_str[i] == "(":
end = i+1
break
x = r_str[start:end][::-1] # substring that we want to remove
str = str.replace(x,'')
print(str)
output:
hell(h)o
If the string is not at the end:
str = "hell(h)o(world)blahblahblah"
output:
hell(h)oblahblahblah
Edit: Here is a modified version to detect nested parenthesis. However, please keep in mind that this will not work if there are unbalanced parenthesis in the string.
str = "hell(h)o(w(orld))"
r_str = str[::-1]
p_count = 0
for i in range(len(str)):
if r_str[i] == ")":
if p_count == 0:
start = i
p_count = p_count+1
elif r_str[i] == "(":
if p_count == 1:
end = i+1
break
else:
p_count = p_count - 1
x = r_str[start:end][::-1]
print("x:", x)
str = str.replace(x,'')
print(str)
output:
hell(h)o

Something like this?
string = 'hell(h)o(w(orl)d)23'
new_str = ''
escaped = 0
for char in reversed(string):
if escaped is not None and char == ')':
escaped += 1
if not escaped:
new_str = char + new_str
if escaped is not None and char == '(':
escaped -= 1
if escaped == 0:
escaped = None
print(new_str)
This starts escaping when a ) and stops when it's current level is closed with (.
So a nested () would not effect it.

Using re.sub('[()]', '', string) will replace any parenthesis in the string with an empty string.
To match the last set of balanced parenthesis, and if you can make use of the regex PyPi module, you can use a recursive pattern repeating the first sub group, and assert that to the right there are no more occurrences of either ( or )
(\((?:[^()\n]++|(?1))*\))(?=[^()\n]*$)
The pattern matches:
( Capture group 1
\( Match (
(?:[^()\n]++|(?1))* Repeat 0+ times matching either any char except ( ) or a newline. If you do, recurse group 1 using (?1)
\) Match )
) Close group 1
(?=[^()\n]*$) Positive lookahead, assert till the end of the string no ( or ) or newline
See a regex demo and a Python demo.
For example
import regex
strings = [
"hell(h)o(world)",
"hel(lo(wor)ld)",
"hell(h)o(world)blahblahblah"
]
pattern = r"(\((?:[^()]++|(?1))*\))(?=[^()]*$)"
for s in strings:
print(regex.sub(pattern, "", s))
Output
hell(h)o
hel
hell(h)oblahblahblah

Append last letter in a string to another string

I am constructing a chatbot that rhymes in Python. Is it possible to identify the last vowel (and all the letters after that vowel) in a random word and then append those letters to another string without having to go through all the possible letters one by one (like in the following example)
lastLetters = '' # String we want to append the letters to
if user_answer.endswith("a")
lastLetters.append("a")
else if user_answer.endswith("b")
lastLetters.append("b")
Like if the word was right we’d want to get ”ight”

You need to find the last index of a vowel, for that you could do something like this (a bit fancy):
s = input("Enter the word: ") # You can do this to get user input
last_index = len(s) - next((i for i, e in enumerate(reversed(s), 1) if e in "aeiou"), -1)
result = s[last_index:]
print(result)
Output
ight
An alternative using regex:
import re
s = "right"
last_index = -1
match = re.search("[aeiou][^aeiou]*$", s)
if match:
last_index = match.start()
result = s[last_index:]
print(result)
The pattern [aeiou][^aeiou]*$ means match a vowel followed by possibly several characters that are not a vowel ([^aeiou] means not a vowel, the sign ^ inside brackets means negation in regex) until the end of the string. So basically match the last vowel. Notice this assumes a string compose only of consonants and vowels.

How to find the largest repeating substring given character in Python?

Given some string say 'aabaaab', how would I go about finding the largest substring of a. So it should return 'aaa'. Any help would be greatly appreciated.
def sub_string(s):
best_run = 0
current_run = 0
for char in s:
if char == 'a'
current_run += 1
else:
current_letter = char
return(best_run)
I have something like the one above. Not sure where I can fix it up.

not the most efficient, but a straightforward solution:
word = "aasfgaaassaasdsddaaaaaafff"
substr_count = 0
substr_counts = []
character = "f"
for i, letter in enumerate(word):
if (letter == character):
substr_count += 1
else:
substr_counts.append(substr_count)
substr_count = 0
if (i == len(word) - 1):
substr_counts.append(substr_count)
print(max(substr_counts))

If you want a short method using standard python tools (and avoid writing loops to reconstruct the string as you iterate), you can use regex to split the string by any non-a characters than get the max() according to len:
import re
test_string = 'aabaaab'
split_string_list = re.split( '[^a]', test_string )
longest_string_subset = max( split_string_list, key=len )
print( longest_string_subset )
The re library is for regex, the '[^a]' is a regex statement for any non-a character. Basically, the 'aabaaab' is being split into a list according to any matches on the regex statement, so that it becomes [ 'aa' 'aaa' '' ]. Then, the max() statement looks for the longest string based on len (aka length).
You can read more about functions like re.split() in the docs: https://docs.python.org/2/library/re.html

Check special symbols in string endings

How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?

You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.

You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3

If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))

You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass

You could try this.
string="gffrwr."
print(string[-1] in "!?,().")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split string by comma, but ignore commas within brackets - python

Related

How to use a list of numbers as index inputs

How can i remove only the last bracket from a string in python?

Append last letter in a string to another string

How to find the largest repeating substring given character in Python?

Check special symbols in string endings

Categories

Resources