Split by "pipeline" with a negative lookahead/lookbehind

Split by "pipeline" with a negative lookahead/lookbehind - python

I have this code:
my_string = "(chargeur|magn(e|é)tique) sans fil|chargeur solaire (imperm(e|é)able|pliable)|chargeur ext(e|é)rieur externe|chargeur de t(e|é)l(e|é)phone solaire"
my_list = re.split(r'\|\s*(?![^()]*\))', my_string)
print(my_list)
I am trying to split by pipeline but also take in consideration the parenthesis. For exemple my list should be something like this:
my_list = ['(chargeur|magn(e|é)tique) sans fil', 'chargeur solaire (imperm(e|é)able|pliable)' etc..]
But instead I get this:
my_list = ['(chargeur', 'magn(e|é)tique) sans fil', 'chargeur solaire (imperm(e|é)able|pliable)' etc..]
I know there is a lookahead and lookbehind negative approach, but I don't really understand how should I merge them in order to consider all the parenthesis. Thank you!

I think I have found the solution:
def split_string(string):
result = []
current = []
level = 0
for char in string:
if char == "(":
level += 1
elif char == ")":
level -= 1
if char == "|" and level == 0:
result.append("".join(current))
current = []
continue
current.append(char)
result.append("".join(current))
return result
my_string = "(chargeur|magn(e|é)tique) sans fil|chargeur solaire (imperm(e|é)able|pliable)|chargeur ext(e|é)rieur externe|chargeur de t(e|é)l(e|é)phone solaire"
my_list = split_string(my_string)
print(my_list)
The code uses a counter level to keep track of the level of parentheses. Whenever a left parenthesis is encountered, the level is increased, and whenever a right parenthesis is encountered, the level is decreased. The pipeline is split only when the level is 0, meaning that the pipeline is not between any parentheses.

Related

How to use a list of numbers as index inputs

So I have a list of numbers (answer_index) which correlate to the index locations (indicies) of a characters (char) in a word (word). I would like to use the numbers in the list as index inputs later (indexes) on in code to replace every character except my chosen character(char) with "*" so that the final print (new_word) in this instance would be (****ee) instead of (coffee). it is important that (word) maintains it's original value while (new_word) becomes the modified version. Does anyone have a solution for turning a list into valid index inputs? I will also except easier ways to meet my goal. (Note: I am extremely new to python so I'm sure my code looks horrendous) Code below:
word = 'coffee'
print(word)
def find(string, char):
for i, c in enumerate(string):
if c == char:
yield i
string = word
char = "e"
indices = (list(find(string, char)))
answer_index = (list(indices))
print(answer_index)
for t in range(0, len(answer_index)):
answer_index[t] = int(answer_index[t])
indexes = [(answer_index)]
new_character = '*'
result = ''
for i in indexes:
new_word = word[:i] + new_character + word[i+1:]
print(new_word)

You hardly ever need to work with indices directly:
string = "coffee"
char_to_reveal = "e"
censored_string = "".join(char if char == char_to_reveal else "*" for char in string)
print(censored_string)
Output:
****ee
If you're trying to implement a game of hangman, you might be better off using a dictionary which maps characters to other characters:
string = "coffee"
map_to = "*" * len(string)
mapping = str.maketrans(string, map_to)
translated_string = string.translate(mapping)
print(f"All letters are currently hidden: {translated_string}")
char_to_reveal = "e"
del mapping[ord(char_to_reveal)]
translated_string = string.translate(mapping)
print(f"'{char_to_reveal}' has been revealed: {translated_string}")
Output:
All letters are currently hidden: ******
'e' has been revealed: ****ee

The easiest and fastest way to replace all characters except some is to use regular expression substitution. In this case, it would look something like:
import re
re.sub('[^e]', '*', 'coffee') # returns '****ee'
Here, [^...] is a pattern for negative character match. '[^e]' will match (and then replace) anything except "e".
Other options include decomposing the string into an iterable of characters (#PaulM's answer) or working with bytearray instead

In Python, it's often not idiomatic to use indexes, unless you really want to do something with them. I'd avoid them for this problem and instead just iterate over the word, read each character and and create a new word:
word = "coffee"
char_to_keep = "e"
new_word = ""
for char in word:
if char == char_to_keep:
new_word += char_to_keep
else:
new_word += "*"
print(new_word)
# prints: ****ee

How to find the largest repeating substring given character in Python?

Given some string say 'aabaaab', how would I go about finding the largest substring of a. So it should return 'aaa'. Any help would be greatly appreciated.
def sub_string(s):
best_run = 0
current_run = 0
for char in s:
if char == 'a'
current_run += 1
else:
current_letter = char
return(best_run)
I have something like the one above. Not sure where I can fix it up.

not the most efficient, but a straightforward solution:
word = "aasfgaaassaasdsddaaaaaafff"
substr_count = 0
substr_counts = []
character = "f"
for i, letter in enumerate(word):
if (letter == character):
substr_count += 1
else:
substr_counts.append(substr_count)
substr_count = 0
if (i == len(word) - 1):
substr_counts.append(substr_count)
print(max(substr_counts))

If you want a short method using standard python tools (and avoid writing loops to reconstruct the string as you iterate), you can use regex to split the string by any non-a characters than get the max() according to len:
import re
test_string = 'aabaaab'
split_string_list = re.split( '[^a]', test_string )
longest_string_subset = max( split_string_list, key=len )
print( longest_string_subset )
The re library is for regex, the '[^a]' is a regex statement for any non-a character. Basically, the 'aabaaab' is being split into a list according to any matches on the regex statement, so that it becomes [ 'aa' 'aaa' '' ]. Then, the max() statement looks for the longest string based on len (aka length).
You can read more about functions like re.split() in the docs: https://docs.python.org/2/library/re.html

How to find the multiple instances of a data between two special characters in python

I am a beginner in Python so please excuse me if my question is two simple. I want to find the multiple instances of data between two special characters in a string and also count the number of instances. Until now I have the following code.
import re
count=0
myString="abcde(fghi)defggdfsidf(ijkl)gfders(gkjh)hgstfvd"
startString = '('
endString = ')'
for item in myString:
portString=myString[myString.find(startString)+len(startString):myString.find(endString)]
print(portString)
count=count+1
My desired output is
fghi
ijkl
gkjh
But my code always start the loop from the start and produces fghi. Can any one tell me what is the problem?

You can use non greedy regexes:
count=0
myString="abcde(fghi)defggdfsidf(ijkl)gfders(gkjh)hgstfvd"
rx = re.compile(r'\((.*?)\)') # non greedy version inside parens
pos = 0
while True:
m = rx.search(myString[pos:]) # search starting at pos (initially 0)
if m is None: break
count += 1
print(m.group(1))
pos += m.end() # next search will start past last ')'
Above solution only makes sense if parentheses are correctly balanced or if you want to start on first opening one and end of first closing next.
If you want to select text parenthesed text containing no opening or closing parentheses, you have to specify it in the regex:
myString="abcde(fghi)defg(gdfsidf(ijkl)g(fders(gkjh)hgstfvd"
rx = re.compile(r'\(([^()]*)\)')
pos = 0
while True:
m = rx.search(myString[pos:]) # search starting at pos (initially 0)
if m is None: break
count += 1
print(m.group(1))
pos += m.end() # next search will start past last ')'

As an alternative to regex if you'd prefer to keep the loop, note that String.find() can take an optional parameter to tell it where to start looking. Just keep track of the where the closing parenthesis is and start again from just after that.
Unfortunately it's not quite so simple as the loop condition will have to change too, so that it stops after hitting the last set of parentheses.
Something like this should do the trick:
count=0
myString="abcde(fghi)defggdfsidf(ijkl)gfders(gkjh)hgstfvd"
startString = '('
endString = ')'
endStringIndex = 0
while True:
startStringIndex = myString.find(startString, endStringIndex+1)
endStringIndex = myString.find(endString, endStringIndex+1)
if (startStringIndex == -1):
break
portString=myString[startStringIndex+len(startString):endStringIndex]
print(portString)
count+=1
Output:
fghi
ijkl
gkjh

You can use re.findall:
>>> myString = "abcde(fghi)defggdfsidf(ijkl)gfders(gkjh)hgstfvd"
>>> matches = re.findall(r'\((\w+)\)', myString)
>>> count = len(matches)
>>> print('\n'.join(matches))
fghi
ijkl
gkjh
>>> print(count)
3

Python, replacing characters in a string while preserving original string

More specifically:
Given a string and a non-empty word string, return a version of the original String where all chars have been replaced by pluses ("+"), except for appearances of the word string which are preserved unchanged.
def(base,word):
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
My original thought was this:
def main():
x = base.split(word)
y = ''.join(x)
print(y.replace(y,'+')*len(y))
From here I have trouble reinserting the word back into the str in the correct places. Any help is appreciated.

You can use any string to join (instead of the empty string '' like you have).
def plusOut(s, word):
x = s.split(word)
y = ['+' * len(z) for z in x]
final = word.join(y)
return final
Edit: I've removed the regex, but I'm keeping the function across multiple lines to more closely match your original code.

A regex is not required. We can solve this without any libraries, iterating through exactly once.
We want to iterate through the indices i of the string, yielding the word and jumping ahead by len(word) if the slice of len(word) starting at i matches the word, and by yielding '+' and incrementing by one otherwise.
def replace_chars_except_word(string, word):
def generate_chars():
i = 0
while i < len(string):
if string[i:(i+len(word))] == word:
i += len(word)
yield word
else:
yield '+'
i+= 1
return ''.join(generate_chars())
if __name__ == '__main__':
test_string = 'stringabcdefg string11010string1'
result = replace_chars_except_word(test_string, word = 'string')
assert result == 'string++++++++string+++++string+'
I use an internal generator function to yield the strings, but you could use a buffer to replace the internal function. (This is slightly less memory efficient).
buffer = []
if (condition)
buffer.append(word)
else:
buffer.append'+'
return ''.join(buffer)

Find out word at specific index

I have a string with multiple words separated by underscores like this:
string = 'this_is_my_string'
And let's for example take string[n] which will return a letter.
Now for this index I want to get the whole word between the underscores.
So for string[12] I'd want to get back the word 'string' and for string[1] I'd get back 'this'

Very simple approach using string slicing is to:
slice the list in two parts based on position
split() each part based on _.
concatenate last item from part 1 and first item from part 2
Sample code:
>>> my_string = 'this_is_my_sample_string'
# ^ index 14
>>> pos = 14
>>> my_string[:pos].split('_')[-1] + my_string[pos:].split('_')[0]
'sample'

This shuld work:
string = 'this_is_my_string'
words = string.split('_')
idx = 0
indexes = {}
for word in words:
for i in range(len(word)):
idx += 1
indexes[idx] = word
print(indexes[1]) # this
print(indexes[12]) #string

The following code works. You can change the index and string variables and adapt to new strings. You can also define a new function with the code to generalize it.
string = 'this_is_my_string'
sp = string.split('_')
index = 12
total_len = 0
for word in sp:
total_len += (len(word) + 1) #The '+1' accounts for the underscore
if index < total_len:
result = word
break
print result

A little bit of regular expression magic does the job:
import re
def wordAtIndex(text, pos):
p = re.compile(r'(_|$)')
beg = 0
for m in p.finditer(text):
#(end, sym) = (m.start(), m.group())
#print (end, sym)
end = m.start()
if pos < end: # 'pos' is within current split piece
break
beg = end+1 # advance to next split piece
if pos == beg-1: # handle case where 'pos' is index of split character
return ""
else:
return text[beg:end]
text = 'this_is_my_string'
for i in range(0, len(text)+1):
print ("Text["+str(i)+"]: ", wordAtIndex(text, i))
It splits the input string at '_' characters or at end-of-string, and then iteratively compares the given position index with the actual split position.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split by "pipeline" with a negative lookahead/lookbehind - python

Related

How to use a list of numbers as index inputs

How to find the largest repeating substring given character in Python?

How to find the multiple instances of a data between two special characters in python

Python, replacing characters in a string while preserving original string

Find out word at specific index

Categories

Resources