Is calling str.replace() twice the best solution for overlapping matches?

Is calling str.replace() twice the best solution for overlapping matches? - python

When I execute the following code I expect all ' a ' to be replaced by ' b ' yet only non overlapping matches are replaced.
" a a a a a a a a ".replace(' a ', ' b ')
>>>' b a b a b a b a'
So I use the following:
" a a a a a a a a ".replace(' a ', ' b ').replace(' a ', ' b ')
>>>' b b b b b b b b '
Is this a bug or a feature of replace ?
From the docs ALL OCCURENCES are replaced.
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

Most likely your best bet is using regex. Lookbehind/lookahead expressions let you match part of a string surrounded by a specific expression.
import re
s = " a a a a a a a a "
pattern = r'(?<= )a(?= )'
print(re.sub(pattern, "b", s))
Spaces don't actually become part of the match, so they don't get replaced.

why not just replace only the thing you want to replace that is only 'a' and not ' a ' like this
" a a a a a a a a ".replace('a', 'b')
which gives the output
' b b b b b b b b '

Related

Replacing spaces in one string with characters of other string

Say I have two strings, string1="A B C " and string2="abc". How do combine these two strings so string1 becomes "AaBbCc"? So basically I want all the spaces in string1 to be replaced by characters in string2. I tried using two for-loops like this:
string1="A B C "
string2="abc"
for char1 in string1:
if char1==" ":
for char2 in string2:
string1.replace(char1,char2)
else:
pass
print(string1)
But that doesn't work. I'm fairly new to Python so could somebody help me? I use version Python3. Thank you in advance.

You can use iter on String2 and replace ' ' with char in String2 like below:
>>> string1 = "A B C "
>>> string2 = "abc"
>>> itrStr2 = iter(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCc'
If maybe len in two String is different you can use itertools.cycle like below:
>>> from itertools import cycle
>>> string1 = "A B C A B C "
>>> string2 = "abc"
>>> itrStr2 = cycle(string2)
>>> ''.join(st if st!=' ' else next(itrStr2) for st in string1)
'AaBbCcAaBbCc'

string1 = "A B C "
string2 = "abc"
out, repl = '', list(string2)
for s in string1:
out += s if s != " " else repl.pop(0)
print(out) #AaBbCc

How to extract unique substring from a string in Python, when comparing it against another string?

I have two strings, say 'a' and 'b'. I want to compare 'a' against 'b' and extract only the unique part of 'a'. I could simply check if 'b' is in a and extract. But the issue here is, either string 'a' or 'b' has randomly ignored whitespaces, thus making it slightly difficult.
Here is what I have done so far
a = "catsand dogs some other strings"
b = "cats and dogs"
a_no_space = a.replace(" ", "")
b_no_space = b.replace(" ", "")
if(b_no_space in a_no_space and len(a_no_space) > len(b_no_space)):
unique = a[b_no_space.index(b_no_space)+len(b_no_space):]
With this solution, I get the following result
s some other strings
I don't want that 's' in the beginning. How can I fix this in python?
Does using regex help here? If so how?

You can convert your search string to a regular expression where spaces are replaced by '\s*' which will accept any number of intermediate spaces between words (including no spaces):
a = "catsand dogs some other strings"
b = "cats and dogs"
import re
pattern = r"\s*".join(map(re.escape,re.split("\s+",b))) # r'cats\s*and\s*dogs'
r = re.sub(pattern,"",a) # ' some other strings'

Here is a solution that progressively slices the larger string according to the letters of the substring:
idx = 0
if len(a) > len(b):
for letter in b:
if letter in a and letter != " ":
a= a[a.index(letter) + 1:]
print(a)
else:
for letter in a:
if letter in b and letter != " ":
b= b[b.index(letter) + 1:]
print(b)

Add missing periods in Python

I have the next list of sentences:
list_of_sentense = ['Hi how are you?', 'I am good', 'Great!', 'I am doing good,', 'Good.']
I want to convert it into:
['Hi how are you?', 'I am good.', 'Great!', 'I am doing good.', 'Good.']
So I need to insert a period only if a sentence doesn't end with '?', '!' or '.'. Also if a sentence ends with a comma I need to change it into a period.
My code is here:
list_of_sentense_fixed = []
for i in range(len(list_of_sentense)):
b = list_of_sentense[i]
b = b + '.' if (not b.endswith('.')) or (not b.endswith('!')) or (not b.endswith('?')) else b
list_of_sentense_fixed.append(b)
But it doesn't work properly.

Just define a function to fix one sentence, then use list comprehension to construct a new list from the old:
def fix_sentence(str):
if str == "": # Don't change empty strings.
return str
if str[-1] in ["?", ".", "!"]: # Don't change if already okay.
return str
if str[-1] == ",": # Change trailing ',' to '.'.
return str[:-1] + "."
return str + "." # Otherwise, add '.'.
orig_sentences = ['Hi how are you?', 'I am good', 'Great!', 'I am doing good,', 'Good.']
fixed_sentences = [fix_sentence(item) for item in orig_sentences]
print(fixed_sentences)
This outputs, as requested:
['Hi how are you?', 'I am good.', 'Great!', 'I am doing good.', 'Good.']
With a separate function, you can just improve fix_sentence() if/when new rules need to be added.
For example, being able to handle empty strings so that you don't get an exception when trying to extract the last character from them, as per the first two lines of the function.

According to De Morgan's laws, you should change to:
b = b + '.' if (not b.endswith('.')) and (not b.endswith('!')) and (not b.endswith('?')) else b
You can simplify to:
b = b + '.' if b and b[-1] not in ('.', '!', '?') else b

Splitting Strings within an Array

I am writing a program in python that reads in a text file and executes any python commands within it. The commands may be out of order, but each command has a letter ID such as {% (c) print x %}
I've been able to sort all the commands with in the document into an array, in the correct order. My question is, how to i remove the (c), so i can run exec(statement) on the string?
Here is the full example array
[' (a) import random ', ' (b) x = random.randint(1,6) ', ' (c) print x ', ' (d) print 2*x ']
Also, I am very new to python, my first assignment with it.

You can remove the index part, by using substring:
for cmd in arr:
exec(cmd[5:])

Take everything right to the parenthesis and exec:
for cmd in arr:
exec(cmd.split(") ")[-1])

Stripping the command-id prefixes is a good job for a regular expression:
>>> import re
>>> commands = [' (a) import random ', ' (b) x = random.randint(1,6) ', ' (c) print x ', ' (d) print 2*x ']
>>> [re.search(r'.*?\)\s*(.*)', command).group(1) for command in commands]
['import random ', 'x = random.randint(1,6) ', 'print x ', 'print 2*x ']
The meaning of regex components are:
.*?\) means "Get the shortest group of any characters that ends with a closing-parentheses."
\s* means "Zero or more space characters."
(.*) means "Collect all the remaining characters into group(1)."
How this explanation makes it all clear :-)

Since the pattern looks simple and consistent, you could use regex.
This also allows for both (a) and (abc123) as valid IDs.
import re
lines = [
' (a) import random ',
' (b) x = random.randint(1,6) ',
' (c) print x ',
' (d) print 2*x '
]
for line in lines:
print(re.sub(r"^[ \t]+(\(\w+\))", "", line))
Which would output:
import random
x = random.randint(1,6)
print x
print 2*x
If you really only want to match a single letter, then replace \w+ with [a-zA-Z].

You may use a simple regex to omit the first alpha character in braces as:
import re
lst = [' (a) import random ', ' (b) x = random.randint(1,6) ', ' (c) print x ', ' (d) print 2*x ']
for ele in lst:
print re.sub("^ \([a-z]\)", "", ele)

'One-lineize' python statement

I'd like to know if one can write the following statement in one line:
new = ''
for char in text:
if char in blacklist:
new += ' '
else:
new += char
I tried but I get syntax error:
new = ''.join(c for c in text if c not in blacklist else ' ')
I know is not better or prettier, I just want to know if it's possible.

Iterating over it seems like an overly complicated way to do it. Why not use a regex?
import re
blacklist = re.compile(r'[xyz]') # Blacklist the characters 'x', 'y', 'z'
new = re.sub(blacklist, ' ', text)

You're using your in-line conditional in the wrong place (it'd work if you didn't have the else ' ' there, as then it'd just be a filter on the iterable). As it is, you'll want to do it this way:
new = ''.join(c if c not in blacklist else ' ' for c in text)
You could also do it like this if you wanted:
new = ''.join(' ' if c in blacklist else c for c in text)

You almost had it:
''.join(c if c not in blacklist else ' ' for c in text)
The X if Y else Z is an expression in itself, so you can't split it up by putting the for c in text part in the middle.

Use the translate method of str. Build a string of your whitelist characters, with ' ' in place of the blacklist ones:
>>> table = ''.join(c if c not in 'axy' else ' ' for c in map(chr,range(256)))
Then call translate with this table:
>>> 'xyzzy'.translate(table)
' zz '

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is calling str.replace() twice the best solution for overlapping matches? - python

why not just replace only the thing you want to replace that is only 'a' and not ' a ' like this " a a a a a a a a ".replace('a', 'b') which gives the output ' b b b b b b b b '

Related

Replacing spaces in one string with characters of other string

How to extract unique substring from a string in Python, when comparing it against another string?

Add missing periods in Python

Splitting Strings within an Array

'One-lineize' python statement

Categories

Resources