Removing consecutive duplicates in a string [duplicate]

Removing consecutive duplicates in a string [duplicate] - python

This question already has answers here:
How to remove duplicates only if consecutive in a string? [duplicate]
(9 answers)
Closed 3 years ago.
I'm trying to remove duplicates from my code but when this code encounters some cases, it fails with 'String Index Out Of Range' error.
eg., i/p - tadayutaysgcgtttggytytyyyikk
o/p - tadayutaysgcgtgytytyik
def removeDups(str):
smallOut = ''
if len(str) == 1 or len(str) == 0:
return str
if str[0] == str[1]:
smallOut = removeDups(str[2:])
if str[1] == smallOut[0]:
return smallOut
else:
return str[1] + smallOut
else:
smallOut = removeDups(str[1:])
return str[0] + smallOut
string = input().strip()
print(removeDups(string))
i/p - tadayutaysgcgtttggytytyyyikk
o/p - tadayutaysgcgtgytytyik

For removing duplicates you can use re module:
import re
s = 'tadayutaysgcgtttggytytyyyikk'
print( re.sub(r'(.)\1+', r'\1', s) )
Prints:
tadayutaysgcgtgytytyik
Or: without re, using itertools.groupby:
from itertools import groupby
s = 'tadayutaysgcgtttggytytyyyikk'
print(''.join(v for v, _ in groupby(s)))
Prints:
tadayutaysgcgtgytytyik

I use a standard approach for removing consecutive duplicates. For each character, keep incrementing the pointer till the current and the next are same.
ans = ""
s = 'tadayutaysgcgtttggytytyyyikk'
i=0
while i<len(s):
cnt = 1
ans = ans+s[i]
while i+1<len(s) and s[i]==s[i+1]:
i+=1
i+=1
ans

Related

Divide string into pairs [duplicate]

This question already has answers here:
Split string every nth character?
(19 answers)
Splitting a string into 2-letter segments [duplicate]
(6 answers)
Closed 2 years ago.
I want to divide text into pairs.
Input: text = "abcde"
Goal Output: result = ["ab", "cd", "e_"]
Current Output: result = ['ab', 'abcd']
My current code looks like this. But I do not know how I do that now. Anyone has a tip for me?
def split_pairs(text):
result = []
if text is None or not text:
return []
pair = ""
for i in range(len(text)):
if i % 2 == 0:
pair += text[i]
pair += text[i+1]
else:
result.append(pair)
return result

You could use a list comprehension to zip together the even values with the corresponding odd values. And using itertools.zip_longest you can use the fillvalue argument to provide a "fill in" if there is a length mismatch.
>>> from itertools import zip_longest
>>> s = 'abcde'
>>> pairs = [i+j for i,j in zip_longest(s[::2], s[1::2], fillvalue='_')]
>>> pairs
['ab', 'cd', 'e_']

You should reset your "pair" variable once appended to "result"
def split_pairs(text):
result = []
if text is None or not text:
return []
pair = ""
for i in range(len(text)):
if i % 2 == 0:
pair += text[i]
pair += text[i+1]
else:
result.append(pair)
pair = ""
return result

You could also use a list comprehension over a range with 3rd step parameter and add ljust to add _. This will also work nicely for more than just pairs:
>>> s = "abcde"
>>> k = 2
>>> [s[i:i+k].ljust(k, "_") for i in range(0, len(s), k)]
['ab', 'cd', 'e_']

I am not if your code needed to be in the format you originally wrote it in, but I wrote the below code that gets the job done.
def split_pairs(text):
if len(text) % 2 == 0:
result = [text[i:i+2] for i in range(0, len(text), 2)]
else:
result = [text[i:i+2] for i in range(0, len(text), 2)]
result[-1]+="_"
return result

The issue here is that the "pair" variable is never reinitialized to "".
Make sure you make it an empty string in your else block.
def split_pairs(text):
result = []
if text is None or not text:
return []
pair = ""
for i in range(len(text)):
if i % 2 == 0:
pair += text[i]
pair += text[i+1]
else:
result.append(pair)
pair = "" # Make sure you reset it
return result
If you want to have a "_" at the end (in case of an odd number of character), you could do like the following:
def split_pairs(text):
result = []
if text is None or not text:
return []
pair = "__" # Setting pair to "__" by default
for i in range(len(text)):
if i % 2 == 0:
pair[0] = text[i]
if i < len(text): # Avoiding overflow
pair[1] = text[i+1]
else:
result.append(pair)
pair = "__" # Make sure you reset it
if pair != "__": # Last bit
result.append(pair)
return result

PYTHON how I separate numbers by dots every three steps? [duplicate]

This question already has answers here:
How do I slice a string every 3 indices? [duplicate]
(4 answers)
Closed 4 years ago.
Example
test = "123456789123"
I tried
test = "1234567"
print(".".join(test))
result
1.2.3.4.5.6.7
but I would like this
result
123.456.789.123

Here is a simple regex solution:
import re
print(re.sub(r'(?<!^)(?=(\d{3})+$)', r'.', "12345673456456456"))
It produces the following output:
12.345.673.456.456.456
the regex uses lookahead to check that the number of digits after a given position is divisible by 3.

If you don't really need a dot, you can simply use:
formated_number = "{:,}".format(value)
And if you really want those dots:
formated_number.replace(',','.')

Very naive solution without list comprehension:
test = '123456789123'
result = ''
while test:
result += test[:3]
if len(test) > 3:
result += '.'
test = test[3:]
print(result)

You can loop over the characters as a list and track where you are with a counter:
s = "123456789123"
output = []
count = 0
for c in list(s):
count += 1
if count == 4:
output.append(".")
count = 0
continue
else:
output.append(c)
result = ''.join(output)
print(result)

A straightforward implementation:
test = "1234567891234431222334442234ee3432"
testJoined = ""
for i in range(0,len(test),3):
testJoined += test[i:i+3] + "."
print(testJoined[0:-1])
test1:
"1234567891234431222334442234ee3432"
result1:
123.456.789.123.443.122.233.444.223.4ee.343.2
test2:
"1234567891234431222334442234ee34aa32"
result2:
123.456.789.123.443.122.233.444.223.4ee.34a.a32

Reversing a string in Python using a loop? [duplicate]

This question already has answers here:
How do I reverse a string in Python?
(19 answers)
Closed 6 years ago.
I'm stuck at an exercise where I need to reverse a random string in a function using only a loop (for loop or while?).
I can not use ".join(reversed(string)) or string[::-1] methods here so it's a bit tricky.
My code looks something like this:
def reverse(text):
while len(text) > 0:
print text[(len(text)) - 1],
del(text[(len(text)) - 1]
I use the , to print out every single letter in text on the same line!
I get invalid syntax on del(text[(len(text)) - 1]
Any suggestions?

Python string is not mutable, so you can not use the del statement to remove characters in place. However you can build up a new string while looping through the original one:
def reverse(text):
rev_text = ""
for char in text:
rev_text = char + rev_text
return rev_text
reverse("hello")
# 'olleh'

The problem is that you can't use del on a string in python.
However this code works without del and will hopefully do the trick:
def reverse(text):
a = ""
for i in range(1, len(text) + 1):
a += text[len(text) - i]
return a
print(reverse("Hello World!")) # prints: !dlroW olleH

Python strings are immutable. You cannot use del on string.
text = 'abcde'
length = len(text)
text_rev = ""
while length>0:
text_rev += text[length-1]
length = length-1
print text_rev
Hope this helps.

Here is my attempt using a decorator and a for loop. Put everything in one file.
Implementation details:
def reverse(func):
def reverse_engine(items):
partial_items = []
for item in items:
partial_items = [item] + partial_items
return func(partial_items)
return reverse_engine
Usage:
Example 1:
#reverse
def echo_alphabets(word):
return ''.join(word)
echo_alphabets('hello')
# olleh
Example 2:
#reverse
def echo_words(words):
return words
echo_words([':)', '3.6.0', 'Python', 'Hello'])
# ['Hello', 'Python', '3.6.0', ':)']
Example 3:
#reverse
def reverse_and_square(numbers):
return list(
map(lambda number: number ** 2, numbers)
)
reverse_and_square(range(1, 6))
# [25, 16, 9, 4, 1]

Count overlapping substring in a string [duplicate]

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 1 year ago.
Say I have string = 'hannahannahskdjhannahannah' and I want to count the number of times the string hannah occurs, I can't simply use count, because that only counts the substring once in each case. That is, I am expecting to return 4 but only returns 2 when I run this with string.count('hannah').

You could use a running index to fetch the next occurance:
bla = 'hannahannahskdjhannahannah'
cnt = 0
idx = 0
while True:
idx = bla.find('hannah', idx)
if idx >= 0:
cnt += 1
idx += 1
else:
break
print(cnt)
Gives:
>> 4

How about something like this?
>>> d = {}
>>> string = 'hannahannahskdjhannahannah'
>>> for i in xrange(0,len(string)-len('hannah')+1):
... if string[i:i+len('hannah')] == 'hannah':
... d['hannah'] = d.get('hannah',0)+1
...
>>> d
{'hannah': 4}
>>>
This searches the string for hannah by splicing the string iteratively from index 0 all the way up to the length of the string minus the length of hannah

'''
s: main string
sub: sub-string
count: number of sub-strings found
p: use the found sub-string's index in p for finding the next occurrence of next sub-string
'''
count=0
p=0
for letter in s:
p=s.find(sub,p)
if(p!=-1):
count+=1
p+=1
print count

If you want to count also nonconsecutive substrings, this is the way to do it
def subword(lookup,whole):
if len(whole)<len(lookup):
return 0
if lookup==whole:
return 1
if lookup=='':
return 1
if lookup[0]==whole[0]:
return subword(lookup[1:],whole[1:])+subword(lookup,whole[1:])
return subword(lookup,whole[1:])

def Count_overlap(string, substring):
count = 0
start = 0
while start < len(string):
pos = string.find(substring, start)
if pos != -1:
start = pos + 1
count += 1
else:
break
return count
string = "hannahannahskdjhannahannah"
print(Count_overlap(string, "hannah"))

Don't want to answer this for you as it's simple enough to work out yourself.
But if I were you I'd use the string.find() method which takes the string you're looking for and the position to start looking from, combined with a while loop which uses the result of the find method as it's condition in some way.
That should in theory give you the answer.

String count with overlapping occurrences [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:
def function(string, str_to_search_for):
count = 0
for x in xrange(len(string) - len(str_to_search_for) + 1):
if string[x:x+len(str_to_search_for)] == str_to_search_for:
count += 1
return count
function('1011101111','11')
This method returns 5.
Is there a better way in Python?

Well, this might be faster since it does the comparing in C:
def occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count

>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:
>>> sum(1 for _ in re.finditer('(?=11)', text))
5
As a function (re.escape makes sure the substring doesn't interfere with the regex):
def occurrences(text, sub):
return len(re.findall('(?={0})'.format(re.escape(sub)), text))
>>> occurrences(text, '11')
5

You can also try using the new Python regex module, which supports overlapping matches.
import regex as re
def count_overlapping(text, search_for):
return len(re.findall(search_for, text, overlapped=True))
count_overlapping('1011101111','11') # 5

Python's str.count counts non-overlapping substrings:
In [3]: "ababa".count("aba")
Out[3]: 1
Here are a few ways to count overlapping sequences, I'm sure there are many more :)
Look-ahead regular expressions
How to find overlapping matches with a regexp?
In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
Generate all substrings
In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2

def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
This could be the easiest way.

A fairly pythonic way would be to use list comprehension here, although it probably wouldn't be the most efficient.
sequence = 'abaaadcaaaa'
substr = 'aa'
counts = sum([
sequence.startswith(substr, i) for i in range(len(sequence))
])
print(counts) # 5
The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1, sum gives us the total number of matches.

s = "bobobob"
sub = "bob"
ln = len(sub)
print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))

How to find a pattern in another string with overlapping
This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.
def occurrences(pattern, text):
"""
input: search a pattern (regular expression) in a text
returns: a list of substrings and their positions
"""
p = re.compile('(?=({0}))'.format(pattern))
matches = re.finditer(p, text)
return [(match.group(1), match.start()) for match in matches]
print (occurrences('ana', 'banana'))
print (occurrences('.ana', 'Banana-fana fo-fana'))
[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]

My answer, to the bob question on the course:
s = 'azcbobobegghaklbob'
total = 0
for i in range(len(s)-2):
if s[i:i+3] == 'bob':
total += 1
print 'number of times bob occurs is: ', total

Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:
s = 'azcbobobegghakl'
count = 0
while 'bob' in s:
count += 1
s = s[(s.find('bob') + 2):]
print "Number of times bob occurs is: {}".format(count)

If strings are large, you want to use Rabin-Karp, in summary:
a rolling window of substring size, moving over a string
a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
implemented in C or relying on pypy

That can be solved using regex.
import re
def function(string, sub_string):
match = re.findall('(?='+sub_string+')',string)
return len(match)

def count_substring(string, sub_string):
counter = 0
for i in range(len(string)):
if string[i:].startswith(sub_string):
counter = counter + 1
return counter
Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.

re.subn hasn't been mentioned yet:
>>> import re
>>> re.subn('(?=11)', '', '1011101111')[1]
5

def count_overlaps (string, look_for):
start = 0
matches = 0
while True:
start = string.find (look_for, start)
if start < 0:
break
start += 1
matches += 1
return matches
print count_overlaps ('abrabra', 'abra')

Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.
def count_Occurrences(string, sub):
count=0
for i in range(0, len(string)-len(sub)+1):
if sub in string[i:i+len(sub)]:
count=count+1
print 'Number of times sub occurs in string (including overlaps): ', count

For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.
counted = 0
for i in range(len(string)):
if string[i*3:(i+1)*3] == 'xox':
counted = counted +1
print counted

An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:
def countSubstr(string, sub):
count = 0
while sub in string:
count += 1
string = string[string.find(sub) + 1:]
return count;
This avoids while True: and is a little cleaner in my opinion

This is another example of using str.find() but a lot of the answers make it more complicated than necessary:
def occurrences(text, sub):
c, n = 0, text.find(sub)
while n != -1:
c += 1
n = text.find(sub, n+1)
return c
In []:
occurrences('1011101111', '11')
Out[]:
5

Given
sequence = '1011101111'
sub = "11"
Code
In this particular case:
sum(x == tuple(sub) for x in zip(sequence, sequence[1:]))
# 5
More generally, this
windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)]))
sum(x == tuple(sub) for x in windows)
# 5
or extend to generators:
import itertools as it
iter_ = (sequence[i:] for i, _ in enumerate(sequence))
windows = zip(*(it.islice(iter_, None, len(sub))))
sum(x == tuple(sub) for x in windows)
Alternative
You can use more_itertools.locate:
import more_itertools as mit
len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub))))
# 5

A simple way to count substring occurrence is to use count():
>>> s = 'bobob'
>>> s.count('bob')
1
You can use replace () to find overlapping strings if you know which part will be overlap:
>>> s = 'bobob'
>>> s.replace('b', 'bb').count('bob')
2
Note that besides being static, there are other limitations:
>>> s = 'aaa'
>>> count('aa') # there must be two occurrences
1
>>> s.replace('a', 'aa').count('aa')
3

def occurance_of_pattern(text, pattern):
text_len , pattern_len = len(text), len(pattern)
return sum(1 for idx in range(text_len - pattern_len + 1) if text[idx: idx+pattern_len] == pattern)

I wanted to see if the number of input of same prefix char is same postfix, e.g., "foo" and """foo"" but fail on """bar"":
from itertools import count, takewhile
from operator import eq
# From https://stackoverflow.com/a/15112059
def count_iter_items(iterable):
"""
Consume an iterable not reading it into memory; return the number of items.
:param iterable: An iterable
:type iterable: ```Iterable```
:return: Number of items in iterable
:rtype: ```int```
"""
counter = count()
deque(zip(iterable, counter), maxlen=0)
return next(counter)
def begin_matches_end(s):
"""
Checks if the begin matches the end of the string
:param s: Input string of length > 0
:type s: ```str```
:return: Whether the beginning matches the end (checks first match chars
:rtype: ```bool```
"""
return (count_iter_items(takewhile(partial(eq, s[0]), s)) ==
count_iter_items(takewhile(partial(eq, s[0]), s[::-1])))

Solution with replaced parts of the string
s = 'lolololol'
t = 0
t += s.count('lol')
s = s.replace('lol', 'lo1')
t += s.count('1ol')
print("Number of times lol occurs is:", t)
Answer is 4.

If you want to count permutation counts of length 5 (adjust if wanted for different lengths):
def MerCount(s):
for i in xrange(len(s)-4):
d[s[i:i+5]] += 1
return d

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing consecutive duplicates in a string [duplicate] - python

I use a standard approach for removing consecutive duplicates. For each character, keep incrementing the pointer till the current and the next are same. ans = "" s = 'tadayutaysgcgtttggytytyyyikk' i=0 while i<len(s): cnt = 1 ans = ans+s[i] while i+1<len(s) and s[i]==s[i+1]: i+=1 i+=1 ans

Related

Divide string into pairs [duplicate]

PYTHON how I separate numbers by dots every three steps? [duplicate]

Reversing a string in Python using a loop? [duplicate]

Count overlapping substring in a string [duplicate]

String count with overlapping occurrences [closed]

Categories

Resources