Python: Best Way to remove duplicate character from string

Python: Best Way to remove duplicate character from string - python

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:
foo = "SSYYNNOOPPSSIISS"
How can I make the string:
foo = SYNOPSIS
I'm new to python and What I have tired and it's working. I knew there is smart and best way to do this.. and only experience can show this..
def RemoveDupliChar(Word):
NewWord = " "
index = 0
for char in Word:
if char != NewWord[index]:
NewWord += char
index += 1
print(NewWord.strip())
NOTE: Order is important and this question is not similar to this one.

Using itertools.groupby:
>>> foo = "SSYYNNOOPPSSIISS"
>>> import itertools
>>> ''.join(ch for ch, _ in itertools.groupby(foo))
'SYNOPSIS'

This is a solution without importing itertools:
foo = "SSYYNNOOPPSSIISS"
''.join([foo[i] for i in range(len(foo)-1) if foo[i+1]!= foo[i]]+[foo[-1]])
Out[1]: 'SYNOPSIS'
But it is slower than the others method!

How about this:
oldstring = 'SSSYYYNNNOOOOOPPPSSSIIISSS'
newstring = oldstring[0]
for char in oldstring[1:]:
if char != newstring[-1]:
newstring += char

def remove_duplicates(astring):
if isinstance(astring,str) :
#the first approach will be to use set so we will convert string to set and then convert back set to string and compare the lenght of the 2
newstring = astring[0]
for char in astring[1:]:
if char not in newstring:
newstring += char
return newstring,len(astring)-len(newstring)
else:
raise TypeError("only deal with alpha strings")
I've discover that solution with itertools and with list comprehesion even the solution when we compare the char to the last element of the list doesn't works

def removeDuplicate(s):
if (len(s)) < 2:
return s
result = []
for i in s:
if i not in result:
result.append(i)
return ''.join(result)

How about
foo = "SSYYNNOOPPSSIISS"
def rm_dup(input_str):
newstring = foo[0]
for i in xrange(len(input_str)):
if newstring[(len(newstring) - 1 )] != input_str[i]:
newstring += input_str[i]
else:
pass
return newstring
print rm_dup(foo)

You can try this:
string1 = "example1122334455"
string2 = "hello there"
def duplicate(string):
temp = ''
for i in string:
if i not in temp:
temp += i
return temp;
print(duplicate(string1))
print(duplicate(string2))

Related

remove consecutive substrings from a string without importing any packages

I want to remove consecutive "a" substrings and replace them with one "a" from a string without importing any packages.
For example, I want to get abbccca from aaabbcccaaaa.
Any suggestions?
Thanks.

This method will remove a determined repeated char from your string:
def remove_dulicated_char(string, char):
new_s = ""
prev = ""
for c in string:
if len(new_s) == 0:
new_s += c
prev = c
if c == prev and c == char:
continue
else:
new_s += c
prev = c
return new_s
print(remove_dulicated_char("aaabbcccaaaa", "a"))

Whats wrong with using a loop?
oldstring = 'aaabbcccaaaa'
# Initialise the first character as the same as the initial string
# as this will always be the same.
newstring = oldstring[0]
# Loop through each character starting at the second character
# check if the preceding character is an a, if it isn't add it to
# the new string. If it is an a then check if the current character
# is an a too. If the current character isn't an a then add it to
# the new string.
for i in range(1, len(oldstring)):
if oldstring[i-1] != 'a':
newstring += oldstring[i]
else:
if oldstring[i] != 'a':
newstring += oldstring[i]
print(newstring)

using python regular expressions this will do it.
If you don't know about regex. They are extremely powerful for
this kind of matching
import re
str = 'aaabbcccaaaa'
print(re.sub('a+', 'a', str))

You can use a function that removes double values of a string occurrence recursively until only one occurrence of the repeating string remains:
val = 'aaabbcccaaaaaaaaaaa'
def remove_doubles(v):
v = v.replace('aa', 'a')
if 'aa' in v:
v = remove_doubles(v)
if 'aa' in v:
v = remove_doubles(v)
else: return v
else: return v
print(remove_doubles(val))

There are many ways to do this. Here's another one:
def remove_duplicates(s, x):
t = [s[0]]
for c in s[1:]:
if c != x or t[-1] != x:
t.append(c)
return ''.join(t)
print(remove_duplicates('aaabbcccaaaa', 'a'))

How to remove Triplicate Letters in Python

So I'm a little confused as far as putting this small code together. My teacher gave me this info:
Iterate over the string and remove any triplicated letters (e.g.
"byeee mmmy friiiennd" becomes "bye my friennd"). You may assume any
immediate following same letters are a triplicate.
I've mostly only seen examples for duplicates, so how do I remove triplicates? My code doesn't return anything when I run it.
def removeTriplicateLetters(i):
result = ''
for i in result:
if i not in result:
result.append(i)
return result
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()

I have generalized the scenario with "n". In your case, you can pass n=3 as below
def remove_n_plicates(input_string, n):
i=0
final_string = ''
if not input_string:
return final_string
while(True):
final_string += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return final_string
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd
You can use this for any "n" value now (where n > 0 and n < length of input string)

Your code returns an empty string because that's exactly what you coded:
result = ''
for i in result:
...
return result
Since result is an empty string, you don't enter the loop at all.
If you did enter the loop you couldn't return anything:
for i in result:
if i not in result:
The if makes no sense: to get to that statement, i must be in result
Instead, do as #newbie showed you. Iterate through the string, looking at a 3-character slice. If the slice is equal to 3 copies of the first character, then you've identified a triplet.
if input_string[i:i+n] == input_string[i]*n:

Without going in to writing the code to resolve the problem.
When you iterate over the string, add that iteration to a new string.
If the next iteration is the same as the previous iteration then do not add that to the new string.
This will catch both the triple and the double characters in your problem.
Tweaked a previous answer to remove a few lines that were not needed.
def remove_n_plicates(input_string, n):
i=0
result = ''
while(True):
result += input_string[i]
if input_string[i:i+n] == input_string[i]*n:
i += n
else:
i += 1
if i >= len(input_string):
break
return result
input_string = "byeee mmmy friiiennd"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
# bye my friennd

Here's a fun way using itertools.groupby:
def removeTriplicateLetters(s):
return ''.join(k*(l//3+l%3) for k,l in ((k,len(list(g))) for k, g in groupby(s)))
>>> removeTriplicateLetters('byeee mmmy friiiennd')
'bye my friennd'

just modifying #newbie solution and using stack data structure as solution
def remove_n_plicates(input_string, n):
if input_string =='' or n<1:
return None
w = ''
c = 0
if input_string!='':
tmp =[]
for i in range(len(input_string)):
if c==n:
w+=str(tmp[-1])
tmp=[]
c =0
if tmp==[]:
tmp.append(input_string[i])
c = 1
else:
if input_string[i]==tmp[-1]:
tmp.append(input_string[i])
c+=1
elif input_string[i]!=tmp[-1]:
w+=str(''.join(tmp))
tmp=[input_string[i]]
c = 1
w+=''.join(tmp)
return w
input_string = "byeee mmmy friiiennd nnnn"
output_string = remove_n_plicates(input_string, 3)
print(output_string)
output
bye my friennd nn

so this is a bit dirty but it's short and works
def removeTriplicateLetters(i):
result,string = i[:2],i[2:]
for k in string:
if result[-1]==k and result[-2]==k:
result=result[:-1]
else:
result+=k
return result
print(removeTriplicateLetters('byeee mmmy friiiennd'))
bye my friennd

You have already got a working solution. But here, I come with another way to achieve your goal.
def removeTriplicateLetters(sentence):
"""
:param sentence: The sentence to transform.
:param words: The words in the sentence.
:param new_words: The list of the final words of the new sentence.
"""
words = sentence.split(" ") # split the sentence into words
new_words = []
for word in words: # loop through words of the sentence
new_word = []
for char in word: # loop through characters in a word
position = word.index(char)
if word.count(char) >= 3:
new_word = [i for i in word if i != char]
new_word.insert(position, char)
new_words.append(''.join(new_word))
return ' '.join(new_words)
def main():
print(removeTriplicateLetters('byeee mmmy friiiennd'))
main()
Output: bye my friennd

Split a string, loop through it character by character, and replace specific ones?

I'm working on an assignment and have gotten stuck on a particular task. I need to write two functions that do similar things. The first needs to correct capitalization at the beginning of a sentence, and count when this is done. I've tried the below code:
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
if sentence[0].islower():
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
Bu receive an out of range error. I'm getting an "Index out of range error" and am not sure why. Should't sentence[0] simply reference the first character in that particular string in the list?
I also need to replace certain characters with others, as shown below:
def replace_punctuation(usrStr):
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for sentence in s:
for i in sentence:
if i == '!':
sentence[i] = '.'
exclamationCount += 1
if i == ';':
sentence[i] = ','
semicolonCount += 1
newStr = ''.join(s)
print(newStr)
print(semicolonCount)
print(exclamationCount)
But I'm struggling to figure out how to actually do the replacing once the character is found. Where am I going wrong here?
Thank you in advance for any help!

I would use str.capitalize over str.upper on one character. It also works correctly on empty strings. The other major improvement would be to use enumerate to also track the index as you iterate over the list:
def fix_capitalization(s):
sentences = [sentence.strip() for sentence in s.split('.')]
count = 0
for index, sentence in enumerate(sentences):
capitalized = sentence.capitalize()
if capitalized != sentence:
count += 1
sentences[index] = capitalized
result = '. '.join(sentences)
return result, count
You can take a similar approach to replacing punctuation:
replacements = {'!': '.', ';': ','}
def replace_punctuation(s):
l = list(s)
counts = dict.fromkeys(replacements, 0)
for index, item in enumerate(l):
if item in replacements:
l[index] = replacements[item]
counts[item] += 1
print("Replacement counts:")
for k, v in counts.items():
print("{} {:>5}".format(k, v))
return ''.join(l)

There are better ways to do these things but I'll try to change your code minimally so you will learn something.
The first function's issue is that when you split the sentence like "Hello." there will be two sentences in your fixStr list that the last one is an empty string; so the first index of an empty string is out of range. fix it by doing this.
def fix_capitalization(usrStr):
count = 0
fixStr = usrStr.split('.')
for sentence in fixStr:
# changed line
if sentence != "":
sentence[0].upper()
count += 1
print('Number of letters capitalized: %d' % count)
print('Edited text: %s' % fixStr)
In second snippet you are trying to write, when you pass a string to list() you get a list of characters of that string. So all you need to do is to iterate over the elements of the list and replace them and after that get string from the list.
def replace_punctuation(usrStr):
newStr = ""
s = list(usrStr)
exclamationCount = 0
semicolonCount = 0
for c in s:
if c == '!':
c = '.'
exclamationCount += 1
if c == ';':
c = ','
semicolonCount += 1
newStr = newStr + c
print(newStr)
print(semicolonCount)
print(exclamationCount)
Hope I helped!

Python has a nice build in function for this
for str in list:
new_str = str.replace('!', '.').replace(';', ',')
You can write a oneliner to get a new list
new_list = [str.replace('!', '.').replace(';', ',') for str in list]
You also could go for the split/join method
new_str = '.'.join(str.split('!'))
new_str = ','.join(str.split(';'))
To count capitalized letters you could do
result = len([cap for cap in str if str(cap).isupper()])
And to capitalize them words just use the
str.capitalize()
Hope this works out for you

Changing lowercase characters to uppercase and vice-versa in Python

I'm trying to write a function which given a string returns another string that flips its lowercase characters to uppercase and viceversa.
My current aproach is as follows:
def swap_case(s):
word = []
for char in s:
word.append(char)
for char in word:
if char.islower():
char.upper()
else:
char.lower()
str1 = ''.join(word)
However the string doesn't actually change, how should I alter the characters inside the loop to do so?
PS: I know s.swapcase() would easily solve this, but I want to alter the characters inside the string during a for loop.

def swap_case(s):
swapped = []
for char in s:
if char.islower():
swapped.append(char.upper())
elif char.isupper():
swapped.append(char.lower())
else:
swapped.append(char)
return ''.join(swapped)

you can use swapcase.
string.swapcase(s)
Return a copy of s, but with lower case letters converted to upper case and vice versa.
Source : Python docs

>>> name = 'Mr.Ed' or name= ["M","r",".","E","d"]
>>> ''.join(c.lower() if c.isupper() else c.upper() for c in name)
'mR.eD'

Your code is not working because you are not storing the chars after transforming them. You created a list with all chars and then, you access the values without actually saving them. Also, you should make your function return str1 or you will never get a result. Here's a workaround:
def swap_case(s):
word = []
for char in s:
if char.islower():
word.append(char.upper())
else:
word.append(char.lower())
str1 = ''.join(word)
return str1
By the way, I know you want to do it with a for loop, but you should know there are better ways to perform this which would be more pythonic.

You can use this code:
s = "yourString"
res = s.swapcase()
print(res)
Which will output:
YOURsTRING

def swap_case(s):
x = ''
for i in s:
if i.isupper():
i = i.lower()
else:
i = i.upper()
x += ''.join(i)
return x

Swapping uppercase and lowercase in a string [duplicate]

This question already has answers here:
How can I invert (swap) the case of each letter in a string?
(8 answers)
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
Closed 6 months ago.
I would like to change the chars of a string from lowercase to uppercase.
My code is below, the output I get with my code is a; could you please tell me where I am wrong and explain why?
Thanks in advance
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
for word in words:
if word.isupper() == True:
return word.lower()
else:
return word.upper()
print to_alternating_case(test)

If you want to invert the case of that string, try this:
>>> 'AltERNating'.swapcase()
'aLTernATING'

There are two answers to this: an easy one and a hard one.
The easy one
Python has a built in function to do that, i dont exactly remember what it is, but something along the lines of
string.swapcase()
The hard one
You define your own function. The way you made your function is wrong, because
iterating over a string will return it letter by letter, and you just return the first letter instead of continuing the iteration.
def to_alternating_case(string):
temp = ""
for character in string:
if character.isupper() == True:
temp += character.lower()
else:
temp += word.upper()
return temp

Your loop iterates over the characters in the input string. It then returns from the very first iteration. Thus, you always get a 1-char return value.
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
rval = ''
for c in words:
if word.isupper():
rval += c.lower()
else:
rval += c.upper()
return rval
print to_alternating_case(test)

That's because your function returns the first character only. I mean return keyword breaks your for loop.
Also, note that is unnecessary to convert the string into a list by running words = list(string) because you can iterate over a string just as you did with the list.
If you're looking for an algorithmic solution instead of the swapcase() then modify your method this way instead:
test = "AltERNating"
def to_alternating_case(string):
res = ""
for word in string:
if word.isupper() == True:
res = res + word.lower()
else:
res = res + word.upper()
return res
print to_alternating_case(test)

You are returning the first alphabet after looping over the word alternating which is not what you are expecting. There are some suggestions to directly loop over the string rather than converting it to a list, and expression if <variable-name> == True can be directly simplified to if <variable-name>. Answer with modifications as follows:
test = "AltERNating"
def to_alternating_case(string):
result = ''
for word in string:
if word.isupper():
result += word.lower()
else:
result += word.upper()
return result
print to_alternating_case(test)
OR using list comprehension :
def to_alternating_case(string):
result =[word.lower() if word.isupper() else word.upper() for word in string]
return ''.join(result)
OR using map, lambda:
def to_alternating_case(string):
result = map(lambda word:word.lower() if word.isupper() else word.upper(), string)
return ''.join(result)

You should do that like this:
test = "AltERNating"
def to_alternating_case(string):
words = list(string)
newstring = ""
if word.isupper():
newstring += word.lower()
else:
newstring += word.upper()
return alternative
print to_alternating_case(test)

def myfunc(string):
i=0
newstring=''
for x in string:
if i%2==0:
newstring=newstring+x.lower()
else:
newstring=newstring+x.upper()
i+=1
return newstring

contents='abcdefgasdfadfasdf'
temp=''
ss=list(contents)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)
you can add this code inside a function also and in place of print use the return key

string=input("enter string:")
temp=''
ss=list(string)
for item in range(len(ss)):
if item%2==0:
temp+=ss[item].lower()
else:
temp+=ss[item].upper()
print(temp)

Here is a short form of the hard way:
alt_case = lambda s : ''.join([c.upper() if c.islower() else c.lower() for c in s])
print(alt_case('AltERNating'))
As I was looking for a solution making a all upper or all lower string alternating case, here is a solution to this problem:
alt_case = lambda s : ''.join([c.upper() if i%2 == 0 else c.lower() for i, c in enumerate(s)])
print(alt_case('alternating'))

You could use swapcase() method
string_name.swapcase()
or you could be a little bit fancy and use list comprehension
string = "thE big BROWN FoX JuMPeD oVEr thE LAZY Dog"
y = "".join([val.upper() if val.islower() else val.lower() for val in string])
print(y)
>>> 'THe BIG brown fOx jUmpEd OveR THe lazy dOG'

This doesn't use any 'pythonic' methods and gives the answer in a basic logical format using ASCII :
sentence = 'aWESOME is cODING'
words = sentence.split(' ')
sentence = ' '.join(reversed(words))
ans =''
for s in sentence:
if ord(s) >= 97 and ord(s) <= 122:
ans = ans + chr(ord(s) - 32)
elif ord(s) >= 65 and ord(s) <= 90 :
ans = ans + chr(ord(s) + 32)
else :
ans += ' '
print(ans)
So, the output will be : Coding IS Awesome

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Best Way to remove duplicate character from string - python

Using itertools.groupby: >>> foo = "SSYYNNOOPPSSIISS" >>> import itertools >>> ''.join(ch for ch, _ in itertools.groupby(foo)) 'SYNOPSIS'

This is a solution without importing itertools: foo = "SSYYNNOOPPSSIISS" ''.join([foo[i] for i in range(len(foo)-1) if foo[i+1]!= foo[i]]+[foo[-1]]) Out[1]: 'SYNOPSIS' But it is slower than the others method!

How about this: oldstring = 'SSSYYYNNNOOOOOPPPSSSIIISSS' newstring = oldstring[0] for char in oldstring[1:]: if char != newstring[-1]: newstring += char

def removeDuplicate(s): if (len(s)) < 2: return s result = [] for i in s: if i not in result: result.append(i) return ''.join(result)

How about foo = "SSYYNNOOPPSSIISS" def rm_dup(input_str): newstring = foo[0] for i in xrange(len(input_str)): if newstring[(len(newstring) - 1 )] != input_str[i]: newstring += input_str[i] else: pass return newstring print rm_dup(foo)

You can try this: string1 = "example1122334455" string2 = "hello there" def duplicate(string): temp = '' for i in string: if i not in temp: temp += i return temp; print(duplicate(string1)) print(duplicate(string2))

Related

remove consecutive substrings from a string without importing any packages

How to remove Triplicate Letters in Python

Split a string, loop through it character by character, and replace specific ones?

Changing lowercase characters to uppercase and vice-versa in Python

Swapping uppercase and lowercase in a string [duplicate]

Categories

Resources