How to remove duplicated characters from a string? [duplicate] - python

This question already has answers here:
Removing duplicate characters from a string
(15 answers)
Closed 3 years ago.
How can I remove all repeated characters from a string?
e.g:
Input: string = 'Hello'
Output: 'Heo'
different question from Removing duplicate characters from a string as i don't want to print out the duplicates but i want to delete them.

You can use a generator expression and join like,
>>> x = 'Hello'
>>> ''.join(c for c in x if x.count(c) == 1)
'Heo'

You could construct a Counter from the string, and retrieve elements from it looking up in the counter which appear only once:
from collections import Counter
c = Counter(string)
''.join([i for i in string if c[i]==1])
# 'Heo'

a = 'Hello'
list_a = list(a)
output = []
for i in list_a:
if list_a.count(i) == 1:
output.append(i)
''.join(output)

In addition to the other answers, a filter is also possible:
s = 'Hello'
result = ''.join(filter(lambda c: s.count(c) == 1, s))
# result - Heo

If you limit your question to cases with only repeated consecutive letters (as your example suggests), you could employ regular expressions:
import re
print(re.sub(r"(.)\1+", "", "hello")) # result = heo
print(re.sub(r"(.)\1+", "", "helloo")) # result = he
print(re.sub(r"(.)\1+", "", "hellooo")) # result = he
print(re.sub(r"(.)\1+", "", "sports")) # result = sports
If you need to re-apply the regular expression many times, its worth to compile it beforehand:
prog = re.compile(r"(.)\1+")
print(prog.sub("", "hello"))
To restrict the search for duplicated letters on some subset of characters, you can adjust the regular expression accordingly.
print(re.sub(r"(\S)\1+", "", "hello")) # Search duplicated non-whitespace chars
print(re.sub(r"([a-z])\1+", "", "hello")) # Search for duplicated lowercase letters
Alternatively, an approach using list comprehension could look as follows:
from itertools import groupby
dedup = lambda s: "".join([i for i, g in groupby(s) if len(list(g))==1])
print(dedup("hello")) # result = heo
print(dedup("helloo")) # result = he
print(dedup("hellooo")) # result = he
print(dedup("sports")) # result = sports
Note that the first method using regular expressions was on my machine about 8-10 times faster than the second one. (System: python 3.6.7, MacBook Pro (Mid 2015))

Related

Finding exact number of characters in word

I'm looking for a way to find words with the exact number of the given character.
For example:
If we have this input: ['teststring1','strringrr','wow','strarirngr'] and we are looking for 4 r characters
It will return only ['strringrr','strarirngr'] because they are the words with 4 letters r in it.
I decided to use regex and read the documentation and I can't find a function that satisfies my needs.
I tried with [r{4}] but it apparently returns any word with letters r in it.
Please help
something like this:
import collections
def map_characters(string):
characters = collections.defaultdict(lambda: 0)
for char in string:
characters[char] += 1
return characters
items = ['teststring1','strringrr','wow','strarirngr']
for item in items:
characters_map = map_characters(item)
# if any of string has 4 identical letters
# we print it
if max(characters_map.values()) >= 4:
print(item)
# in the result it outputs strringrr and strarirngr
# because these words have 4 r letters
You can use str.count() to count the occurrences of a character, combined with list comprehensions to create a new list:
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
filtered = [item for item in myArray if item.count(letter) == amount]
print(filtered) # ['strringrr', 'strarirngr']
If you wanted to make this reusable (to look for different letters or different amounts), you could pack it into a function:
def filterList(stringList, pattern, occurrences):
return [item for item in stringList if item.count(pattern)==occurrences]
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
print(filterList(myArray, letter, amount)) # ['strringrr', 'strarirngr']
The square brackets are for matching any items in the set, e.g. [abc] matches any words with a,b or c. In your case, it evaluates to [rrrr], so any one r is a match. Try it without the brackets: r{4}
Since you asked about using regex, you could use the following:
import re
l = ['teststring1', 'strringrr', 'wow', 'strarirngr']
[ word for word in l if re.match(r'(.*r.*){4}', word) ]
output: ['strringrr', 'strarirngr']

How can we remove word with repeated single character?

I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.
A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g
If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string
You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']

Print the first, second occurred character in a list

I working on a simple algorithm which prints the first character who occurred twice or more.
for eg:
string ='abcabc'
output = a
string = 'abccba'
output = c
string = 'abba'
output = b
what I have done is:
string = 'abcabc'
s = []
for x in string:
if x in s:
print(x)
break
else:
s.append(x)
output: a
But its time complexity is O(n^2), how can I do this in O(n)?
Change s = [] to s = set() (and obviously the corresponding append to add). in over set is O(1), unlike in over list which is sequential.
Alternately, with regular expressions (O(n^2), but rather fast and easy):
import re
match = re.search(r'(.).*\1', string)
if match:
print(match.group(1))
The regular expression (.).*\1 means "any character which we'll remember for later, any number of intervening characters, then the remembered character again". Since regexp is scanned left-to-right, it will find a in "abba" rather than b, as required.
Use dictionaries
string = 'abcabc'
s = {}
for x in string:
if x in s:
print(x)
break
else:
s[x] = 0
or use sets
string = 'abcabc'
s = set()
for x in string:
if x in s:
print(x)
break
else:
s.add(x)
both dictionaries and sets use indexing and search in O(1)

python - Replace several different characters by only one [duplicate]

This question already has answers here:
how to replace multiple characters in a string?
(3 answers)
Closed 5 years ago.
I'm looking for a way to replace some characters by another one.
For example we have :
chars_to_be_replaced = "ihgr"
and we want them to be replaced by
new_char = "b"
So that the new string
s = "im hungry"
becomes
s' = "bm bunbby".
I'm well aware you can do this one char at a time with .replace or with regular expressions, but I'm looking for a way to go only once through the string.
Does the re.sub goes only once through the string ? Are there other ways to do this ? Thanks
Thanks
You can use string.translate()
from string import maketrans
chars_to_be_replaced = "ihgr"
new_char = "b"
s = "im hungry"
trantab = maketrans(chars_to_be_replaced, new_char * len(chars_to_be_replaced))
print s.translate(trantab)
# bm bunbby
How about this:
chars_to_be_replaced = "ihgr"
new_char = "b"
my_dict = {k: new_char for k in chars_to_be_replaced}
s = "im hungry"
new_s = ''.join(my_dict.get(x, x) for x in s)
print(new_s) # bm bunbby
''.join(my_dict.get(x, x) for x in s): for each letter in your original string it tries to get it's dictionary value instead unless it does not exist in which case the original is returned.
NOTE: You can speed it up (a bit) by passing a list to join instead of a generator:
new_s = ''.join([my_dict.get(x, x) for x in s])

python string manipulation [duplicate]

I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()

Categories

Resources