Removing duplicate characters from a string - python

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:
foo = 'mppmt'
How can I make the string:
foo = 'mpt'
NOTE: Order is not important

If order does not matter, you can use
"".join(set(foo))
set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.
If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)
foo = "mppmt"
result = "".join(dict.fromkeys(foo))
resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

If order does matter, how about:
>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'

If order is not the matter:
>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'
To keep the order:
>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'

Create a list in Python and also a set which doesn't allow any duplicates.
Solution1 :
def fix(string):
s = set()
list = []
for ch in string:
if ch not in s:
s.add(ch)
list.append(ch)
return ''.join(list)
string = "Protiijaayiiii"
print(fix(string))
Method 2 :
s = "Protijayi"
aa = [ ch for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))
Method 3 :
dd = ''.join(dict.fromkeys(a))
print(dd)

As was mentioned "".join(set(foo)) and collections.OrderedDict will do.
A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.
from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))
prints eugnhsaw

#Check code and apply in your Program:
#Input= 'pppmm'
s = 'ppppmm'
s = ''.join(set(s))
print(s)
#Output: pm

If order is important,
seen = set()
result = []
for c in foo:
if c not in seen:
result.append(c)
seen.add(c)
result = ''.join(result)
Or to do it without sets:
result = []
for c in foo:
if c not in result:
result.append(c)
result = ''.join(result)

def dupe(str1):
s=set(str1)
return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)
works well if order is not important.

d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
if c not in d:
res.append(c)
d[c]=1
print ("".join(res))
variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

Using regular expressions:
import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1' # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)
output:
sh!

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.
"".join(list(dict.fromkeys(foo)))

Functional programming style while keeping order:
import functools
def get_unique_char(a, b):
if b not in a:
return a + b
else:
return a
if __name__ == '__main__':
foo = 'mppmt'
gen = functools.reduce(get_unique_char, foo)
print(''.join(list(gen)))

def remove_duplicates(value):
var=""
for i in value:
if i in value:
if i in var:
pass
else:
var=var+i
return var
print(remove_duplicates("11223445566666ababzzz###123#*#*"))

from collections import OrderedDict
def remove_duplicates(value):
m=list(OrderedDict.fromkeys(value))
s=''
for i in m:
s+=i
return s
print(remove_duplicates("11223445566666ababzzz###123#*#*"))

mylist=["ABA", "CAA", "ADA"]
results=[]
for item in mylist:
buffer=[]
for char in item:
if char not in buffer:
buffer.append(char)
results.append("".join(buffer))
print(results)
output
ABA
CAA
ADA
['AB', 'CA', 'AD']

Related

Print the first, second occurred character in a list

I working on a simple algorithm which prints the first character who occurred twice or more.
for eg:
string ='abcabc'
output = a
string = 'abccba'
output = c
string = 'abba'
output = b
what I have done is:
string = 'abcabc'
s = []
for x in string:
if x in s:
print(x)
break
else:
s.append(x)
output: a
But its time complexity is O(n^2), how can I do this in O(n)?
Change s = [] to s = set() (and obviously the corresponding append to add). in over set is O(1), unlike in over list which is sequential.
Alternately, with regular expressions (O(n^2), but rather fast and easy):
import re
match = re.search(r'(.).*\1', string)
if match:
print(match.group(1))
The regular expression (.).*\1 means "any character which we'll remember for later, any number of intervening characters, then the remembered character again". Since regexp is scanned left-to-right, it will find a in "abba" rather than b, as required.
Use dictionaries
string = 'abcabc'
s = {}
for x in string:
if x in s:
print(x)
break
else:
s[x] = 0
or use sets
string = 'abcabc'
s = set()
for x in string:
if x in s:
print(x)
break
else:
s.add(x)
both dictionaries and sets use indexing and search in O(1)

Find common characters between two strings

I am trying to print the common letters from two different user inputs using a for loop. (I need to do it using a for loop.) I am running into two problems: 1. My statement "If char not in output..." is not pulling unique values. 2. The output is giving me a list of individual letters rather than a single string. I tried the split the output but split ran into a type error.
wrd = 'one'
sec_wrd = 'toe'
def unique_letters(x):
output =[]
for char in x:
if char not in output and char != " ":
output.append(char)
return output
final_output = (unique_letters(wrd) + unique_letters(sec_wrd))
print(sorted(final_output))
You are trying to perform the Set Intersection. Python has set.intersection method for the same. You can use it for your use-case as:
>>> word_1 = 'one'
>>> word_2 = 'toe'
# v join the intersection of `set`s to get back the string
# v v No need to type-cast it to `set`.
# v v Python takes care of it
>>> ''.join(set(word_1).intersection(word_2))
'oe'
set will return the unique characters in your string. set.intersection method will return the characters which are common in both the sets.
If for loop is must for you, then you may use a list comprehension as:
>>> unique_1 = [w for w in set(word_1) if w in word_2]
# OR
# >>> unique_2 = [w for w in set(word_2) if w in word_1]
>>> ''.join(unique_1) # Or, ''.join(unique_2)
'oe'
Above result could also be achieved with explicit for loop as:
my_str = ''
for w in set(word_1):
if w in word_2:
my_str += w
# where `my_str` will hold `'oe'`
For this kind of problem, you're probably better off using sets:
wrd = 'one'
sec_wrd = 'toe'
wrd = set(wrd)
sec_wrd = set(sec_wrd)
print(''.join(sorted(wrd.intersection(sec_wrd))))
I have just solved this today on code signal. It worked for all tests.
def solution(s1, s2):
common_char = ""
for i in s1:
if i not in common_char:
i_in_s1 = s1.count(i)
i_in_s2 = s2.count(i)
comm_num = []
comm_num.append(i_in_s1)
comm_num.append(i_in_s2)
comm_i = min(comm_num)
new_char = i * comm_i
common_char += new_char
return len(common_char)
Function to solve the problem
def find_common_characters(msg1,msg2):
#to remove duplication set() is used.
set1=set(msg1)
set2=set(msg2)
remove={" "}
#if you wish to exclude space
set3=(set1&set2)-remove
msg=''.join(set3)
return msg
Providing input and Calling the function
Provide different values for msg1,msg2 and test your program
msg1="python"
msg2="Python"
common_characters=find_common_characters(msg1,msg2)
print(common_characters)
Here is your one line code if you want the number of common character between them!
def solution(s1,s2):
return sum(min(s1.count(x),s2.count(x)) for x in set(s1))

Replacing strings, not characters without the use of .replace and joining the strings the characters

Question has been asked that is similar but all post on here refer to replacing single characters. I'm trying to replace a whole word in a string. I've replaced it but I cant print it with spaces in between.
Here is the function replace that replaces it:
def replace(a, b, c):
new = b.split()
result = ''
for x in new:
if x == a:
x = c
result +=x
print(' '.join(result))
Calling it with:
replace('dogs', 'I like dogs', 'kelvin')
My result is this:
i l i k e k e l v i n
What I'm looking for is:
I like kelvin
The issue here is that result is a string and when join is called it will take each character in result and join it on a space.
Instead, use a list , append to it (it's also faster than using += on strings) and print it out by unpacking it.
That is:
def replace(a, b, c):
new = b.split(' ')
result = []
for x in new:
if x == a:
x = c
result.append(x)
print(*result)
print(*result) will supply the elements of the result list as positional arguments to print which prints them out with a default white space separation.
"I like dogs".replace("dogs", "kelvin") can of course be used here but I'm pretty sure that defeats the point.
Substrings and space preserving method:
def replace(a, b, c):
# Find all indices where 'a' exists
xs = []
x = b.find(a)
while x != -1:
xs.append(x)
x = b.find(a, x+len(a))
# Use slice assignment (starting from the last index)
result = list(b)
for i in reversed(xs):
result[i:i+len(a)] = c
return ''.join(result)
>>> replace('dogs', 'I like dogs dogsdogs and hotdogs', 'kelvin')
'I like kelvin kelvinkelvin and hotkelvin'
Just make result a list, and the joining will work:
result = []
You are just generating one long string and join its chars.

Cycle through string

I need to be able to cycle through a string of characters using the modulo operator so that each character can be passed to a function. This is a simple question, I know, but I am seriously confused as to how to do it. Here is what I have but it gives me the error "TypeError: not all arguments converted during string formatting". Any suggestions would be appreciated.
key = 'abc'
def encrypt(key,string):
c = ''
for i in range(0,len(string)):
t = (key)%3
a = XOR(ord(string[i]),ord(t))
b = chr(a)
c = c + b
return(c)
Ingredients
Here are some ingredients which help you write your encrypt function in a concise way:
You can directly iterate over the characters of a string:
>>> my_string = 'hello'
>>> for c in my_string:
... print(c)
...
h
e
l
l
o
You can cycle through any iterable (like, for example, a string) using the cycle function from the itertools module of the standard library:
>>> from itertools import cycle
>>> for x in cycle('abc'):
... print(x)
...
a
b
c
a
b
c
a
# goes on infinitely, abort with Ctrl-C
You can use the zip function to iterate over two sequences at the same time:
>>> for a, b in zip('hello', 'world'):
... print(a, b)
...
h w
e o
l r
l l
o d
Edit: as kichik suggests, you can also use itertools.izip which is beneficial if you deal with very large input strings.
You calculate the xor of two numbers by using the ^ operator:
>>> 5 ^ 3
6
You can concatenate a sequence of individual strings to a single string using the join function:
>>> ''.join(['hello', 'how', 'are', 'you'])
'hellohowareyou'
You can feed join with a so-called generator expression, which is similar to a for loop, but as a single expression:
>>> ''.join(str(x+5) for x in range(3))
'567'
Putting it all together
from itertools import cycle, izip
def encrypt(key, string):
return ''.join(chr(ord(k) ^ ord(c))
for k, c in izip(cycle(key), string))
You can (probably should) iterate over the characters without using range() and you'll want to run ord() on each character before you run mod on it, as the % operator means something else for a string. This should work:
key = 'abc'
for c in key:
print XOR(ord(c), ord(c) % 3)
You can use itertools.cycle() to cycle through the key and itertools.izip() for an easy combination of the two.
import itertools
def encrypt(key, string):
keyi = itertools.cycle(key)
result = ''
for k, v in itertools.izip(keyi, string):
a = ord(v) ^ ord(k)
result += chr(a)
return result
And then use it like this:
>>> encrypt('\x00', 'abc')
'abc'
>>> encrypt('\x01', 'abc')
'`cb'
You got an error about formatting because % is not a modulo operator for strings. It's used for formatting strings. You probably meant to use something like this:
key[i%3]

How would one alternately add 2 characters into a string in python?

Like, for example, I have the string '12345' and the string '+*' and I want to make it so that the new string would be '1+2*3+4*5', alternating between the two characters in the second string. I know how to do it with one character using join(), but I just can't figure out how to do it with both alternating. Any help would be greatly appreciated. Thanks!
You could use itertools.cycle() to forever alternate between the characters:
from itertools import cycle
result = ''.join([c for pair in zip(inputstring, cycle('+*')) for c in pair])[:-1]
You do need to remove that last + added on, but this does work just fine otherwise:
>>> from itertools import cycle
>>> inputstring = '12345'
>>> ''.join([c for pair in zip(inputstring, cycle('+*')) for c in pair])[:-1]
'1+2*3+4*5'
import itertools
s = '12345'
op = '+*'
answer = ''.join(itertools.chain.from_iterable(zip(s, itertools.cycle(op))))[:-1]
print(answer)
Output:
1+2*3+4*5
You could use this code:
string = "12345"
separator = "+*"
result = ""
for i, c in enumerate(string): //enumerate returns a list of tuples [index, character]
t = i, c
result += t[1] //append character
if(t[0]==len(string)-1): //if reached max length
break
if(t[0]%2==0): //if even
result += separator[0] //append +
else:
result += separator[1] //append *
print(result) //otuput "1+2*3+4*5"
Following works without having to trim the end.
''.join(map(lambda x: x[0] + x[1],izip_longest('12345',''.join(repeat('*+',len('12345')/2)),fillvalue='')))
From python documentation;
itertools.izip_longest(*iterables[, fillvalue]): Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.

Categories

Resources