How to remove duplicates only if consecutive in a string? [duplicate] - python

This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
For a string such as '12233322155552', by removing the duplicates, I can get '1235'.
But what I want to keep is '1232152', only removing the consecutive duplicates.

import re
# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')
# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')

You can use itertools, here is the one liner
>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'

Microsoft / Amazon job interview type of question:
This is the pseudocode, the actual code is left as exercise.
for each char in the string do:
if the current char is equal to the next char:
delete next char
else
continue
return string
As a more high level, try (not actually the implementation):
for s in string:
if s == s+1: ## check until the end of the string
delete s+1

Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:
itertools.groupby(iterable[, key])
Make an iterator that returns consecutive keys and groups from
the iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged. Generally, the iterable
needs to already be sorted on the same key function.
So since strings are iterable, what you could do is:
use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together
which can all be done in one clean line..

First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).
M first approach would be:
foo = '12233322155552'
bar = ''
for chr in foo:
if bar == '' or chr != bar[len(bar)-1]:
bar += chr
or, using the itertools hint from above:
''.join([ k[0] for k in groupby(a) ])

+1 for groupby. Off the cuff, something like:
from itertools import groupby
def remove_dupes(arg):
# create generator of distinct characters, ignore grouper objects
unique = (i[0] for i in groupby(arg))
return ''.join(unique)
Cooks for me in Python 2.7.2

number = '12233322155552'
temp_list = []
for item in number:
if len(temp_list) == 0:
temp_list.append(item)
elif len(temp_list) > 0:
if temp_list[-1] != item:
temp_list.append(item)
print(''.join(temp_list))

This would be a way:
def fix(a):
list = []
for element in a:
# fill the list if the list is empty
if len(list) == 0:list.append(element)
# check with the last element of the list
if list[-1] != element: list.append(element)
print(''.join(list))
a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi

t = '12233322155552'
for i in t:
dup = i+i
t = re.sub(dup, i, t)
You can get final output as 1232152

Related

Which item in list - Python

I am making a console game using python and I am checking if an item is in a list using:
if variable in list:
I want to check which variable in that list it was like list[0] for example. Any help would be appreciated :)
You can do it using the list class attribute index as following:
list.index(variable)
Index gives you an integer that matches the location of the first appearance of the value you are looking for, and it will throw an error if the value is not found.
If you are already checking if the value is in the list, then within the if statement you can get the index by:
if variable in list:
variable_at = list.index(variable)
Example:
foo = ['this','is','not','This','it','is','that','This']
if 'This' in foo:
print(foo.index('This'))
Outputs:
3
Take a look at the answer below, which has more complete information.
Finding the index of an item in a list
We may be inspired from other languages such as Javascript and create a function which returns index if item exists or -1 otherwise.
list_ = [5, 6, 7, 8]
def check_element(alist: list, item: any):
if item in alist:
return alist.index(item)
else:
return -1
and the usage is
check1 = check_element(list_, 5)
check2 = check_element(list_, 9)
and this one is for one line lovers
check_element_one_liner = lambda alist, item: alist.index(item) if item in alist else -1
alternative_check1 = check_element_one_liner(list_, 5)
alternative_check2 = check_element_one_liner(list_, 9)
and a bit shorter version :)
check_shorter = lambda a, i: a.index(i) if i in a else -1
Using a librairy you could use numpy's np.where(list == variable).
In vanilla Python, I can think of something like:
idx = [idx for idx, item in enumerate(list) if item == variable][0]
But this solution is not fool proof, for instance, if theres no matching results, it will crash. You could complete this using an if right before:
if variable in list:
idx = [idx for idx, item in enumerate(list) if item == variable][0]
else:
idx = None
I understand that you want to get a sublist containing only the elements of the original list that match a certain condition (in your example case, you want to extract all the elements that are equal to the first element of the list).
You can do that by using the built-in filter function which allows you to produce a new list containing only the elements that match a specific condition.
Here's an example:
a = [1,1,1,3,4]
variable = a[0]
b = list(filter(lambda x : x == variable, a)) # [1,1,1]
This answer assumes that you only search for one (the first) matching element in the list.
Using the index method of a list should be the way to go. You just have to wrap it in a try-except statement. Here is an alternative version using next.
def get_index(data, search):
return next((index for index, value in enumerate(data) if value == search), None)
my_list = list('ABCDEFGH')
print(get_index(my_list, 'C'))
print(get_index(my_list, 'X'))
The output is
2
None
assuming that you want to check that it exists and get its index, the most efficient way is to use list.index , it returns the first item index found, otherwise it raises an error so it can be used as follows:
items = [1,2,3,4,5]
item_index = None
try:
item_index = items.index(3) # look for 3 in the list
except ValueError:
# do item not found logic
print("item not found") # example
else:
# do item found logic knowing item_index
print(items[item_index]) # example, prints 3
also please avoid naming variables list as it overrides the built-in function list.
If you simply want to check if the number is in the list and print it or print it's index, you could simply try this:
ls = [1,2,3]
num = 2
if num in ls:
# to print the num
print(num)
# to print the index of num
print(ls.index(num))
else:
print('Number not in the list')
animals = ['cat', 'dog', 'rabbit', 'horse']
index = animals.index('dog')
print(index)

Contradictory outputs in simple recursive function

Note: Goal of the function is to remove duplicate(repeated) characters.
Now for the same given recursive function, different output pops out for different argument:
def rd(x):
if x[0]==x[-1]:
return x
elif x[0]==x[1]:
return rd(x[1: ])
else:
return x[0]+rd(x[1: ])
print("Enter a sentence")
r=raw_input()
print("simplified: "+rd(r))
This functions works well for the argument only if the duplicate character is within the starting first six characters of the string, for example:
if r=abcdeeeeeeefghijk or if r=abcdeffffffghijk
but if the duplicate character is after the first six character then the output is same as the input,i.e, output=input. That means with the given below value of "r", the function doesn't work:
if r=abcdefggggggggghijkde (repeating characters are after the first six characters)
The reason you function don't work properly is you first if x[0]==x[-1], there you check the first and last character of the substring of the moment, but that leave pass many possibility like affffffa or asdkkkkkk for instance, let see why:
example 1: 'affffffa'
here is obvious right?
example 2: 'asdkkkkkk'
here we go for case 3 of your function, and then again
'a' +rd('sdkkkkkk')
'a'+'s' +rd('dkkkkkk')
'a'+'s'+'d' +rd('kkkkkk')
and when we are in 'kkkkkk' it stop because the first and last are the same
example 3: 'asdfhhhhf'
here is the same as example 2, in the recursion chain we arrive to fhhhhf and here the first and last are the same so it leave untouched
How to fix it?, simple, as other have show already, check for the length of the string first
def rd(x):
if len(x)<2: #if my string is 1 or less character long leave it untouched
return x
elif x[0]==x[1]:
return rd(x[1: ])
else:
return x[0]+rd(x[1: ])
here is alternative and iterative way of doing the same: you can use the unique_justseen recipe from itertools recipes
from itertools import groupby
from operator import itemgetter
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))
def clean(text):
return "".join(unique_justseen(text)
test
>>> clean("abcdefggggggggghijk")
'abcdefghijk'
>>> clean("abcdefghijkkkkkkkk")
'abcdefghijk'
>>> clean("abcdeffffffghijk")
'abcdefghijk'
>>>
and if you don't want to import anything, here is another way
def clean(text):
result=""
last=""
for c in text:
if c!=last:
last = c
result += c
return result
The only issue I found with you code was the first if statement. I assumed you used it to make sure that the string was at least 2 long. It can be done using string modifier len() in fact the whole function can but we will leave it recursive for OP sake.
def rd(x):
if len(x) < 2: #Modified to return if len < 2. accomplishes same as original code and more
return x
elif x[0]==x[1]:
return rd(x[1: ])
else:
return x[0]+rd(x[1: ])
r=raw_input("Enter a sentence: ")
print("simplified: "+rd(r))
I would however recommend not making the function recursive and instead mutating the original string as follows
from collections import OrderedDict
def rd(string):
#assuming order does matter we will use OrderedDict, no longer recursive
return "".join(OrderedDict.fromkeys(string)) #creates an empty ordered dict eg. ({a:None}), duplicate keys are removed because it is a dict
#grabs a list of all the keys in dict, keeps order because list is orderable
#joins all items in list with '', becomes string
#returns string
r=raw_input("Enter a sentence: ")
print("simplified: "+rd(r))
Your function is correct but, if you want to check the last letter, the function must be:
def rd(x):
if len(x)==1:
return x
elif x[0]==x[1]:
return rd(x[1: ])
else:
return x[0]+rd(x[1: ])
print("Enter a sentence")
r=raw_input()
print("simplified: "+rd(r))

Python string function that removes one duplicate pair from multiple duplicates

I'm looking for a string function that removes one duplicate pair from multiple duplicates.
What i'd like the function to do:
input = ['a','a','a','b','b','c','d','d','d','d']
output = ['a','c']
heres what I have so far:
def double(lijst):
"""
returns all duplicates in the list as a set
"""
res = set()
zien = set()
for x in lijst:
if x in zien or zien.add(x):
res.add(x)
return(res)
def main():
list_1 = ['a','a','a','b','b','c']
list_2 = set(list_1)
print(list_2 - double(list_1))
main()
The problem being that it removes all duplicates, and doesn't leave the 'a'. Any ideas how to approach this problem?
For those interested why I need this; I want to track when a levehnstein function is processing vowel steps, if a vowel is being inserted or deleted I want to assign a different value to 'that step' (first I need to tract if a vowel has passed on either side of the matrix before the current step though) hence I need to remove duplicate pairs from a vowel list (as explained in the input output example).
These solves your problem. Take a look.
lsit = ['a','a','a','b','b','c']
for i in lsit:
temp = lsit.count(i)
if temp%2==0:
for x in range(temp):
lsit.remove(i)
else:
for x in range(temp-1):
lsit.remove(i)
print lsit
Output:
['a','c']
Just iterate through the list. If an element does not exist in the result, add it to the set. Or if there does already have one in the set, cancel out those two element.
The code is simple:
def double(l):
"""
returns all duplicates in the list as a set
"""
res = set()
for x in l:
if x in res:
res.remove(x)
else:
res.add(x)
return res
input = ['a','a','a','b','b','c','d','d','d','d']
print double(input)

How would one alternately add 2 characters into a string in python?

Like, for example, I have the string '12345' and the string '+*' and I want to make it so that the new string would be '1+2*3+4*5', alternating between the two characters in the second string. I know how to do it with one character using join(), but I just can't figure out how to do it with both alternating. Any help would be greatly appreciated. Thanks!
You could use itertools.cycle() to forever alternate between the characters:
from itertools import cycle
result = ''.join([c for pair in zip(inputstring, cycle('+*')) for c in pair])[:-1]
You do need to remove that last + added on, but this does work just fine otherwise:
>>> from itertools import cycle
>>> inputstring = '12345'
>>> ''.join([c for pair in zip(inputstring, cycle('+*')) for c in pair])[:-1]
'1+2*3+4*5'
import itertools
s = '12345'
op = '+*'
answer = ''.join(itertools.chain.from_iterable(zip(s, itertools.cycle(op))))[:-1]
print(answer)
Output:
1+2*3+4*5
You could use this code:
string = "12345"
separator = "+*"
result = ""
for i, c in enumerate(string): //enumerate returns a list of tuples [index, character]
t = i, c
result += t[1] //append character
if(t[0]==len(string)-1): //if reached max length
break
if(t[0]%2==0): //if even
result += separator[0] //append +
else:
result += separator[1] //append *
print(result) //otuput "1+2*3+4*5"
Following works without having to trim the end.
''.join(map(lambda x: x[0] + x[1],izip_longest('12345',''.join(repeat('*+',len('12345')/2)),fillvalue='')))
From python documentation;
itertools.izip_longest(*iterables[, fillvalue]): Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.

Skip an iteration while looping through a list - Python

Is there a way to skip the first iteration in this for-loop, so that I can put a for-loop inside a for-loop in order to compare the first element in the list with the rest of them.
from collections import Counter
vowelCounter = Counter()
vowelList = {'a','e','i','o','u'}
userString = input("Enter a string ")
displayed = False
for letter in userString:
letter = letter.lower()
if letter in vowelList:
vowelCounter[letter] +=1
for vowelCount1 in vowelCounter.items():
char, count = vowelCount1
for vowelCount2 in vowelCounter.items(STARTING AT 2)
char2, count2 = vowelCount2
if count > count2 : CONDITION
How would the syntax go for this? I only need to do a 5 deep For-loop. So the next would Start at 3, then start at 4, then 5, the the correct print statement depending on the condition.
Thanks
You could do:
for vowelCount2 in vowelCounter.items()[1:]:
This will give you all the elements of vowelCounter.items() except the first one.
The [1:] means you're slicing the list and it means: start at index 1 instead of at index 0. As such you're excluding the first element from the list.
If you want the index to depend on the previous loop you can do:
for i, vowelCount1 in enumerate(vowelCounter.items()):
# ...
for vowelCount2 in vowelCounter.items()[i:]:
# ...
This means you're specifying i as the starting index and it depends on the index of vowelCount1. The function enumerate(mylist) gives you an index and an element of the list each time as you're iterating over mylist.
It looks like what you want is to compare each count to every other count. While you can do what you suggested, a more succinct way might be to use itertools.combinations:
for v1,v2 in itertools.combinations(vowelCounter, 2):
if vowelCounter[v1] > vowelCounter[v2]:
# ...
This will iterate over all pairs of vowels for comparison. Doing it this way, you may also want to check if vowelCounter[v2] > vowelCounter[v1] as you won't see these two again (this goes for this method or the nested for loop method). Or, you can use the itertools.permutations function with the same arguments and just one check would suffice.
To skip an iteration you can use the continue keyword eg:
list = [1,2,3,4,5,6,7,8,9,10]
for value in list:
if value == list[0]:
continue
print(value)
Would give you:
2
3
4
5
6
7
8
9
10
I hope this answers your question.
Slicing a list with [1:] as suggested by a few others creates a new array. It is faster and more economic to use a slice iterator with itertools.islice()
from itertools import islice
for car in islice(cars, 1, None):
# do something

Categories

Resources