I'm looking for a string function that removes one duplicate pair from multiple duplicates.
What i'd like the function to do:
input = ['a','a','a','b','b','c','d','d','d','d']
output = ['a','c']
heres what I have so far:
def double(lijst):
"""
returns all duplicates in the list as a set
"""
res = set()
zien = set()
for x in lijst:
if x in zien or zien.add(x):
res.add(x)
return(res)
def main():
list_1 = ['a','a','a','b','b','c']
list_2 = set(list_1)
print(list_2 - double(list_1))
main()
The problem being that it removes all duplicates, and doesn't leave the 'a'. Any ideas how to approach this problem?
For those interested why I need this; I want to track when a levehnstein function is processing vowel steps, if a vowel is being inserted or deleted I want to assign a different value to 'that step' (first I need to tract if a vowel has passed on either side of the matrix before the current step though) hence I need to remove duplicate pairs from a vowel list (as explained in the input output example).
These solves your problem. Take a look.
lsit = ['a','a','a','b','b','c']
for i in lsit:
temp = lsit.count(i)
if temp%2==0:
for x in range(temp):
lsit.remove(i)
else:
for x in range(temp-1):
lsit.remove(i)
print lsit
Output:
['a','c']
Just iterate through the list. If an element does not exist in the result, add it to the set. Or if there does already have one in the set, cancel out those two element.
The code is simple:
def double(l):
"""
returns all duplicates in the list as a set
"""
res = set()
for x in l:
if x in res:
res.remove(x)
else:
res.add(x)
return res
input = ['a','a','a','b','b','c','d','d','d','d']
print double(input)
Related
I have lists that are empty and filled in the data. I am trying to the store last element of the list into a variable. If there are elements in the list, it is working fine. However, when I pass in a empty [] list, I get error like: IndexError: list index out of range. Which syntax I should be using for []?
ids = [
'abc123',
'ab233',
'23231ad',
'a23r2d23'
]
ids = []
# I tried these for empty
final = [ids if [] else ids[-1]] #error
# final = [ids if ids == None else ids == ids[-1]] # error
# final = [ids if ids == [] else ids == ids[-1]] # gives [[]] instead of []
print(final)
Basically, if an empty list is in ids, I need it to give []. If there are elements, then give the last element, which is working.
Here is one way to do this:
final = ids[-1] if ids else None
(Replace None with the value you'd like final to take when the list is empty.)
you can check for a empty list by below expression.
data = []
if data: #this returns False for empty list
print("list is empty")
else:
print("list has elements")
so what you can do is.
final = data[-1] if data else []
print(final)
final = ids[-1] if len(ids) > 0 else []
This will handle the immediate problem. Please work through class materials or tutorials a little more for individual techniques. For instance, your phrase ids if [] doesnt' do what you (currently) seem to think: it does not check ids against the empty list -- all it does is to see whether that empty list is "truthy", and an empty list evaluates to False.
You are getting the error because you wont be able to select the last item if the list is empty and it will rightfully throw an IndexError.
Try this example
ids = [[i for i in range(10)] for x in range(3)]
ids.append([])
last_if_not_empty = [i[-1] for i in ids if i]
Here you filter out the non-empty list by if i which is the condition to select not empty lists. From there you can pick out the last elements of the lists.
a = list[-1] if not len(list)==0 else 0
In the following Function I tried to simulate an wide search alike function.
First Function 'Hello' by giving and search name = V, is supposed to give you all the tuple partners that it has in a list.
Second Function is supposed to have two lists, first for already visited and the ones who are still in a list i.e. que.
With two for loops I went through newly generated list by the word given in, so that I can use those words to generate further tuple partners, that aren't in besucht list. After the usage has been done the item will be deleted from the 'Liste' and appended to besucht.
Question: It doesn't work as I intended and I don't understand why
V = {'CGN', 'FRA','KEL','MUC','PAD','SFX','STR','TXL'}
E = {('SFX','TXL'),('FRA','CGN'),('FRA','MUC'),('FRA','STR'),('CGN','FRA'),('STR','FRA'),('TXL','SFX'),('CGN','PAD'),('PAD','KEL'),('MUC','KEL'),('KEL','STR') }
S = {('A','B'),('A','B')}
def Hallo(V,E):
Total = []
Que = []
for i in E:
for j in i:
if j == V:
Total.append(i)
for i in Total:
for a in i:
if a != V:
if a not in Que:
Que.append(a)
return Que
def Durchsuchen(V,E):
besucht = []
Liste = []
Liste.append(Hallo(V,E))
besucht.append(V)
while len(Liste) !=0
for i in Liste:
if i not in besucht:
besucht.append(i)
Liste.remove(i)
Liste.append(Hallo(i,E))
print Liste
print besucht
print Durchsuchen('FRA',E)
What is supposed to do? It's supposed to give you all the possibilities i.e if you give in 'FRA' it will generate [MUC, STR, CGN] since MUC and etc is within this list this should also give you KEL for example. I.E all the possible options that are out there.
Well your problem is that you use remove() on a list inside a loop that iterates through it - which should never be done. This changes the logic that the loop relies on and causes your unexpected results.
The first function works just fine so I'd suggest you to use it once on your first word 'FRA', and then use it on every word that pairs with it - [MUC, STR, CGN] in your example. Then use the following code on the lists that you got:
newList = []
for ls in listOfLists:
for word in ls:
if word not in newList:
newList.append(word)
I looked up and found a close example, but the answer found in this link: Remove adjacent duplicate elements from a list won't run the test cases for this problem. So this is all I have so far:
def remove_dups(thelist):
"""Returns: a COPY of thelist with adjacent duplicates removed.
Example: for thelist = [1,2,2,3,3,3,4,5,1,1,1],
the answer is [1,2,3,4,5,1]
Precondition: thelist is a list of ints"""
i = 1
if len(thelist) == 0:
return []
elif len(thelist) == 1:
return thelist
elif thelist[i] == thelist[i-1]:
del thelist[i]
return remove_dups(thelist[i:])
def test_remove_dups():
assert_equals([], remove_dups([]))
assert_equals([3], remove_dups([3,3]))
assert_equals([4], remove_dups([4]))
assert_equals([5], remove_dups([5, 5]))
assert_equals([1,2,3,4,5,1], remove_dups([1,2,2,3,3,3,4,5,1,1,1]))
# test for whether the code is really returning a copy of the original list
mylist = [3]
assert_equals(False, mylist is remove_dups(mylist))
EDIT while I do understand that the accepted answer linked above using itertools.groupby would work, I think it wouldn't teach me what's wrong with my code & and would defeat the purpose of the exercise if I imported grouby from itertools.
from itertools import groupby
def remove_dups(lst):
return [k for k,items in groupby(lst)]
If you really want a recursive solution, I would suggest something like
def remove_dups(lst):
if lst:
firstval = lst[0]
# find lowest index of val != firstval
for index, value in enumerate(lst):
if value != firstval:
return [firstval] + remove_dups(lst[index:])
# no such value found
return [firstval]
else:
# empty list
return []
Your assertion fails, because in
return thelist
you are returning the same list, and not a copy as specified in the comments.
Try:
return thelist[:]
When using recursion with list it is most of the time a problem of returning a sub-list or part of that list. Which makes the termination case testing for an empty list. And then you have the two cases:
The current value is different from the last one we saw so we want to keep it
The current value is the same as the last one we saw so we discard it and keep iterating on the "rest" of the values.
Which translate in this code:
l = [1,2,2,3,3,3,4,5,1,1,1]
def dedup(values, uniq):
# The list of values is empty our work here is done
if not values:
return uniq
# We add a value in 'uniq' for two reasons:
# 1/ it is empty and we need to start somewhere
# 2/ it is different from the last value that was added
if not uniq or values[0] != uniq[-1]:
uniq.append(values.pop(0))
return dedup(values, uniq)
# We just added the exact same value so we remove it from 'values' and
# move to the next iteration
return dedup(values[1:], uniq)
print dedup(l, []) # output: [1, 2, 3, 4, 5, 1]
problem is with your return statement,
you are returning
return remove_dups(thelist[i:])
output will be always last n single element of list
like for above soon,
print remove_dups([1,2,2,3,3,3,4,5,1,1,1])
>>> [1] #as your desired is [1,2,3,4,5,1]
which returns finally a list of single element as it don't consider Oth element.
here is recursive solution.
def remove_dups(lst):
if len(lst)>1:
if lst[0] != lst[1]:
return [lst[0]] + remove_dups(lst[1:])
del lst[1]
return remove_dups(lst)
else:
return lst
I am trying to remove duplicates from the list of unicode string without changing the order(So, I don't want to use set) of elements appeared in it.
Program:
result = [u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
result.reverse()
for e in result:
count_e = result.count(e)
if count_e > 1:
for i in range(0, count_e - 1):
result.remove(e)
result.reverse()
print result
Output:
[u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
Expected Output:
[u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
So, Is there any way of doing it simple as possible.
You actually don't have duplicates in your list. One time you have http://catb.org while another time you have http://www.catb.org.
You'll have to figure a way to determine whether the URL has www. in front or not.
You can create a new list and add items to it if they're not already in it.
result = [ /some list items/]
uniq = []
for item in result:
if item not in uniq:
uniq.append(item)
You could use a set and then sort it by the original index:
sorted(set(result), key=result.index)
This works because index returns the first occurrence (so it keeps them in order according to first appearance in the original list)
I also notice that one of the strings in your original isn't a unicode string. So you might want to do something like:
u = [unicode(s) for s in result]
return sorted(set(u), key=u.index)
EDIT: 'http://google.com' and 'http://www.google.com' are not string duplicates. If you want to treat them as such, you could do something like:
def remove_www(s):
s = unicode(s)
prefix = u'http://'
suffix = s[11:] if s.startswith(u'http://www') else s[7:]
return prefix+suffix
And then replace the earlier code with
u = [remove_www(s) for s in result]
return sorted(set(u), key=u.index)
Here is a method that modifies result in place:
result = [u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', 'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
seen = set()
i = 0
while i < len(result):
if result[i] not in seen:
seen.add(result[i])
i += 1
else:
del result[i]
This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
For a string such as '12233322155552', by removing the duplicates, I can get '1235'.
But what I want to keep is '1232152', only removing the consecutive duplicates.
import re
# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')
# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')
You can use itertools, here is the one liner
>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'
Microsoft / Amazon job interview type of question:
This is the pseudocode, the actual code is left as exercise.
for each char in the string do:
if the current char is equal to the next char:
delete next char
else
continue
return string
As a more high level, try (not actually the implementation):
for s in string:
if s == s+1: ## check until the end of the string
delete s+1
Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:
itertools.groupby(iterable[, key])
Make an iterator that returns consecutive keys and groups from
the iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged. Generally, the iterable
needs to already be sorted on the same key function.
So since strings are iterable, what you could do is:
use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together
which can all be done in one clean line..
First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).
M first approach would be:
foo = '12233322155552'
bar = ''
for chr in foo:
if bar == '' or chr != bar[len(bar)-1]:
bar += chr
or, using the itertools hint from above:
''.join([ k[0] for k in groupby(a) ])
+1 for groupby. Off the cuff, something like:
from itertools import groupby
def remove_dupes(arg):
# create generator of distinct characters, ignore grouper objects
unique = (i[0] for i in groupby(arg))
return ''.join(unique)
Cooks for me in Python 2.7.2
number = '12233322155552'
temp_list = []
for item in number:
if len(temp_list) == 0:
temp_list.append(item)
elif len(temp_list) > 0:
if temp_list[-1] != item:
temp_list.append(item)
print(''.join(temp_list))
This would be a way:
def fix(a):
list = []
for element in a:
# fill the list if the list is empty
if len(list) == 0:list.append(element)
# check with the last element of the list
if list[-1] != element: list.append(element)
print(''.join(list))
a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi
t = '12233322155552'
for i in t:
dup = i+i
t = re.sub(dup, i, t)
You can get final output as 1232152