Remove duplicates with a different equality test in Python

Remove duplicates with a different equality test in Python - python

I'm looking for a Python function similar to nubBy in Haskell, which removes duplicate but with a different equality test.
The function would take the equality test and the list as parameters, and would return the list of elements with no duplicates.
Example:
In [1]: remove(lambda x, y: x+y == 12, [2, 3, 6, 9, 10])
Out[1]: [2,3,6]
For example, here (2 and 10) and (9 and 3) are duplicates. I don't care if the output is [10, 9, 6] or [2, 3, 6].
Is there an equivalent built-in function in Python? If not, what is the best way to efficiently implement it?

There is no built-in method (as the use case is rather esoteric), but you can easily write one:
def removeDups(duptest, iterable):
res = []
for e in iterable:
if not any(duptest(e, r) for r in res):
res.append(e)
return res
Now, in the console:
>>> removeDups(lambda x,y: x+y == 10, [2,3,5,7,8])
[2, 3, 5]
>>> removeDups(lambda x,y: x+y == 10, [2,3,6,7,8])
[2, 3, 6]
>>> removeDups(lambda x, y: x+y == 12, [2, 3, 6, 9, 10])
[2, 3, 6]

This remove function will allow you to specify any pairwise equality function. It will keep the last of each set of duplicates.
values = [2,3,5,7,8]
def addstoten(item, other):
return item + other == 10
def remove(eq, values):
values = tuple(values)
for index, item in enumerate(values):
if not any(eq(item, other) for other in values[index + 1:]):
yield item
print list(remove(addstoten, values))

Related

Identifying certain elements which are not in one list in Python [duplicate]

I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]

Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y

Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.

if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]

That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])

For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.

If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]

The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts

Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.

We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]

Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>

The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y

from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))

Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once   xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences   xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences   xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.

This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))

list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.

def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes

How to calculate a cumulative product of a list using list comprehension

I'm trying my hand at converting the following loop to a comprehension.
Problem is given an input_list = [1, 2, 3, 4, 5]
return a list with each element as multiple of all elements till that index starting from left to right.
Hence return list would be [1, 2, 6, 24, 120].
The normal loop I have (and it's working):
l2r = list()
for i in range(lst_len):
if i == 0:
l2r.append(lst_num[i])
else:
l2r.append(lst_num[i] * l2r[i-1])

Python 3.8+ solution:
:= Assignment Expressions
lst = [1, 2, 3, 4, 5]
curr = 1
out = [(curr:=curr*v) for v in lst]
print(out)
Prints:
[1, 2, 6, 24, 120]
Other solution (with itertools.accumulate):
from itertools import accumulate
out = [*accumulate(lst, lambda a, b: a*b)]
print(out)

Well, you could do it like this(a):
import math
orig = [1, 2, 3, 4, 5]
print([math.prod(orig[:pos]) for pos in range(1, len(orig) + 1)])
This generates what you wanted:
[1, 2, 6, 24, 120]
and basically works by running a counter from 1 to the size of the list, at each point working out the product of all terms before that position:
pos values prod
=== ========= ====
1 1 1
2 1,2 2
3 1,2,3 6
4 1,2,3,4 24
5 1,2,3,4,5 120
(a) Just keep in mind that's less efficient at runtime since it calculates the full product for every single element (rather than caching the most recently obtained product). You can avoid that while still making your code more compact (often the reason for using list comprehensions), with something like:
def listToListOfProds(orig):
curr = 1
newList = []
for item in orig:
curr *= item
newList.append(curr)
return newList
print(listToListOfProds([1, 2, 3, 4, 5]))
That's obviously not a list comprehension but still has the advantages in that it doesn't clutter up your code where you need to calculate it.
People seem to often discount the function solution in Python, simply because the language is so expressive and allows things like list comprehensions to do a lot of work in minimal source code.
But, other than the function itself, this solution has the same advantages of a one-line list comprehension in that it, well, takes up one line :-)
In addition, you're free to change the function whenever you want (if you find a better way in a later Python version, for example), without having to change all the different places in the code that call it.

This should not be made into a list comprehension if one iteration depends on the state of an earlier one!
If the goal is a one-liner, then there are lots of solutions with #AndrejKesely's itertools.accumulate() being an excellent one (+1). Here's mine that abuses functools.reduce():
from functools import reduce
lst = [1, 2, 3, 4, 5]
print(reduce(lambda x, y: x + [x[-1] * y], lst, [lst.pop(0)]))
But as far as list comprehensions go, #AndrejKesely's assignment-expression-based solution is the wrong thing to do (-1). Here's a more self contained comprehension that doesn't leak into the surrounding scope:
lst = [1, 2, 3, 4, 5]
seq = [a.append(a[-1] * b) or a.pop(0) for a in [[lst.pop(0)]] for b in [*lst, 1]]
print(seq)
But it's still the wrong thing to do! This is based on a similar problem that also got upvoted for the wrong reasons.

A recursive function could help.
input_list = [ 1, 2, 3, 4, 5]
def cumprod(ls, i=None):
i = len(ls)-1 if i is None else i
if i == 0:
return 1
return ls[i] * cumprod(ls, i-1)
output_list = [cumprod(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
This method can be compressed in python3.8 using the walrus operator
input_list = [ 1, 2, 3, 4, 5]
def cumprod_inline(ls, i=None):
return 1 if (i := len(ls)-1 if i is None else i) == 0 else ls[i] * cumprod_inline(ls, i-1)
output_list = [cumprod_inline(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
Because you plan to use this in list comprehension, there's no need to provide a default for the i argument. This removes the need to check if i is None.
input_list = [ 1, 2, 3, 4, 5]
def cumprod_inline_nodefault(ls, i):
return 1 if i == 0 else ls[i] * cumprod_inline_nodefault(ls, i-1)
output_list = [cumprod_inline_nodefault(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
Finally, if you really wanted to keep it to a single , self-contained list comprehension line, you can follow the approach note here to use recursive lambda calls
input_list = [ 1, 2, 3, 4, 5]
output_list = [(lambda func, x, y: func(func,x,y))(lambda func, ls, i: 1 if i == 0 else ls[i] * func(func, ls, i-1),input_list,i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
It's entirely over-engineered, and barely legible, but hey! it works and its just for fun.

For your list, it might not be intentional that the numbers are consecutive, starting from 1. But for cases that that pattern is intentional, you can use the built in method, factorial():
from math import factorial
input_list = [1, 2, 3, 4, 5]
l2r = [factorial(i) for i in input_list]
print(l2r)
Output:
[1, 2, 6, 24, 120]

The package numpy has a number of fast implementations of list comprehensions built into it. To obtain, for example, a cumulative product:
>>> import numpy as np
>>> np.cumprod([1, 2, 3, 4, 5])
array([ 1, 2, 6, 24, 120])
The above returns a numpy array. If you are not familiar with numpy, you may prefer to obtain just a normal python list:
>>> list(np.cumprod([1, 2, 3, 4, 5]))
[1, 2, 6, 24, 120]

using itertools and operators:
from itertools import accumulate
import operator as op
ip_lst = [1,2,3,4,5]
print(list(accumulate(ip_lst, func=op.mul)))

Code not performing as expected [For in cycle]

Why is this not working? Actual result is [] for any entry.
def non_unique(ints):
"""
Return a list consisting of only the non-unique elements from the list lst.
You are given a non-empty list of integers (ints). You should return a
list consisting of only the non-unique elements in this list. To do so
you will need to remove all unique elements (elements which are
contained in a given list only once). When solving this task, do not
change the order of the list.
>>> non_unique([1, 2, 3, 1, 3])
[1, 3, 1, 3]
>>> non_unique([1, 2, 3, 4, 5])
[]
>>> non_unique([5, 5, 5, 5, 5])
[5, 5, 5, 5, 5]
>>> non_unique([10, 9, 10, 10, 9, 8])
[10, 9, 10, 10, 9]
"""
new_list = []
for x in ints:
for a in ints:
if ints.index(x) != ints.index(a):
if x == a:
new_list.append(a)
return new_list
Working code (not from me):
result = []
for c in ints:
if ints.count(c) > 1:
result.append(c)
return result

list.index will return the first index that contains the input parameter, so if x==a is true, then ints.index(x) will always equal ints.index(a). If you want to keep your same code structure, I'd recommend keeping track of the indicies within the loop using enumerate as in:
for x_ind, x in enumerate(ints):
for a_ind, a in enumerate(ints):
if x_ind != a_ind:
if x == a:
new_list.append(a)
Although, for what it's worth, I think your example of working code is a better way of accomplishing the same task.

Although the example of working code is correct, if suffers from quadratic complexity which makes it slow for larger lists. I'd prefer s.th. like this:
from nltk.probability import FreqDist
def non_unique(ints):
fd = FreqDist(ints)
return [x for x in ints if fd[x] > 1]
It precomputes a frequency distribution in the first step, and then selects all non-unique elements. Both steps have a O(n) performance characteristic.

Python function that returns values from list smaller than a number

My function needs to take in a list of integers and a certain integer and return the numbers in the list that are smaller than the specific integer. Any advice?
def smallerThanN(intList,intN):
y=0
newlist=[]
list1=intList
for x in intList:
if int(x) < int(intN):
print(intN)
y+=1
newlist.append(x)
return newlist

Use a list comprehension with an "if" filter to extract those values in the list less than the specified value:
def smaller_than(sequence, value):
return [item for item in sequence if item < value]
I recommend giving the variables more generic names because this code will work for any sequence regardless of the type of sequence's items (provided of course that comparisons are valid for the type in question).
>>> smaller_than([1,2,3,4,5,6,7,8], 5)
[1, 2, 3, 4]
>>> smaller_than('abcdefg', 'd')
['a', 'b', 'c']
>>> smaller_than(set([1.34, 33.12, 1.0, 11.72, 10]), 10)
[1.0, 1.34]
N.B. There is already a similar answer, however, I'd prefer to declare a function instead of binding a lambda expression.

integers_list = [4, 6, 1, 99, 45, 76, 12]
smallerThan = lambda x,y: [i for i in x if i<y]
print smallerThan(integers_list, 12)
Output:
[4, 6, 1]

def smallerThanN(intList, intN):
return [x for x in intList if x < intN]
>>> smallerThanN([1, 4, 10, 2, 7], 5)
[1, 4, 2]

concatenate an arbitrary number of lists in a function in Python

I hope to write the join_lists function to take an arbitrary number of lists and concatenate them. For example, if the inputs are
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
then we I call print join_lists(m, n, o), it will return [1, 2, 3, 4, 5, 6, 7, 8, 9]. I realize I should use *args as the argument in join_lists, but not sure how to concatenate an arbitrary number of lists. Thanks.

Although you can use something which invokes __add__ sequentially, that is very much the wrong thing (for starters you end up creating as many new lists as there are lists in your input, which ends up having quadratic complexity).
The standard tool is itertools.chain:
def concatenate(*lists):
return itertools.chain(*lists)
or
def concatenate(*lists):
return itertools.chain.from_iterable(lists)
This will return a generator which yields each element of the lists in sequence. If you need it as a list, use list: list(itertools.chain.from_iterable(lists))
If you insist on doing this "by hand", then use extend:
def concatenate(*lists):
newlist = []
for l in lists: newlist.extend(l)
return newlist
Actually, don't use extend like that - it's still inefficient, because it has to keep extending the original list. The "right" way (it's still really the wrong way):
def concatenate(*lists):
lengths = map(len,lists)
newlen = sum(lengths)
newlist = [None]*newlen
start = 0
end = 0
for l,n in zip(lists,lengths):
end+=n
newlist[start:end] = list
start+=n
return newlist
http://ideone.com/Mi3UyL
You'll note that this still ends up doing as many copy operations as there are total slots in the lists. So, this isn't any better than using list(chain.from_iterable(lists)), and is probably worse, because list can make use of optimisations at the C level.
Finally, here's a version using extend (suboptimal) in one line, using reduce:
concatenate = lambda *lists: reduce((lambda a,b: a.extend(b) or a),lists,[])

One way would be this (using reduce) because I currently feel functional:
import operator
from functools import reduce
def concatenate(*lists):
return reduce(operator.add, lists)
However, a better functional method is given in Marcin's answer:
from itertools import chain
def concatenate(*lists):
return chain(*lists)
although you might as well use itertools.chain(*iterable_of_lists) directly.
A procedural way:
def concatenate(*lists):
new_list = []
for i in lists:
new_list.extend(i)
return new_list
A golfed version: j=lambda*x:sum(x,[]) (do not actually use this).

You can use sum() with an empty list as the start argument:
def join_lists(*lists):
return sum(lists, [])
For example:
>>> join_lists([1, 2, 3], [4, 5, 6])
[1, 2, 3, 4, 5, 6]

Another way:
>>> m = [1, 2, 3]
>>> n = [4, 5, 6]
>>> o = [7, 8, 9]
>>> p = []
>>> for (i, j, k) in (m, n, o):
... p.append(i)
... p.append(j)
... p.append(k)
...
>>> p
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

This seems to work just fine:
def join_lists(*args):
output = []
for lst in args:
output += lst
return output
It returns a new list with all the items of the previous lists. Is using + not appropriate for this kind of list processing?

Or you could be logical instead, making a variable (here 'z') equal to the first list passed to the 'join_lists' function
then assigning the items in the list (not the list itself) to a new list to which you'll then be able add the elements of the other lists:
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
def join_lists(*x):
z = [x[0]]
for i in range(len(z)):
new_list = z[i]
for item in x:
if item != z:
new_list += (item)
return new_list
then
print (join_lists(m, n ,o)
would output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates with a different equality test in Python - python

Related

Identifying certain elements which are not in one list in Python [duplicate]

How to calculate a cumulative product of a list using list comprehension

Code not performing as expected [For in cycle]

Python function that returns values from list smaller than a number

concatenate an arbitrary number of lists in a function in Python

Categories

Resources