Remove all values within one list from another list? [duplicate] - python

This question already has answers here:
Remove all the elements that occur in one list from another
(13 answers)
Closed 6 years ago.
I am looking for a way to remove all values within a list from another list.
Something like this:
a = range(1,10)
a.remove([2,3,7])
print a
a = [1,4,5,6,8,9]

>>> a = range(1, 10)
>>> [x for x in a if x not in [2, 3, 7]]
[1, 4, 5, 6, 8, 9]

I was looking for fast way to do the subject, so I made some experiments with suggested ways. And I was surprised by results, so I want to share it with you.
Experiments were done using pythonbenchmark tool and with
a = range(1,50000) # Source list
b = range(1,15000) # Items to remove
Results:
def comprehension(a, b):
return [x for x in a if x not in b]
5 tries, average time 12.8 sec
def filter_function(a, b):
return filter(lambda x: x not in b, a)
5 tries, average time 12.6 sec
def modification(a,b):
for x in b:
try:
a.remove(x)
except ValueError:
pass
return a
5 tries, average time 0.27 sec
def set_approach(a,b):
return list(set(a)-set(b))
5 tries, average time 0.0057 sec
Also I made another measurement with bigger inputs size for the last two functions
a = range(1,500000)
b = range(1,100000)
And the results:
For modification (remove method) - average time is 252 seconds
For set approach - average time is 0.75 seconds
So you can see that approach with sets is significantly faster than others. Yes, it doesn't keep similar items, but if you don't need it - it's for you.
And there is almost no difference between list comprehension and using filter function. Using 'remove' is ~50 times faster, but it modifies source list.
And the best choice is using sets - it's more than 1000 times faster than list comprehension!

If you don't have repeated values, you could use set difference.
x = set(range(10))
y = x - set([2, 3, 7])
# y = set([0, 1, 4, 5, 6, 8, 9])
and then convert back to list, if needed.

a = range(1,10)
itemsToRemove = set([2, 3, 7])
b = filter(lambda x: x not in itemsToRemove, a)
or
b = [x for x in a if x not in itemsToRemove]
Don't create the set inside the lambda or inside the comprehension. If you do, it'll be recreated on every iteration, defeating the point of using a set at all.

The simplest way is
>>> a = range(1, 10)
>>> for x in [2, 3, 7]:
... a.remove(x)
...
>>> a
[1, 4, 5, 6, 8, 9]
One possible problem here is that each time you call remove(), all the items are shuffled down the list to fill the hole. So if a grows very large this will end up being quite slow.
This way builds a brand new list. The advantage is that we avoid all the shuffling of the first approach
>>> removeset = set([2, 3, 7])
>>> a = [x for x in a if x not in removeset]
If you want to modify a in place, just one small change is required
>>> removeset = set([2, 3, 7])
>>> a[:] = [x for x in a if x not in removeset]

Others have suggested ways to make newlist after filtering e.g.
newl = [x for x in l if x not in [2,3,7]]
or
newl = filter(lambda x: x not in [2,3,7], l)
but from your question it looks you want in-place modification for that you can do this, this will also be much much faster if original list is long and items to be removed less
l = range(1,10)
for o in set([2,3,7,11]):
try:
l.remove(o)
except ValueError:
pass
print l
output:
[1, 4, 5, 6, 8, 9]
I am checking for ValueError exception so it works even if items are not in orginal list.
Also if you do not need in-place modification solution by S.Mark is simpler.

>>> a=range(1,10)
>>> for i in [2,3,7]: a.remove(i)
...
>>> a
[1, 4, 5, 6, 8, 9]
>>> a=range(1,10)
>>> b=map(a.remove,[2,3,7])
>>> a
[1, 4, 5, 6, 8, 9]

Related

Identifying certain elements which are not in one list in Python [duplicate]

I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]
Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y
Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.
if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]
That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])
For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.
If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]
The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts
Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.
We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]
Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>
The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y
from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))
Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once   xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences   xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences   xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.
This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))
list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.
def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes

list only stops once the element of the list is the number 7

I want to write a code that contains a sublist , that only stops once the element of the list is a certain number , for example 9.
I´ve already tried using different operators , if statements .
def sublist (list):
return [x for x in list if x <9]
[7,8,3,2,4,9,51]
the output for the list above should be :
[7,8,3,2,4]
List comprehensions really are for making mapping/filtering combinations. If the length depends on some previous state in the iteration, you're better off with a for-loop, it will be more readable. However, this is a use-case for itertools.takewhile. Here is a functional approach to this task, just for fun, some may even consider it readable:
>>> from itertools import takewhile
>>> from functools import partial
>>> import operator as op
>>> list(takewhile(partial(op.ne, 9), [7,8,3,2,4,9,51]))
[7, 8, 3, 2, 4]
You can use iter() builtin with sentinel value (official doc)
l = [7,8,3,2,4,9,51]
sublist = [*iter(lambda i=iter(l): next(i), 9)]
print(sublist)
Prints:
[7, 8, 3, 2, 4]
To begin with, it's not a good idea to use python keywords like list as variable.
The list comprehension [x for x in list if x < 9] filters out elements less than 9, but it won't stop when it encounters a 9, instead it will go over the entire list
Example:
li = [7,8,3,2,4,9,51,8,7]
print([x for x in li if x < 9])
The output is
[7, 8, 3, 2, 4, 8, 7]
To achieve what you are looking for, you want a for loop which breaks when it encounters a given element (9 in your case)
li = [7,8,3,2,4,9,51]
res = []
item = 9
#Iterate over the list
for x in li:
#If item is encountered, break the loop
if x == item:
break
#Append item to list
res.append(x)
print(res)
The output is
[7, 8, 3, 2, 4]

Code not performing as expected [For in cycle]

Why is this not working? Actual result is [] for any entry.
def non_unique(ints):
"""
Return a list consisting of only the non-unique elements from the list lst.
You are given a non-empty list of integers (ints). You should return a
list consisting of only the non-unique elements in this list. To do so
you will need to remove all unique elements (elements which are
contained in a given list only once). When solving this task, do not
change the order of the list.
>>> non_unique([1, 2, 3, 1, 3])
[1, 3, 1, 3]
>>> non_unique([1, 2, 3, 4, 5])
[]
>>> non_unique([5, 5, 5, 5, 5])
[5, 5, 5, 5, 5]
>>> non_unique([10, 9, 10, 10, 9, 8])
[10, 9, 10, 10, 9]
"""
new_list = []
for x in ints:
for a in ints:
if ints.index(x) != ints.index(a):
if x == a:
new_list.append(a)
return new_list
Working code (not from me):
result = []
for c in ints:
if ints.count(c) > 1:
result.append(c)
return result
list.index will return the first index that contains the input parameter, so if x==a is true, then ints.index(x) will always equal ints.index(a). If you want to keep your same code structure, I'd recommend keeping track of the indicies within the loop using enumerate as in:
for x_ind, x in enumerate(ints):
for a_ind, a in enumerate(ints):
if x_ind != a_ind:
if x == a:
new_list.append(a)
Although, for what it's worth, I think your example of working code is a better way of accomplishing the same task.
Although the example of working code is correct, if suffers from quadratic complexity which makes it slow for larger lists. I'd prefer s.th. like this:
from nltk.probability import FreqDist
def non_unique(ints):
fd = FreqDist(ints)
return [x for x in ints if fd[x] > 1]
It precomputes a frequency distribution in the first step, and then selects all non-unique elements. Both steps have a O(n) performance characteristic.

concatenate an arbitrary number of lists in a function in Python

I hope to write the join_lists function to take an arbitrary number of lists and concatenate them. For example, if the inputs are
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
then we I call print join_lists(m, n, o), it will return [1, 2, 3, 4, 5, 6, 7, 8, 9]. I realize I should use *args as the argument in join_lists, but not sure how to concatenate an arbitrary number of lists. Thanks.
Although you can use something which invokes __add__ sequentially, that is very much the wrong thing (for starters you end up creating as many new lists as there are lists in your input, which ends up having quadratic complexity).
The standard tool is itertools.chain:
def concatenate(*lists):
return itertools.chain(*lists)
or
def concatenate(*lists):
return itertools.chain.from_iterable(lists)
This will return a generator which yields each element of the lists in sequence. If you need it as a list, use list: list(itertools.chain.from_iterable(lists))
If you insist on doing this "by hand", then use extend:
def concatenate(*lists):
newlist = []
for l in lists: newlist.extend(l)
return newlist
Actually, don't use extend like that - it's still inefficient, because it has to keep extending the original list. The "right" way (it's still really the wrong way):
def concatenate(*lists):
lengths = map(len,lists)
newlen = sum(lengths)
newlist = [None]*newlen
start = 0
end = 0
for l,n in zip(lists,lengths):
end+=n
newlist[start:end] = list
start+=n
return newlist
http://ideone.com/Mi3UyL
You'll note that this still ends up doing as many copy operations as there are total slots in the lists. So, this isn't any better than using list(chain.from_iterable(lists)), and is probably worse, because list can make use of optimisations at the C level.
Finally, here's a version using extend (suboptimal) in one line, using reduce:
concatenate = lambda *lists: reduce((lambda a,b: a.extend(b) or a),lists,[])
One way would be this (using reduce) because I currently feel functional:
import operator
from functools import reduce
def concatenate(*lists):
return reduce(operator.add, lists)
However, a better functional method is given in Marcin's answer:
from itertools import chain
def concatenate(*lists):
return chain(*lists)
although you might as well use itertools.chain(*iterable_of_lists) directly.
A procedural way:
def concatenate(*lists):
new_list = []
for i in lists:
new_list.extend(i)
return new_list
A golfed version: j=lambda*x:sum(x,[]) (do not actually use this).
You can use sum() with an empty list as the start argument:
def join_lists(*lists):
return sum(lists, [])
For example:
>>> join_lists([1, 2, 3], [4, 5, 6])
[1, 2, 3, 4, 5, 6]
Another way:
>>> m = [1, 2, 3]
>>> n = [4, 5, 6]
>>> o = [7, 8, 9]
>>> p = []
>>> for (i, j, k) in (m, n, o):
... p.append(i)
... p.append(j)
... p.append(k)
...
>>> p
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
This seems to work just fine:
def join_lists(*args):
output = []
for lst in args:
output += lst
return output
It returns a new list with all the items of the previous lists. Is using + not appropriate for this kind of list processing?
Or you could be logical instead, making a variable (here 'z') equal to the first list passed to the 'join_lists' function
then assigning the items in the list (not the list itself) to a new list to which you'll then be able add the elements of the other lists:
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
def join_lists(*x):
z = [x[0]]
for i in range(len(z)):
new_list = z[i]
for item in x:
if item != z:
new_list += (item)
return new_list
then
print (join_lists(m, n ,o)
would output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Average List from a List of Lists - Is there a more efficient way?

I have a list of lists named 'run'. I am creating an average of those lists using this section of my code:
ave = [0 for t in range(s)]
for t in range(s):
z = 0
for i in range(l):
z = z + run[i][t]
#Converted values to a string for output purposes
# Added \n to output
ave[t]= ((str(z / l) + "\n"))
Much to my surprise, this code worked the first time that I wrote it. I'm now planning on working with much larger lists and many more values, and it's possible that performance issues will come into play. Is this method of writing an average inefficient in its use of computational resources, and how could I write code that was more efficient?
List comprehensions may be more efficient.
>>> run = [[1, 2, 3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13]]
>>> [sum(elem)/len(elem) for elem in zip(*run)]
[5.666666666666667, 6.666666666666667, 7.666666666666667, 8.666666666666666]
Alternatively, you could try map()
>>> list(map(lambda x: sum(x)/len(x), zip(*run)))
[5.666666666666667, 6.666666666666667, 7.666666666666667, 8.666666666666666]
You can improve efficiency by having Python do more of the work for you with efficient built-in functions and list comprehensions:
averages = [sum(items) / len(run) for items in zip(*run)]
import numpy as np
ave = [np.avg(col) for col in zip(*run)]
OR
ave = [sum(col)/len(col) for col in zip(*run)]
I entered this question looking for the following, which is dumb but as it does not use zip it does not ignore any value.
If you have a numerical list of lists with different lengths and want to find the average list
import numpy as np
def my_mean(list_of_lists):
maxlen = max([len(l) for l in list_of_lists])
for i in range(len(list_of_lists)):
while len(list_of_lists[i]) < maxlen:
list_of_lists[i].append(np.nan)
return np.nanmean(list_of_lists, axis=0)
aaa = [1, 2, 3]
bbb = [1, 2, 3, 5]
ccc = [4, 5, 6, 5, 10]
lofl = [aaa, bbb, ccc]
print(my_mean(lofl))
gives
[ 2. 3. 4. 5. 10.]

Categories

Resources