I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]
Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y
Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.
if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]
That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])
For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.
If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]
The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts
Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.
We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]
Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>
The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y
from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))
Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.
This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))
list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.
def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes
I want to write a code that contains a sublist , that only stops once the element of the list is a certain number , for example 9.
I´ve already tried using different operators , if statements .
def sublist (list):
return [x for x in list if x <9]
[7,8,3,2,4,9,51]
the output for the list above should be :
[7,8,3,2,4]
List comprehensions really are for making mapping/filtering combinations. If the length depends on some previous state in the iteration, you're better off with a for-loop, it will be more readable. However, this is a use-case for itertools.takewhile. Here is a functional approach to this task, just for fun, some may even consider it readable:
>>> from itertools import takewhile
>>> from functools import partial
>>> import operator as op
>>> list(takewhile(partial(op.ne, 9), [7,8,3,2,4,9,51]))
[7, 8, 3, 2, 4]
You can use iter() builtin with sentinel value (official doc)
l = [7,8,3,2,4,9,51]
sublist = [*iter(lambda i=iter(l): next(i), 9)]
print(sublist)
Prints:
[7, 8, 3, 2, 4]
To begin with, it's not a good idea to use python keywords like list as variable.
The list comprehension [x for x in list if x < 9] filters out elements less than 9, but it won't stop when it encounters a 9, instead it will go over the entire list
Example:
li = [7,8,3,2,4,9,51,8,7]
print([x for x in li if x < 9])
The output is
[7, 8, 3, 2, 4, 8, 7]
To achieve what you are looking for, you want a for loop which breaks when it encounters a given element (9 in your case)
li = [7,8,3,2,4,9,51]
res = []
item = 9
#Iterate over the list
for x in li:
#If item is encountered, break the loop
if x == item:
break
#Append item to list
res.append(x)
print(res)
The output is
[7, 8, 3, 2, 4]
Why is this not working? Actual result is [] for any entry.
def non_unique(ints):
"""
Return a list consisting of only the non-unique elements from the list lst.
You are given a non-empty list of integers (ints). You should return a
list consisting of only the non-unique elements in this list. To do so
you will need to remove all unique elements (elements which are
contained in a given list only once). When solving this task, do not
change the order of the list.
>>> non_unique([1, 2, 3, 1, 3])
[1, 3, 1, 3]
>>> non_unique([1, 2, 3, 4, 5])
[]
>>> non_unique([5, 5, 5, 5, 5])
[5, 5, 5, 5, 5]
>>> non_unique([10, 9, 10, 10, 9, 8])
[10, 9, 10, 10, 9]
"""
new_list = []
for x in ints:
for a in ints:
if ints.index(x) != ints.index(a):
if x == a:
new_list.append(a)
return new_list
Working code (not from me):
result = []
for c in ints:
if ints.count(c) > 1:
result.append(c)
return result
list.index will return the first index that contains the input parameter, so if x==a is true, then ints.index(x) will always equal ints.index(a). If you want to keep your same code structure, I'd recommend keeping track of the indicies within the loop using enumerate as in:
for x_ind, x in enumerate(ints):
for a_ind, a in enumerate(ints):
if x_ind != a_ind:
if x == a:
new_list.append(a)
Although, for what it's worth, I think your example of working code is a better way of accomplishing the same task.
Although the example of working code is correct, if suffers from quadratic complexity which makes it slow for larger lists. I'd prefer s.th. like this:
from nltk.probability import FreqDist
def non_unique(ints):
fd = FreqDist(ints)
return [x for x in ints if fd[x] > 1]
It precomputes a frequency distribution in the first step, and then selects all non-unique elements. Both steps have a O(n) performance characteristic.
I hope to write the join_lists function to take an arbitrary number of lists and concatenate them. For example, if the inputs are
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
then we I call print join_lists(m, n, o), it will return [1, 2, 3, 4, 5, 6, 7, 8, 9]. I realize I should use *args as the argument in join_lists, but not sure how to concatenate an arbitrary number of lists. Thanks.
Although you can use something which invokes __add__ sequentially, that is very much the wrong thing (for starters you end up creating as many new lists as there are lists in your input, which ends up having quadratic complexity).
The standard tool is itertools.chain:
def concatenate(*lists):
return itertools.chain(*lists)
or
def concatenate(*lists):
return itertools.chain.from_iterable(lists)
This will return a generator which yields each element of the lists in sequence. If you need it as a list, use list: list(itertools.chain.from_iterable(lists))
If you insist on doing this "by hand", then use extend:
def concatenate(*lists):
newlist = []
for l in lists: newlist.extend(l)
return newlist
Actually, don't use extend like that - it's still inefficient, because it has to keep extending the original list. The "right" way (it's still really the wrong way):
def concatenate(*lists):
lengths = map(len,lists)
newlen = sum(lengths)
newlist = [None]*newlen
start = 0
end = 0
for l,n in zip(lists,lengths):
end+=n
newlist[start:end] = list
start+=n
return newlist
http://ideone.com/Mi3UyL
You'll note that this still ends up doing as many copy operations as there are total slots in the lists. So, this isn't any better than using list(chain.from_iterable(lists)), and is probably worse, because list can make use of optimisations at the C level.
Finally, here's a version using extend (suboptimal) in one line, using reduce:
concatenate = lambda *lists: reduce((lambda a,b: a.extend(b) or a),lists,[])
One way would be this (using reduce) because I currently feel functional:
import operator
from functools import reduce
def concatenate(*lists):
return reduce(operator.add, lists)
However, a better functional method is given in Marcin's answer:
from itertools import chain
def concatenate(*lists):
return chain(*lists)
although you might as well use itertools.chain(*iterable_of_lists) directly.
A procedural way:
def concatenate(*lists):
new_list = []
for i in lists:
new_list.extend(i)
return new_list
A golfed version: j=lambda*x:sum(x,[]) (do not actually use this).
You can use sum() with an empty list as the start argument:
def join_lists(*lists):
return sum(lists, [])
For example:
>>> join_lists([1, 2, 3], [4, 5, 6])
[1, 2, 3, 4, 5, 6]
Another way:
>>> m = [1, 2, 3]
>>> n = [4, 5, 6]
>>> o = [7, 8, 9]
>>> p = []
>>> for (i, j, k) in (m, n, o):
... p.append(i)
... p.append(j)
... p.append(k)
...
>>> p
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
This seems to work just fine:
def join_lists(*args):
output = []
for lst in args:
output += lst
return output
It returns a new list with all the items of the previous lists. Is using + not appropriate for this kind of list processing?
Or you could be logical instead, making a variable (here 'z') equal to the first list passed to the 'join_lists' function
then assigning the items in the list (not the list itself) to a new list to which you'll then be able add the elements of the other lists:
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
def join_lists(*x):
z = [x[0]]
for i in range(len(z)):
new_list = z[i]
for item in x:
if item != z:
new_list += (item)
return new_list
then
print (join_lists(m, n ,o)
would output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
I have a list of lists named 'run'. I am creating an average of those lists using this section of my code:
ave = [0 for t in range(s)]
for t in range(s):
z = 0
for i in range(l):
z = z + run[i][t]
#Converted values to a string for output purposes
# Added \n to output
ave[t]= ((str(z / l) + "\n"))
Much to my surprise, this code worked the first time that I wrote it. I'm now planning on working with much larger lists and many more values, and it's possible that performance issues will come into play. Is this method of writing an average inefficient in its use of computational resources, and how could I write code that was more efficient?
List comprehensions may be more efficient.
>>> run = [[1, 2, 3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13]]
>>> [sum(elem)/len(elem) for elem in zip(*run)]
[5.666666666666667, 6.666666666666667, 7.666666666666667, 8.666666666666666]
Alternatively, you could try map()
>>> list(map(lambda x: sum(x)/len(x), zip(*run)))
[5.666666666666667, 6.666666666666667, 7.666666666666667, 8.666666666666666]
You can improve efficiency by having Python do more of the work for you with efficient built-in functions and list comprehensions:
averages = [sum(items) / len(run) for items in zip(*run)]
import numpy as np
ave = [np.avg(col) for col in zip(*run)]
OR
ave = [sum(col)/len(col) for col in zip(*run)]
I entered this question looking for the following, which is dumb but as it does not use zip it does not ignore any value.
If you have a numerical list of lists with different lengths and want to find the average list
import numpy as np
def my_mean(list_of_lists):
maxlen = max([len(l) for l in list_of_lists])
for i in range(len(list_of_lists)):
while len(list_of_lists[i]) < maxlen:
list_of_lists[i].append(np.nan)
return np.nanmean(list_of_lists, axis=0)
aaa = [1, 2, 3]
bbb = [1, 2, 3, 5]
ccc = [4, 5, 6, 5, 10]
lofl = [aaa, bbb, ccc]
print(my_mean(lofl))
gives
[ 2. 3. 4. 5. 10.]