Related
I want to remove duplicate items from lists in sublists on Python.
Exemple :
myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]
to
myList = [[1,2,3], [4,5,6], [7,8,9], [0]]
I tried with this code :
myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
nbr = []
for x in myList:
for i in x:
if i not in nbr:
nbr.append(i)
else:
x.remove(i)
But some duplicate items are not deleted.
Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]
I still have the number 4 that repeats.
You iterate over a list that you are also modifying:
...
for i in x:
...
x.remove(i)
That means that it may skip an element on next iteration.
The solution is to create a shallow copy of the list and iterate over that while modifying the original list:
...
for i in x.copy():
...
x.remove(i)
You can make this much faster by:
Using a set for repeated membership testing instead of a list, and
Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.
seen = set()
for i, sublist in enumerate(myList):
new_list = []
for x in sublist:
if x not in seen:
seen.add(x)
new_list.append(x)
myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]
If you want mild speed gains and moderate readability loss, you can also write this as:
seen = set()
for i, sublist in enumerate(myList):
myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]
Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.
Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.
An example:
list = [[2, 1, 2, 3, 4],
[0, 4, 5],
[1, 8, 9]]
So the first index inside a nested list decides which following numbers will be put into an unnested list.
[2, 1, 2, 3, 4] -> 2: so 1 and 2 gets picked up
[0, 4, 5] -> 0: no number gets picked up
[1, 8, 9] -> 1; number 8 gets picked up
Output would be:
[1, 2, 8]
This is what I have so far:
def nested_list(numbers):
if isinstance(numbers[0], list):
if numbers[0][0] > 0:
nested_list(numbers[0][1:numbers[0][0] + 1])
else:
numbers = list(numbers[0])
return numbers + nested_list(numbers[1:])
I try to get the list through recursion but something is wrong. What am I missing or could this be done even without recursion ?
You try using list comprehension with tuple unpacking here.
[val for idx, *rem in lst for val in rem[:idx]]
# [1, 2, 8]
NB This solution assumes you would always have a sub-list of size 1 or greater. We can filter out empty sub-lists using filter(None, lst)
list1=[[2, 1, 2, 3, 4],
[0, 4, 5],
[1, 8, 9]]
list2= []
for nested_list in list1:
for i in range(nested_list[0]):
list2.append(nested_list[i+1])
You can try List-comprehension:
>>> [sub[i] for sub in lst for i in range(1, sub[0]+1) ]
[1, 2, 8]
PS: The solution expects each sublist to be a non-empty list, else it will throw IndexError exception due to sub[0].
Another list comprehension
sum([x[1:x[0] + 1] for x in arr], [])
# [1, 2, 8]
Using builtin function map to apply the picking function, and using itertools.chain to flatten the resulting list of list:
def pick(l):
return l[1:1+l[0]]
ll = [[2, 1, 2, 3, 4], [0, 4, 5], [1, 8, 9]]
print( list(map(pick, ll)) )
# [[1, 2], [], [8]]
print( list(itertools.chain.from_iterable((map(pick, ll)))) )
# [1, 2, 8]
Or alternatively, with a list comprehension:
ll = [[2, 1, 2, 3, 4], [0, 4, 5], [1, 8, 9]]
print( [x for l in ll for x in l[1:1+l[0]]] )
# [1, 2, 8]
Two important notes:
I've renamed your list of lists ll rather than list. This is because list is already the name of the builtin class list in python. Shadowing the name of a builtin is very dangerous and can have unexpected consequences. I strongly advise you never to use the name of a builtin, when naming your own variables.
For both solutions above, the error-handling behaves the same: exception IndexError will be raised if one of the sublists is empty (because we need to access the first element to know how many elements to pick, so an error is raised if there is no first element). However, no exception will be raised if there are not enough elements in one of the sublists. For instance, if one of the sublists is [12, 3, 4], then both solutions above will silently pick the two elements 3 and 4, even though they were asked to pick 12 elements and not just 2. If you want an exception to be raised for this situation, you can modify function pick in the first solution:
def pick(l):
if len(l) == 0 or len(l) <= l[0]:
raise ValueError('in function pick: two few elements in sublist {}'.format(l))
return l[1:1+l[0]]
ll = [[2, 1, 2, 3, 4], [0, 4, 5], [1, 8, 9], [12, 3, 4]]
print( [x for l in ll for x in l[1:1+l[0]]] )
# [1, 2, 8, 3, 4]
print( [x for l in ll for x in pick(l)] )
# ValueError: in function pick: two few elements in sublist [12, 3, 4]
lst_a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
lst_b = [[1, 4, 7], [6, 5, 4], [9, 8, 7]]
My goal is to check all nested lists in lst_a if the first entry == first entry of any element in lst_b. If it's not than copy ONLY THAT sublist. In this example he wouldn't copy lst_a[0] but 1 and 2.
I tried to achieve my goal with list comprehension but it won't work.
zero = [x[0] for x in lst_a]
if zero not in lst_b:
# I don't know what to do here.
Creating a tuple or a dictionary isn't possible because the whole process is in a loop in which every second new data come in and I try to avoid copying duplicates to the list.
EDIT: lst_b should look like that after the whole process:
lst_b = [[1, 4, 7], [6, 5, 4], [9, 8, 7], [4, 5, 6], [7, 8, 9]]
Extract all the first elements from lst_b into a set so you can check membership efficiently. Then use a list comprehension to copy all the sublists in lst_a that match your criteria.
first_elements = {x[0] for x in lst_b}
result = [x for x in lst_a if x[0] not in first_elements]
It's a bit of a mouthful, but not too bad:
lst_b.extend(x for x in lst_a if not any(x[0] == y[0] for y in lst_b)
If you want a new list rather than modifying lst_b in place, then
lst_c = lst_b + [x for x in lst_a if not any(x[0] == y[0] for y in lst_b)]
In either case, we examine each sublist x in lst_a. any(x[0] == y[0] for y in lst_b) is True if the first element of the sublist is equal to the first element of any sublist in lst_b. If that's not true, then we'll include x in our final result.
Using any allows us to avoid checking against every sublist in lst_b when finding a single match is sufficient. (There are cases where this could be more efficient than first creating an entire set of first elements, as in #barmar's answer, but on average that approach is probably more efficient.)
Another way:
exclude=set(next(zip(*lst_b)))
lst_b+=[sl for sl in lst_a if sl[0] not in exclude]
>>> lst_b
[[1, 4, 7], [6, 5, 4], [9, 8, 7], [4, 5, 6], [7, 8, 9]]
Explanation:
zip(*lst_b) is a generator of the inverse of the matrix lst_b, The * expands the sub lists and this creates a generator that yields [(1, 6, 9), (4, 5, 8), (7, 4, 7)] in turn.
next(zip(*lst_b) we only need the first element of that inverse: (1,6,9)
set(next(zip(*lst_b))) only need the uniq elements of that so turn into a set. You get {1, 6, 9} (order does not matter)
[sl for sl in lst_a if sl[0] not in exclude] filter on that condition.
lst_b+= extend lst_b with the filtered elements.
Profit!
There may be more efficient ways of doing this, but this accomplishes the goal.
>>> [a for a in lst_a if a[0] not in [b[0] for b in lst_b]]
[[4, 5, 6], [7, 8, 9]]
I have 2 arrays:
arr1 = [a,b,c,d,e]
arr2 = [c,d,e]
I want to give array arr1 except arr2.
Mathematically, you're looking for a difference between two sets represented in lists. So how about using the Python set, which has a builtin difference operation (overloaded on the - operator)?
>>>
>>> arr = [1, 2, 3, 4, 5]
>>> arr2 = [3, 4, 9]
>>> set(arr) - set(arr2)
>>> sdiff = set(arr) - set(arr2)
>>> sdiff
set([1, 2, 5])
>>> list(sdiff)
[1, 2, 5]
>>>
It would be more convenient to have your information in a set in the first place, though. This operation suggests that a set better fits your application semantics than a list. On the other hand, if you may have duplicates in the lists, then set is not a good solution.
So you want the difference of two lists:
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list2 = [1, 2, 3, 4, 4, 6, 7, 8, 11, 77]
def list_difference(list1, list2):
"""uses list1 as the reference, returns list of items not in list2"""
diff_list = []
for item in list1:
if not item in list2:
diff_list.append(item)
return diff_list
print list_difference(list1, list2) # [5, 9, 10]
Or using list comprehension:
# simpler using list comprehension
diff_list = [item for item in list1 if item not in list2]
print diff_list # [5, 9, 10]
If you care about (1) preserving the order in which the items appear and (2) efficiency in the case where your lists are large, you probably want a hybrid of the two solutions already proposed.
list2_items = set(list2)
[x for x in list1 if x not in list2_items]
(Converting both to sets will lose the ordering. Using if x not in list2 in your list comprehension will give you in effect an iteration over both lists, which will be inefficient if list2 is large.)
If you know that list2 is not very long and don't need to save every possible microsecond, you should probably go with the simple list comprehension proposed by Flavius: it's short, simple and says exactly what you mean.
I have two lists that i need to combine where the second list has any duplicates of the first list ignored. .. A bit hard to explain, so let me show an example of what the code looks like, and what i want as a result.
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
# The result of combining the two lists should result in this list:
resulting_list = [1, 2, 2, 5, 7, 9]
You'll notice that the result has the first list, including its two "2" values, but the fact that second_list also has an additional 2 and 5 value is not added to the first list.
Normally for something like this i would use sets, but a set on first_list would purge the duplicate values it already has. So i'm simply wondering what the best/fastest way to achieve this desired combination.
Thanks.
You need to append to the first list those elements of the second list that aren't in the first - sets are the easiest way of determining which elements they are, like this:
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
in_first = set(first_list)
in_second = set(second_list)
in_second_but_not_in_first = in_second - in_first
result = first_list + list(in_second_but_not_in_first)
print(result) # Prints [1, 2, 2, 5, 9, 7]
Or if you prefer one-liners 8-)
print(first_list + list(set(second_list) - set(first_list)))
resulting_list = list(first_list)
resulting_list.extend(x for x in second_list if x not in resulting_list)
You can use sets:
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
resultList= list(set(first_list) | set(second_list))
print(resultList)
# Results in : resultList = [1,2,5,7,9]
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
print( set( first_list + second_list ) )
You can bring this down to one single line of code if you use numpy:
a = [1,2,3,4,5,6,7]
b = [2,4,7,8,9,10,11,12]
sorted(np.unique(a+b))
>>> [1,2,3,4,5,6,7,8,9,10,11,12]
Simplest to me is:
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
merged_list = list(set(first_list+second_list))
print(merged_list)
#prints [1, 2, 5, 7, 9]
resulting_list = first_list + [i for i in second_list if i not in first_list]
You can also combine RichieHindle's and Ned Batchelder's responses for an average-case O(m+n) algorithm that preserves order:
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
fs = set(first_list)
resulting_list = first_list + [x for x in second_list if x not in fs]
assert(resulting_list == [1, 2, 2, 5, 7, 9])
Note that x in s has a worst-case complexity of O(m), so the worst-case complexity of this code is still O(m*n).
Based on the recipe :
resulting_list = list(set().union(first_list, second_list))
you can use dict.fromkeys to return a list with no duplicates:
def mergeTwoListNoDuplicates(list1, list2):
"""
Merges two lists together without duplicates
:param list1:
:param list2:
:return:
"""
merged_list = list1 + list2
merged_list = list(dict.fromkeys(merged_list))
return merged_list
This might help
def union(a,b):
for e in b:
if e not in a:
a.append(e)
The union function merges the second list into first, with out duplicating an element of a, if it's already in a. Similar to set union operator. This function does not change b. If a=[1,2,3] b=[2,3,4]. After union(a,b) makes a=[1,2,3,4] and b=[2,3,4]
first_list = [1, 2, 2, 5]
second_list = [2, 5, 7, 9]
newList=[]
for i in first_list:
newList.append(i)
for z in second_list:
if z not in newList:
newList.append(z)
newList.sort()
print newList
[1, 2, 2, 5, 7, 9]