Remove duplicate items from lists in Python lists - python

I want to remove duplicate items from lists in sublists on Python.
Exemple :
myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]
to
myList = [[1,2,3], [4,5,6], [7,8,9], [0]]
I tried with this code :
myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
nbr = []
for x in myList:
for i in x:
if i not in nbr:
nbr.append(i)
else:
x.remove(i)
But some duplicate items are not deleted.
Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]
I still have the number 4 that repeats.

You iterate over a list that you are also modifying:
...
for i in x:
...
x.remove(i)
That means that it may skip an element on next iteration.
The solution is to create a shallow copy of the list and iterate over that while modifying the original list:
...
for i in x.copy():
...
x.remove(i)

You can make this much faster by:
Using a set for repeated membership testing instead of a list, and
Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.
seen = set()
for i, sublist in enumerate(myList):
new_list = []
for x in sublist:
if x not in seen:
seen.add(x)
new_list.append(x)
myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]
If you want mild speed gains and moderate readability loss, you can also write this as:
seen = set()
for i, sublist in enumerate(myList):
myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]

Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.
Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.

Related

How to get the get the lowest sum of 3 elements in matrix

A = [[1, 2, 4],
[3, 5, 6],
[7,8,9]]
def sumof(A,3):
We need to find the lowest sum of 3 elements
here 1 + 2 + 3 = 6 so output is (1,2,3)
Can we do by using zip
one basic solution, convert nested list into flat list, sort it,slice sorted list and sum:
A = [[1, 2, 4],
[3, 5, 6],
[7,8,9]]
def sumof(A, n):
# convert nested list into flat list
flat_list = [item for sublist in A for item in sublist]
return sum(sorted(flat_list)[:n])
print (sumof(A,3))
If you have a large array, you can do this without sorting the list,
which is a little faster like
from operator import add
from functools import reduce
A = [[1, 2, 4],
[3, 5, 6],
[7,8,9]]
addlists = lambda l: reduce(add, l)
list_A = addlists(A)
result = [list_A.pop(list_A.index(min(list_A))) for _ in range(3)]
It's a little more complicated, though the modules imported are really useful.

remove a list of lists while iterating over it

I know that you are not supposed to remove an element of a list while iterating over it but I have to.
I'm trying to iterate over a list of lists and if I find a value in a list of my lists i need to remove it.
This is what I've tried so far.
dict[["A1","A2"],
["B1","B2"],
["C1","C2"]]
for i in range(len(dict)):
if dict[i][0]=="A1":
dict.pop(i)
But it's giving me an error of out of range.
How can I do it with list comprehensions or any other approach?
Do you mean this?
old = [["A1","A2"], ["B1","B2"], ["C1","C2"]]
new = [x for x in old if x[0] != "A1"]
You can't. You will get an exception. Create a new list as a copy.
>>> disallowed = [1, 2, 3]
>>> my_list = [ [1, 2, 3, 4, 5, 6, 7], [3, 3, 4, 5, 8, 8, 2] ]
>>> filtered_list = [[y for y in x if y not in disallowed] for x in my_list]
>>> print filtered_list
[[4, 5, 6, 7], [4, 5, 8, 8]]
You can actually delete from a list while you iterate over it, provided you to it backwards (so deletion only affects higher indices which you have already seen during this iteration):
data = [["A1","A2"],
["B1","B2"],
["C1","C2"]]
for i, pair in reversed(data):
if pair[0] == 'A1':
del data[i]

Python: How to only retain elements from all previous sublists while iterating over list?

So this is what I have:
lst = [[1,4,5,9], [4,5,7,9], [6,2,9], [4,5,9], [4,5]]
And I want to make a new list where the sublists only have the elements that are shared with ALL the previous sublists.
It should look like this:
new_lst = [[1,4,5,9], [4,5,9], [9], [9], []]
I tried converting the integers to strings and iterating over it, but I can't seem to get the right result. I'm new to python, so any help is much appreciated!
If you don't care about the order of the items, you can use a set based approach:
last = set(lst[0])
res = [lst[0]]
for item in lst[1:]:
last &= set(item)
res.append(list(last))
Here, res contains the resulting list of lists. The important line is last &= set(item) which calculates the intersection between the previous and the current item.
You can use accumulate
from itertools import accumulate
new_lst = map(list,accumulate(map(set,lst), set.intersection))
print(list(new_lst))
which produces
[[1, 4, 5, 9], [4, 5, 9], [9], [9], []]
but it doesn't guarantee the items of the sublists will be in the same order.
In case you do care about the order of the items, you can do it manually
last = set(lst[0])
new_lst = [lst[0]]
for l in lst[1:]:
new_els = [n for n in l if n in last]
new_lst.append(new_els)
last &= set(new_els)
print(new_lst)
Or just ignore sets and use lists instead, but sets are faster, since they are hashed and intersection is a core functionality.
One line answer:
>>> from functools import reduce # for forward compatible.
>>> lst = [[1,4,5,9], [4,5,7,9], [4,5,7,9], [4,5,9], [4,5]]
>>> [list(reduce(lambda x,y: set(x)&set(y), lst[:i+1])) for i in range(len(lst))]
[[1, 4, 5, 9], [9, 4, 5], [9], [9], []]
The explanation:
[1, 4, 5, 9] = list(set([1, 4, 5, 9]))
[9, 4, 5] = list(set([1, 4, 5, 9]) & set([4,5,7,9]))
[9] = list(set([1, 4, 5, 9]) & set([4,5,7,9]) & set([4,5,7,9]))
...
in the i'st loop, it should be union the items in lst[:i+1].

How can I find same values in a list and group together a new list?

From this list:
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
I'm trying to create:
L = [[1],[2,2],[3,3,3],[4,4,4,4],[5,5,5,5,5]]
Any value which is found to be the same is grouped into it's own sublist.
Here is my attempt so far, I'm thinking I should use a while loop?
global n
n = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] #Sorted list
l = [] #Empty list to append values to
def compare(val):
""" This function receives index values
from the n list (n[0] etc) """
global valin
valin = val
global count
count = 0
for i in xrange(len(n)):
if valin == n[count]: # If the input value i.e. n[x] == n[iteration]
temp = valin, n[count]
l.append(temp) #append the values to a new list
count +=1
else:
count +=1
for x in xrange (len(n)):
compare(n[x]) #pass the n[x] to compare function
Use itertools.groupby:
from itertools import groupby
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
print([list(j) for i, j in groupby(N)])
Output:
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
Side note: Prevent from using global variable when you don't need to.
Someone mentions for N=[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 1] it will get [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [1]]
In other words, when numbers of the list isn't in order or it is a mess list, it's not available.
So I have better answer to solve this problem.
from collections import Counter
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
C = Counter(N)
print [ [k,]*v for k,v in C.items()]
You can use itertools.groupby along with a list comprehension
>>> l = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
>>> [list(v) for k,v in itertools.groupby(l)]
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
This can be assigned to the variable L as in
L = [list(v) for k,v in itertools.groupby(l)]
You're overcomplicating this.
What you want to do is: for each value, if it's the same as the last value, just append it to the list of last values; otherwise, create a new list. You can translate that English directly to Python:
new_list = []
for value in old_list:
if new_list and new_list[-1][0] == value:
new_list[-1].append(value)
else:
new_list.append([value])
There are even simpler ways to do this if you're willing to get a bit more abstract, e.g., by using the grouping functions in itertools. But this should be easy to understand.
If you really need to do this with a while loop, you can translate any for loop into a while loop like this:
for value in iterable:
do_stuff(value)
iterator = iter(iterable)
while True:
try:
value = next(iterator)
except StopIteration:
break
do_stuff(value)
Or, if you know the iterable is a sequence, you can use a slightly simpler while loop:
index = 0
while index < len(sequence):
value = sequence[index]
do_stuff(value)
index += 1
But both of these make your code less readable, less Pythonic, more complicated, less efficient, easier to get wrong, etc.
You can do that using numpy too:
import numpy as np
N = np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
counter = np.arange(1, np.alen(N))
L = np.split(N, counter[N[1:]!=N[:-1]])
The advantage of this method is when you have another list which is related to N and you want to split it in the same way.
Another slightly different solution that doesn't rely on itertools:
#!/usr/bin/env python
def group(items):
"""
groups a sorted list of integers into sublists based on the integer key
"""
if len(items) == 0:
return []
grouped_items = []
prev_item, rest_items = items[0], items[1:]
subgroup = [prev_item]
for item in rest_items:
if item != prev_item:
grouped_items.append(subgroup)
subgroup = []
subgroup.append(item)
prev_item = item
grouped_items.append(subgroup)
return grouped_items
print group([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
# [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]

array except other

I have 2 arrays:
arr1 = [a,b,c,d,e]
arr2 = [c,d,e]
I want to give array arr1 except arr2.
Mathematically, you're looking for a difference between two sets represented in lists. So how about using the Python set, which has a builtin difference operation (overloaded on the - operator)?
>>>
>>> arr = [1, 2, 3, 4, 5]
>>> arr2 = [3, 4, 9]
>>> set(arr) - set(arr2)
>>> sdiff = set(arr) - set(arr2)
>>> sdiff
set([1, 2, 5])
>>> list(sdiff)
[1, 2, 5]
>>>
It would be more convenient to have your information in a set in the first place, though. This operation suggests that a set better fits your application semantics than a list. On the other hand, if you may have duplicates in the lists, then set is not a good solution.
So you want the difference of two lists:
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
list2 = [1, 2, 3, 4, 4, 6, 7, 8, 11, 77]
def list_difference(list1, list2):
"""uses list1 as the reference, returns list of items not in list2"""
diff_list = []
for item in list1:
if not item in list2:
diff_list.append(item)
return diff_list
print list_difference(list1, list2) # [5, 9, 10]
Or using list comprehension:
# simpler using list comprehension
diff_list = [item for item in list1 if item not in list2]
print diff_list # [5, 9, 10]
If you care about (1) preserving the order in which the items appear and (2) efficiency in the case where your lists are large, you probably want a hybrid of the two solutions already proposed.
list2_items = set(list2)
[x for x in list1 if x not in list2_items]
(Converting both to sets will lose the ordering. Using if x not in list2 in your list comprehension will give you in effect an iteration over both lists, which will be inefficient if list2 is large.)
If you know that list2 is not very long and don't need to save every possible microsecond, you should probably go with the simple list comprehension proposed by Flavius: it's short, simple and says exactly what you mean.

Categories

Resources