Unique list of lists - python

I have a nested list as an example:
lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
I'm trying to check if the first 3 indices of a nested list element are the same as other.
I.e.
if [1,2,3] exists in other lists, remove all the other nested list elements that contain that. So that the nested list is unique.
I'm not sure the most pythonic way of doing this would be.
for i in range(0, len(lst_a)):
if lst[i][:3] == lst[i-1][:3]:
lst[i].pop()
Desired output:
lst_a = [[1,2,3,9], [1,2,6,8]]

If, as you said in comments, sublists that have the same first three elements are always next to each other (but the list is not necessarily sorted) you can use itertools.groupby to group those elements and then get the next from each of the groups.
>>> from itertools import groupby
>>> lst_a = [[1,2,3,5], [1,2,3,7], [1,2,3,9], [1,2,6,8]]
>>> [next(g) for k, g in groupby(lst_a, key=lambda x: x[:3])]
[[1, 2, 3, 5], [1, 2, 6, 8]]
Or use a list comprehension with enumerate and compare the current element with the last one:
>>> [x for i, x in enumerate(lst_a) if i == 0 or lst_a[i-1][:3] != x[:3]]
[[1, 2, 3, 5], [1, 2, 6, 8]]
This does not require any imports, but IMHO when using groupby it is much clearer what the code is supposed to do. Note, however, that unlike your method, both of those will create a new filtered list, instead of updating/deleting from the original list.

I think you are missing a loop For if you want to check all possibilities. I guess it should like :
for i in range(0, len(lst_a)):
for j in range(i, len(lst_a)):
if lst[i][:3] == lst[j][:3]:
lst[i].pop()
Deleting while going throught the list is maybe not the best idea you should delete unwanted elements at the end

Going with your approach, Find the below code:
lst=[lst_a[0]]
for li in lst_a[1:]:
if li[:3]!=lst[0][:3]:
lst.append(li)
print(lst)
Hope this helps!

You can use a dictionary to filter a list:
dct = {tuple(i[:3]): i for i in lst}
# {(1, 2, 3): [1, 2, 3, 9], (1, 2, 6): [1, 2, 6, 8]}
list(dct.values())
# [[1, 2, 3, 9], [1, 2, 6, 8]]

Related

Remove common in two list

I have problem with Python function uncommon(l1,l2) that takes two lists sorted in ascending order as arguments and returns the list of all elements that appear in exactly one of the two lists. The list returned should be in ascending order. All such elements should be listed only once, even if they appear multiple times in l1 or l2.
Thus, uncommon([2,2,4],[1,3,3,4,5]) should return [1,2,3,5] while uncommon([1,2,3],[1,1,2,3,3]) should return []
I have tried
def uncommon(l1,l2):
sl1=set(l1)
sl2=set(l2)
Using sets is a good start - sets have good logic regarding group operations.
If we think using venn diagrams we can see that "uncommon" elements are everything that is in the Union of the two lists minus the intersection of the two lists (the part in white is the intersection):
In python, this is called symmetric difference, and is built into sets:
def uncommon(l1, l2):
set1 = set(l1)
set2 = set(l2)
return sorted(set1.symmetric_difference(set2))
print(uncommon([2, 2, 4], [1, 3, 3, 4, 5])) # [1, 2, 3, 5]
print(uncommon([2, 2, 4], [1, 3, 3, 4, 5, 255])) # [1, 2, 3, 5, 255]
print(uncommon([1, 2, 3], [1, 1, 2, 3, 3])) # []
For a solution that doesn't require sorting (O(n*logn)), you can merge the sorted lists with heapq.merge after removing the duplicates and intersections:
from heapq import merge
def uncommon(l1,l2):
d1 = dict.fromkeys(l1)
d2 = dict.fromkeys(l2)
drop = set(d1).intersection(d2)
return list(merge(*([x for x in d
if not x in drop]
for d in [d1, d2])))
uncommon([2,2,4], [1,3,3,4,5,256])
# [1, 2, 3, 5, 256]
To get the uncommon element from two lists remove the common elements from the union(list1+list2) of the two lists.
def uncommon(l1,l2):
l1 = set(l1)
l2 = set(l2)
return sorted(list(l1.union(l2) - (l1.intersection(l2))))
print(uncommon([2,2,4],[1,3,3,4,5]))
print(uncommon([1,2,3],[1,1,2,3,3]))
OUTPUT
[1, 2, 3, 5]
[]
def uncommon(l1,l2):
sl1=set(l1)
sl2=set(l2)
return list(sl1^sl2)
Thanks for help! Actually i found a solution like this. What about this?
The common elements are those contained in both lists. So first get rid of elements in list 1 that are also in list 2. Then get rid of elements in list 2 that are also in list 1.
Return the remaining elements.
def uncommon(l1, l2):
d1=l1-l2
d2=l2-l1
return d1+d2

Finding minimum index of a sub-list of a list

Say, I have a list and a sub-list constructed from that list. I want to find the number from the sub-list that appears earliest in the original list.
Example:
lst = [5, 3, 4, 1, 2, 6]
sublst = [1, 2, 3]
In this case, I want to select 3 since it appears in lst 2nd, while 1 and 2 appear 4th and 5th respectively. What I have so far:
lst[min(lst.index(num) for num in sublst)]
This seems really convoluted and difficult-to-read. I was wondering if there was a cleaner way to write this.
You should make sublst a set to make it more efficient to search in it. Then you could use a simple for loop:
lst = [5, 3, 4, 1, 2, 6]
sublst = set([1, 2, 3])
for l in lst:
if l in sublst:
break
print(l)
You could also write that as a generator comprehension, finding all values in lst that are in sublst. By using a generator we will stop at the first matching value:
first = (l for l in lst if l in sublst)
print(next(first))
Output in both cases for your sample data is
3

Unable to create duplicate list from existing list using list comprehension with an if condition

I have a sorted list with duplicate elements like
>>> randList = [1, 2, 2, 3, 4, 4, 5]
>>> randList
[1, 2, 2, 3, 4, 4, 5]
I need to create a list that removes the adjacent duplicate elements. I can do it like:
>>>> dupList = []
for num in nums:
if num not in dupList:
dupList.append(num)
But I want to do it with list comprehension. I tried the following code:
>>> newList = []
>>> newList = [num for num in randList if num not in newList]
But I get the result like the if condition isn't working.
>>> newList
[1, 2, 2, 3, 4, 4, 5]
Any help would be appreciated.
Thanks!!
Edit 1: The wording of the question does seem to be confusing given the data I have provided. The for loop that I am using will remove all duplicates but since I am sorting the list beforehand, that shouldn't a problem when removing adjacent duplicates.
Using itertools.groupby is the simplest approach to remove adjacent (and only adjacent) duplicates, even for unsorted input:
>>> from itertools import groupby
>>> [k for k, _ in groupby(randList)]
[1, 2, 3, 4, 5]
Removing all duplicates while maintaining the order of occurence can be efficiently achieved with an OrderedDict. This, as well, works for ordered and unordered input:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(randList))
[1, 2, 3, 4, 5]
I need to create a list that removes the adjacent duplicate elements
Note that your for loop based solution will remove ALL duplicates, not only adjacent ones. Test it with this:
rand_list = [1, 2, 2, 3, 4, 4, 2, 5, 1]
according to your spec the result should be:
[1, 2, 3, 4, 2, 5, 1]
but you'll get
[1, 2, 3, 4, 5]
instead.
A working solution to only remove adjacent duplicates is to use a generator:
def dedup_adjacent(seq):
prev = seq[0]
yield prev
for current in seq[1:]:
if current == prev:
continue
yield current
prev = current
rand_list = [1, 2, 2, 3, 4, 4, 2, 5, 1]
list(dedup_adjacent(rand_list))
=> [1, 2, 3, 4, 2, 5, 1]
Python first evaluates the list comprehension and then assigns it to newList, so you cannot refer to it during execution of the list comprehension.
You can remove dublicates in two ways:-
1. Using for loop
rand_list = [1,2,2,3,3,4,5]
new_list=[]
for i in rand_list:
if i not in new_list:
new_list.append(i)
Convert list to set,then again convert set to list,and at last sort the new list.
Since set stores values in any order so when we convert set into list you need to sort the list so that you get the item in ascending order
rand_list = [1,2,2,3,3,4,5]
sets = set(rand_list)
new_list = list(sets)
new_list.sort()
Update: Comparison of different Approaches
There have been three ways of achieving the goal of removing adjacent duplicate elements in a sorted list, i.e. removing all duplicates:
using groupby (only adjacent elements, requires initial sorting)
using OrderedDict (all duplicates removed)
using sorted(list(set(_))) (all duplicaties removed, ordering restored by sorting).
I compared the running times of the different solutions using:
from timeit import timeit
print('groupby:', timeit('from itertools import groupby; l = [x // 5 for x in range(1000)]; [k for k, _ in groupby(l)]'))
print('OrderedDict:', timeit('from collections import OrderedDict; l = [x // 5 for x in range(1000)]; list(OrderedDict.fromkeys(l))'))
print('Set:', timeit('l = [x // 5 for x in range(1000)]; sorted(list(set(l)))'))
> groupby: 78.83623623599942
> OrderedDict: 94.54144410200024
> Set: 65.60372123999969
Note that the set approach is the fastest among all alternatives.
Old Answer
Python first evaluates the list comprehension and then assigns it to newList, so you cannot refer to it during execution of the list comprehension. To illustrate, consider the following code:
randList = [1, 2, 2, 3, 4, 4, 5]
newList = []
newList = [num for num in randList if print(newList)]
> []
> []
> []
> …
This becomes even more evident if you try:
# Do not initialize newList2
newList2 = [num for num in randList if print(newList2)]
> NameError: name 'newList2' is not defined
You can remove duplicates by turning randList into a set:
sorted(list(set(randlist)))
> [1, 2, 3, 4, 5]
Be aware that this does remove all duplicates (not just adjacent ones) and ordering is not preserved. The former also holds true for your proposed solution with the loop.
edit: added a sorted clause as to specification of required ordering.
In this line newList = [num for num in randList if num not in newList], at first the list will be created in right side then then it will be assigned to newList. That's why every time you check if num not in newList returns True. Becasue newList remains empty till the assignment.
You can try this:
randList = [1, 2, 2, 3, 4, 4, 5]
new_list=[]
for i in randList:
if i not in new_list:
new_list.append(i)
print(new_list)
You cannot access the items in a list comprehension as you go along. The items in a list comprehension are only accessible once the comprehension is completed.
For large lists, checking for membership in a list will be expensive, albeit with minimal memory requirements. Instead, you can append to a set:
randList = [1, 2, 2, 3, 4, 4, 5]
def gen_values(L):
seen = set()
for i in L:
if i not in seen:
seen.add(i)
yield i
print(list(gen_values(randList)))
[1, 2, 3, 4, 5]
This algorithm has been implemented in the 3rd party toolz library. It's also known as the unique_everseen recipe in the itertools docs:
from toolz import unique
res = list(unique(randList))
Since your list is sorted, using set will be the fasted way to achieve your goal, as follows:
>>> randList = [1, 2, 2, 3, 4, 4, 5]
>>> randList
[1, 2, 2, 3, 4, 4, 5]
>>> remove_dup_list = list(set(randList))
>>> remove_dup_list
[1, 2, 3, 4, 5]
>>>

unite lists if at least one value matches in python

Let's say I have a list of lists, for example:
[[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]
and if at least one of the values on a list is the same that another one of a different list, i would like to unite the lists so in the example the final result would be:
[[0, 1, 2, 3], [4, 5, 6, 7, 8]]
I really don't care about the order of the values inside the list [0, 1, 2, 3] or [0, 2, 1, 3].
I tried to do it but it doesn't work. So have you got any ideas? Thanks.
Edit(sorry for not posting the code that i tried before):
What i tried to do was the following:
for p in llista:
for q in p:
for k in llista:
if p==k:
llista.remove(k)
else:
for h in k:
if p!=k:
if q==h:
k.remove(h)
for t in k:
if t not in p:
p.append(t)
llista_final = [x for x in llista if x != []]
Where llista is the list of lists.
I have to admit this is a tricky problem. I'm really curious what does this problem represent and/or where did you find it out...
I initially have thought this is just a graph connected components problem, but I wanted to take a shortcut from creating an explicit representation of the graph, running bfs, etc...
The idea of the solution is this: for every sublist, check if it has some common element with any other sublist, and replace that with their union.
Not very pythonic, but here it is:
def merge(l):
l = list(map(tuple, l))
for i, h in enumerate(l):
sh = set(h)
for j, k in enumerate(l):
if i == j: continue
sk = set(k)
if sh & sk: # h and k have some element in common
l[j] = tuple(sh | sk)
return list(map(list, set(l)))
Here is a function that does what you want. I tried to use self-documenting variable names and comments to help you understand how this code works. As far as I can tell, the code is pythonic. I used sets to speed up and simplify some of the operations. The downside of that is that the items in your input list-of-lists must be hashable, but your example uses integers which works perfectly well.
def cliquesfromlistoflists(inputlistoflists):
"""Given a list of lists, return a new list of lists that unites
the old lists that have at least one element in common.
"""
listofdisjointsets = []
for inputlist in inputlistoflists:
# Update the list of disjoint sets using the current sublist
inputset = set(inputlist)
unionofsetsoverlappinginputset = inputset.copy()
listofdisjointsetsnotoverlappinginputset = []
for aset in listofdisjointsets:
# Unite set if overlaps the new input set, else just store it
if aset.isdisjoint(inputset):
listofdisjointsetsnotoverlappinginputset.append(aset)
else:
unionofsetsoverlappinginputset.update(aset)
listofdisjointsets = (listofdisjointsetsnotoverlappinginputset
+ [unionofsetsoverlappinginputset])
# Return the information in a list-of-lists format
return [list(aset) for aset in listofdisjointsets]
print(cliquesfromlistoflists([[0, 2], [0, 1], [2, 3], [4, 5, 7, 8], [6, 4]]))
# printout is [[0, 1, 2, 3], [4, 5, 6, 7, 8]]
This solution modifies the generic breadth-first search to gradually diminish the initial deque and update a result list with either a combination should a match be found or a list addition if no grouping is discovered:
from collections import deque
d = deque([[0,2] , [0,1] , [2,3] , [4,5,7,8] , [6,4]])
result = [d.popleft()]
while d:
v = d.popleft()
result = [list(set(i+v)) if any(c in i for c in v) else i for i in result] if any(any(c in i for c in v) for i in result) else result + [v]
Output:
[[0, 1, 2, 3], [8, 4, 5, 6, 7]]

Function that compares 1st and last element, 2nd and 2nd last element, and so on

I want to write a function that compares the first element of this list with the last element of this list, the second element of this list with the second last element of this list, and so on. If the compared elements are the same, I want to add the element to a new list. Finally, I'd like to print this new list.
For example,
>>> f([1,5,7,7,8,1])
[1,7]
>>> f([3,1,4,1,5]
[1,4]
>>> f([2,3,5,7,1,3,5])
[3,7]
I was thinking to take the first (i) and last (k) element, compare them, then raise i but lower k, then repeat the process. When i and k 'overlap', stop, and print the list. I've tried to visualise my thoughts in the following code:
def f(x):
newlist=[]
k=len(x)-1
i=0
for j in x:
if x[i]==x[k]:
if i<k:
newlist.append(x[i])
i=i+1
k=k-1
print(newlist)
Please let me know if there are any errors in my code, or if there is a more suitable way to address the problem.
As I am new to Python, I am not very good with understanding complicated terminology/features of Python. As such, it would be encouraged if you took this into account in your answer.
You could use a conditional list comprehension with enumerate, comparing the element x at index i to the element at index -1-i (-1 being the last index of the list):
>>> lst = [1,5,7,7,8,1]
>>> [x for i, x in enumerate(lst[:(len(lst)+1)//2]) if lst[-1-i] == x]
[1, 7]
>>> lst = [3,1,4,1,5]
>>> [x for i, x in enumerate(lst[:(len(lst)+1)//2]) if lst[-1-i] == x]
[1, 4]
Or, as already suggested in other answers, use zip. However, it is enough to slice the first argument; the second one can just be the reversed list, as zip will stop once one of the argument lists is finished, making the code a bit shorter.
>>> [x for x, y in zip(lst[:(len(lst)+1)//2], reversed(lst)) if x == y]
In both approaches, (len(lst)+1)//2 is equivalent to int(math.ceil(len(lst)/2)).
maybe you want something like for even length of list:
>>> r=[l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
>>> r
[3]
>>> l=[1,5,7,7,8,1]
>>> r=[l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
>>> r
[1, 7]
And for odd length of list :
>>> l=[3,1,4,1,5]
>>> r=[l[i] for i in range(len(l)/2+1) if l[i]==l[-(i+1)]]
>>> r
[1, 4]
so you can create a function :
def myfunc(mylist):
if (len(mylist) % 2 == 0):
return [l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
else:
return [l[i] for i in range(len(l)/2+1) if l[i]==l[-(i+1)]]
and use it this way :
>>> l=[1,5,7,7,8,1]
>>> myfunc(l)
[1, 7]
>>> l=[3,1,4,1,5]
>>> myfunc(l)
[1, 4]
What you can do is zip over the first half and the second half reversed and use list comprehensions to build a list of the same ones:
[element_1 for element_1, element_2 in zip(l[:len(l)//2], reversed(l[(len(l)+1)//2:])) if element_1 == element_2]
What happens is that you take the first half and iterate over those as element_1, the second half reversed as element_2 and then only add them if they are the same:
l = [1, 2, 3, 3, 2, 4]
l[:len(l)//2] == [1, 2, 3]
reversed(l[(len(l)+1)//2:])) == [4, 2, 3]
1 != 4, 2 == 2, 3 == 3, result == [2, 3]
If you also want to include the middle element in the case of an odd list, we can just extend our lists to both include the middle element, which will always evaluate as the same:
[element_1 for element_1, element_2 in zip(l[:(len(l) + 1)//2], reversed(l[len(l)//2:])) if element_1 == element_2]
l = [3, 1, 4, 1, 5]
l[:len(l)//2] == [3, 1, 4]
reversed(l[(len(l)+1)//2:])) == [5, 1, 4]
3 != 5, 1 == 1, 4 == 4, result == [1, 4]
Here is my solution:
[el1 for (el1, el2) in zip(L[:len(L)//2+1], L[len(L)//2:][::-1]) if el1==el2]
There is a lot going on, so let me explain step by step:
L[:len(L)//2+1] is the first half of the list plus an extra element (which is useful for lists of odd lengths)
L[len(L)//2:][::-1] is the second half of the list, reversed ([::-1])
zip creates a list of pairs from two lists. it stops at the end of the shortest list. We use this in the case the length of the list is even, so the extra term in the first half is neglected
List comprehension essentially equivalent to a for loop, but useful to create a list "on the fly". It will return an element only if the if condition is true, otherwise it will pass.
You can easily modify the solution above if you are interested in the indexes (of the first half) where the match occurs:
[idx for idx, (el1, el2) in enumerate(zip(L[:len(L)//2+1], L[len(L)//2:][::-1])) if el1==el2]
You can use the following which leverages from zip_longest:
from itertools import zip_longest
def compare(lst):
size = len(lst) // 2
return [y for x, y in zip_longest(lst[:size], lst[-1:size-1:-1], fillvalue=None) if x == y or x is None]
print(compare([1, 5, 7, 7, 8, 1])) # [1, 7]
print(compare([3, 1, 4, 1, 5])) # [1, 4]
print(compare([2, 3, 5, 7, 1, 3, 5])) # [3, 7]
On zip_longest:
Normally, zip stops zipping when one of its iterators run out. zip_longest does not have that limitation and it simply keeps on zipping by adding dummy values.
Example:
list(zip([1, 2, 3], ['a'])) # [(1, 'a')]
list(zip_longest([1, 2, 3], ['a'], fillvalue='z')) # [(1, 'a'), (2, 'z'), (3, 'z')]

Categories

Resources