I have two quite long lists and I know that all of the elements of the shorter are contained in the longer, yet I need to isolate the elements in the longer list which are not in the shorter so that I can remove them individually from the dictionary I got the longer list from.
What I have so far is:
for e in range(len(lst_ck)):
if lst_ck[e] not in lst_rk:
del currs[lst_ck[e]]
del lst_ck[e]
lst_ck is the longer list and lst_rk is the shorter, currs is the dictionary from which came lst_ck. If it helps, they are both lists of 3 digit keys from dictionaries.
Use sets to find the difference:
l1 = [1,2,3,4]
l2 = [1,2,3,4,6,7,8]
print(set(l2).difference(l1))
set([6, 7, 8]) # in l2 but not in l1
Then remove the elements.
diff = set(l2).difference(l1):
your_list[:] = [ele for ele in your_list of ele not in diff]
If you lists are very big you may prefer a generator expression:
your_list[:] = (ele for ele in your_list of ele not in diff)
If you don't care of multiple occurrences of the same item, use set.
diff = set(lst_ck) - set(lst_rk)
If you care, try this:
diff = [e for e in lst_rk if e not in lst_ck]
Related
I have two lists, both fairly long. List A contains a list of integers, some of which are repeated in list B. I can find which elements appear in both by using:
idx = set(list_A).intersection(list_B)
This returns a set of all the elements appearing in both list A and list B.
However, I would like to find a way to find the matches between the two lists and also retain information about the elements' positions in both lists. Such a function might look like:
def match_lists(list_A,list_B):
.
.
.
return match_A,match_B
where match_A would contain the positions of elements in list_A that had a match somewhere in list_B and vice-versa for match_B.
I can see how to construct such lists using a for-loop, however this feels like it would be prohibitively slow for long lists.
Regarding duplicates: list_B has no duplicates in it, if there is a duplicate in list_A then return all the matched positions as a list, so match_A would be a list of lists.
That should do the job :)
def match_list(list_A, list_B):
intersect = set(list_A).intersection(list_B)
interPosA = [[i for i, x in enumerate(list_A) if x == dup] for dup in intersect]
interPosB = [i for i, x in enumerate(list_B) if x in intersect]
return interPosA, interPosB
(Thanks to machine yearning for duplicate edit)
Use dicts or defaultdicts to store the unique values as keys that map to the indices they appear at, then combine the dicts:
from collections import defaultdict
def make_offset_dict(it):
ret = defaultdict(list) # Or set, the values are unique indices either way
for i, x in enumerate(it):
ret[x].append(i)
dictA = make_offset_dict(A)
dictB = make_offset_dict(B)
for k in dictA.viewkeys() & dictB.viewkeys(): # Plain .keys() on Py3
print(k, dictA[k], dictB[k])
This iterates A and B exactly once each so it works even if they're one-time use iterators, e.g. from a file-like object, and it works efficiently, storing no more data than needed and sticking to cheap hashing based operations instead of repeated iteration.
This isn't the solution to your specific problem, but it preserves all the information needed to solve your problem and then some (e.g. it's cheap to figure out where the matches are located for any given value in either A or B); you can trivially adapt it to your use case or more complicated ones.
How about this:
def match_lists(list_A, list_B):
idx = set(list_A).intersection(list_B)
A_indexes = []
for i, element in enumerate(list_A):
if element in idx:
A_indexes.append(i)
B_indexes = []
for i, element in enumerate(list_B):
if element in idx:
B_indexes.append(i)
return A_indexes, B_indexes
This only runs through each list once (requiring only one dict) and also works with duplicates in list_B
def match_lists(list_A,list_B):
da=dict((e,i) for i,e in enumerate(list_A))
for bi,e in enumerate(list_B):
try:
ai=da[e]
yield (e,ai,bi) # element e is in position ai in list_A and bi in list_B
except KeyError:
pass
Try this:
def match_lists(list_A, list_B):
match_A = {}
match_B = {}
for elem in list_A:
if elem in list_B:
match_A[elem] = list_A.index(elem)
match_B[elem] = list_B.index(elem)
return match_A, match_B
I'd like to know how I can easily generate a list based on the values/order of two other lists:
list_a = ['web1','web2','web3','web1','web4']
list_b = ['web2','web4','web1','web5','web1']
I'd like to retrieve the "list_b" list ordered by value from "list_a":
final = ['web1','web2','web1','web4','web5']
If an entry exist on list_b but not on list_a, then the value is appended to the list at the end.
I'm not sure where to start, my initial thinking was to retrieve all the indexes with enum [i for i, x in enumerate(mylist) if x==value], then sort the list, but I'm having hard time managing entries with multiples index (eg: web1) . Just wondering if you guys are thinking about an easy way to achieve this ?
An extremely simplistic way would be to just iterate over list_a, and should you find each element in list_b you remove it and append it to a list. Then after iterating all that remains in list_b are the elements that you need to add to the end of your list.
list_a = ['web1','web2','web3','web1','web4']
list_b = ['web2','web4','web1','web5','web1']
front = []
for ele in list_a:
if ele in list_b:
front.append(ele)
list_b.remove(ele)
final = front + list_b
print(final)
Outputs:
['web1', 'web2', 'web1', 'web4', 'web5']
Another trickier way would be to use collections.Counter and a few list comprehensions, leveraging the set intersection and difference of the counters.
from collections import Counter
cnt_a, cnt_b = Counter(list_a), Counter(list_b)
intersct = (cnt_a & cnt_b)
diff = (cnt_b - cnt_a)
final = [a for a in list_a if a in intersct] + [b for b in list_b if b in diff]
Say I have a dictionary of lists,
C = {}
li = []
li.append(x)
C[ind] = li
And I want to check if another list is a member of this dictionary.
for s in C.values():
s.append(w)
Python checks it for any occurrences of the values in s and the dictionary values. But I want to check if any of the lists in the dictionary is identical to the given list.
How can I do it?
Use any for a list of lists:
d = {1 : [1,2,3], 2: [2,1]}
lsts = [[1,2],[2,1]]
print(any(x in d.values() for x in lsts))
True
d = {1:[1,2,3],2:[1,2]}
lsts = [[3,2,1],[2,1]]
print(any(x in d.values() for x in lsts))
False
Or in for a single list:
lst = [1,2]
lst in d.itervalues()
Python will compare each element of both lists so they will have to have the same order to be equal, even if they have the same elements inside the order must also be the same so a simple comparison will do what you want.
in does the trick perfectly, because it does a comparison with each element behind the scenes, so it works even for mutable elements:
lst in d.values()
Is it possible to run through a subset in a python list?
I have the following problem, I have two lists, list1 is very long and list2 is quite short. Now, I want to check which elements of the list2 are also in list1. My current version looks like this:
for item in list1:
if item in list2:
# do something
This takes a very long time. Is it possible to get a subset and then run through the list?
I need to do this many times.
If the list elements are hashable, you can find the intersection using sets:
>>> for x in set(list2).intersection(list1):
print x
If they are not hashable, you can at least speed-up the search by sorting the shorter list and doing bisected lookups:
>>> from bisect import bisect_left
>>> list2.sort()
>>> n = len(list2)
>>> for x in list1:
i = bisect_left(list2, x)
if i != n and list2[i] == x:
print x
If your data elements are neither hashable nor sortable, then you won't be able to speed-up your original code:
>>> for x in list1:
if x in list2:
print x
The running time of the set-intersection approach is proportional to the sum of lengths of the two lists, O(n1 + n2). The running time of the bisected-search approach is O((n1 + n2) * log(n2)). The running time of the original brute-force approach is O(n1 * n2).
You can use sets here, they provide O(1) lookup compared to O(N) by lists.
But sets expect that the items must be hashable(immutable).
s = set(list1)
for item in lis2:
if item in s:
#do something
you can use shorthand
list3 = [l for l in list1 if l in list2]
if your list has elements that repeat
l2 = list(set(list2))
l1 = list(set(list1))
list3 = [l for l l1 if l in l2]
I was asked this on an interview this past week, and I didn't have the answer (the correct answer anyways). Say for instance you have list A which has the following elements [1,3,5,7,9,10] and then you have list B, which has the following elements: [3,4,5,6,7], and you want to know which elements in list B are in list A. My answer was:
for item in listA:
for item1 in listB:
if item1 == item:
put item1 in some third list
But I know this is bad, because say listA is one million elements, and listB is a hundred thousand, this solution is just rubbish.
What's the best way to achieve something like this without iteration of both lists?
set(listA) & set(listB) is simplest.
I'd suggest converting them both to sets and doing an intersection:
setA = set(listA)
setB = set(listB)
setA.intersection(setB)
Edit: Note that this will remove any duplicate elements that were in both lists. So if we had listA = [1,1,2,2,3] and listB = [1,1,2,3] then the intersection will only be set([1,2,3]). Also, for a worst-case estimate, this will be as slow as the list comprehension - O(n * m), where n and m are the respective lengths of the lists. Average case is a far better O(n) + O(m) + O(min(m,n)) == O(max(m,n)), however.
well i may as well throw filter in the mix
filter(lambda x: x in listb,lista)
Using list comprehension and using the in operator to test for membership:
[i for i in lista if i in listb]
would yield:
[3, 5, 7]
Alternatively, one could use set operations and see what the intersection of both lists (converted to sets) would be.
You can use sets (preferred):
listC = list(set(listA) & set(listB))
Or a list comprehension:
listC = [i for i in listA if i in listB]