I have two lists, both fairly long. List A contains a list of integers, some of which are repeated in list B. I can find which elements appear in both by using:
idx = set(list_A).intersection(list_B)
This returns a set of all the elements appearing in both list A and list B.
However, I would like to find a way to find the matches between the two lists and also retain information about the elements' positions in both lists. Such a function might look like:
def match_lists(list_A,list_B):
.
.
.
return match_A,match_B
where match_A would contain the positions of elements in list_A that had a match somewhere in list_B and vice-versa for match_B.
I can see how to construct such lists using a for-loop, however this feels like it would be prohibitively slow for long lists.
Regarding duplicates: list_B has no duplicates in it, if there is a duplicate in list_A then return all the matched positions as a list, so match_A would be a list of lists.
That should do the job :)
def match_list(list_A, list_B):
intersect = set(list_A).intersection(list_B)
interPosA = [[i for i, x in enumerate(list_A) if x == dup] for dup in intersect]
interPosB = [i for i, x in enumerate(list_B) if x in intersect]
return interPosA, interPosB
(Thanks to machine yearning for duplicate edit)
Use dicts or defaultdicts to store the unique values as keys that map to the indices they appear at, then combine the dicts:
from collections import defaultdict
def make_offset_dict(it):
ret = defaultdict(list) # Or set, the values are unique indices either way
for i, x in enumerate(it):
ret[x].append(i)
dictA = make_offset_dict(A)
dictB = make_offset_dict(B)
for k in dictA.viewkeys() & dictB.viewkeys(): # Plain .keys() on Py3
print(k, dictA[k], dictB[k])
This iterates A and B exactly once each so it works even if they're one-time use iterators, e.g. from a file-like object, and it works efficiently, storing no more data than needed and sticking to cheap hashing based operations instead of repeated iteration.
This isn't the solution to your specific problem, but it preserves all the information needed to solve your problem and then some (e.g. it's cheap to figure out where the matches are located for any given value in either A or B); you can trivially adapt it to your use case or more complicated ones.
How about this:
def match_lists(list_A, list_B):
idx = set(list_A).intersection(list_B)
A_indexes = []
for i, element in enumerate(list_A):
if element in idx:
A_indexes.append(i)
B_indexes = []
for i, element in enumerate(list_B):
if element in idx:
B_indexes.append(i)
return A_indexes, B_indexes
This only runs through each list once (requiring only one dict) and also works with duplicates in list_B
def match_lists(list_A,list_B):
da=dict((e,i) for i,e in enumerate(list_A))
for bi,e in enumerate(list_B):
try:
ai=da[e]
yield (e,ai,bi) # element e is in position ai in list_A and bi in list_B
except KeyError:
pass
Try this:
def match_lists(list_A, list_B):
match_A = {}
match_B = {}
for elem in list_A:
if elem in list_B:
match_A[elem] = list_A.index(elem)
match_B[elem] = list_B.index(elem)
return match_A, match_B
Related
Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.
I have a list of lists in python of the form
A=[[1,2,3,4],
[5,6,7,8],
[9,10,11,12]]
I need to get a fast way to get the row index of an element in that structure.
method(2) = 0
method(8) = 1
method(12) = 2
and so on. As always, the fastest the method the better, as my actual list of lists is quite large.
In this state, the data structure (list of lists) is not quite convenient and efficient for the queries you want to make on it. Restructure it to have it in a form:
item -> list of sublist indexes # assuming items can be present in multiple sublists
This way the lookups would be instant, by key - O(1). Let's use defaultdict(list):
>>> from collections import defaultdict
>>>
>>> d = defaultdict(list)
>>> for index, sublist in enumerate(A):
... for item in sublist:
... d[item].append(index)
...
>>> d[2]
[0]
>>> d[8]
[1]
>>> d[12]
[2]
It is very simple using next() with a generator expression:
def method(lists, value):
return next(i for i, v in enumerate(lists) if value in v)
The problem with that is that it will have an error if value does not occur. With a slightly longer function call, you can make a default of -1:
def method(lists, value):
return next((i for i,v in enumerate(lists) if value in v), -1)
Here is another way using numpy
import numpy
A = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
my_array = numpy.array(A)
numpy.where(my_array==2) ## will return both the list and the index within the list
numpy.where(my_array==12)
## As a follow up if we want only the index we can always do :
numpy.where(my_array==12)[0][0] # will return 2 , index of list
numpy.where(my_array==12)[1][0] # will return 3 , index within list
find operation in list is linear. Following is simple code in python to find an element in list of lists.
A=[[1,2,3,4],
[5,6,7,8],
[9,10,11,12]]
def method(value):
for idx, list in enumerate(A):
if value in list:
return idx
return -1
print (method(12))
I'd like to know how I can easily generate a list based on the values/order of two other lists:
list_a = ['web1','web2','web3','web1','web4']
list_b = ['web2','web4','web1','web5','web1']
I'd like to retrieve the "list_b" list ordered by value from "list_a":
final = ['web1','web2','web1','web4','web5']
If an entry exist on list_b but not on list_a, then the value is appended to the list at the end.
I'm not sure where to start, my initial thinking was to retrieve all the indexes with enum [i for i, x in enumerate(mylist) if x==value], then sort the list, but I'm having hard time managing entries with multiples index (eg: web1) . Just wondering if you guys are thinking about an easy way to achieve this ?
An extremely simplistic way would be to just iterate over list_a, and should you find each element in list_b you remove it and append it to a list. Then after iterating all that remains in list_b are the elements that you need to add to the end of your list.
list_a = ['web1','web2','web3','web1','web4']
list_b = ['web2','web4','web1','web5','web1']
front = []
for ele in list_a:
if ele in list_b:
front.append(ele)
list_b.remove(ele)
final = front + list_b
print(final)
Outputs:
['web1', 'web2', 'web1', 'web4', 'web5']
Another trickier way would be to use collections.Counter and a few list comprehensions, leveraging the set intersection and difference of the counters.
from collections import Counter
cnt_a, cnt_b = Counter(list_a), Counter(list_b)
intersct = (cnt_a & cnt_b)
diff = (cnt_b - cnt_a)
final = [a for a in list_a if a in intersct] + [b for b in list_b if b in diff]
I have two quite long lists and I know that all of the elements of the shorter are contained in the longer, yet I need to isolate the elements in the longer list which are not in the shorter so that I can remove them individually from the dictionary I got the longer list from.
What I have so far is:
for e in range(len(lst_ck)):
if lst_ck[e] not in lst_rk:
del currs[lst_ck[e]]
del lst_ck[e]
lst_ck is the longer list and lst_rk is the shorter, currs is the dictionary from which came lst_ck. If it helps, they are both lists of 3 digit keys from dictionaries.
Use sets to find the difference:
l1 = [1,2,3,4]
l2 = [1,2,3,4,6,7,8]
print(set(l2).difference(l1))
set([6, 7, 8]) # in l2 but not in l1
Then remove the elements.
diff = set(l2).difference(l1):
your_list[:] = [ele for ele in your_list of ele not in diff]
If you lists are very big you may prefer a generator expression:
your_list[:] = (ele for ele in your_list of ele not in diff)
If you don't care of multiple occurrences of the same item, use set.
diff = set(lst_ck) - set(lst_rk)
If you care, try this:
diff = [e for e in lst_rk if e not in lst_ck]
My question seems simple, but for a novice to python like myself this is starting to get too complex for me to get, so here's the situation:
I need to take a list such as:
L = [(a, b, c), (d, e, d), (etc, etc, etc), (etc, etc, etc)]
and make each index an individual list so that I may pull elements from each index specifically. The problem is that the list I am actually working with contains hundreds of indices such as the ones above and I cannot make something like:
L_new = list(L['insert specific index here'])
for each one as that would mean filling up the memory with hundreds of lists corresponding to individual indices of the first list and would be far too time and memory consuming from my point of view. So my question is this, how can I separate those indices and then pull individual parts from them without needing to create hundreds of individual lists (at least to the point where I wont need hundreds of individual lines to create them).
I might be misreading your question, but I'm inclined to say that you don't actually have to do anything to be able to index your tuples. See my comment, but: L[0][0] will give "a", L[0][1] will give "b", L[2][1] will give "etc" etc...
If you really want a clean way to turn this into a list of lists you could use a list comprehension:
cast = [list(entry) for entry in L]
In response to your comment: if you want to access across dimensions I would suggest list comprehension. For your comment specifically:
crosscut = [entry[0] for entry in L]
In response to comment 2: This is largely a part of a really useful operation called slicing. Specifically to do the referenced operation you would do this:
multiple_index = [entry[0:3] for entry in L]
Depending on your readability preferences there are actually a number of possibilities here:
list_of_lists = []
for sublist in L:
list_of_lists.append(list(sublist))
iterator = iter(L)
for i in range(0,iterator.__length_hint__()):
return list(iterator.next())
# Or yield list(iterator.next()) if you want lazy evaluation
What you have there is a list of tuples, access them like a list of lists
L[3][2]
will get the second element from the 3rd tuple in your list L
Two way of using inner lists:
for index, sublist in enumerate(L):
# do something with sublist
pass
or with an iterator
iterator = iter(L)
sublist = L.next() # <-- yields the first sublist
in both case, sublist elements can be reached via
direct index
sublist[2]
iteration
iterator = iter(sublist)
iterator.next() # <-- yields first elem of sublist
for elem in sublist:
# do something with my elem
pass