Python maintain order in intersection of lists - python

I have a list A and list B, I want to get common elements from the two lists but want when I get the common elements they should maintain the order of List A.
First I started with converting them in to set and taking the intersection but that had the problem of maintaining the order.
common = list(set(A).intersection(set(B)))
so I decided to do list comprehension:
common = [i for i in A if i in B]
I am getting
IndexError: too many indices for array

As a general answer for such problems You can use sorted function with lambda x:A.index(x) as its key that will sort the result based on the order of list A:
>>> common = sorted(set(A).intersection(B) ,key=lambda x:A.index(x))
Also Note that you don't need use set(B) for intersection.

Your code (common = [i for i in A if i in B]) works just fine. Perhaps what you wrote in your question was not the exact code that raised the IndexError.
You can speedup membership tests by making a set from B:
set_b = set(B)
common = [i for i in A if i in set_b]

Related

Indices of elements in a python list of tuples

Problem
Given 2 lists A and B, I want to get the indices of all elements in List A which are present in List B. Each element is a tuple.
I am using lists of size 40,000 elements or so.
Sample case
Input:
A = [(1,2),(3,4),(5,6),(7,8)]
B = [(1,2),(3,4),(5,6)]
Expected output:
[0,1,2]
Attempted solutions
I tried two solutions:
1) using map function
m = map(a.index,b)
list(m)
2) using list comprehension
m = [a.index(item) for item in b if item in a]
These methods seem to be taking too much time. Is there any other way to accomplish this?
Below would be your best bet. I am using a set (i.e., set(B)) as searching for the specific tuple can be done in O(1) Time Complexity.
m = [index for index, tuple in enumerate(A) if tuple in set(B)]

Python: How To Get The Number Of Items In One List Which Belong To Another

I have a list of ~3000 items. Let's call it listA.
And another list with 1,000,000 items. Let's call it listB.
I want to check how many items of listA belong in listB. For example to get an answer like 436.
The obvious way is to have a nested loop looking for each item, but this is slow, especially due to the size of the lists.
What is the fastest and/or Pythonic way to get the number of the items of one list belonging to another?
Make a set out of list_b. That will avoid nested loops and make the contains-check O(1). The entire process will be O(M+N) which should be fairly optimal:
set_b = set(list_b)
count = sum(1 for a in list_a if a in set_b)
# OR shorter, but maybe less intuitive
count = sum(a in set_b for a in list_a)
# where the bool expression is coerced to int {0; 1} for the summing
If you don't want to (or have to) count repeated elements in list_a, you can use set intersection:
count = len(set(list_a) & set(list_b))
# OR
count = len(set(list_a).intersection(list_b)) # avoids one conversion
One should also note that these set-based operations only work if the items in your lists are hashable (e.g. not lists themselves)!
Another option is to use set and find the intersection:
len(set(listA).intersection(listB))
You can loop over the contents of listA and use a generator to yield values to be more efficient:
def get_number_of_elements(s, a):
for i in s:
if i in a:
yield i
print(len(list(get_number_of_elements(listA, listB))))

Python: replace values of sublist, with values looked up from another sublist without indexing

Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.

Delete all occurrences of specific values from list of lists python

As far as I can see this question (surprisingly?) has not been asked before - unless I am failing to spot an equivalent question due to lack of experience. (Similar questions have
been asked about 1D lists)
I have a list_A that has int values in it.
I want to delete all occurrences of all the values specified in List_A from my list_of_lists. As a novice coder I can hack something together here myself using list comprehensions and for loops, but given what I have read about inefficiencies of deleting elements from within lists, I am looking for advice from more experienced users about the fastest way to go about this.
list_of_lists= [
[1,2,3,4,5,6,8,9],
[0,2,4,5,6,7],
[0,1,6],
[0,4,9],
[0,1,3,5],
[0,1,4],
[0,1,2],
[1,8],
[0,7],
[0,3]
]
Further info
I am not looking to eliminate duplicates (there is already a question on here about that). I am looking to eliminate all occurrences of selected values.
list_A may typically have 200 values in it
list_of_lists will have a similar (long tailed) distribution to that shown above but in the order of up to 10,000 rows by 10,000 columns
Output can be a modified version of original list_of_lists or completely new list - whichever is quicker
Last but not least (thanks to RemcoGerlich for drawing attention to this) - I need to eliminate empty sublists from with the list of lists
Many thanks
Using list comprehension should work as:
new_list = [[i for i in l if i not in list_A] for l in list_of_list]
After that, if you want to remove empty lists, you can make:
for i in new_list:
if not i:
new_list.remove(i)
of, as #ferhatelmas pointed in comments:
new_list = [i for i in new_list if i]
To avoid duplicates in list_A you can convert it to a set before with list_A = set(list_A)
I'd write a function that just takes one list (or iterable) and a set, and returns a new list with the values from the set removed:
def list_without_values(L, values):
return [l for l in L if l not in values]
Then I'd turn list_A into a set for fast checking, and loop over the original list:
set_A = set(list_A)
list_of_lists = [list_without_values(L, set_A) for L in list_of_lists]
Should be fast enough and readibility is what matters most.

How to separate one list in two via list comprehension or otherwise

If have a list of dictionary items like so:
L = [{"a":1, "b":0}, {"a":3, "b":1}...]
I would like to split these entries based upon the value of "b", either 0 or 1.
A(b=0) = [{"a":1, "b":1}, ....]
B(b=1) = [{"a":3, "b":2}, .....]
I am comfortable with using simple list comprehensions, and i am currently looping through the list L two times.
A = [d for d in L if d["b"] == 0]
B = [d for d in L if d["b"] != 0]
Clearly this is not the most efficient way.
An else clause does not seem to be available within the list comprehension functionality.
Can I do what I want via list comprehension?
Is there a better way to do this?
I am looking for a good balance between readability and efficiency, leaning towards readability.
Thanks!
update:
thanks everyone for the comments and ideas! the most easiest one for me to read is the one by Thomas. but i will look at Alex' suggestion as well. i had not found any reference to the collections module before.
Don't use a list comprehension. List comprehensions are for when you want a single list result. You obviously don't :) Use a regular for loop:
A = []
B = []
for item in L:
if item['b'] == 0:
target = A
else:
target = B
target.append(item)
You can shorten the snippet by doing, say, (A, B)[item['b'] != 0].append(item), but why bother?
If the b value can be only 0 or 1, #Thomas's simple solution is probably best. For a more general case (in which you want to discriminate among several possible values of b -- your sample "expected results" appear to be completely divorced from and contradictory to your question's text, so it's far from obvious whether you actually need some generality;-):
from collections import defaultdict
separated = defaultdict(list)
for x in L:
separated[x['b']].append(x)
When this code executes, separated ends up with a dict (actually an instance of collections.defaultdict, a dict subclass) whose keys are all values for b that actually occur in dicts in list L, the corresponding values being the separated sublists. So, for example, if b takes only the values 0 and 1, separated[0] would be what (in your question's text as opposed to the example) you want as list A, and separated[1] what you want as list B.

Categories

Resources