Generate all combinations from multiple lists with repeat lists - python

I have multiple lists, some of them are repeats and I need all combinations, excluding ones where the same element from a repeated list is chosen. For example, I have
import itertools
list1 = [1,2,3]
list2 = [4,5,6]
list3 = [4,5,6]
list4 = [7,8,9]
a = [list1,list2,list3,list4]
print list(itertools.product(*a))
which outputs
(1,4,4,7)
(1,4,4,8)
(1,4,4,9)
(1,4,5,7)
.
.
.
etc, as you'd expect, but what I want it to do is output every combination, without repeating elements from lists 2 and 3. Like this:
(1,4,5,7)
(1,4,5,8)
(1,4,5,9)
(1,4,6,7)
(1,4,6,8)
(1,4,6,9)
(1,5,6,7)
(1,5,6,8)
(1,5,6,9)
(2,4,5,7)
.
.
.
I'd obviously like to avoid having to remove them manually after creating the list, but any help on how to do this efficiently is really appreciated. Thanks.

The easy way is a generator expression with a filter:
print list(item for item in itertools.product(*a) if item[1] != item[2])
If two items are considered the same if they contain the same elements, and if each item is guaranteed to contain no repeated elements, you can discard duplicates by changing them into sets and only adding them to your list if they're not already in it:
result = []
for item in itertools.product(*a):
if item[1]==item[2]:
continue
item = set(item)
if item not in result:
result.append(item)
print result

Related

Can you use AND in List comprehension conditional statements?

I am trying to use List Comprehension to perform the following. I want to make a new list (unique) that only has the common numbers from both lists.
unique = []
for listcomp in range(len(list1)):
if list1[listcomp] in list2 and list1[listcomp] not in unique:
unique.append(list1[listcomp])
else:
continue
Above works fine but when I create the List comprehension below I get duplicates if list1 has duplicate numbers. i.e. list1 = [1, 1, 2], list2 = [1, 5]. I created my list comprehension as
unique = [list1[listcomp] for listcomp in range(len(list1)) if list1[listcomp] in list2 and list1[listcomp] not in unique]
If I'm getting duplicates I assume the "and" statement isn't being applied? I have read other queries about moving the if statement further up the comprehension statement but this didn't work. Can you use AND to extend your conditions?
Many thanks
My full code is:-
import random as rnd
# Randomly generate the size of your list
list1size = rnd.randint(1,20)
list2size = rnd.randint(1,20)
# Declare your list variables
list1 = []
list2 = []
# Fill your lists with randomly generated numbers upto the listsize generated above
for x in range(list1size):
list1.append(rnd.randint(1,15))
for y in range(list2size):
list2.append(rnd.randint(1,15))
# Not required but easier to read lists once sorted
list1.sort()
list2.sort()
print(list1)
print(list2)
# Now to compare old school
unique = []
# for listcomp in range(len(list1)):
# if list1[listcomp] in list2 and list1[listcomp] not in unique:
# unique.append(list1[listcomp])
# else:
# continue
# Now to compare with list comprehension
unique = [list1[listcomp] for listcomp in range(len(list1)) if list1[listcomp] in list2 and list1[listcomp] not in unique]
# Above doesn't stop duplicates if they are in List1 so I assume you can't use AND
print(f"The common numbers in both lists are {unique}")
You can't access elements produced by a list comprehension as you go along. Your condition list1[listcomp] not in unique will always return True since at that moment in time unique is defined as the empty list intialised in unique = [].
So the and statement is being applied, but not the in way you want.
Instead, you can create a "seen" set holding items you have already found and omit them. The standard implementation is found in the itertools unique_everseen recipe.
If you have the 3rd party toolz library, you can use the identical toolz.unique and feed a generator expression. More Pythonic, you can iterate elements directly rather than using indices:
from toolz import unique
unique = list(unique(i for i in list1 if i in list2))

Randomly chose an element of one list that's NOT in a second list

Say I have a list2 of randomly chosen elements from a large list1. Is there a clever way of choosing an element from list1 that's NOT already in list2?
For example:
list1 = range(20,100)
list2 = [37,49,22,35,72] # could be much longer
while True:
n = random.choice(list1)
if n not in list2:
break
# now n is an element of list1 that's not in list2
I feel like there must be a more efficient way of doing this than a guess-and-check while-loop.
You can subtract list2 of list1:
list3 = list(set(list1)-set(list2))
and choose from it randomly:
random.choice(list3)
Note: you need to reconvert the set to a list.
You could use:
import random
list1 = range(20,100)
list2 = [37,49,22,35,72]
not_in_list2 = [item for item in list1 if item not in list2]
n = random.choice(not_in_list2)
This uses a list comprehension to create a list of all elements in list1 that aren't inlist2. It then selects randomly from this list. Unlike when working with sets, this technique does not change the probability of items being selected, because it does not remove duplicate elements from list1.
In case that there are no repeating elements in list1, this is a pythonic way, working with set and -:
import random
list1 = range(20,100)
list2 = [37,49,22,35,72] # could be much longer
n = random.choice(tuple(set(list1)-set(list2)))
# now n is an element of list1 that's not in list2
The tuple call is needed to avoid a NotIndexable exception.
If you want to randomly select more than one item from a list, or select an item from a set, it's better to use random.sample instead of choice
import random
diff = set(list1)-set(list2)
num_to_select = 1 # set the number to select here.
list_of_random_items = random.sample(diff, num_to_select)
If you do not want the overhead of creating a new list (or a new list and two sets) which can become quite costly if list1 is very large, there is another option.
import random
list1 = range(20,100)
list2 = [37,49,22,35,72]
for i in list2:
while i in list1:
list1.remove(i)
random.choice(list1)
Just iterate through the items in list2 and remove them from list1. Since list.remove() only removes the first occurrence of an item, I added a while-loop to ensure that all occurences are removed.

Test if two lists of lists are equal

Say I have two lists of lists in Python,
l1 = [['a',1], ['b',2], ['c',3]]
l2 = [['b',2], ['c',3], ['a',1]]
What is the most elegant way to test they are equal in the sense that the elements of l1 are simply some permutation of the elements in l2?
Note to do this for ordinary lists see here, however this uses set which does not work for lists of lists.
l1 = [['a',1], ['b',2], ['c',3]]
l2 = [['b',2], ['c',3], ['a',1]]
print sorted(l1) == sorted(l2)
Result:
True
Set doesn't work for list of lists but it works for list of tuples. Sou you can map each sublist to tuple and use set as:
>>> l1 = [['a',1], ['b',2], ['c',3]]
>>> l2 = [['b',2], ['c',3], ['a',1]]
>>> print set(map(tuple,l1)) == set(map(tuple,l2))
True
For one liner solution to the above question, refer to my answer in this question
I am quoting the same answer over here. This will work regardless of whether your input is a simple list or a nested one.
let the two lists be list1 and list2, and your requirement is to
ensure whether two lists have the same elements, then as per me,
following will be the best approach :-
if ((len(list1) == len(list2)) and
(all(i in list2 for i in list1))):
print 'True'
else:
print 'False'
The above piece of code will work per your need i.e. whether all the
elements of list1 are in list2 and vice-verse. Elements in both the lists need not to be in the same order.
But if you want to just check whether all elements of list1 are
present in list2 or not, then you need to use the below code piece
only :-
if all(i in list2 for i in list1):
print 'True'
else:
print 'False'
The difference is, the later will print True, if list2 contains some
extra elements along with all the elements of list1. In simple words,
it will ensure that all the elements of list1 should be present in
list2, regardless of whether list2 has some extra elements or not.

Optimizing a nested for loop with two lists

I have a program that searches through two separate lists, lets call them list1 and list2.
I only want to print the instances where list1 and list2 have matching items. The thing is, not all items in both lists match eachother, but the first, third and fourth items should.
If they match, I want the complete lists (including the mismatching items) to be appended to two corresponding lists.
I have written the follow code:
for item in list1:
for item2 in list2:
if (item[0] and item[2:4])==(item[0] and item2[2:4]):
newlist1.append(item)
newlist2.append(item2)
break
This works, but it's quite inefficient. For some of the larger files I'm looking through it can take more than 10 seconds to complete the match, and it should ideally be at most half of that.
What I'm thinking is that it shouldn't have to start over from the beginning in list2 each time the code is run, it should be enough to continue from the last point where there was a match. But I don't know how to write it in code.
Your condition (item[0] and item[2:4])==(item[0] and item2[2:4]) is wrong.
Besides that the second item[0] should probably be item2[0], what (item[0] and item[2:4]) does is the following (analogously for (item2[0] and item2[2:4])):
if item[0] is 0, it returns item[0] itself, i.e. 0
if item[0] is not 0, it returns whatever item[2:4] is
And this is then compared to the result of the second term. Thus, [0,1,1,1] would "equal" [0,2,2,2], and [1,1,1,1] would "equal" [2,1,1,1].
Try using tuples instead:
if (item[0], item[2:4]) == (item2[0], item2[2:4]):
Or use operator.itemgetter as suggested in the other answer.
To speed up the pairwise matching of items from both lists, put the items from the first list into a dictionary, using those tuples as key, and then iterating over the other list and looking up the matching items in the dictionary. Complexity will be O(n+m) instead of O(n*m) (n and m being the length of the lists).
key = operator.itemgetter(0, 2, 3)
list1_dict = {}
for item in list1:
list1_dict.setdefault(key(item), []).append(item)
for item2 in list2:
for item in list1_dict.get(key(item2), []):
newlist1.append(item)
newlist2.append(item2)
from operator import itemgetter
getter = itemgetter(0, 2, 3)
for item,item2 in zip(list1, list2):
if getter(item) == getter(item2):
newlist1.append(item)
newlist2.append(item2)
break
This may reduce bit of time complexity though...

Modifying list elements based on key word of the element

I have many lists which I want to do some operations on some specific elements. So if I have something like:
list1 = ['list1_itemA', 'list1_itemB', 'list1_itemC', 'list1_itemD']
list2 = ['list2_itemA', 'list2_itemC','list2_itemB']
What interest me is item 'itemC' wherever it occurs in all lists and I need to isolate an element which contain itemC for next manipulations on it. I thought about sorting the lists in such a way that itemC occupies the first index which would be achieved by list[0] method.
But in my case itemA, itemB, itemC and itemD are biological species names and I dont know how to force list element occupy the first index (that would be an element with certain string e.g 'cow' in my analysis or 'itemC' here). Is this possible with Python?
You can extract items containing "itemC" without ordering, or worrying how many there are, with a "generator expression":
itemCs = []
for lst in (list1, list2):
itemCs.extend(item for item in lst if "itemC" in item)
This gives itemCs == ['list1_itemC', 'list2_itemC'].
If you're trying to save the lists with a specific string contained in the text, you can use:
parse_lists = [ list1, list2, list3 ]
matching_lists = []
search_str = "itemC"
for thisList in parse_list:
if any( search_str in item for item in thisList ):
matching_lists.append( thisList )
This has an advantage that you don't need to hard-code your list name in all your list item strings, which I'm assuming you're doing now.
Also interesting to note is that changing elements of matching_lists changes the original (referenced) lists as well. You can see this and this for clarity.
>>> [x for y in [list1, list2] for x in y if "itemC" in x]
['list1_itemC', 'list2_itemC']
or
>>> [x for y in [list1, list2] for x in y if any(search_term in x for search_term in ["itemC"])]
['list1_itemC', 'list2_itemC']

Categories

Resources