This question already has answers here:
Permutations between two lists of unequal length
(11 answers)
Closed 3 years ago.
Let's say I have the following elements: ['A', 'B', 1, 2]
My idea is to get the following combinations:
('A', 1)
('A', 2)
('B', 1)
('B', 2)
But these are not all the combinations of the above sequence, e.g. I'm not considering (in purpose) ('A', 'B') or (1, 2)
Using itertools.combinations, of course, gets me all the combinations:
from itertools import combinations
combinations(['A', 'B', 1, 2], 2)
# [('A', 'B'), ('A', 1), ('A', 2), ('B', 1), ('B', 2), (1, 2)]
It's possible for me to internally group the elements that cannot go together:
elems = [('A', 'B'), (1, 2)]
However, combinations does not expect iterables inside other iterables, so the outcome is not really unexpected: [(('A', 'B'), (1, 2))]. Not what I want, nonetheless.
What's the best way to achieve this?
You can use itertools.product to get the cartesian product of two lists:
from itertools import product
elems = [('A', 'B'), (1, 2)]
list(product(*elems))
# [('A', 1), ('A', 2), ('B', 1), ('B', 2)]
You can use itertools.product after forming new input with values grouped by type:
from itertools import product as prd, groupby as gb
d = ['A', 'B', 1, 2]
result = list(product(*[list(b) for _, b in gb(sorted(d, key=lambda x:str(type(x)), reverse=True), key=type)]))
Output:
[('A', 1), ('A', 2), ('B', 1), ('B', 2)]
This solution will create new sublists grouped by data type, enabling robustness for future input and/or flexibility in element ordering in d:
d = ['A', 1, 'B', 2, (1, 2), 'C', 3, (3, 4), (4, 5)]
result = list(prd(*[list(b) for _, b in gb(sorted(d, key=lambda x:str(type(x)), reverse=True), key=type)]))
Output:
[((1, 2), 'A', 1), ((1, 2), 'A', 2), ((1, 2), 'A', 3), ((1, 2), 'B', 1), ((1, 2), 'B', 2), ((1, 2), 'B', 3), ((1, 2), 'C', 1), ((1, 2), 'C', 2), ((1, 2), 'C', 3), ((3, 4), 'A', 1), ((3, 4), 'A', 2), ((3, 4), 'A', 3), ((3, 4), 'B', 1), ((3, 4), 'B', 2), ((3, 4), 'B', 3), ((3, 4), 'C', 1), ((3, 4), 'C', 2), ((3, 4), 'C', 3), ((4, 5), 'A', 1), ((4, 5), 'A', 2), ((4, 5), 'A', 3), ((4, 5), 'B', 1), ((4, 5), 'B', 2), ((4, 5), 'B', 3), ((4, 5), 'C', 1), ((4, 5), 'C', 2), ((4, 5), 'C', 3)]
Related
Lets say I have a list:
t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
There are two tuples with 'a' as the first element, and two tuples with 'c' as the first element. I want to only keep the first instance of each, so I end up with:
t = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
How can I achieve that?
You can use a dictionary to help you filter the duplicate keys:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> d = {}
>>> for x, y in t:
... if x not in d:
... d[x] = y
...
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> t = list(d.items())
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#MrGeek's answer is good, but if you do not want to use a dictionary, you could do something simply like this:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> already_seen = []
>>> for e in t:
... if e[0] not in already_seen:
... already_seen.append(e[0])
... else:
... t.remove(e)
...
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#gold_cy's Comment is the easiest way:
You can use itertools.groupby in order to group your data. We use key param to group by the first element of each tuple.
import itertools as it
t = [list(my_iterator)[0] for g, my_iterator in it.groupby(t, key=lambda x: x[0])]
Output:
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
If I have this list of tuples:
[(['a', 'b', 'c'], [1, 2, 3]),
(['d', 'e', 'f'], [4, 5, 6])]
How can I parametrize a test function, so the following pairs are tested:
[('a', 1), ('a', 2), ('a', 3),
('b', 1), ('b', 2), ('b', 3),
('c', 1), ('c', 2), ('c', 3),
('d', 4), ('d', 5), ('d', 6),
('e', 4), ('e', 5), ('e', 6),
('f', 4), ('f', 5), ('f', 6)]
I'm aware that two stacked decorators will combine the two lists in one of the tuples.
Use itertools.product.
Example code is here:
import itertools
A = [(['a', 'b', 'c'], [1, 2, 3]),
(['d', 'e', 'f'], [4, 5, 6])]
L = []
for i in range(len(A)):
L += list(itertools.product(A[i][0], A[i][1]))
I have a dictionary with only 4 keys (mydictionary) and a list (mynodes) as follows.
mydictionary = {0: {('B', 'E', 'G'), ('A', 'E', 'G'), ('A', 'E', 'F'), ('A', 'D', 'F'), ('C', 'D', 'F'), ('C', 'E', 'F'), ('A', 'D', 'G'), ('C', 'D', 'G'), ('C', 'E', 'G'), ('B', 'E', 'F')},
1: {('A', 'C', 'G'), ('E', 'F', 'G'), ('D', 'E', 'F'), ('A', 'F', 'G'), ('A', 'B', 'G'), ('B', 'D', 'F'), ('C', 'F', 'G'), ('A', 'C', 'E'), ('D', 'E', 'G'), ('B', 'F', 'G'), ('B', 'C', 'G'), ('A', 'C', 'D'), ('A', 'B', 'F'), ('B', 'D', 'G'), ('B', 'C', 'F'), ('A', 'D', 'E'), ('C', 'D', 'E'), ('A', 'C', 'F'), ('A', 'B', 'E'), ('B', 'C', 'E'), ('D', 'F', 'G')},
2: {('B', 'D', 'E'), ('A', 'B', 'D'), ('B', 'C', 'D')},
3: {('A', 'B', 'C')}}
mynodes = ['E', 'D', 'G', 'F', 'B', 'A', 'C']
I am checking how many times each node in mynodes list is in each key of mydictionary. For example, consider the above dictionary and list.
The output should be;
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
For example, consider E. It appears 6 times in 0 key, 8 times in 1 key, 2 times in 2 key and 0 times in 3 key.
My current code is as follows.
triad_class_for_nodes = {}
for node in mynodes:
temp_list = []
for key, value in mydictionary.items():
temp_counting = 0
for triad in value:
#print(triad[0])
if node in triad:
temp_counting = temp_counting + 1
temp_list.append(tuple((key, temp_counting)))
triad_class_for_nodes.update({node: temp_list})
print(triad_class_for_nodes)
This works fine with the small dictionary values.
However, in my real dataset, I have millions of tuples in the value list for each of my 4 keys in my dictionary. Hence, my existing code is really inefficient and takes days to run.
When I search on how to make this more efficient I came accross this question (Fastest way to search a list in python), which suggests to make the list of values to a set. I tried this as well. However, it also takes days to run.
I am just wondering if there is a more efficient way of doing this in python. I am happy to transform my existing data formats into different structures (such as pandas dataframe) to make things more efficient.
A small sample of mydictionary and mynodes is attached below for testing purposes. https://drive.google.com/drive/folders/15Faa78xlNAYLPvqS3cKM1v8bV1HQzW2W?usp=sharing
mydictionary: see triads.txt
with open("triads.txt", "r") as file:
mydictionary = ast.literal_eval(file.read)
mynodes: see nodes.txt
with open("nodes.txt", "r") as file:
mynodes = ast.literal_eval(file.read)
I am happy to provide more details if needed.
Since you tag pandas, first we need convert your dict to pandas dataframe , then we stack it , and using crosstab
s=pd.DataFrame.from_dict(mydictionary,'index').stack()
s = pd.DataFrame(s.values.tolist(), index=s.index).stack()
pd.crosstab(s.index.get_level_values(0),s)
col_0 A B C D E F G
row_0
0 4 2 4 4 6 5 5
1 9 9 9 8 8 10 10
2 1 3 1 3 1 0 0
3 1 1 1 0 0 0 0
Update
s=pd.crosstab(s.index.get_level_values(0), s).stack().reset_index()
s[['row_0',0]].apply(tuple,1).groupby(s['col_0']).agg(list).to_dict()
If you're not using pandas, you could do this with Counter from collections:
from collections import Counter,defaultdict
from itertools import product
counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t )
result = defaultdict(list)
for c,k in product(mynodes,mydictionary):
result[c].append((k,counts[(c,k)]))
print(result)
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
Counter will manage counting instances for each combination of mydictionary key and node. You can then use these counts to create the expected output.
EDIT Expanded counts line:
counts = Counter() # initialize Counter() object
for key,tupleSet in mydictionary.items(): # loop through dictionary
for tupl in tupleSet: # loop through tuple set of each key
for node in tupl: # loop through node character in each tuple
counts[(node,key]] += 1 # count 1 node/key pair
I am trying to create tuple of following kind:
('a', 0), ('b', 0), ('a', 1), ('b', 1), ('a', 2), ('b', 2), ('a', 3), ('b', 3)
from arrays:
A = ['a','b'] and numbers 0 through 3.
What is good pythonic representation as I am ending with a real for loop here.
Use itertools.product.
from itertools import product
tuples = list(product(['a', 'b'], [0, 1, 2, 3]))
print(tuples) # [('a', 0), ('a', 1), ..., ('b', 0), ('b', 1), ...]
If you need them in the exact order you originally specified, then:
tuples = [(let, n) for n, let in product([0, 1, 2, 3], ['a', 'b'])]
If your comment that "I am ending with a real for loop here" means you ultimately just want to iterate over these elements, then:
for n, let in product([0, 1, 2, 3], ['a', 'b']):
tup = (let, n) # possibly unnecessary, depending on what you're doing
''' your code here '''
You could opt for itertools.product to get the Cartesian product you're looking for. If the element order isn't of significance, then we have
>>> from itertools import product
>>> list(product(A, range(4)))
[('a', 0),
('a', 1),
('a', 2),
('a', 3),
('b', 0),
('b', 1),
('b', 2),
('b', 3)]
If you need that particular order,
>>> list(tuple(reversed(x)) for x in product(range(4), A))
[('a', 0),
('b', 0),
('a', 1),
('b', 1),
('a', 2),
('b', 2),
('a', 3),
('b', 3)]
L = range(0, 4)
K = ['a', 'b']
L3 = [(i, j) for i in K for j in L]
print(L3)
OUTPUT
[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('b', 0), ('b', 1), ('b', 2), ('b', 3)]
If you wish to use list comprehension... other answers are correct as well
Use list comprehension
>>> [(a,n) for a in list1 for n in range(4)]
[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('b', 0), ('b', 1), ('b', 2), ('b', 3)]
If order matters:
>>> [(a,n) for n in range(4) for a in list1]
[('a', 0), ('b', 0), ('a', 1), ('b', 1), ('a', 2), ('b', 2), ('a', 3), ('b', 3)]
I need to get the cartesian product of iterables, like itertools.product gives me, but for optimization reasons I want those pairs/combinations with the lowest sum of indices to appear first.
So, for example, if I have two lists, a = [1, 2, 3, 4, 5] and b = ['a', 'b', 'c', 'd', 'e'], itertools.product gives me:
>>> list(itertools.product(a, b))
[(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e'), (2, 'a'), (2, 'b'), (2, 'c'), (2, 'd'), (2, 'e'), (3, 'a'), (3, 'b'), (3, 'c'), (3, 'd'), (3, 'e'), (4, 'a'), (4, 'b'), (4, 'c'), (4, 'd'), (4, 'e'), (5, 'a'), (5, 'b'), (5, 'c'), (5, 'd'), (5, 'e')]
Instead, I would want to see (2, 'a') before (1, 'c'). The exact order, between e.g. (1, 'b') and (2, 'a'), is unimportant.
Currently, I am sorting a list based on the product of the index ranges:
>>> sorted(list(itertools.product(range(len(a)), range(len(b)))), lambda a, b: sum(a) - sum(b))
[(0, 0), (0, 1), (1, 0), (0, 2), (1, 1), (2, 0), (0, 3), (1, 2), (2, 1), (3, 0), (0, 4), (1, 3), (2, 2), (3, 1), (4, 0), (1, 4), (2, 3), (3, 2), (4, 1), (2, 4), (3, 3), (4, 2), (3, 4), (4, 3), (4, 4)]
Then using that to index the lists. However, this takes too much memory with long lists. I need some kind of generator with the same calling convention as itertools.product, but I cannot figure out the way to iterate so that I get both the ordering and all the possible pairs exactly once.
def cartprod(x,y):
nx = len(x)
ny = len(y)
for i in range(nx+ny):
for j in range(max(0,i-ny+1), min(i+1,nx)):
yield (x[j],y[i-j])
updated following #otus comment - generating indices ordered by sum, using those to lookup values:
A = range(5)
B = 'abcde'
def indices(A,B):
# iterate all possible target sums in order
for m in range(max(A)+max(B)):
for a in A:
# stop once current target sum isn't possible
if a > m:
break
# yield if sum equals current target sum
if m-a in B:
yield a,m-a
def values(A,B):
for a,b in indices(range(len(A)),set(range(len(B)))):
yield A[a],B[b]
print list(values(A,B))
output:
[(0, 'a'), (0, 'b'), (1, 'a'), (0, 'c'), (1, 'b'), (2, 'a'), (0, 'd'), (1, 'c'), (2, 'b'), (3, 'a'), (0, 'e'), (1, 'd'), (2, 'c'), (3, 'b'), (4, 'a'), (1, 'e'), (2, 'd'), (3, 'c'), (4, 'b'), (2, 'e'), (3, 'd'), (4, 'c'), (3, 'e'), (4, 'd')]