Related
I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.
Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]
I have a project where I need to find an algorithm that can solve the following problem:
Having three list of items :
A = [1,2,3,4,5]
B = [1,2,3,4,5]
C = [1,2,3,4,5]
With python I can find all unique combinations via this line of code:
allCombinations = list(set(product(A,B,C)))
But now i need to get from all of those combinations, the combinations that follow a pretty linear repartition.
for example, there are 125 unique combinations, and now I want 50 combinations where A1 B1 C1 appear less than A2 B2 C2 ... (if it can be almost linear, it will be perfect)
I have no idea how to solve this kind of problem, how can I select the best combinations that correspond to my thinking.
I can do it handly with 125 combinations, but for more it's too difficult.
Thanks
#Edit
I'll remake the example here.
A=[1,2]
B=[1,2]
C=[1,2]
the combinations from this list are
(1,1,1) (1,2,1) (1,2,2) (1,1,2) (2,1,1) (2,1,2) (2,2,1) (2,2,2)
If i need to select 3 combinations, i will choose (2,2,2) (1,2,2) (2,2,1) because i want to make 1 for A,B,C list fewer than 2 from A,B,C.
The goal is to produce rarity, A,B,C represents three list of items. Make the first item of the three list more rare than the second.
And i want to do it for a lot of items.
I think your problem is a little under-specified, so you have a choice to make as to how exactly you want to weight your combinations.
One possibility is to choose random combinations, but with a weight of i*j*k attributed to combination [A[i],B[j],C[k]]. So for instance, combination [A2,B2,C2] will be 8 times more likely to be chosen as combination [A1,B1,C1].
We can use random.sample to sample with weights: https://docs.python.org/3/library/random.html#random.sample
Python 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
sampled = random.sample(allCombinations, k, counts=weights)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
Note the use of enumerate to get the indices i,j,k that are needed to compute the weights. Then we don't forget to remove the indices in sampled_clean before returning the combinations. Also note the weights are computed as (i+1)*(j+1)*(k+1) rather than i*j*k, because everything is 0-indexed, not 1-indexed.
Note: the "counts" keyword argument of random.sample is new in python 3.9. Prior to version 3.9, it was necessary to manually duplicate elements in the population to simulate the weights.
Python < 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
weightedCombinations = [c for c,w in zip(allCombinations, weights) for _ in range(w)]
sampled = random.sample(weightedCombinations, k)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
# [('A3', 'B4', 'C2'), ('A4', 'B4', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B4', 'C4'), ('A3', 'B1', 'C4'), ('A4', 'B3', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B3', 'C4'), ('A2', 'B5', 'C3'), ('A5', 'B2', 'C2'), ('A5', 'B4', 'C3'), ('A4', 'B3', 'C1'), ('A3', 'B2', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B5', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B4', 'C5'), ('A3', 'B4', 'C5'), ('A5', 'B4', 'C2'), ('A2', 'B3', 'C1'), ('A2', 'B5', 'C2'), ('A3', 'B4', 'C4'), ('A4', 'B5', 'C1'), ('A3', 'B2', 'C2'), ('A4', 'B3', 'C5'), ('A2', 'B3', 'C3'), ('A3', 'B4', 'C1'), ('A5', 'B5', 'C4'), ('A3', 'B5', 'C5'), ('A3', 'B2', 'C5'), ('A5', 'B5', 'C3'), ('A5', 'B5', 'C3'), ('A3', 'B4', 'C4'), ('A4', 'B1', 'C1'), ('A3', 'B3', 'C4'), ('A4', 'B2', 'C5'), ('A5', 'B5', 'C5'), ('A4', 'B4', 'C3'), ('A1', 'B5', 'C3'), ('A4', 'B5', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B2', 'C2'), ('A5', 'B2', 'C5'), ('A4', 'B3', 'C5'), ('A4', 'B5', 'C1'), ('A4', 'B3', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B5', 'C3'), ('A5', 'B4', 'C5'), ('A3', 'B1', 'C4')]
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
# [(2, 2, 2), (2, 2, 2), (1, 1, 1)]
I need to write a script in Python to solve this task, but I can't figure out how to do it.
I have items (let's name them layers): A, B, C...
Each layer can have any number of variations.
For each variation, the proportion percent is given that we want to get at the output.
At the output, we have to get a given number of unique combinations of all layers according to the given proportions.
For example:
layers = [
{'A0':'30%', 'A1':'30%', 'A2':'40%'},
{'B0':'10%', 'B1': '20%', 'B2' '40%', 'B3':'30%'},
{'C0':'50%'}
]
If I want to get exact 10 unique combinations of the A, B, C layers variations,
the script should output the dataset like this:
[
('A0', 'B0'),
('A0', 'B1', 'C0'),
('A0', 'B1'),
('A1', 'B2', 'C0'),
('A1', 'B2'),
('A1', 'B3', 'C0'),
('A2', 'B2', 'C0'),
('A2', 'B2'),
('A2', 'B3', 'C0'),
('A2', 'B3')
]
So, the counts of each layer variation should align with the given proportions:
A0 = 3, A1 = 3, A2 = 4
B0 = 1, B1 = 2, B2 = 4, B3 = 3,
C0 = 5
If we want to get 20 variations the counts will be different:
A0 = 6, A1 = 6, A2 = 8
B0 = 2, B1 = 4, B2 = 8, B3 = 6,
C0 = 10
It should work for any number of layers, variations, proportions and get the exact count of the output combinations
(or the maximum of combinations, if there are no more combinations to get the exact number)
For every layer, you can find the distribution list and then recursively merge the results to produce the combinations. Due to the very high number of combinations that could result from get_combos, the latter is a generator, and you can use next to produce the values on-demand:
import itertools
layers = [{'A0': '30%', 'A1': '30%', 'A2': '40%'}, {'B0': '10%', 'B1': '20%', 'B2': '40%', 'B3': '30%'}, {'C0': '50%'}]
def layer_combos(l, d):
return [i for a, b in l.items() for i in ([a]*int((d*(int(b[:-1])/float(100)))))]
def get_offsets(l, d, c = []):
if not d:
yield tuple(c)
else:
if l:
yield from get_offsets(l[1:], d-1, c+[l[0]])
if not c or c[-1] is not None:
for i in range(d - len(l)):
yield from get_offsets(l, d-(i+1), c+([None]*(i+1)))
def get_combos(l, d, c = []):
if not l:
if len((l:=[tuple(list(filter(None, i))) for i in zip(*c)])) == len(set(l)):
yield l
else:
for i in itertools.permutations((l1:=layer_combos(l[0], d)), (l2:=len(l1))):
for j in set(get_offsets(i, d)):
yield from get_combos(l[1:], d, c + [j])
result = get_combos(layers, 10)
for _ in range(10): #first ten combinations
print(next(result))
Output:
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
I wrote a program that generates some lists, something like
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5', 'b5', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2']
and I want to find the shortest list, the list that has the minimum number of elements
thanks,
You can use the min function:
min(data, key = len)
If you want to handle cases where there are multiple elements having the shortest length, you can sort the list in ascending order by length:
sorted(data, key = len)
You can sort it by list length then get the first element but this won't take into account lists that all have the same length.
smallest_list = sorted(list_of_list, key=len)[0]
Another would be get the length of the smallest list then use that as a filter
len_smallest_list = min(len(x) for x in list_of_list)
smallest_list = [list for list in list_of_list if len(list) == len_smallest_list]
Hello Stackoverlow members,
I'm trying to concatenate keys (string) on a hand, and values (list) on the other hand, of a dictionnary.
For your better understanding, here is what I have at the beginning:
dict = {'bk1':
{'k11': ['a1', 'b1', 'c1'],
'k12': ['a2', 'b2', 'c2']},
'bk2':
{'k21': ['d1', 'e1'],
'k22': ['d2', 'e2'],
'k23': ['d3', 'e3']},
'bk3':
{'k31': ['f1', 'g1', 'h1'],
'k32': ['f2', 'g2', 'h2']}
}
And here is what I would like at the end:
newdict = {'k11_k21_k31': ['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1', 'h1'],
'k11_k21_k32': ['a1', 'b1', 'c1', 'd1', 'e1', 'f2', 'g2', 'h2'],
'k11_k22_k31': ['a1', 'b1', 'c1', 'd2', 'e2', 'f1', 'g1', 'h1'],
'k11_k22_k32': ['a1', 'b1', 'c1', 'd2', 'e2', 'f2', 'g2', 'h2'],
'k11_k23_k31': ['a1', 'b1', 'c1', 'd3', 'e3', 'f1', 'g1', 'h1'],
'k11_k23_k32': ['a1', 'b1', 'c1', 'd3', 'e3', 'f2', 'g2', 'h2'],
'k12_k21_k31': ['a2', 'b2', 'c2', 'd1', 'e1', 'f1', 'g1', 'h1'],
'k12_k21_k32': ['a2', 'b2', 'c2', 'd1', 'e1', 'f2', 'g2', 'h2'],
'k12_k22_k31': ['a2', 'b2', 'c2', 'd2', 'e2', 'f1', 'g1', 'h1'],
'k12_k22_k32': ['a2', 'b2', 'c2', 'd2', 'e2', 'f2', 'g2', 'h2'],
'k12_k23_k31': ['a2', 'b2', 'c2', 'd3', 'e3', 'f1', 'g1', 'h1'],
'k12_k23_k32': ['a2', 'b2', 'c2', 'd3', 'e3', 'f2', 'g2', 'h2']}
I wish to do that with:
a variant number of "big key" (bki), and for each bki, a variant number of key (kij).
"Full combination" between "big keys". For example, I don't expect results like:
{'k11_k23': ['a1', 'b1', 'c1', 'd3', 'e3']}
where the "bk3" is missed.
I tried with imbricated "for" loops but the number of loops is depending on the number of "big keys"...
Then, I felt that the problem could be solved with recursion (maybe?), but in spite of my research and my will to implement it, I failed.
Any help with "recursive or not" solution would be strongly appreciated.
Thank you,
Mat
Whoaa, what a reactivity!
Thanks a lot for all your quick answers, it works perfect!
As suggested by #jksnw in the comments, you can use itertools.product to do this:
import itertools
dct = {
'bk1': {
'k11': ['a1', 'b1', 'c1'],
'k12': ['a2', 'b2', 'c2']
},
'bk2':{
'k21': ['d1', 'e1'],
'k22': ['d2', 'e2'],
'k23': ['d3', 'e3']
},
'bk3': {
'k31': ['f1', 'g1', 'h1'],
'k32': ['f2', 'g2', 'h2']
}
}
big_keys = dct.keys()
small_keys = (dct[big_key].keys() for big_key in big_keys)
res = {}
for keys_from_each in itertools.product(*small_keys):
key = "_".join(keys_from_each)
value = []
for big_key, small_key in zip(big_keys, keys_from_each):
value.extend(dct[big_key][small_key])
res[key] = value
So that:
>>> res
{'k11_k21_k31': ['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1', 'h1'],
'k11_k21_k32': ['a1', 'b1', 'c1', 'd1', 'e1', 'f2', 'g2', 'h2'],
'k11_k22_k31': ['a1', 'b1', 'c1', 'd2', 'e2', 'f1', 'g1', 'h1'],
'k11_k22_k32': ['a1', 'b1', 'c1', 'd2', 'e2', 'f2', 'g2', 'h2'],
'k11_k23_k31': ['a1', 'b1', 'c1', 'd3', 'e3', 'f1', 'g1', 'h1'],
'k11_k23_k32': ['a1', 'b1', 'c1', 'd3', 'e3', 'f2', 'g2', 'h2'],
'k12_k21_k31': ['a2', 'b2', 'c2', 'd1', 'e1', 'f1', 'g1', 'h1'],
'k12_k21_k32': ['a2', 'b2', 'c2', 'd1', 'e1', 'f2', 'g2', 'h2'],
'k12_k22_k31': ['a2', 'b2', 'c2', 'd2', 'e2', 'f1', 'g1', 'h1'],
'k12_k22_k32': ['a2', 'b2', 'c2', 'd2', 'e2', 'f2', 'g2', 'h2'],
'k12_k23_k31': ['a2', 'b2', 'c2', 'd3', 'e3', 'f1', 'g1', 'h1'],
'k12_k23_k32': ['a2', 'b2', 'c2', 'd3', 'e3', 'f2', 'g2', 'h2']}
Here, itertools.product is used to get a list of the "small keys" that we take from each block:
>>> big_keys = dct.keys()
>>> small_keys = (dct[big_key].keys() for big_key in big_keys)
>>> list(itertools.product(*small_keys))
[('k12', 'k22', 'k31'),
('k12', 'k22', 'k32'),
('k12', 'k23', 'k31'),
('k12', 'k23', 'k32'),
('k12', 'k21', 'k31'),
('k12', 'k21', 'k32'),
('k11', 'k22', 'k31'),
('k11', 'k22', 'k32'),
('k11', 'k23', 'k31'),
('k11', 'k23', 'k32'),
('k11', 'k21', 'k31'),
('k11', 'k21', 'k32')]
You can use itertools.product, and reduce(lambda x,y:x+y,i) to flatten your nested lists , also do not use dict or other python built-in types name or keywords as your variables name (i used d) :
>>> from itertools import product
>>> v=[i.values() for i in d.values()]
>>> v=[reduce(lambda x,y:x+y,i) for i in product(*v)]
>>> k=[i.keys() for i in d.values()]
>>> k=['_'.join(i) for i in product(*k)]
>>> {k:v for k,v in zip(k,v)}
{'k31_k12_k22': ['f1', 'g1', 'h1', 'a2', 'b2', 'c2', 'd2', 'e2'],
'k32_k12_k21': ['f2', 'g2', 'h2', 'a2', 'b2', 'c2', 'd1', 'e1'],
'k31_k11_k22': ['f1', 'g1', 'h1', 'a1', 'b1', 'c1', 'd2', 'e2'],
'k31_k12_k23': ['f1', 'g1', 'h1', 'a2', 'b2', 'c2', 'd3', 'e3'],
'k32_k12_k22': ['f2', 'g2', 'h2', 'a2', 'b2', 'c2', 'd2', 'e2'],
'k31_k12_k21': ['f1', 'g1', 'h1', 'a2', 'b2', 'c2', 'd1', 'e1'],
'k32_k11_k23': ['f2', 'g2', 'h2', 'a1', 'b1', 'c1', 'd3', 'e3'],
'k32_k12_k23': ['f2', 'g2', 'h2', 'a2', 'b2', 'c2', 'd3', 'e3'],
'k31_k11_k21': ['f1', 'g1', 'h1', 'a1', 'b1', 'c1', 'd1', 'e1'],
'k31_k11_k23': ['f1', 'g1', 'h1', 'a1', 'b1', 'c1', 'd3', 'e3'],
'k32_k11_k21': ['f2', 'g2', 'h2', 'a1', 'b1', 'c1', 'd1', 'e1'],
'k32_k11_k22': ['f2', 'g2', 'h2', 'a1', 'b1', 'c1', 'd2', 'e2']}