Related
I have a project where I need to find an algorithm that can solve the following problem:
Having three list of items :
A = [1,2,3,4,5]
B = [1,2,3,4,5]
C = [1,2,3,4,5]
With python I can find all unique combinations via this line of code:
allCombinations = list(set(product(A,B,C)))
But now i need to get from all of those combinations, the combinations that follow a pretty linear repartition.
for example, there are 125 unique combinations, and now I want 50 combinations where A1 B1 C1 appear less than A2 B2 C2 ... (if it can be almost linear, it will be perfect)
I have no idea how to solve this kind of problem, how can I select the best combinations that correspond to my thinking.
I can do it handly with 125 combinations, but for more it's too difficult.
Thanks
#Edit
I'll remake the example here.
A=[1,2]
B=[1,2]
C=[1,2]
the combinations from this list are
(1,1,1) (1,2,1) (1,2,2) (1,1,2) (2,1,1) (2,1,2) (2,2,1) (2,2,2)
If i need to select 3 combinations, i will choose (2,2,2) (1,2,2) (2,2,1) because i want to make 1 for A,B,C list fewer than 2 from A,B,C.
The goal is to produce rarity, A,B,C represents three list of items. Make the first item of the three list more rare than the second.
And i want to do it for a lot of items.
I think your problem is a little under-specified, so you have a choice to make as to how exactly you want to weight your combinations.
One possibility is to choose random combinations, but with a weight of i*j*k attributed to combination [A[i],B[j],C[k]]. So for instance, combination [A2,B2,C2] will be 8 times more likely to be chosen as combination [A1,B1,C1].
We can use random.sample to sample with weights: https://docs.python.org/3/library/random.html#random.sample
Python 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
sampled = random.sample(allCombinations, k, counts=weights)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
Note the use of enumerate to get the indices i,j,k that are needed to compute the weights. Then we don't forget to remove the indices in sampled_clean before returning the combinations. Also note the weights are computed as (i+1)*(j+1)*(k+1) rather than i*j*k, because everything is 0-indexed, not 1-indexed.
Note: the "counts" keyword argument of random.sample is new in python 3.9. Prior to version 3.9, it was necessary to manually duplicate elements in the population to simulate the weights.
Python < 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
weightedCombinations = [c for c,w in zip(allCombinations, weights) for _ in range(w)]
sampled = random.sample(weightedCombinations, k)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
# [('A3', 'B4', 'C2'), ('A4', 'B4', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B4', 'C4'), ('A3', 'B1', 'C4'), ('A4', 'B3', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B3', 'C4'), ('A2', 'B5', 'C3'), ('A5', 'B2', 'C2'), ('A5', 'B4', 'C3'), ('A4', 'B3', 'C1'), ('A3', 'B2', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B5', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B4', 'C5'), ('A3', 'B4', 'C5'), ('A5', 'B4', 'C2'), ('A2', 'B3', 'C1'), ('A2', 'B5', 'C2'), ('A3', 'B4', 'C4'), ('A4', 'B5', 'C1'), ('A3', 'B2', 'C2'), ('A4', 'B3', 'C5'), ('A2', 'B3', 'C3'), ('A3', 'B4', 'C1'), ('A5', 'B5', 'C4'), ('A3', 'B5', 'C5'), ('A3', 'B2', 'C5'), ('A5', 'B5', 'C3'), ('A5', 'B5', 'C3'), ('A3', 'B4', 'C4'), ('A4', 'B1', 'C1'), ('A3', 'B3', 'C4'), ('A4', 'B2', 'C5'), ('A5', 'B5', 'C5'), ('A4', 'B4', 'C3'), ('A1', 'B5', 'C3'), ('A4', 'B5', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B2', 'C2'), ('A5', 'B2', 'C5'), ('A4', 'B3', 'C5'), ('A4', 'B5', 'C1'), ('A4', 'B3', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B5', 'C3'), ('A5', 'B4', 'C5'), ('A3', 'B1', 'C4')]
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
# [(2, 2, 2), (2, 2, 2), (1, 1, 1)]
I need to write a script in Python to solve this task, but I can't figure out how to do it.
I have items (let's name them layers): A, B, C...
Each layer can have any number of variations.
For each variation, the proportion percent is given that we want to get at the output.
At the output, we have to get a given number of unique combinations of all layers according to the given proportions.
For example:
layers = [
{'A0':'30%', 'A1':'30%', 'A2':'40%'},
{'B0':'10%', 'B1': '20%', 'B2' '40%', 'B3':'30%'},
{'C0':'50%'}
]
If I want to get exact 10 unique combinations of the A, B, C layers variations,
the script should output the dataset like this:
[
('A0', 'B0'),
('A0', 'B1', 'C0'),
('A0', 'B1'),
('A1', 'B2', 'C0'),
('A1', 'B2'),
('A1', 'B3', 'C0'),
('A2', 'B2', 'C0'),
('A2', 'B2'),
('A2', 'B3', 'C0'),
('A2', 'B3')
]
So, the counts of each layer variation should align with the given proportions:
A0 = 3, A1 = 3, A2 = 4
B0 = 1, B1 = 2, B2 = 4, B3 = 3,
C0 = 5
If we want to get 20 variations the counts will be different:
A0 = 6, A1 = 6, A2 = 8
B0 = 2, B1 = 4, B2 = 8, B3 = 6,
C0 = 10
It should work for any number of layers, variations, proportions and get the exact count of the output combinations
(or the maximum of combinations, if there are no more combinations to get the exact number)
For every layer, you can find the distribution list and then recursively merge the results to produce the combinations. Due to the very high number of combinations that could result from get_combos, the latter is a generator, and you can use next to produce the values on-demand:
import itertools
layers = [{'A0': '30%', 'A1': '30%', 'A2': '40%'}, {'B0': '10%', 'B1': '20%', 'B2': '40%', 'B3': '30%'}, {'C0': '50%'}]
def layer_combos(l, d):
return [i for a, b in l.items() for i in ([a]*int((d*(int(b[:-1])/float(100)))))]
def get_offsets(l, d, c = []):
if not d:
yield tuple(c)
else:
if l:
yield from get_offsets(l[1:], d-1, c+[l[0]])
if not c or c[-1] is not None:
for i in range(d - len(l)):
yield from get_offsets(l, d-(i+1), c+([None]*(i+1)))
def get_combos(l, d, c = []):
if not l:
if len((l:=[tuple(list(filter(None, i))) for i in zip(*c)])) == len(set(l)):
yield l
else:
for i in itertools.permutations((l1:=layer_combos(l[0], d)), (l2:=len(l1))):
for j in set(get_offsets(i, d)):
yield from get_combos(l[1:], d, c + [j])
result = get_combos(layers, 10)
for _ in range(10): #first ten combinations
print(next(result))
Output:
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
I have two dataframes
df
Out[162]:
colA colB
L0 L1 L2
A1 B1 C1 1 2
C2 3 4
B2 C1 5 6
C2 7 8
A2 B3 C1 9 10
C2 11 12
B4 C1 13 14
C2 15 16
df1
Out[166]:
rate
from to
CHF CHF 1.000000
MXN 19.673256
ZAR 0.000000
XAU 0.000775
THB 32.961405
When I did
df.query('L0=="A1" & L2=="C1"')
Out[167]:
colA colB
L0 L1 L2
A1 B1 C1 1 2
B2 C1 5 6
Which give me back the expected out put .
Then I want to apply the same function in df1
df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"')
and
df1.query('from=="CHF" & to=="MXN"')
Both failed
What happened here ?
Data Input :
#df
{'colA': {('A1', 'B1', 'C1'): 1,
('A1', 'B1', 'C2'): 3,
('A1', 'B2', 'C1'): 5,
('A1', 'B2', 'C2'): 7,
('A2', 'B3', 'C1'): 9,
('A2', 'B3', 'C2'): 11,
('A2', 'B4', 'C1'): 13,
('A2', 'B4', 'C2'): 15},
'colB': {('A1', 'B1', 'C1'): 2,
('A1', 'B1', 'C2'): 4,
('A1', 'B2', 'C1'): 6,
('A1', 'B2', 'C2'): 8,
('A2', 'B3', 'C1'): 10,
('A2', 'B3', 'C2'): 12,
('A2', 'B4', 'C1'): 14,
('A2', 'B4', 'C2'): 16}}
#df1
{'rate': {('CHF', 'CHF'): 1.0,
('CHF', 'MXN'): 19.673256,
('CHF', 'THB'): 32.961405,
('CHF', 'XAU'): 0.000775,
('CHF', 'ZAR'): 0.0}}
Consider -
df1
rate
from to
CHF CHF 1.000000
MXN 19.673256
THB 32.961405
XAU 0.000775
ZAR 0.000000
First, the reason for df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') not working, is because your index already has a name. ilevel_* is the name assigned, when the index does not yet have a name. So, this command gives you an UndefinedVariableError.
Next, the reason for df1.query('from=="CHF" & to=="MXN"') not working, is that from is a keyword in python, and when pandas evals the expression, from == ... is considered invalid syntax. One workaround would be -
df1.rename_axis(['frm', 'to']).query("frm == 'CHF' and to == 'MXN'")
rate
frm to
CHF MXN 19.673256
Another would be getting rid of the axis names -
df1.rename_axis([None, None]).query("ilevel_0 == 'CHF' and ilevel_1 == 'MXN'")
rate
CHF MXN 19.673256
Keep in mind that query suffers from a host of limitations, mostly revolving around restrictions with variable names.
I need help in swapping 2 elements using Python for the following type of list that is generated randomly:
Actual list
list = [('a0', 'b5'), ('a0', 'b6'), ('a1', 'b0'), ('a1', 'b2'), ('a1', 'b3'), ('a1', 'b5'), ('a1', 'b6'), ('a2', 'b0'), ('a2', 'b2'), ('a2', 'b5'), ('a3', 'b4')]
After swapping element 'a1' with 'a2'
Array [('a0', 'b5'), ('a0', 'b6'), ('a2', 'b0'), ('a2', 'b2'), ('a2', 'b5'), ('a3', 'b4'), ('a1', 'b0'), ('a1', 'b2'), ('a1', 'b3'), ('a1', 'b5'), ('a1', 'b6')]
This is my code:
r1 = random.randrange(1, 5, 1)
r2 = random.randrange(4, 9, 2)
a = ['a' + str(j) for j in range(r1)]
b = ['b' + str(j) for j in range(r2)]
dd = []
total = math.floor((r1 * r2) * 80 / 100)
print("80% connection", total)
for x in a:
for y in b:
r3 = random.randrange(1, total, 2)
if (r3 < 10):
dd.append((x, y))
print("Connection", dd)
cop = [eb[0] for eb in dd]
s1 = random.randrange(len(a))
s2 = random.randrange(len(a))
print("Number to Swap", s1)
print("Range Number Two", s2)
for swp in range(len(dd)):
if swp ==s1:
for tes in range(len(a)):
if a[s1] == cop[swp]:
temp = dd[s1]
dd[s1] = dd[s2]
dd[s2] = temp
else:
for tes in range(len(a)):
if a[s2] == cop[swp]:
temp = dd[s1+1]
dd[s1+1] = dd[swp]
dd[swp] = temp
print("New Swap Array", dd)
This works with lists similar to your example, swaps elements only when the actual list contains tuples with 'a1' or 'a2' as the first element, could easily be modified to work with other and more elements.
new_dd = []
first_els = []
second_els = []
end_els = []
for i in dd:
if int(i[0][1]) < 1:
new_dd.append(i)
elif int(i[0][1]) > 2:
end_els.append(i)
elif int(i[0][1]) == 1:
first_els.append(i)
elif int(i[0][1]) == 2:
second_els.append(i)
new_dd.extend(second_els)
new_dd.extend(first_els)
new_dd.extend(end_els)
print(dd)
print(new_dd)
Output:
[('a0', 'b1'), ('a0', 'b2'), ('a0', 'b3'), ('a1', 'b0'), ('a1', 'b1'), ('a1', 'b2'), ('a1', 'b3'), ('a2', 'b0'), ('a2', 'b1'), ('a2', 'b2'), ('a2', 'b3'), ('a3', 'b0'), ('a3', 'b2'), ('a3', 'b3')]
[('a0', 'b1'), ('a0', 'b2'), ('a0', 'b3'), ('a2', 'b0'), ('a2', 'b1'), ('a2', 'b2'), ('a2', 'b3'), ('a1', 'b0'), ('a1', 'b1'), ('a1', 'b2'), ('a1', 'b3'), ('a3', 'b0'), ('a3', 'b2'), ('a3', 'b3')]
We have some departures which can be assigned to different arrivals, just like this:
Dep1.arrivals = [A1, A2]
Dep2.arrivals = [A2, A3, A4]
Dep3.arrivals = [A3, A5]
The output of this function should be a list containing every possible combination of arrivals:
Output: [[A1, A2, A3], [A1, A2, A5], [A1, A3, A5], [A1, A4, A5], ...]
Notice that [A1, A3, A3] isn't contained in the list because you can not use an arrival twice. Also notice that [A1, A2, A3] is the same element as [A3, A1, A2] or [A3, A2, A1].
EDIT:
Many solutions given works in this case but not as a general solution, for instance if the 3 sets or arrivals are equal:
Dep1.arrivals = [A1, A2, A3]
Dep2.arrivals = [A1, A2, A3]
Dep3.arrivals = [A1, A2, A3]
Then it returns:
('A1', 'A2', 'A3')
('A1', 'A3', 'A2')
('A2', 'A1', 'A3')
('A2', 'A3', 'A1')
('A3', 'A1', 'A2')
('A3', 'A2', 'A1')
Which is wrong since ('A1', 'A2', 'A3') and ('A3', 'A2', 'A1') are the same solution.
Thank you anyway!
You can do this using a list comprehension with itertools.product:
>>> import itertools
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> print [x for x in itertools.product(*lol) if len(set(x)) == len(lol)]
Result
[('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A3'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A3'),
('A2', 'A4', 'A5')]
Note that this is notionally equivalent to the code that #Kevin has given.
Edit: As OP mentions in his edits, this solution doesn't work with when order of combination is different.
To resolve that, the last statement can be altered to the following, where we first obtain a list of sorted tuple of arrivals, and then convert convert the list to a set, as below:
>>> lol = [["A1", "A2", "A3"], ["A1", "A2", "A3"], ["A1", "A2", "A3"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3')}
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A4'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A4'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A5')}
You could use product to generate all possible combinations of the departures, and then filter out combinations containing duplicates after the fact:
import itertools
arrivals = [
["A1", "A2"],
["A2", "A3", "A4"],
["A3", "A5"]
]
for items in itertools.product(*arrivals):
if len(set(items)) < len(arrivals): continue
print items
Result:
('A1', 'A2', 'A3')
('A1', 'A2', 'A5')
('A1', 'A3', 'A5')
('A1', 'A4', 'A3')
('A1', 'A4', 'A5')
('A2', 'A3', 'A5')
('A2', 'A4', 'A3')
('A2', 'A4', 'A5')
The question is tagged with itertools but i suspect you did not look at itertools.combinations
arrivals = ['A1', 'A2', 'A3', 'A4']
[a for a in itertools.combinations(arrivals, 3)]
#[('A1', 'A2', 'A3'),
#('A1', 'A2', 'A4'),
# ('A1', 'A3', 'A4'),
#('A2', 'A3', 'A4')]