Trouble calling pair of index values in pandas Dataframe - python

I have a list of column names such that:
for c in collist:
print(c)
returns
E1
E2
E3
E4
C1
C2
C3
C4
G1
I would like to loop through every 2 combinations of this list + the list above.
import itertools as itertools
for i in itertools.combinations(collist, 2):
collist.append(i)
print(collist)
['E1', 'E2', 'E3', 'E4', 'C1', 'C2', 'C3', 'C4', 'G1', ('E1', 'E2'), ('E1', 'E3'), ('E1', 'E4'), ('E1', 'C1'), ('E1', 'C2'), ('E1', 'C3'), ('E1', 'C4'), ('E1', 'G1'), ('E2', 'E3'), ('E2', 'E4'), ('E2', 'C1'), ('E2', 'C2'), ('E2', 'C3'), ('E2', 'C4'), ('E2', 'G1'), ('E3', 'E4'), ('E3', 'C1'), ('E3', 'C2'), ('E3', 'C3'), ('E3', 'C4'), ('E3', 'G1'), ('E4', 'C1'), ('E4', 'C2'), ('E4', 'C3'), ('E4', 'C4'), ('E4', 'G1'), ('C1', 'C2'), ('C1', 'C3'), ('C1', 'C4'), ('C1', 'G1'), ('C2', 'C3'), ('C2', 'C4'), ('C2', 'G1'), ('C3', 'C4'), ('C3', 'G1'), ('C4', 'G1')]
The problem is when I return to my data matrix and call collist, it doesn't work::
for col in collist:
print(data[[col]])
KeyError: "None of [Index([('E1', 'E2')], dtype='object')] are in the [columns]"
The problem I believe is the loop is looking for ('E1','E2') together, bur that returns nothing.
However when I try this standalone, it works:
print(data[['E1','E2']])
I think I need to make some adjustment to collist to make the index values read from the dataframe data. Any idea how?

This would be another way to do it:
collist_2 = []
for i in range(len(collist)):
try:
collist_2.append([collist[i],collist[i+1]]) #Here we add the value and the following one.
except IndexError: #This is to avoid raising an error when the index is out of range (the [i+1] for the final value of i)
pass
final_list = collist + collist_2 #Concatenate both lists, the one with single values and the one with 2 values
Now you can print as you normally would:
for col in final_list:
print(df[col])

You need to have your columns be in lists. A tuple (A, B) won't look for columns A and B, it'll look for a column named (A, B).
df = pd.DataFrame(np.random.randint(10, size=(5,10)))
cols = [list(x) for i in range (1,3) for x in itertools.combinations(df.columns, i)]
for col in cols:
print(df[col])

Related

Combinations with constraints find a repartitions that goes from low to high

I have a project where I need to find an algorithm that can solve the following problem:
Having three list of items :
A = [1,2,3,4,5]
B = [1,2,3,4,5]
C = [1,2,3,4,5]
With python I can find all unique combinations via this line of code:
allCombinations = list(set(product(A,B,C)))
But now i need to get from all of those combinations, the combinations that follow a pretty linear repartition.
for example, there are 125 unique combinations, and now I want 50 combinations where A1 B1 C1 appear less than A2 B2 C2 ... (if it can be almost linear, it will be perfect)
I have no idea how to solve this kind of problem, how can I select the best combinations that correspond to my thinking.
I can do it handly with 125 combinations, but for more it's too difficult.
Thanks
#Edit
I'll remake the example here.
A=[1,2]
B=[1,2]
C=[1,2]
the combinations from this list are
(1,1,1) (1,2,1) (1,2,2) (1,1,2) (2,1,1) (2,1,2) (2,2,1) (2,2,2)
If i need to select 3 combinations, i will choose (2,2,2) (1,2,2) (2,2,1) because i want to make 1 for A,B,C list fewer than 2 from A,B,C.
The goal is to produce rarity, A,B,C represents three list of items. Make the first item of the three list more rare than the second.
And i want to do it for a lot of items.
I think your problem is a little under-specified, so you have a choice to make as to how exactly you want to weight your combinations.
One possibility is to choose random combinations, but with a weight of i*j*k attributed to combination [A[i],B[j],C[k]]. So for instance, combination [A2,B2,C2] will be 8 times more likely to be chosen as combination [A1,B1,C1].
We can use random.sample to sample with weights: https://docs.python.org/3/library/random.html#random.sample
Python 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
sampled = random.sample(allCombinations, k, counts=weights)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
Note the use of enumerate to get the indices i,j,k that are needed to compute the weights. Then we don't forget to remove the indices in sampled_clean before returning the combinations. Also note the weights are computed as (i+1)*(j+1)*(k+1) rather than i*j*k, because everything is 0-indexed, not 1-indexed.
Note: the "counts" keyword argument of random.sample is new in python 3.9. Prior to version 3.9, it was necessary to manually duplicate elements in the population to simulate the weights.
Python < 3.9:
import itertools # product
import random # sample
def sampleCombinations(A, B, C, k):
allCombinations = list(itertools.product(enumerate(A), enumerate(B), enumerate(C)))
weights = [(i+1) * (j+1) * (k+1) for (i,_), (j,_), (k,_) in allCombinations]
weightedCombinations = [c for c,w in zip(allCombinations, weights) for _ in range(w)]
sampled = random.sample(weightedCombinations, k)
sampled_clean = [(x,y,z) for (_,x), (_,y), (_,z) in sampled]
return sampled_clean
print(sampleCombinations(['A1','A2','A3','A4','A5'], ['B1','B2','B3','B4','B5'], ['C1','C2','C3','C4','C5'], 50))
# [('A3', 'B4', 'C2'), ('A4', 'B4', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B4', 'C4'), ('A3', 'B1', 'C4'), ('A4', 'B3', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B3', 'C4'), ('A2', 'B5', 'C3'), ('A5', 'B2', 'C2'), ('A5', 'B4', 'C3'), ('A4', 'B3', 'C1'), ('A3', 'B2', 'C5'), ('A2', 'B5', 'C5'), ('A4', 'B5', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B4', 'C5'), ('A3', 'B4', 'C5'), ('A5', 'B4', 'C2'), ('A2', 'B3', 'C1'), ('A2', 'B5', 'C2'), ('A3', 'B4', 'C4'), ('A4', 'B5', 'C1'), ('A3', 'B2', 'C2'), ('A4', 'B3', 'C5'), ('A2', 'B3', 'C3'), ('A3', 'B4', 'C1'), ('A5', 'B5', 'C4'), ('A3', 'B5', 'C5'), ('A3', 'B2', 'C5'), ('A5', 'B5', 'C3'), ('A5', 'B5', 'C3'), ('A3', 'B4', 'C4'), ('A4', 'B1', 'C1'), ('A3', 'B3', 'C4'), ('A4', 'B2', 'C5'), ('A5', 'B5', 'C5'), ('A4', 'B4', 'C3'), ('A1', 'B5', 'C3'), ('A4', 'B5', 'C3'), ('A4', 'B4', 'C2'), ('A5', 'B2', 'C2'), ('A5', 'B2', 'C5'), ('A4', 'B3', 'C5'), ('A4', 'B5', 'C1'), ('A4', 'B3', 'C5'), ('A5', 'B5', 'C5'), ('A3', 'B5', 'C3'), ('A5', 'B4', 'C5'), ('A3', 'B1', 'C4')]
print(sampleCombinations([1, 2], [1, 2], [1, 2], 3))
# [(2, 2, 2), (2, 2, 2), (1, 1, 1)]

How to get a given number of unique combinations of layers variations, while maintaining a given proportion of each layer variant using Python?

I need to write a script in Python to solve this task, but I can't figure out how to do it.
I have items (let's name them layers): A, B, C...
Each layer can have any number of variations.
For each variation, the proportion percent is given that we want to get at the output.
At the output, we have to get a given number of unique combinations of all layers according to the given proportions.
For example:
layers = [
{'A0':'30%', 'A1':'30%', 'A2':'40%'},
{'B0':'10%', 'B1': '20%', 'B2' '40%', 'B3':'30%'},
{'C0':'50%'}
]
If I want to get exact 10 unique combinations of the A, B, C layers variations,
the script should output the dataset like this:
[
('A0', 'B0'),
('A0', 'B1', 'C0'),
('A0', 'B1'),
('A1', 'B2', 'C0'),
('A1', 'B2'),
('A1', 'B3', 'C0'),
('A2', 'B2', 'C0'),
('A2', 'B2'),
('A2', 'B3', 'C0'),
('A2', 'B3')
]
So, the counts of each layer variation should align with the given proportions:
A0 = 3, A1 = 3, A2 = 4
B0 = 1, B1 = 2, B2 = 4, B3 = 3,
C0 = 5
If we want to get 20 variations the counts will be different:
A0 = 6, A1 = 6, A2 = 8
B0 = 2, B1 = 4, B2 = 8, B3 = 6,
C0 = 10
It should work for any number of layers, variations, proportions and get the exact count of the output combinations
(or the maximum of combinations, if there are no more combinations to get the exact number)
For every layer, you can find the distribution list and then recursively merge the results to produce the combinations. Due to the very high number of combinations that could result from get_combos, the latter is a generator, and you can use next to produce the values on-demand:
import itertools
layers = [{'A0': '30%', 'A1': '30%', 'A2': '40%'}, {'B0': '10%', 'B1': '20%', 'B2': '40%', 'B3': '30%'}, {'C0': '50%'}]
def layer_combos(l, d):
return [i for a, b in l.items() for i in ([a]*int((d*(int(b[:-1])/float(100)))))]
def get_offsets(l, d, c = []):
if not d:
yield tuple(c)
else:
if l:
yield from get_offsets(l[1:], d-1, c+[l[0]])
if not c or c[-1] is not None:
for i in range(d - len(l)):
yield from get_offsets(l, d-(i+1), c+([None]*(i+1)))
def get_combos(l, d, c = []):
if not l:
if len((l:=[tuple(list(filter(None, i))) for i in zip(*c)])) == len(set(l)):
yield l
else:
for i in itertools.permutations((l1:=layer_combos(l[0], d)), (l2:=len(l1))):
for j in set(get_offsets(i, d)):
yield from get_combos(l[1:], d, c + [j])
result = get_combos(layers, 10)
for _ in range(10): #first ten combinations
print(next(result))
Output:
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]

How to find different combinations between two list?

There are two lists.
list_1=[a1,b1,c1,d1]
list_2=[a2,b2,c2,d2]
Conditions are (i) there must be four elements in each of the combinations and (ii) combinations should contain one element of a (i.e. either a1 or a2),one element of b (i.e. either b1 or b2),one element of c (i.e. either c1 or c2) and one element of d (i.e. either d1 or d2).
Please help me to find out different combinations by using python 3x.
You can use itertools.product:
from itertools import product
list_1 = ['a1','b1','c1','d1']
list_2 = ['a2','b2','c2','d2']
result = list(product(*zip(list_1, list_2)))
print(result)
[('a1', 'b1', 'c1', 'd1'), ('a1', 'b1', 'c1', 'd2'), ('a1', 'b1', 'c2', 'd1'), ('a1', 'b1', 'c2', 'd2'), ('a1', 'b2', 'c1', 'd1'), ('a1', 'b2', 'c1', 'd2'), ('a1', 'b2', 'c2', 'd1'), ('a1', 'b2', 'c2', 'd2'), ('a2', 'b1', 'c1', 'd1'), ('a2', 'b1', 'c1', 'd2'), ('a2', 'b1', 'c2', 'd1'), ('a2', 'b1', 'c2', 'd2'), ('a2', 'b2', 'c1', 'd1'), ('a2', 'b2', 'c1', 'd2'), ('a2', 'b2', 'c2', 'd1'), ('a2', 'b2', 'c2', 'd2')]

Swapping two dimensional array using python

I need help in swapping 2 elements using Python for the following type of list that is generated randomly:
Actual list
list = [('a0', 'b5'), ('a0', 'b6'), ('a1', 'b0'), ('a1', 'b2'), ('a1', 'b3'), ('a1', 'b5'), ('a1', 'b6'), ('a2', 'b0'), ('a2', 'b2'), ('a2', 'b5'), ('a3', 'b4')]
After swapping element 'a1' with 'a2'
Array [('a0', 'b5'), ('a0', 'b6'), ('a2', 'b0'), ('a2', 'b2'), ('a2', 'b5'), ('a3', 'b4'), ('a1', 'b0'), ('a1', 'b2'), ('a1', 'b3'), ('a1', 'b5'), ('a1', 'b6')]
This is my code:
r1 = random.randrange(1, 5, 1)
r2 = random.randrange(4, 9, 2)
a = ['a' + str(j) for j in range(r1)]
b = ['b' + str(j) for j in range(r2)]
dd = []
total = math.floor((r1 * r2) * 80 / 100)
print("80% connection", total)
for x in a:
for y in b:
r3 = random.randrange(1, total, 2)
if (r3 < 10):
dd.append((x, y))
print("Connection", dd)
cop = [eb[0] for eb in dd]
s1 = random.randrange(len(a))
s2 = random.randrange(len(a))
print("Number to Swap", s1)
print("Range Number Two", s2)
for swp in range(len(dd)):
if swp ==s1:
for tes in range(len(a)):
if a[s1] == cop[swp]:
temp = dd[s1]
dd[s1] = dd[s2]
dd[s2] = temp
else:
for tes in range(len(a)):
if a[s2] == cop[swp]:
temp = dd[s1+1]
dd[s1+1] = dd[swp]
dd[swp] = temp
print("New Swap Array", dd)
This works with lists similar to your example, swaps elements only when the actual list contains tuples with 'a1' or 'a2' as the first element, could easily be modified to work with other and more elements.
new_dd = []
first_els = []
second_els = []
end_els = []
for i in dd:
if int(i[0][1]) < 1:
new_dd.append(i)
elif int(i[0][1]) > 2:
end_els.append(i)
elif int(i[0][1]) == 1:
first_els.append(i)
elif int(i[0][1]) == 2:
second_els.append(i)
new_dd.extend(second_els)
new_dd.extend(first_els)
new_dd.extend(end_els)
print(dd)
print(new_dd)
Output:
[('a0', 'b1'), ('a0', 'b2'), ('a0', 'b3'), ('a1', 'b0'), ('a1', 'b1'), ('a1', 'b2'), ('a1', 'b3'), ('a2', 'b0'), ('a2', 'b1'), ('a2', 'b2'), ('a2', 'b3'), ('a3', 'b0'), ('a3', 'b2'), ('a3', 'b3')]
[('a0', 'b1'), ('a0', 'b2'), ('a0', 'b3'), ('a2', 'b0'), ('a2', 'b1'), ('a2', 'b2'), ('a2', 'b3'), ('a1', 'b0'), ('a1', 'b1'), ('a1', 'b2'), ('a1', 'b3'), ('a3', 'b0'), ('a3', 'b2'), ('a3', 'b3')]

Mixing lists in python

from itertools import product
x_coord = ['a','b','c','d','e']
y_coord = ['1', '2', '3', '4', '5']
board = []
index = 0
for item in itertools.product(x_coord, y_coord):
board += item
for elements in board:
board[index] = board[index] + board[index +1]
board.remove(board[index +1])
index += 1
print board
Hello. Let me explain what I want to do with that:
I have two lists(x_coord and y_coord) and I want to mix them like that:
board = ['a1', 'a2', ..., 'e1', 'e2', ...]
But I get the IndexError: list index out of range error instead of that.
How should I proceed?
OBS.:If there's any type of error in my english, please tell me. I'm learning english as well as code.
You can try like that,
>>> x_coord = ['a','b','c','d','e']
>>> y_coord = ['1', '2', '3', '4', '5']
>>> [item + item2 for item2 in y_coord for item in x_coord]
['a1', 'b1', 'c1', 'd1', 'e1', 'a2', 'b2', 'c2', 'd2', 'e2', 'a3', 'b3', 'c3', 'd3', 'e3', 'a4', 'b4', 'c4', 'd4', 'e4', 'a5', 'b5', 'c5', 'd5', 'e5']
Sorted results:
>>> sorted([item + item2 for item2 in y_coord for item in x_coord])
['a1', 'a2', 'a3', 'a4', 'a5', 'b1', 'b2', 'b3', 'b4', 'b5', 'c1', 'c2', 'c3', 'c4', 'c5', 'd1', 'd2', 'd3', 'd4', 'd5', 'e1', 'e2', 'e3', 'e4', 'e5']
combined = map(lambda x: ''.join(x), product(x_coord, y_coord))
coord = map(lambda x,y: x+y, x_coord, y_coord)
print(coord)
t = [a + b for a,b in itertools.product(x_coord,y_coord)]
print t % prints what you want
Normally itertools.product(x_coord,y_coord) will print the following:
[('a', '1'), ('a', '2'), ('a', '3'), ('a', '4'), ('a', '5'), ('b', '1'), ('b', '2'), ('b', '3'), ('b', '4'), ('b', '5'), ('c', '1'), ('c', '2'), ('c', '3'), ('c', '4'), ('c', '5'), ('d', '1'), ('d', '2'), ('d', '3'), ('d', '4'), ('d', '5'), ('e', '1'), ('e', '2'), ('e', '3'), ('e', '4'), ('e', '5')]
As you can see it's already in order because itertools.product will multiply a by each index in y_coord before moving x_coord to b, etc. etc.
By using list comprehension we can combine the two indices using a+b for each pair in the output, resulting in this:
['a1', 'a2', 'a3', 'a4', 'a5', 'b1', 'b2', 'b3', 'b4', 'b5', 'c1', 'c2', 'c3', 'c4', 'c5', 'd1', 'd2', 'd3', 'd4', 'd5', 'e1', 'e2', 'e3', 'e4', 'e5']
x_coord = ['a','b','c','d','e']
y_coord = ['1', '2', '3', '4', '5']
a=[]
for i in range(len(x_coord)):
for j in range(len(y_coord)):
a.append(x_coord[i].join(" "+ y_coord[j]))
b=[]
for item in a:
b.append(item.replace(" ",""))
print b

Categories

Resources