Related
I need to write a script in Python to solve this task, but I can't figure out how to do it.
I have items (let's name them layers): A, B, C...
Each layer can have any number of variations.
For each variation, the proportion percent is given that we want to get at the output.
At the output, we have to get a given number of unique combinations of all layers according to the given proportions.
For example:
layers = [
{'A0':'30%', 'A1':'30%', 'A2':'40%'},
{'B0':'10%', 'B1': '20%', 'B2' '40%', 'B3':'30%'},
{'C0':'50%'}
]
If I want to get exact 10 unique combinations of the A, B, C layers variations,
the script should output the dataset like this:
[
('A0', 'B0'),
('A0', 'B1', 'C0'),
('A0', 'B1'),
('A1', 'B2', 'C0'),
('A1', 'B2'),
('A1', 'B3', 'C0'),
('A2', 'B2', 'C0'),
('A2', 'B2'),
('A2', 'B3', 'C0'),
('A2', 'B3')
]
So, the counts of each layer variation should align with the given proportions:
A0 = 3, A1 = 3, A2 = 4
B0 = 1, B1 = 2, B2 = 4, B3 = 3,
C0 = 5
If we want to get 20 variations the counts will be different:
A0 = 6, A1 = 6, A2 = 8
B0 = 2, B1 = 4, B2 = 8, B3 = 6,
C0 = 10
It should work for any number of layers, variations, proportions and get the exact count of the output combinations
(or the maximum of combinations, if there are no more combinations to get the exact number)
For every layer, you can find the distribution list and then recursively merge the results to produce the combinations. Due to the very high number of combinations that could result from get_combos, the latter is a generator, and you can use next to produce the values on-demand:
import itertools
layers = [{'A0': '30%', 'A1': '30%', 'A2': '40%'}, {'B0': '10%', 'B1': '20%', 'B2': '40%', 'B3': '30%'}, {'C0': '50%'}]
def layer_combos(l, d):
return [i for a, b in l.items() for i in ([a]*int((d*(int(b[:-1])/float(100)))))]
def get_offsets(l, d, c = []):
if not d:
yield tuple(c)
else:
if l:
yield from get_offsets(l[1:], d-1, c+[l[0]])
if not c or c[-1] is not None:
for i in range(d - len(l)):
yield from get_offsets(l, d-(i+1), c+([None]*(i+1)))
def get_combos(l, d, c = []):
if not l:
if len((l:=[tuple(list(filter(None, i))) for i in zip(*c)])) == len(set(l)):
yield l
else:
for i in itertools.permutations((l1:=layer_combos(l[0], d)), (l2:=len(l1))):
for j in set(get_offsets(i, d)):
yield from get_combos(l[1:], d, c + [j])
result = get_combos(layers, 10)
for _ in range(10): #first ten combinations
print(next(result))
Output:
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
Having a dataframe which looks like this:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
I wonder how to rearange the dataframe when having a different order in one column that one wants to apply to all the others, for example having changed the A column in this example?
df2 = pd.DataFrame({'A': ['A3', 'A0', 'A2', 'A1'],
'B': ['B3', 'B0', 'B2', 'B1'],
'C': ['C3', 'C0', 'C2', 'C1'],
'D': ['D3', 'D0', 'D2', 'D1']},
index=[0, 1, 2, 3])
You can use indexing via set_index, reindex and reset_index. Assumes your values in A are unique, which is the only case where such a transformation would make sense.
L = ['A3', 'A0', 'A2', 'A1']
res = df1.set_index('A').reindex(L).reset_index()
print(res)
A B C D
0 A3 B3 C3 D3
1 A0 B0 C0 D0
2 A2 B2 C2 D2
3 A1 B1 C1 D1
did you mean to sort 1 specific row? if so, use:
df1.iloc[:1] = df1.iloc[:1].sort_index(axis=1,ascending=False)
print(df1)
for all columns use:
df1 = df1.sort_index(axis=0,ascending=False)
for specific columns use the iloc function.
You can use the key parameter from the sorted function:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
key = {'A3': 0, 'A0': 1, 'A2' : 2, 'A1': 3}
df1['A'] = sorted(df1.A, key=lambda e: key.get(e, 4))
print(df1)
Output
A B C D
0 A3 B0 C0 D0
1 A0 B1 C1 D1
2 A2 B2 C2 D2
3 A1 B3 C3 D3
By changing the values of key, you can set whatever order you want.
UPDATE
If want you want is to alter the order of the other columns based on the new order of A, you could try something like this:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A3', 'A0', 'A2', 'A1'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
key = [df1.A.values.tolist().index(k) for k in df2.A]
df2.B = df2['B'][key].tolist()
print(df2)
Output
A B C D
0 A3 B3 C0 D0
1 A0 B0 C1 D1
2 A2 B2 C2 D2
3 A1 B1 C3 D3
To alter all the columns just apply the above for each column. Somthing like this:
for column in df2.columns.values:
if column != 'A':
df2[column] = df2[column][key].tolist()
print(df2)
Output
A B C D
0 A3 B3 C3 D3
1 A0 B0 C0 D0
2 A2 B2 C2 D2
3 A1 B1 C1 D1
I'd like to insert rows of a specific dataframe one time in two rows in another specific dataframe. At the end, I'd like to do this for several columns of df1 and df2 (not only D and E).
I've got two different dataframes:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'E': ['E0', 'E1', 'E2', 'E3']},
index=[0, 1, 2, 3])
And I'd like to merge them like
df3 = pd.DataFrame({'A': ['A0', 'A0', 'A1', 'A1', 'A2', 'A2', 'A3', 'A3'],
'B': ['B0', 'B0', 'B1', 'B1', 'B2', 'B2', 'B3', 'B3'],
'C': ['C0', 'C0', 'C1', 'C1', 'C2', 'C2', 'C3', 'C3'],
'D': ['D0', 'E0', 'D1', 'E1', 'D2', 'E2', 'D3', 'E3']},
index=[0, 1, 2, 3, 4, 5, 6, 7])
1) Using pd.concat and sort_index
In [1006]: (pd.concat([df1, df2.rename(columns={'E': 'D'})])
.sort_index().reset_index(drop=True))
Out[1006]:
A B C D
0 A0 B0 C0 D0
1 A0 B0 C0 E0
2 A1 B1 C1 D1
3 A1 B1 C1 E1
4 A2 B2 C2 D2
5 A2 B2 C2 E2
6 A3 B3 C3 D3
7 A3 B3 C3 E3
2) Or, Using append and sort_index
In [1007]: df1.append(df2.rename(columns={'E': 'D'})).sort_index().reset_index(drop=True)
Out[1007]:
A B C D
0 A0 B0 C0 D0
1 A0 B0 C0 E0
2 A1 B1 C1 D1
3 A1 B1 C1 E1
4 A2 B2 C2 D2
5 A2 B2 C2 E2
6 A3 B3 C3 D3
7 A3 B3 C3 E3
Test
In [1009]: (pd.concat([df1, df2.rename(columns={'E': 'D'})])
.sort_index().reset_index(drop=True)
.equals(df3))
Out[1009]: True
In [1010]: pd.concat([df1, df2.rename(columns={'E': 'D'})]).equals(df3)
Out[1010]: False
The concat function lets you combine multiple DataFrames:
frames = [df1, df2.rename(columns={'E': 'D'})]
pd.concat(frames)
You can additional DataFrames to the list, but you will have to rename columns to have have merge correctly.
The codes below throw an exception, ValueError: Shape of passed values is (7, 4), indices imply (7, 2).
df4 = pd.DataFrame({'E': ['B2', 'B3', 'B6', 'B7'],
'F': ['D2', 'D3', 'D6', 'D7'],
'G': ['F2', 'F3', 'F6', 'F7']},
index=[2, 2, 6, 7])
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']},
index=[0, 1, 2])
result00 = pd.concat([df1, df4], axis=1,join='inner')
I am confused about the error. How to merge the two dataframe?
The result of merging i want is like below
you can use merge() method:
In [122]: pd.merge(df1, df4, left_index=True, right_index=True)
Out[122]:
A B C D E F G
2 A2 B2 C2 D2 B2 D2 F2
2 A2 B2 C2 D2 B3 D3 F3
you can use the pd.concat in the following form:
result00 = pd.concat([df1, df4], axis=1, join_axes = [df4.index], join = 'inner').dropna()
The earlier code did not work since there was a duplicate index in df2. Hope this helps
I have been puzzling for hours how to make a program for this problem. I searched for similar solutions but I had no success.
There are 6 sets of 2 values a [a1,a2] ; b [b1, b2] ; ... f [f1, f2].
Every combination needs to have at least one value from every set, but it can have also both. Therefore, there are 64 combinations.
What I need is to count all those combinations, and print something like this:
Combination 1: a1, b1, c1, d1, e1, f1 Sum: (sum of those listed)
Combination 2: ...
Total sum:
>>> from itertools import product
>>> for item in product(['a1', 'a2'], ['b1', 'b2'], ['c1', 'c2']):
... print item
...
('a1', 'b1', 'c1')
('a1', 'b1', 'c2')
('a1', 'b2', 'c1')
('a1', 'b2', 'c2')
('a2', 'b1', 'c1')
('a2', 'b1', 'c2')
('a2', 'b2', 'c1')
('a2', 'b2', 'c2')
It looks like your a1, a2 etc are numeric. That's fine too
>>> from itertools import product
>>> for item in product([1, 2], [3, 4], [5, 6]):
... print item, sum(item)
...
(1, 3, 5) 9
(1, 3, 6) 10
(1, 4, 5) 10
(1, 4, 6) 11
(2, 3, 5) 10
(2, 3, 6) 11
(2, 4, 5) 11
(2, 4, 6) 12