Using df.query on MultiIndex gives UndefinedVariableError - python

I have two dataframes
df
Out[162]:
colA colB
L0 L1 L2
A1 B1 C1 1 2
C2 3 4
B2 C1 5 6
C2 7 8
A2 B3 C1 9 10
C2 11 12
B4 C1 13 14
C2 15 16
df1
Out[166]:
rate
from to
CHF CHF 1.000000
MXN 19.673256
ZAR 0.000000
XAU 0.000775
THB 32.961405
When I did
df.query('L0=="A1" & L2=="C1"')
Out[167]:
colA colB
L0 L1 L2
A1 B1 C1 1 2
B2 C1 5 6
Which give me back the expected out put .
Then I want to apply the same function in df1
df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"')
and
df1.query('from=="CHF" & to=="MXN"')
Both failed
What happened here ?
Data Input :
#df
{'colA': {('A1', 'B1', 'C1'): 1,
('A1', 'B1', 'C2'): 3,
('A1', 'B2', 'C1'): 5,
('A1', 'B2', 'C2'): 7,
('A2', 'B3', 'C1'): 9,
('A2', 'B3', 'C2'): 11,
('A2', 'B4', 'C1'): 13,
('A2', 'B4', 'C2'): 15},
'colB': {('A1', 'B1', 'C1'): 2,
('A1', 'B1', 'C2'): 4,
('A1', 'B2', 'C1'): 6,
('A1', 'B2', 'C2'): 8,
('A2', 'B3', 'C1'): 10,
('A2', 'B3', 'C2'): 12,
('A2', 'B4', 'C1'): 14,
('A2', 'B4', 'C2'): 16}}
#df1
{'rate': {('CHF', 'CHF'): 1.0,
('CHF', 'MXN'): 19.673256,
('CHF', 'THB'): 32.961405,
('CHF', 'XAU'): 0.000775,
('CHF', 'ZAR'): 0.0}}

Consider -
df1
rate
from to
CHF CHF 1.000000
MXN 19.673256
THB 32.961405
XAU 0.000775
ZAR 0.000000
First, the reason for df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') not working, is because your index already has a name. ilevel_* is the name assigned, when the index does not yet have a name. So, this command gives you an UndefinedVariableError.
Next, the reason for df1.query('from=="CHF" & to=="MXN"') not working, is that from is a keyword in python, and when pandas evals the expression, from == ... is considered invalid syntax. One workaround would be -
df1.rename_axis(['frm', 'to']).query("frm == 'CHF' and to == 'MXN'")
rate
frm to
CHF MXN 19.673256
Another would be getting rid of the axis names -
df1.rename_axis([None, None]).query("ilevel_0 == 'CHF' and ilevel_1 == 'MXN'")
rate
CHF MXN 19.673256
Keep in mind that query suffers from a host of limitations, mostly revolving around restrictions with variable names.

Related

How to get a given number of unique combinations of layers variations, while maintaining a given proportion of each layer variant using Python?

I need to write a script in Python to solve this task, but I can't figure out how to do it.
I have items (let's name them layers): A, B, C...
Each layer can have any number of variations.
For each variation, the proportion percent is given that we want to get at the output.
At the output, we have to get a given number of unique combinations of all layers according to the given proportions.
For example:
layers = [
{'A0':'30%', 'A1':'30%', 'A2':'40%'},
{'B0':'10%', 'B1': '20%', 'B2' '40%', 'B3':'30%'},
{'C0':'50%'}
]
If I want to get exact 10 unique combinations of the A, B, C layers variations,
the script should output the dataset like this:
[
('A0', 'B0'),
('A0', 'B1', 'C0'),
('A0', 'B1'),
('A1', 'B2', 'C0'),
('A1', 'B2'),
('A1', 'B3', 'C0'),
('A2', 'B2', 'C0'),
('A2', 'B2'),
('A2', 'B3', 'C0'),
('A2', 'B3')
]
So, the counts of each layer variation should align with the given proportions:
A0 = 3, A1 = 3, A2 = 4
B0 = 1, B1 = 2, B2 = 4, B3 = 3,
C0 = 5
If we want to get 20 variations the counts will be different:
A0 = 6, A1 = 6, A2 = 8
B0 = 2, B1 = 4, B2 = 8, B3 = 6,
C0 = 10
It should work for any number of layers, variations, proportions and get the exact count of the output combinations
(or the maximum of combinations, if there are no more combinations to get the exact number)
For every layer, you can find the distribution list and then recursively merge the results to produce the combinations. Due to the very high number of combinations that could result from get_combos, the latter is a generator, and you can use next to produce the values on-demand:
import itertools
layers = [{'A0': '30%', 'A1': '30%', 'A2': '40%'}, {'B0': '10%', 'B1': '20%', 'B2': '40%', 'B3': '30%'}, {'C0': '50%'}]
def layer_combos(l, d):
return [i for a, b in l.items() for i in ([a]*int((d*(int(b[:-1])/float(100)))))]
def get_offsets(l, d, c = []):
if not d:
yield tuple(c)
else:
if l:
yield from get_offsets(l[1:], d-1, c+[l[0]])
if not c or c[-1] is not None:
for i in range(d - len(l)):
yield from get_offsets(l, d-(i+1), c+([None]*(i+1)))
def get_combos(l, d, c = []):
if not l:
if len((l:=[tuple(list(filter(None, i))) for i in zip(*c)])) == len(set(l)):
yield l
else:
for i in itertools.permutations((l1:=layer_combos(l[0], d)), (l2:=len(l1))):
for j in set(get_offsets(i, d)):
yield from get_combos(l[1:], d, c + [j])
result = get_combos(layers, 10)
for _ in range(10): #first ten combinations
print(next(result))
Output:
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2'), ('A1', 'B2', 'C0'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0', 'C0'), ('A0', 'B1'), ('A0', 'B1', 'C0'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B3', 'C0')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2', 'C0'), ('A2', 'B2'), ('A2', 'B3', 'C0'), ('A2', 'B3')]
[('A0', 'B0'), ('A0', 'B1', 'C0'), ('A0', 'B1'), ('A1', 'B2', 'C0'), ('A1', 'B2'), ('A1', 'B3', 'C0'), ('A2', 'B2'), ('A2', 'B2', 'C0'), ('A2', 'B3', 'C0'), ('A2', 'B3')]

Reorder your dataframe by reordering one column

Having a dataframe which looks like this:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
I wonder how to rearange the dataframe when having a different order in one column that one wants to apply to all the others, for example having changed the A column in this example?
df2 = pd.DataFrame({'A': ['A3', 'A0', 'A2', 'A1'],
'B': ['B3', 'B0', 'B2', 'B1'],
'C': ['C3', 'C0', 'C2', 'C1'],
'D': ['D3', 'D0', 'D2', 'D1']},
index=[0, 1, 2, 3])
You can use indexing via set_index, reindex and reset_index. Assumes your values in A are unique, which is the only case where such a transformation would make sense.
L = ['A3', 'A0', 'A2', 'A1']
res = df1.set_index('A').reindex(L).reset_index()
print(res)
A B C D
0 A3 B3 C3 D3
1 A0 B0 C0 D0
2 A2 B2 C2 D2
3 A1 B1 C1 D1
did you mean to sort 1 specific row? if so, use:
df1.iloc[:1] = df1.iloc[:1].sort_index(axis=1,ascending=False)
print(df1)
for all columns use:
df1 = df1.sort_index(axis=0,ascending=False)
for specific columns use the iloc function.
You can use the key parameter from the sorted function:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
key = {'A3': 0, 'A0': 1, 'A2' : 2, 'A1': 3}
df1['A'] = sorted(df1.A, key=lambda e: key.get(e, 4))
print(df1)
Output
A B C D
0 A3 B0 C0 D0
1 A0 B1 C1 D1
2 A2 B2 C2 D2
3 A1 B3 C3 D3
By changing the values of key, you can set whatever order you want.
UPDATE
If want you want is to alter the order of the other columns based on the new order of A, you could try something like this:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A3', 'A0', 'A2', 'A1'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
key = [df1.A.values.tolist().index(k) for k in df2.A]
df2.B = df2['B'][key].tolist()
print(df2)
Output
A B C D
0 A3 B3 C0 D0
1 A0 B0 C1 D1
2 A2 B2 C2 D2
3 A1 B1 C3 D3
To alter all the columns just apply the above for each column. Somthing like this:
for column in df2.columns.values:
if column != 'A':
df2[column] = df2[column][key].tolist()
print(df2)
Output
A B C D
0 A3 B3 C3 D3
1 A0 B0 C0 D0
2 A2 B2 C2 D2
3 A1 B1 C1 D1

How to insert a row of df1 one time in two rows of df2 in pandas dataframe

I'd like to insert rows of a specific dataframe one time in two rows in another specific dataframe. At the end, I'd like to do this for several columns of df1 and df2 (not only D and E).
I've got two different dataframes:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'E': ['E0', 'E1', 'E2', 'E3']},
index=[0, 1, 2, 3])
And I'd like to merge them like
df3 = pd.DataFrame({'A': ['A0', 'A0', 'A1', 'A1', 'A2', 'A2', 'A3', 'A3'],
'B': ['B0', 'B0', 'B1', 'B1', 'B2', 'B2', 'B3', 'B3'],
'C': ['C0', 'C0', 'C1', 'C1', 'C2', 'C2', 'C3', 'C3'],
'D': ['D0', 'E0', 'D1', 'E1', 'D2', 'E2', 'D3', 'E3']},
index=[0, 1, 2, 3, 4, 5, 6, 7])
1) Using pd.concat and sort_index
In [1006]: (pd.concat([df1, df2.rename(columns={'E': 'D'})])
.sort_index().reset_index(drop=True))
Out[1006]:
A B C D
0 A0 B0 C0 D0
1 A0 B0 C0 E0
2 A1 B1 C1 D1
3 A1 B1 C1 E1
4 A2 B2 C2 D2
5 A2 B2 C2 E2
6 A3 B3 C3 D3
7 A3 B3 C3 E3
2) Or, Using append and sort_index
In [1007]: df1.append(df2.rename(columns={'E': 'D'})).sort_index().reset_index(drop=True)
Out[1007]:
A B C D
0 A0 B0 C0 D0
1 A0 B0 C0 E0
2 A1 B1 C1 D1
3 A1 B1 C1 E1
4 A2 B2 C2 D2
5 A2 B2 C2 E2
6 A3 B3 C3 D3
7 A3 B3 C3 E3
Test
In [1009]: (pd.concat([df1, df2.rename(columns={'E': 'D'})])
.sort_index().reset_index(drop=True)
.equals(df3))
Out[1009]: True
In [1010]: pd.concat([df1, df2.rename(columns={'E': 'D'})]).equals(df3)
Out[1010]: False
The concat function lets you combine multiple DataFrames:
frames = [df1, df2.rename(columns={'E': 'D'})]
pd.concat(frames)
You can additional DataFrames to the list, but you will have to rename columns to have have merge correctly.

A strange errro,ValueError: Shape of passed values is (7, 4), indices imply (7, 2)

The codes below throw an exception, ValueError: Shape of passed values is (7, 4), indices imply (7, 2).
df4 = pd.DataFrame({'E': ['B2', 'B3', 'B6', 'B7'],
'F': ['D2', 'D3', 'D6', 'D7'],
'G': ['F2', 'F3', 'F6', 'F7']},
index=[2, 2, 6, 7])
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']},
index=[0, 1, 2])
result00 = pd.concat([df1, df4], axis=1,join='inner')
I am confused about the error. How to merge the two dataframe?
The result of merging i want is like below
you can use merge() method:
In [122]: pd.merge(df1, df4, left_index=True, right_index=True)
Out[122]:
A B C D E F G
2 A2 B2 C2 D2 B2 D2 F2
2 A2 B2 C2 D2 B3 D3 F3
you can use the pd.concat in the following form:
result00 = pd.concat([df1, df4], axis=1, join_axes = [df4.index], join = 'inner').dropna()
The earlier code did not work since there was a duplicate index in df2. Hope this helps

count combinations in python

I have been puzzling for hours how to make a program for this problem. I searched for similar solutions but I had no success.
There are 6 sets of 2 values a [a1,a2] ; b [b1, b2] ; ... f [f1, f2].
Every combination needs to have at least one value from every set, but it can have also both. Therefore, there are 64 combinations.
What I need is to count all those combinations, and print something like this:
Combination 1: a1, b1, c1, d1, e1, f1 Sum: (sum of those listed)
Combination 2: ...
Total sum:
>>> from itertools import product
>>> for item in product(['a1', 'a2'], ['b1', 'b2'], ['c1', 'c2']):
... print item
...
('a1', 'b1', 'c1')
('a1', 'b1', 'c2')
('a1', 'b2', 'c1')
('a1', 'b2', 'c2')
('a2', 'b1', 'c1')
('a2', 'b1', 'c2')
('a2', 'b2', 'c1')
('a2', 'b2', 'c2')
It looks like your a1, a2 etc are numeric. That's fine too
>>> from itertools import product
>>> for item in product([1, 2], [3, 4], [5, 6]):
... print item, sum(item)
...
(1, 3, 5) 9
(1, 3, 6) 10
(1, 4, 5) 10
(1, 4, 6) 11
(2, 3, 5) 10
(2, 3, 6) 11
(2, 4, 5) 11
(2, 4, 6) 12

Categories

Resources