Consider the following DataFrame df:
df=
kind A B
names a1 a2 b1 b2 b3
Time
0.0 0.7804 0.5294 0.1895 0.9195 0.0508
0.1 0.1703 0.7095 0.8704 0.8566 0.5513
0.2 0.8147 0.9055 0.0506 0.4212 0.2464
0.3 0.3985 0.4515 0.7118 0.6146 0.2682
0.4 0.2505 0.2752 0.4097 0.3347 0.1296
When I issue the command levs = df.columns.get_level_values("kind"), I get that levs is equal to
Index(['A', 'A', 'A', 'B', 'B'], dtype='object', name='kind')
whereas I would like to have that levs is equal to Index(['A', 'B'], dtype='object', name='kind').
One way to achieve such an objective could be to run levs=list(set(levs)), but I am wondering if there are any other simple methods.
I think you can use levels:
out = df.columns.levels[0]
print (out)
Index(['A', 'B'], dtype='object')
EDIT: One idea with lookup by names of MultiIndex:
d = {v: k for k, v in enumerate(df.columns.names)}
print (d)
{'kind': 0, 'names': 1}
out = df.columns.levels[d['kind']]
print (out)
Index(['A', 'B'], dtype='object', name='kind')
Related
I am hoping to create few new columns for 'data'.
The first created col is a/d, second b/e, and third c/f.
col1 is a list of names for the original columns
The output of df should look like this
a b c d e f res_a res_c res_e
1 2 3 4 2 3 0.5 0.75 2/3
res_a is a divide b a = 1, b = 2, therefore res_a = 1/2 = 0.5
c/d c = 3, d= 4 res_c = 3/4 = 0.75
my code looks like this now, but I can't get a/b, c/d, and e/f
col1 = ['a', 'b', 'c']
col2 = ['d', 'e', 'f']
for col in cols2:
data[f'res_{col}'] = np.round(data[col1]/ data[col2],decimals=2)
You could also use the pandas.IndexSlice to pick up alternate columns with a list slicing type of syntax
cix = pd.IndexSlice
df[['res_a', 'res_c', 'res_e']] = np.divide(df.loc[:, cix['a'::2]], df.loc[:, cix['b'::2]])
print(df)
# a b c d e f res_a res_c res_e
# 0 1 2 3 4 2 3 0.5 0.75 0.666667
You can read more about the pandas slicers in the docs
Use zip() to loop over two lists in parallel.
cols1 = ['a', 'c', 'e']
cols2 = ['b', 'd', 'f']
for c1, c2 in zip(cols1, cols2):
data[f'res_{c1}'] = np.round(data[c1] / data[c2], decimals=2)
I am trying to rename a column and combine that renamed column to others like it. The row indexes will not be the same (i.e. I am not combining 'City' and 'State' from two columns).
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
df.rename(columns={'Col_one' : 'Col_1'}, inplace=True)
# Desired output:
({'Col_1': ['A', 'B', 'C', 'G', 'H', 'I'],
'Col_2': ['D', 'E', 'F', '-', '-', '-'],})
I've tried pd.concat and a few other things, but it fails to combine the columns in a way I'm expecting. Thank you!
This is melt and pivot after you have renamed:
u = df.melt()
out = (u.assign(k=u.groupby("variable").cumcount())
.pivot("k","variable","value").fillna('-'))
out = out.rename_axis(index=None,columns=None)
print(out)
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Using append without modifying the actual dataframe:
result = (df[['Col_1', 'Col_2']]
.append(df[['Col_one']]
.rename(columns={'Col_one': 'Col_1'}),ignore_index=True).fillna('-')
)
OUTPUT:
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Might be a slightly longer method than other answers but the below delivered the required output.
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
# Create a list of the values we want to retain
TempList = df['Col_one']
# Append existing dataframe with the values from the list
df = df.append(pd.DataFrame({'Col_1':TempList}), ignore_index = True)
# Drop the redundant column
df.drop(columns=['Col_one'], inplace=True)
# Populate NaN with -
df.fillna('-', inplace=True)
Output is
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
Using concat should work.
import pandas as pd
df = pd.DataFrame({'Col_1': ['A', 'B', 'C'],
'Col_2': ['D', 'E', 'F'],
'Col_one':['G', 'H', 'I'],})
df2 = pd.DataFrame()
df2['Col_1'] = pd.concat([df['Col_1'], df['Col_one']], axis = 0)
df2 = df2.reset_index()
df2 = df2.drop('index', axis =1)
df2['Col_2'] = df['Col_2']
df2['Col_2'] = df2['Col_2'].fillna('-')
print(df2)
prints
Col_1 Col_2
0 A D
1 B E
2 C F
3 G -
4 H -
5 I -
mydf = pd.DataFrame({'dts':['1/1/2000','1/1/2000','1/1/2000','1/2/2000', '1/3/2000', '1/3/2000'],
'product':['A', 'B', 'A','A', 'A','B'],
'value':[1,2,2,3,6,1]})
a =mydf.groupby(['dts','product']).sum()
so a has multi-level index now...
a
Out[1]:
value
dts product
1/1/2000 A 3
B 2
1/2/2000 A 3
1/3/2000 A 6
B 1
how to extract product-level index in a? a.index['product']does not work.
Using get_level_values
>>> a.index.get_level_values(1)
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')
You can also use the name of the level:
>>> a.index.get_level_values('product')
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')
I am trying to figure out some fast and clean way to map values from one DataFrame A to another. Let say I have DataFrame like this one:
C1 C2 C3 C4 C5
1 a b c a
2 d a e b a
3 a c
4 b e e
And now I want to change those letter codes to actual values. My DataFrame Bwith explanations looks like that:
Code Value
1 a 'House'
2 b 'Bike'
3 c 'Lamp'
4 d 'Window'
5 e 'Car'
So far my brute-force approach was to just go through every element in A and check with isin() the value in B. I know that I can also use Series (or simple dictionary) as an B instead of DataFrame and use for example Code column as a index. But still I would need to use multiple loops to map everything.
Is there any other nice way to achieve my goal?
You could use replace:
A.replace(B.set_index('Code')['Value'])
import pandas as pd
A = pd.DataFrame(
{'C1': ['a', 'd', 'a', 'b'],
'C2': ['b', 'a', 'c', 'e'],
'C3': ['c', 'e', '', 'e'],
'C4': ['a', 'b', '', ''],
'C5': ['', 'a', '', '']})
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
print(A.replace(B.set_index('Code')['Value']))
yields
C1 C2 C3 C4 C5
0 'House' 'Bike' 'Lamp' 'House'
1 'Window' 'House' 'Car' 'Bike' 'House'
2 'House' 'Lamp'
3 'Bike' 'Car' 'Car'
Another alternative is map. Although it requires looping over columns, if I didn't mess up the tests, it is still faster than replace:
A = pd.DataFrame(np.random.choice(list("abcdef"), (1000, 1000)))
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
B = B.set_index("Code")["Value"]
%timeit A.replace(B)
1 loop, best of 3: 970 ms per loop
C = pd.DataFrame()
%%timeit
for col in A:
C[col] = A[col].map(B).fillna(A[col])
1 loop, best of 3: 586 ms per loop
Consider the following DataFrame:
arrays = [['foo', 'bar', 'bar', 'bar'],
['A', 'B', 'C', 'D']]
tuples = list(zip(*arrays))
columnValues = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.rand(4,4), columns = columnValues)
print(df)
foo bar
A B C D
0 0.859664 0.671857 0.685368 0.939156
1 0.155301 0.495899 0.733943 0.585682
2 0.124663 0.467614 0.622972 0.567858
3 0.789442 0.048050 0.630039 0.722298
Say I want to remove the first column, like so:
df.drop(df.columns[[0]], axis = 1, inplace = True)
print(df)
bar
B C D
0 0.671857 0.685368 0.939156
1 0.495899 0.733943 0.585682
2 0.467614 0.622972 0.567858
3 0.048050 0.630039 0.722298
This produces the expected result, however the column labels foo and Aare retained:
print(df.columns.levels)
[['bar', 'foo'], ['A', 'B', 'C', 'D']]
Is there a way to completely drop a column, including its labels, from a MultiIndex DataFrame?
EDIT: As suggested by John, I had a look at https://github.com/pydata/pandas/issues/12822. What I got from it is that it's not a bug, however I believe the suggested solution (https://github.com/pydata/pandas/issues/2770#issuecomment-76500001) does not work for me. Am I missing something here?
df2 = df.drop(df.columns[[0]], axis = 1)
print(df2)
bar
B C D
0 0.969674 0.068575 0.688838
1 0.650791 0.122194 0.289639
2 0.373423 0.470032 0.749777
3 0.707488 0.734461 0.252820
print(df2.columns[[0]])
MultiIndex(levels=[['bar', 'foo'], ['A', 'B', 'C', 'D']],
labels=[[0], [1]])
df2.set_index(pd.MultiIndex.from_tuples(df2.columns.values))
ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements
New Answer
As of pandas 0.20, pd.MultiIndex has a method pd.MultiIndex.remove_unused_levels
df.columns = df.columns.remove_unused_levels()
Old Answer
Our savior is pd.MultiIndex.to_series()
it returns a series of tuples restricted to what is in the DataFrame
df.columns = pd.MultiIndex.from_tuples(df.columns.to_series())