How to remove extra row after set_index() without losing index name? - python

I would like to change my DataFrame index column with the df.set_index() function. While this provides a functional solution, it creates an "extra" row that I would like to get rid of.
df = pd.DataFrame({'A': ['a','b','c'], 'B': ['d','e','f'], 'C': [1,2,3]})
df looks like this:
A B C
0 a d 1
1 b e 2
2 c f 3
Changing the DataFrame index:
df.set_index('C')
Result:
A B
C
1 a e
2 b f
3 c g
How can I make the dataframe look as follows?
C A B
1 a e
2 b f
3 c g
I saw a similar question here but the solution using reset_index() did not provide the desired result. I would like to keep the values I have on column C and only remove the extra row.

If you want to have C column as index:
In [50]: r = df.set_index('C')
In [51]: r
Out[51]:
A B
C
1 a d
2 b e
3 c f
In [52]: r.index.name
Out[52]: 'C'
In [53]: r.columns.name is None
Out[53]: True
In [54]: r = r.rename_axis(None,0).rename_axis('C',1)
In [57]: r
Out[57]:
C A B
1 a d
2 b e
3 c f
In [55]: r.index.name is None
Out[55]: True
In [56]: r.columns.name
Out[56]: 'C'
NOTE: but it looks pretty misleading...

Try this with [[]]:
df[['C','A','B']]
Example:
df = pd.DataFrame({'A': ['a','b','c'], 'B': ['d','e','f'], 'C': [1,2,3]})
print(df)
A B C
0 a d 1
1 b e 2
2 c f 3
df = df[['C','A','B']]
print(df)
C A B
0 1 a d
1 2 b e
2 3 c f

If you only want the DataFrame show(print) as what you want ...
print (df[['C','A','B']].to_string(index=False))
C A B
1 a d
2 b e
3 c f

Related

Interleave a list and distribute to two columns of a dataframe by pandas?

I have a list and I want to interleave the elements in all combinations then distribute them to two columns of a dataframe in pandas, like:
df = pd.DataFrame(columns = ["pair1","pair2"])
mylist = ["a", "b", "c"]
for i in mylist:
for j in mylist:
df.loc[df.shape[0]] = [i, j]
to output
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
However, such an assignment is slow.
Do we have a faster method?
For a pandas solution, you could use pd.MultiIndex:
df[['pair1','pair2']] = pd.MultiIndex.from_product([mylist]*2).tolist()
or you could also cross-merge (if you have pandas>=1.2.0):
df = pd.merge(pd.Series(mylist, name='pair1'), pd.Series(mylist, name='pair2'), how='cross')
Output:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
You can use itertools.product() to generate the data ahead of time, rather than repeatedly appending to the end of the dataframe:
import pandas as pd
from itertools import product
mylist = ["a", "b", "c"]
df = pd.DataFrame(product(mylist, repeat=2), columns = ["pair1","pair2"])
print(df)
This outputs:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
One fast method is expand_grid from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor as jn
others = {'pair1': mylist, 'pair2': mylist}
jn.expand_grid(others = others).droplevel(axis = 1, level = 1)
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c

Expand pandas dataframe by replacing cell value with a list

I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!
I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f
You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w

How to simply add a column level to a pandas dataframe

let say I have a dataframe that looks like this:
df = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df
Out[92]:
A B
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
Asumming that this dataframe already exist, how can I simply add a level 'C' to the column index so I get this:
df
Out[92]:
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
I saw SO anwser like this python/pandas: how to combine two dataframes into one with hierarchical column index? but this concat different dataframe instead of adding a column level to an already existing dataframe.
-
As suggested by #StevenG himself, a better answer:
df.columns = pd.MultiIndex.from_product([df.columns, ['C']])
print(df)
# A B
# C C
# a 0 0
# b 1 1
# c 2 2
# d 3 3
# e 4 4
option 1
set_index and T
df.T.set_index(np.repeat('C', df.shape[1]), append=True).T
option 2
pd.concat, keys, and swaplevel
pd.concat([df], axis=1, keys=['C']).swaplevel(0, 1, 1)
A solution which adds a name to the new level and is easier on the eyes than other answers already presented:
df['newlevel'] = 'C'
df = df.set_index('newlevel', append=True).unstack('newlevel')
print(df)
# A B
# newlevel C C
# a 0 0
# b 1 1
# c 2 2
# d 3 3
# e 4 4
You could just assign the columns like:
>>> df.columns = [df.columns, ['C', 'C']]
>>> df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
>>>
Or for unknown length of columns:
>>> df.columns = [df.columns.get_level_values(0), np.repeat('C', df.shape[1])]
>>> df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
>>>
Another way for MultiIndex (appanding 'E'):
df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[0], 'E', x[1]), df.columns))
A B
E E
C D
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4
I like it explicit (using MultiIndex) and chain-friendly (.set_axis):
df.set_axis(pd.MultiIndex.from_product([df.columns, ['C']]), axis=1)
This is particularly convenient when merging DataFrames with different column level numbers, where Pandas (1.4.2) raises a FutureWarning (FutureWarning: merging between different levels is deprecated and will be removed ... ):
import pandas as pd
df1 = pd.DataFrame(index=list('abcde'), data={'A': range(5), 'B': range(5)})
df2 = pd.DataFrame(index=list('abcde'), data=range(10, 15), columns=pd.MultiIndex.from_tuples([("C", "x")]))
# df1:
A B
a 0 0
b 1 1
# df2:
C
x
a 10
b 11
# merge while giving df1 another column level:
pd.merge(df1.set_axis(pd.MultiIndex.from_product([df1.columns, ['']]), axis=1),
df2,
left_index=True, right_index=True)
# result:
A B C
x
a 0 0 10
b 1 1 11
Another method, but using a list comprehension of tuples as the arg to pandas.MultiIndex.from_tuples():
df.columns = pd.MultiIndex.from_tuples([(col, 'C') for col in df.columns])
df
A B
C C
a 0 0
b 1 1
c 2 2
d 3 3
e 4 4

Append values of row with same values

I have following data frame:
1 A a
1 A b
2 B c
1 A d
How do I append all the values of a row with same values to data frame:
1 A a,c,d
2 B c
You can use groupby and apply function join :
df.columns = ['a','b','c']
print (df)
a b c
0 1 A a
1 1 A b
2 2 B c
3 1 A d
print (df.groupby(['a', 'b'])['c'].apply(', '.join).reset_index())
a b c
0 1 A a, b, d
1 2 B c
Or if first column is index:
df.columns = ['a','b']
print (df)
a b
1 A a
1 A b
2 B c
1 A d
df1 = df.b.groupby([df.index, df.a]).apply(', '.join).reset_index(name='c')
df1.columns = ['a','b','c']
print (df1)
a b c
0 1 A a, b, d
1 2 B c

How do I include the zero-cross terms in pandas groupby?

How do I output a pandas groupby result -- including zero cross-terms -- to a csv file.
A toy example of exactly what I'm looking for:
I have a pandas dataframe that can be approximated as:
df = pd.DataFrame(np.random.choice(['A', 'B', 'C'], (10, 2)),
columns=['one', 'two'])
Which gave me the following:
one two
0 C C
1 C A
2 A B
3 B A
4 B C
5 B B
6 C C
7 A C
8 C B
9 C C
When I run groupby it works as expected:
grouped = df.groupby(['one', 'two']).size()
grouped
one two
A B 1
C 1
B A 1
B 1
C 1
C A 1
B 1
C 3
dtype: int64
However, I would like for the "A A 0" term to be included because I write this to a csv file:
grouped.to_csv("test1.csv", header=True)
!cat test1.csv
one,two,0
A,B,1
A,C,1
B,A,1
B,B,1
B,C,1
C,A,1
C,B,1
C,C,3
And I want the file to include the line: A,A,0.
You can do this with unstack:
grouped.unstack('two').fillna(0).stack()
which gives, for example, the following output:
one two
A A 2
B 1
C 1
B A 0
B 1
C 3
C A 2
B 0
C 0

Categories

Resources