how to replace certain elements in the pandas dataframe index - python

I have a dictionary and want to use .replace to only replace the indices that are in the dictionary key withe values.
dicts = {"certain index element1": "changed element1",
"certain index element2": "changed element2",
"certain index element3": "changed element3",
}
This does not work:
df.replace(dicts,regex=False,inplace=True)
The df is huge so I can not reassign all of the index from scratch. I only need to change certain elements and everything else remains the same.
If I wanted to replace certain elements within the df (not indices) it would work but for indices it does not.

Use rename(index=dicts)
example
df = pd.DataFrame(
np.random.choice(list('abcd'), (10, 10)),
list('ABCDEFGHIJ')
)
dc = {'A': '_A_', 'B': '_B_'}
df.rename(index=dc, inplace=True)
print(df)
0 1 2 3 4 5 6 7 8 9
_A_ a a a a b a d c c b
_B_ b c c a a a b b a b
C b b d b d c c a a b
D d d d d c b b a a d
E d c c c a d a d d a
F d c d c d d d d b d
G b c d c c b c a a b
H c c c b b a a b c a
I c a a b a d c c a a
J a a a c a b d c c c

Related

Reordering DataFrame in Pandas

I have a DataFrame that looks like this:
A B C D E
1 a a a a a
2 b b b b b
3 c c c c c
4 d d d d d
5 e e e e e
6 f f f f f
Anyone knows how to reorder it using Pandas to make it look like this:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f
I tried reading the documentation https://pandas.pydata.org/docs/user_guide/reshaping.html. Its difficult to understand as a beginner, appreciate any help.
Assuming that you want to group every two rows in a single row, use the underlying numpy array and reshape it:
from string import ascii_uppercase
out = (pd.DataFrame(df.to_numpy().reshape(len(df)//2, -1),
index=df.index[::2])
.rename(columns=dict(enumerate(ascii_uppercase)))
)
Output:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f

Pandas Shift Row Value to Match Column Name

I have a sample dataset that has a set list of column names. In shifting data around, I have each row printing letters in each row as seen below.
I am trying to shift the values of each row to match either respective column. I have tried doing pd.shift() to do so but have not had much success. I am trying to get what is seen below. Any thoughts?
import pandas as pd
df = pd.DataFrame({'A': list('AAAAA'),
'B': list('CBBDE'),
'C': list('DDCEG'),
'D': list('EEDF '),
'E': list('FFE '),
'F': list('GGF '),
'G': list(' G ')})
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
After:
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
Here's a broadcasted comparison approach. This will be quite fast, but does have a higher memory complexity.
a = df.to_numpy()
b = df.columns.to_numpy()
pd.DataFrame(np.equal.outer(a, b).any(1) * b, columns=b)
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
This is more list pivot problem
s=df.mask(df=='').stack().reset_index()
s.pivot(index='level_0',columns=0,values=0)
Out[34]:
0 A B C D
level_0
0 A B C D
1 A NaN C NaN
2 A NaN C D
Here's one way with stack, merge, pivot:
new_df = df.stack().reset_index()
(new_df.merge(new_df, left_on=['level_0', 'level_1'],
right_on=['level_0',0],
how='left')
.pivot('level_0', 'level_1', '0_y')
)
Output:
level_1 A B C D E F G
level_0
0 A NaN C D E F G
1 A B NaN D E F G
2 A B C D E F G
3 A NaN NaN D E F NaN
4 A NaN NaN NaN E NaN G

Expand pandas dataframe by replacing cell value with a list

I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!
I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f
You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w

How to remove extra row after set_index() without losing index name?

I would like to change my DataFrame index column with the df.set_index() function. While this provides a functional solution, it creates an "extra" row that I would like to get rid of.
df = pd.DataFrame({'A': ['a','b','c'], 'B': ['d','e','f'], 'C': [1,2,3]})
df looks like this:
A B C
0 a d 1
1 b e 2
2 c f 3
Changing the DataFrame index:
df.set_index('C')
Result:
A B
C
1 a e
2 b f
3 c g
How can I make the dataframe look as follows?
C A B
1 a e
2 b f
3 c g
I saw a similar question here but the solution using reset_index() did not provide the desired result. I would like to keep the values I have on column C and only remove the extra row.
If you want to have C column as index:
In [50]: r = df.set_index('C')
In [51]: r
Out[51]:
A B
C
1 a d
2 b e
3 c f
In [52]: r.index.name
Out[52]: 'C'
In [53]: r.columns.name is None
Out[53]: True
In [54]: r = r.rename_axis(None,0).rename_axis('C',1)
In [57]: r
Out[57]:
C A B
1 a d
2 b e
3 c f
In [55]: r.index.name is None
Out[55]: True
In [56]: r.columns.name
Out[56]: 'C'
NOTE: but it looks pretty misleading...
Try this with [[]]:
df[['C','A','B']]
Example:
df = pd.DataFrame({'A': ['a','b','c'], 'B': ['d','e','f'], 'C': [1,2,3]})
print(df)
A B C
0 a d 1
1 b e 2
2 c f 3
df = df[['C','A','B']]
print(df)
C A B
0 1 a d
1 2 b e
2 3 c f
If you only want the DataFrame show(print) as what you want ...
print (df[['C','A','B']].to_string(index=False))
C A B
1 a d
2 b e
3 c f

Selecting columns from a pandas dataframe based on columns conditions

I want to select to new dataframe, columns that have 'C' in value
protein 1 2 3 4 5
prot1 C M D F A
prot2 C D A M A
prot3 C C D F A
prot4 S D F C L
prot5 S D A I L
So i want to have this:
protein 1 2 4
prot1 C M F
prot2 C D M
prot3 C C F
prot4 S D C
prot5 S D I
Number of colums can be n, i found examples only which i must specify column name... i cant do this here. The script should check column by colummn.
In [22]: df[['protein']].join(df[df.columns[df.eq('C').any()]])
Out[22]:
protein 1 2 4
0 prot1 C M F
1 prot2 C D M
2 prot3 C C F
3 prot4 S D C
4 prot5 S D I
Use:
np.random.seed(123)
n = np.random.choice(['C','M','D', '-'], size=(3,10))
n[:,0] = ['a','b','w']
foo = pd.DataFrame(n)
print (foo)
0 1 2 3 4 5 6 7 8 9
0 a M D D C D D M - D
1 b M D M C M D - M C
2 w C - M - D M C C C
mask = foo.eq('C').any()
#set columns which need in output
mask.loc[0] = True
#filter
print (foo.loc[:,mask])
0 1 4 7 8 9
0 a M C M - D
1 b M C - M C
2 w C - C C C

Categories

Resources