How to replace part of dataframe in pandas - python

I have sample dataframe like this
df1=
A B C
a 1 2
b 3 4
b 5 6
c 7 8
d 9 10
I would like to replace a part of this dataframe (col A=a and b) with this dataframe
df2=
A B C
b 9 10
b 11 12
c 13 14
I would like to get result below
df3=
A B C
a 1 2
b 9 10
b 11 12
c 13 14
d 9 10
I tried
df1[df1.A.isin("bc")]...
But I couldnt figure out how to replace.
someone tell how to replace dataframe.

As I explained try update.
import pandas as pd
df1 = pd.DataFrame({"A":['a','b','b','c'], "B":[1,2,4,6], "C":[3,2,1,0]})
df2 = pd.DataFrame({"A":['b','b','c'], "B":[100,400,300], "C":[39,29,100]}).set_index(df1.loc[df1.A.isin(df2.A),:].index)
df1.update(df2)
Out[75]:
A B C
0 a 1.0 3.0
1 b 100.0 39.0
2 b 400.0 29.0
3 c 300.0 100.0

You need combine_first or update by column A, but because duplicates need cumcount:
df1['g'] = df1.groupby('A').cumcount()
df2['g'] = df2.groupby('A').cumcount()
df1 = df1.set_index(['A','g'])
df2 = df2.set_index(['A','g'])
df3 = df2.combine_first(df1).reset_index(level=1, drop=True).astype(int).reset_index()
print (df3)
A B C
0 a 1 2
1 b 9 10
2 b 11 12
3 c 13 14
4 d 9 10
Another solution:
df1['g'] = df1.groupby('A').cumcount()
df2['g'] = df2.groupby('A').cumcount()
df1 = df1.set_index(['A','g'])
df2 = df2.set_index(['A','g'])
df1.update(df2)
df1 = df1.reset_index(level=1, drop=True).astype(int).reset_index()
print (df1)
A B C
0 a 1 2
1 b 9 10
2 b 11 12
3 c 13 14
4 d 9 10
If duplicatesof column A in df1 are same in df2 and have same length:
df2.index = df1.index[df1.A.isin(df2.A)]
df3 = df2.combine_first(df1)
print (df3)
A B C
0 a 1.0 2.0
1 b 9.0 10.0
2 b 11.0 12.0
3 c 13.0 14.0
4 d 9.0 10.0

you could solve your problem with the following:
import pandas as pd
df1 = pd.DataFrame({'A':['a','b','b','c','d'],'B':[1,3,5,7,9],'C':[2,4,6,8,10]})
df2 = pd.DataFrame({'A':['b','b','c'],'B':[9,11,13],'C':[10,12,14]}).set_index(df1.loc[df1.A.isin(df2.A),:].index)
df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
Out[108]:
A B C
0 a 1 2
1 b 9 10
2 b 11 12
3 c 13 14
4 d 9 10

Related

Drop specific column and indexes in pandas DataFrame

DataFrame:
A B C
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
Is it possible to drop values from index 2 to 4 in column B? or replace it with NaN.
In this case, values: [8, 9, 10] should be removed.
I tried this: df.drop(columns=['B'], index=[8, 9, 10]), but then column B is removed.
Drop values does not make sense into DataFrame. You can set values to NaN instead and use .loc / .iloc to access index/columns:
>>> df
A B C
a 1 6 11
b 2 7 12
c 3 8 13
d 4 9 14
e 5 10 15
# By name:
df.loc['c':'e', 'B'] = np.nan
# By number:
df.iloc[2:5, 2] = np.nan
Read carefully Indexing and selecting data
import pandas as pd
data = [
['A','B','C'],
[1,6,11],
[2,7,12],
[3,8,13],
[4,9,14],
[5,10,15]
]
df = pd.DataFrame(data=data[1:], columns=data[0])
df['B'] = df['B'].shift(3)
>>>
A B C
0 1 NaN 11
1 2 NaN 12
2 3 NaN 13
3 4 6.0 14
4 5 7.0 15

Division in pandas dataframe

I am trying to divide my data frame with one of its columns:
Here is my data frame:
A
B
C
1
10
10
2
20
30
3
15
33
Now, I want to divide columns "b" and "c" by column "a", my desired output be like:
A
B
C
1
10
10
2
10
15
3
5
11
df/df['a']
Use DataFrame.div:
df[['B','C']] = df[['B','C']].div(df['A'], axis=0)
print (df)
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0
If need divide all columns without A:
cols = df.columns.difference(['A'])
df[cols] = df[cols].div(df['A'], axis=0)
try this:
d = {
'A': [1,2,3],
'B': [10,20,15],
'C': [10,30,33]
}
df = pd.DataFrame(d)
df['B'] = df['B']/df['A']
df['C'] = df['C']/df['A']
print(df)
Output:
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0

Insert row in pandas Dataframe based on Date Column

I have a Dataframe df and a list li, My dataframe column contains:
Student Score Date
A 10 15-03-19
C 11 16-03-19
A 12 16-03-19
B 10 16-03-19
A 9 17-03-19
My list contain Name of all Student li=[A,B,C]
If any student have not came on particular day then insert the name of student in dataframe with score value = 0
My Final Dataframe should be like:
Student Score Date
A 10 15-03-19
B 0 15-03-19
C 0 15-03-19
C 11 16-03-19
A 12 16-03-19
B 10 16-03-19
A 9 17-03-19
B 0 17-03-19
C 0 17-03-19
Use DataFrame.reindex with MultiIndex.from_product:
li = list('ABC')
mux = pd.MultiIndex.from_product([df['Date'].unique(), li], names=['Date', 'Student'])
df = df.set_index(['Date', 'Student']).reindex(mux, fill_value=0).reset_index()
print (df)
Date Student Score
0 15-03-19 A 10
1 15-03-19 B 0
2 15-03-19 C 0
3 16-03-19 A 12
4 16-03-19 B 10
5 16-03-19 C 11
6 17-03-19 A 9
7 17-03-19 B 0
8 17-03-19 C 0
Alternative is use left join with DataFrame.merge and helper DataFrame created by product, last replace missing values by fillna:
from itertools import product
df1 = pd.DataFrame(list(product(df['Date'].unique(), li)), columns=['Date', 'Student'])
df = df1.merge(df, how='left').fillna(0)
print (df)
Date Student Score
0 15-03-19 A 10.0
1 15-03-19 B 0.0
2 15-03-19 C 0.0
3 16-03-19 A 12.0
4 16-03-19 B 10.0
5 16-03-19 C 11.0
6 17-03-19 A 9.0
7 17-03-19 B 0.0
8 17-03-19 C 0.0

Trying to join 2 Dataframes, and store certain data as an array in one cell

I have the following 3 data frames:
Frist dataframe:
DF1:
iID data1 data2
10 blue green
11 red teal
Second dataframe:
DF2:
iID rH repH
10 50 60
10 60 70
11 70 50
(DF2 to can have either 1 or 2 rows per iID)
I want my output DF to have an array in one cell for rH and repH
do output would be something like:
OUTPUT DF:
iID data1 data2 rH repH
10 blue green [50,60] [60,70]
11 red teal [70] [50]
IIUC
df1.merge(df2.groupby('iID').agg(lambda x : x.tolist()).reset_index())
Out[144]:
iID data1 data2 rH repH
0 10 blue green [50, 60] [60, 70]
1 11 red teal [70] [50]
Worth to add below add ons..
join, which is left join by default:
df1.join(df2)
Or concat, which is outer join by default:
pd.concat([df1, df2], axis=1)
Just adding more narrative:
>>> df1 = pd.DataFrame({'a':range(6),
... 'b':[5,3,6,9,2,4]}, index=list('abcdef'))
>>> df2 = pd.DataFrame({'c':range(4),
... 'd':[10,20,30, 40]}, index=list('abhi'))
>>>
>>>
>>> df1
a b
a 0 5
b 1 3
c 2 6
d 3 9
e 4 2
f 5 4
>>> df2
c d
a 0 10
b 1 20
h 2 30
i 3 40
>>> df4 = df1.join(df2)
>>> df4
a b c d
a 0 5 0.0 10.0
b 1 3 1.0 20.0
c 2 6 NaN NaN
d 3 9 NaN NaN
e 4 2 NaN NaN
f 5 4 NaN NaN

Adding rows in dataframe based on values of another dataframe

I have the following two dataframes. Please note that 'amt' is grouped by 'id' in both dataframes.
df1
id code amt
0 A 1 5
1 A 2 5
2 B 3 10
3 C 4 6
4 D 5 8
5 E 6 11
df2
id code amt
0 B 1 9
1 C 12 10
I want to add a row in df2 for every id of df1 not contained in df2. For example as Id's A, D and E are not contained in df2,I want to add a row for these Id's. The appended row should contain the id not contained in df2, null value for the attribute code and stored value in df1 for attribute amt
The result should be something like this:
id code name
0 B 1 9
1 C 12 10
2 A nan 5
3 D nan 8
4 E nan 11
I would highly appreciate if I can get some guidance on it.
By using pd.concat
df=df1.drop('code',1).drop_duplicates()
df[~df.id.isin(df2.id)]
pd.concat([df2,df[~df.id.isin(df2.id)]],axis=0).rename(columns={'amt':'name'}).reset_index(drop=True)
Out[481]:
name code id
0 9 1.0 B
1 10 12.0 C
2 5 NaN A
3 8 NaN D
4 11 NaN E
Drop dups from df1 then append df2 then drop more dups then append again.
df2.append(
df1.drop_duplicates('id').append(df2)
.drop_duplicates('id', keep=False).assign(code=np.nan),
ignore_index=True
)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11
Slight variation
m = ~np.in1d(df1.id.values, df2.id.values)
d = ~df1.duplicated('id').values
df2.append(df1[m & d].assign(code=np.nan), ignore_index=True)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11

Categories

Resources