Adding rows in dataframe based on values of another dataframe - python

I have the following two dataframes. Please note that 'amt' is grouped by 'id' in both dataframes.
df1
id code amt
0 A 1 5
1 A 2 5
2 B 3 10
3 C 4 6
4 D 5 8
5 E 6 11
df2
id code amt
0 B 1 9
1 C 12 10
I want to add a row in df2 for every id of df1 not contained in df2. For example as Id's A, D and E are not contained in df2,I want to add a row for these Id's. The appended row should contain the id not contained in df2, null value for the attribute code and stored value in df1 for attribute amt
The result should be something like this:
id code name
0 B 1 9
1 C 12 10
2 A nan 5
3 D nan 8
4 E nan 11
I would highly appreciate if I can get some guidance on it.

By using pd.concat
df=df1.drop('code',1).drop_duplicates()
df[~df.id.isin(df2.id)]
pd.concat([df2,df[~df.id.isin(df2.id)]],axis=0).rename(columns={'amt':'name'}).reset_index(drop=True)
Out[481]:
name code id
0 9 1.0 B
1 10 12.0 C
2 5 NaN A
3 8 NaN D
4 11 NaN E

Drop dups from df1 then append df2 then drop more dups then append again.
df2.append(
df1.drop_duplicates('id').append(df2)
.drop_duplicates('id', keep=False).assign(code=np.nan),
ignore_index=True
)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11
Slight variation
m = ~np.in1d(df1.id.values, df2.id.values)
d = ~df1.duplicated('id').values
df2.append(df1[m & d].assign(code=np.nan), ignore_index=True)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11

Related

How to add pandas data frame column based on other rows values

I am trying to add a new column and set its value based on other rows values. Lets say we have the following data frame:
df = pd.DataFrame({
'B':[1,2,3,4,5,6],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
})
With this corresponding output
B C D
1 7 1
2 8 3
3 9 5
4 4 7
5 2 1
6 3 0
I want to add a new column 'E', which has the following value:
E = df.C value where B = B + 2.
For example, the first value of E should be 3 (we select the row where B = 0+2 = 2, and select C value from that row).
I tried the following
f['E'] = np.where(f.B == (f['B']+2))['C']
But it's not working
You can set B and index and use that to map the modified data:
df['E'] = df['B'].add(2).map(df.set_index('B')['C'])
Output:
B C D E
0 0 1 7 3.0
1 1 2 8 4.0
2 2 3 9 5.0
3 3 4 4 6.0
4 4 5 2 NaN
5 5 6 3 NaN

How to remove observations with missing values for specific columns from pandas DataFrame?

I have pandas DataFrame containing columns with missing values. I want remove observations, rows with them but only for specific columns. For example:
A B C D E
2 1 NaN 7 9
1 3 6 NaN 10
NaN 3 11 0 8
And let's say I want to remove observations with missing value for column D. So I want result like this:
A B C D E
2 1 NaN 7 9
NaN 3 11 0 8
Thank you for all suggestions.
Lets try mask pd.Series.notna()
df[df.D.notna()]
A B C D E
0 2.0 1 NaN 7.0 9
2 NaN 3 11.0 0.0 8

Compute difference between values in dataframe column

i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?
Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN
lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN

Match values based on group value with columns values and merge it in two columns

df
index group1 group2 a b c d
-
0 a b 1 2 NaN NaN
1 b c NaN 5 1 NaN
2 c d NaN NaN 6 9
4 b a 1 7 NaN NaN
5 d a 6 NaN NaN 5
df expect
index group1 group2 one two
-
0 a b 1 2
1 b c 5 1
2 c d 6 9
4 b a 7 1
5 d a 5 6
I want to match values based on columns ['group1','group2'] and append to columns [‘one','two'] by order. For example, row index 5: group1 is 'd', so it will take value of 5 from 'd' first, and then it will do group2.
I am trying to use lookup function: df.one = df.lookup(df.index, df.group1), it works on small data, but not with big data with lots of columns, and values get mixed up.

overwrite and append pandas data frames on column value

I have a base dataframe df1:
id name count
1 a 10
2 b 20
3 c 30
4 d 40
5 e 50
Here I have a new dataframe with updates df2:
id name count
1 a 11
2 b 22
3 f 30
4 g 40
I want to overwrite and append these two dataframes on column name.
for Eg: a and b are present in df1 but also in df2 with updated count values. So we update df1 with new counts for a and b. Since f and g are not present in df1, so we append them.
Here is an example after the desired operation:
id name count
1 a 11
2 b 22
3 c 30
4 d 40
5 e 50
3 f 30
4 g 40
I tried df.merge or pd.concat but nothing seems to give me the output that I require.? Can any one
Using combine_first
df2=df2.set_index(['id','name'])
df2.combine_first(df1.set_index(['id','name'])).reset_index()
Out[198]:
id name count
0 1 a 11.0
1 2 b 22.0
2 3 c 30.0
3 3 f 30.0
4 4 d 40.0
5 4 g 40.0
6 5 e 50.0

Categories

Resources