I have this dataframe, and i want to normalize/standarlize it (columns B,C,D) using column A as weight.
A
B
C
D
34
5
1
12
26
9
0
2
10
0
4
1
Is that possible?
It sounds like you would like to divide the the values in columns B, C, and D by the corresponding row value in column A.
To do this with a pandas dataframe called df:
print(df)
A B C D
34 5 1 12
26 9 0 2
10 0 4 1
cols = df.columns[1:]
for column in cols:
df[column] = df[column]/df["A"]
print(df)
A B C D
34 0.147059 0.029412 0.352941
26 0.346154 0.000000 0.076923
10 0.000000 0.400000 0.100000
Related
Let's say I have two dataframes with same columns, first one has unique index, second has not unique index,
column1 column2
a 1 2
b 4 5
c 3 3
column1 column2
a 1 2
a 4 5
c 3 3
b 1 2
b 4 5
a 3 3
Now how can I make a scalar product of rows where index match, the result would be a dataframe with one column (with values of scalar product, for example first row: 1*1+2*2=5) and index as in second dataframe:
result
a 5
a 14
c 18
b 14
b 41
a 9
Multiple and then sum DataFrames:
df = df2.mul(df1).sum(axis=1).to_frame('result')
print (df)
result
a 5
a 14
a 9
b 14
b 41
c 18
If ordering is important in ouput:
df = (df2.assign(a=range(len(df2)))
.set_index('a', append=True)
.mul(df1, level=0)
.sum(axis=1)
.droplevel(1)
.to_frame('result'))
print (df)
result
a 5
a 14
c 18
b 14
b 41
a 9
Let's take this sample dataframe and this list of ids :
df=pd.DataFrame({'Id':['A','A','A','B','C','C','D','D'], 'Weight':[50,20,30,1,2,8,3,2], 'Value':[100,100,100,10,20,20,30,30]})
Id Weight Value
0 A 50 100
1 A 20 100
2 A 30 100
3 B 1 10
4 C 2 20
5 C 8 20
6 D 3 30
7 D 2 30
L = ['A','C']
Value column has same values for each id in Id column. For the specific ids of L, I would like to apply the weights of Weight column to Value column. I am currently doing the following way but it is extremely slow with my real big dataframe :
for i in L :
df.loc[df["Id"]==i,"Value"] = (df.loc[df["Id"]==i,"Value"] * df.loc[df["Id"]==i,"Weight"] /
df[df["Id"]==i]["Weight"].sum())
How please could I do that efficiently ?
Expected output :
Id Weight Value
0 A 50 50
1 A 20 20
2 A 30 30
3 B 1 10
4 C 2 4
5 C 8 16
6 D 3 30
7 D 2 30
Idea is working only for filtered rows by Series.isin with GroupBy.transform and sum for sum per groups with same size like original DataFrame:
L = ['A','C']
m = df['Id'].isin(L)
df1 = df[m].copy()
s = df1.groupby('Id')['Weight'].transform('sum')
df.loc[m, 'Value'] = df1['Value'].mul(df1['Weight']).div(s)
print (df)
Id Weight Value
0 A 50 50.0
1 A 20 20.0
2 A 30 30.0
3 B 1 10.0
4 C 2 4.0
5 C 8 16.0
6 D 3 30.0
7 D 2 30.0
I have a base dataframe df1:
id name count
1 a 10
2 b 20
3 c 30
4 d 40
5 e 50
Here I have a new dataframe with updates df2:
id name count
1 a 11
2 b 22
3 f 30
4 g 40
I want to overwrite and append these two dataframes on column name.
for Eg: a and b are present in df1 but also in df2 with updated count values. So we update df1 with new counts for a and b. Since f and g are not present in df1, so we append them.
Here is an example after the desired operation:
id name count
1 a 11
2 b 22
3 c 30
4 d 40
5 e 50
3 f 30
4 g 40
I tried df.merge or pd.concat but nothing seems to give me the output that I require.? Can any one
Using combine_first
df2=df2.set_index(['id','name'])
df2.combine_first(df1.set_index(['id','name'])).reset_index()
Out[198]:
id name count
0 1 a 11.0
1 2 b 22.0
2 3 c 30.0
3 3 f 30.0
4 4 d 40.0
5 4 g 40.0
6 5 e 50.0
I have a database that I am bringing in a SQL table of events and alarms (df1), and I have a txt file of alarm codes and properties (df2) to watch for. Want to use 1 columns values from df2 that each value needs cross checked against an entire column values in df1, and output the entire rows of any that match into another dataframe df3.
df1 A B C D
0 100 20 1 1
1 101 30 1 1
2 102 21 2 3
3 103 15 2 3
4 104 40 2 3
df2 0 1 2 3 4
0 21 2 2 3 3
1 40 0 NaN NaN NaN
Output entire rows from df1 that column B match with any of df2 column 0 values into df3.
df3 A B C D
0 102 21 2 3
1 104 40 2 3
I was able to get single results using:
df1[df1['B'] == df2.iloc[0,0]]
But I need something that will do this on a larger scale.
Method 1: merge
Use merge, on B and 0. Then select only the df1 columns
df1.merge(df2, left_on='B', right_on='0')[df1.columns]
A B C D
0 102 21 2 3
1 104 40 2 3
Method 2: loc
Alternatively use loc to find rows in df1 where B has a match in df2 column 0 using .isin:
df1.loc[df1.B.isin(df2['0'])]
A B C D
2 102 21 2 3
4 104 40 2 3
I have the following two dataframes. Please note that 'amt' is grouped by 'id' in both dataframes.
df1
id code amt
0 A 1 5
1 A 2 5
2 B 3 10
3 C 4 6
4 D 5 8
5 E 6 11
df2
id code amt
0 B 1 9
1 C 12 10
I want to add a row in df2 for every id of df1 not contained in df2. For example as Id's A, D and E are not contained in df2,I want to add a row for these Id's. The appended row should contain the id not contained in df2, null value for the attribute code and stored value in df1 for attribute amt
The result should be something like this:
id code name
0 B 1 9
1 C 12 10
2 A nan 5
3 D nan 8
4 E nan 11
I would highly appreciate if I can get some guidance on it.
By using pd.concat
df=df1.drop('code',1).drop_duplicates()
df[~df.id.isin(df2.id)]
pd.concat([df2,df[~df.id.isin(df2.id)]],axis=0).rename(columns={'amt':'name'}).reset_index(drop=True)
Out[481]:
name code id
0 9 1.0 B
1 10 12.0 C
2 5 NaN A
3 8 NaN D
4 11 NaN E
Drop dups from df1 then append df2 then drop more dups then append again.
df2.append(
df1.drop_duplicates('id').append(df2)
.drop_duplicates('id', keep=False).assign(code=np.nan),
ignore_index=True
)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11
Slight variation
m = ~np.in1d(df1.id.values, df2.id.values)
d = ~df1.duplicated('id').values
df2.append(df1[m & d].assign(code=np.nan), ignore_index=True)
id code amt
0 B 1.0 9
1 C 12.0 10
2 A NaN 5
3 D NaN 8
4 E NaN 11