Pandas replacing subset of column values with another column based on index - python

I tried to check other questions but didn't find what I needed.
I have a dataframe df:
a b
0 6 4
1 5 6
2 2 2
3 7 4
4 3 6
5 5 2
6 4 7
and a second dataframe df2
d
0 60
1 50
5 50
6 40
I want to replace the values in df['a'] with the values in df2['d'] - but only in the relevant indices.
Output:
a b
0 60 4
1 50 6
2 2 2
3 7 4
4 3 6
5 50 2
6 40 7
All other questions I saw like this one referring to a single value, but I want to replace the values based on entire column.
I know I can iterate the rows one by one and replace the values, but I'm looking for a more efficient way.
Note: df2 does not have indices that are not in df. I want to replace all values in df2 with the values of df.

Simply use indexing:
df.loc[df2.index, 'a'] = df2['d']
output:
a b
0 60 4
1 50 6
2 2 2
3 7 4
4 3 6
5 50 2
6 40 7

Related

How to Convert the row unique values in to columns

I have this dataFrame
dd = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],'feature':[10,10,20,20,10,10,20,20],'h':['h_30','h_60','h_30','h_60','h_30','h_60','h_30','h_60'],'count':[1,2,3,4,5,6,7,8]})
a feature h count
0 1 10 h_30 1
1 1 10 h_60 2
2 1 20 h_30 3
3 1 20 h_60 4
4 2 10 h_30 5
5 2 10 h_60 6
6 2 20 h_30 7
7 2 20 h_60 8
My expected output is I want to shift my h column unique values into column and use count numbers as values
like this
a feature h_30 h_60
0 1 10 1 2
1 1 20 3 4
2 2 10 5 6
3 2 20 7 8
I tried this but got an error saying ValueError: Length of passed values is 8, index implies 2
dd.pivot(index = ['a','feature'],columns ='h',values = 'count' )
df.pivot does not accept list of columns as index for versions below 1.1.0
Changed in version 1.1.0: Also accept list of index names.
Try this:
import pandas as pd
pd.pivot_table(
dd, index=["a", "feature"], columns="h", values="count"
).reset_index().rename_axis(None, 1)

sum numbers in two dataframes based on their intersecting indexes

i have 2 dataframes with some common index and some that are not:
df1
DATA
1 1
2 2
3 3
4 4
5 5
df2
DATA
3 3
4 4
5 5
6 6
7 7
I want to sum/take max (i actually need both for different cols) them, and consider missing indexes as 0.
In this example the result should be:
df_results
DATA
1 1
2 2
3 6
4 8
5 10
6 6
7 7
where 3,4,5 were summed, but the rest remained the same.
Thx!
Try this:
combined = df1.add(df2, fill_value=0)

Filtering pandas dataframe groups based on groups comparison

I am trying to remove corrupted data from my pandas dataframe. I want to remove groups from dataframe that has difference of value bigger than one from the last group. Here is an example:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 8 <- here number of group if I groupby by Value is larger than
7 8 the last groups number by 6, so I want to remove this
8 3 group from dataframe
9 3
Expected result:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
Edit:
jezrael solution is great, but in my case it is possible that there will be dubplicate group values:
Value
0 1
1 1
2 1
3 3
4 3
5 3
6 1
7 1
Sorry if I was not clear about this.
First remove duplicates for unique rows, then compare difference with shifted values and last filter by boolean indexing:
s = df['Value'].drop_duplicates()
v = s[s.diff().gt(s.shift())]
df = df[~df['Value'].isin(v)]
print (df)
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3
Maybe:
df2 = df.drop_duplicates()
print(df[df['Value'].isin(df2.loc[~df2['Value'].gt(df2['Value'].shift(-1)), 'Value'].tolist())])
Output:
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3
We can check if the difference is less than or equal to 5 or NaN. After we check if we have duplicates and keep those rows:
s = df[df['Value'].diff().le(5) | df['Value'].diff().isna()]
s[s.duplicated(keep=False)]
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3

Groupby column keep multiple rows with minimum value

I have a dataframe consisting of two columns with id's and one column with numerical values. I want to groupby the first id column and keep all the rows corresponding to the smallest values in the second column, so that I keep multiple rows if needed.
This is my pandas dataframe
id1 id2 num1
1 1 9
1 1 4
1 2 4
1 2 3
1 3 7
2 6 9
2 6 1
2 6 5
2 9 3
2 9 7
3 2 8
3 4 2
3 4 7
3 4 9
3 4 10
What I want to have is:
id1 id2 num1
1 1 9
1 1 4
2 6 9
2 6 1
2 6 5
3 2 8
I have tried to keep the min value, find the idxmin() or remove duplicates but this ends up with only one row per id1 and id2.
firstS.groupby('id1')['id2'].transform(min)
Many thanks in advance!
You are close, only need compare id2 column with transform Series and filter by boolean indexing:
df = firstS[firstS['id2'] == firstS.groupby('id1')['id2'].transform(min)]
print (df)
id1 id2 num1
0 1 1 9
1 1 1 4
5 2 6 9
6 2 6 1
7 2 6 5
10 3 2 8
Simplest way:
df = df.merge(df.groupby("id1").id2.min().reset_index())

Index matching with multiple columns in python

I have two pandas dataframe with different size. two dataframe looks like
df1 =
x y data
1 2 5
2 2 7
5 3 9
3 5 2
and another dataframe looks like:
df2 =
x y value
5 3 7
1 2 4
3 5 2
7 1 4
4 6 5
2 2 1
7 5 8
I am trying to merge these two dataframe so that the final dataframe expected to have same combination of x and y with respective value. I am expecting final dataframe in this format:
x y data value
1 2 5 4
2 2 7 1
5 3 9 7
3 5 2 2
I tride this code but not getting expected results.
dfB.set_index('x').loc[dfA.x].reset_index()
Use merge, by default how='inner' so it can be omit and if join only on same columns parameter on can be omit too:
print (pd.merge(df1,df2))
x y data value
0 1 2 5 4
1 2 2 7 1
2 5 3 9 7
3 3 5 2 2
If in real data are multiple same column names use:
print (pd.merge(df1,df2, on=['x','y']))
x y data value
0 1 2 5 4
1 2 2 7 1
2 5 3 9 7
3 3 5 2 2
df1.merge(df2,by='x')
This will do

Categories

Resources