Pandas replacing subset of column values with another column based on index

Pandas replacing subset of column values with another column based on index - python

I tried to check other questions but didn't find what I needed.
I have a dataframe df:
a b
0 6 4
1 5 6
2 2 2
3 7 4
4 3 6
5 5 2
6 4 7
and a second dataframe df2
d
0 60
1 50
5 50
6 40
I want to replace the values in df['a'] with the values in df2['d'] - but only in the relevant indices.
Output:
a b
0 60 4
1 50 6
2 2 2
3 7 4
4 3 6
5 50 2
6 40 7
All other questions I saw like this one referring to a single value, but I want to replace the values based on entire column.
I know I can iterate the rows one by one and replace the values, but I'm looking for a more efficient way.
Note: df2 does not have indices that are not in df. I want to replace all values in df2 with the values of df.

Simply use indexing:
df.loc[df2.index, 'a'] = df2['d']
output:
a b
0 60 4
1 50 6
2 2 2
3 7 4
4 3 6
5 50 2
6 40 7

Related

How to Convert the row unique values in to columns

I have this dataFrame
dd = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],'feature':[10,10,20,20,10,10,20,20],'h':['h_30','h_60','h_30','h_60','h_30','h_60','h_30','h_60'],'count':[1,2,3,4,5,6,7,8]})
a feature h count
0 1 10 h_30 1
1 1 10 h_60 2
2 1 20 h_30 3
3 1 20 h_60 4
4 2 10 h_30 5
5 2 10 h_60 6
6 2 20 h_30 7
7 2 20 h_60 8
My expected output is I want to shift my h column unique values into column and use count numbers as values
like this
a feature h_30 h_60
0 1 10 1 2
1 1 20 3 4
2 2 10 5 6
3 2 20 7 8
I tried this but got an error saying ValueError: Length of passed values is 8, index implies 2
dd.pivot(index = ['a','feature'],columns ='h',values = 'count' )

df.pivot does not accept list of columns as index for versions below 1.1.0
Changed in version 1.1.0: Also accept list of index names.
Try this:
import pandas as pd
pd.pivot_table(
dd, index=["a", "feature"], columns="h", values="count"
).reset_index().rename_axis(None, 1)

sum numbers in two dataframes based on their intersecting indexes

i have 2 dataframes with some common index and some that are not:
df1
DATA
1 1
2 2
3 3
4 4
5 5
df2
DATA
3 3
4 4
5 5
6 6
7 7
I want to sum/take max (i actually need both for different cols) them, and consider missing indexes as 0.
In this example the result should be:
df_results
DATA
1 1
2 2
3 6
4 8
5 10
6 6
7 7
where 3,4,5 were summed, but the rest remained the same.
Thx!

Try this:
combined = df1.add(df2, fill_value=0)

Filtering pandas dataframe groups based on groups comparison

I am trying to remove corrupted data from my pandas dataframe. I want to remove groups from dataframe that has difference of value bigger than one from the last group. Here is an example:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 8 <- here number of group if I groupby by Value is larger than
7 8 the last groups number by 6, so I want to remove this
8 3 group from dataframe
9 3
Expected result:
Value
0 1
1 1
2 1
3 2
4 2
5 2
6 3
7 3
Edit:
jezrael solution is great, but in my case it is possible that there will be dubplicate group values:
Value
0 1
1 1
2 1
3 3
4 3
5 3
6 1
7 1
Sorry if I was not clear about this.

First remove duplicates for unique rows, then compare difference with shifted values and last filter by boolean indexing:
s = df['Value'].drop_duplicates()
v = s[s.diff().gt(s.shift())]
df = df[~df['Value'].isin(v)]
print (df)
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3

Maybe:
df2 = df.drop_duplicates()
print(df[df['Value'].isin(df2.loc[~df2['Value'].gt(df2['Value'].shift(-1)), 'Value'].tolist())])
Output:
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3

We can check if the difference is less than or equal to 5 or NaN. After we check if we have duplicates and keep those rows:
s = df[df['Value'].diff().le(5) | df['Value'].diff().isna()]
s[s.duplicated(keep=False)]
Value
0 1
1 1
2 1
3 2
4 2
5 2
8 3
9 3

Groupby column keep multiple rows with minimum value

I have a dataframe consisting of two columns with id's and one column with numerical values. I want to groupby the first id column and keep all the rows corresponding to the smallest values in the second column, so that I keep multiple rows if needed.
This is my pandas dataframe
id1 id2 num1
1 1 9
1 1 4
1 2 4
1 2 3
1 3 7
2 6 9
2 6 1
2 6 5
2 9 3
2 9 7
3 2 8
3 4 2
3 4 7
3 4 9
3 4 10
What I want to have is:
id1 id2 num1
1 1 9
1 1 4
2 6 9
2 6 1
2 6 5
3 2 8
I have tried to keep the min value, find the idxmin() or remove duplicates but this ends up with only one row per id1 and id2.
firstS.groupby('id1')['id2'].transform(min)
Many thanks in advance!

You are close, only need compare id2 column with transform Series and filter by boolean indexing:
df = firstS[firstS['id2'] == firstS.groupby('id1')['id2'].transform(min)]
print (df)
id1 id2 num1
0 1 1 9
1 1 1 4
5 2 6 9
6 2 6 1
7 2 6 5
10 3 2 8

Simplest way:
df = df.merge(df.groupby("id1").id2.min().reset_index())

Index matching with multiple columns in python

I have two pandas dataframe with different size. two dataframe looks like
df1 =
x y data
1 2 5
2 2 7
5 3 9
3 5 2
and another dataframe looks like:
df2 =
x y value
5 3 7
1 2 4
3 5 2
7 1 4
4 6 5
2 2 1
7 5 8
I am trying to merge these two dataframe so that the final dataframe expected to have same combination of x and y with respective value. I am expecting final dataframe in this format:
x y data value
1 2 5 4
2 2 7 1
5 3 9 7
3 5 2 2
I tride this code but not getting expected results.
dfB.set_index('x').loc[dfA.x].reset_index()

Use merge, by default how='inner' so it can be omit and if join only on same columns parameter on can be omit too:
print (pd.merge(df1,df2))
x y data value
0 1 2 5 4
1 2 2 7 1
2 5 3 9 7
3 3 5 2 2
If in real data are multiple same column names use:
print (pd.merge(df1,df2, on=['x','y']))
x y data value
0 1 2 5 4
1 2 2 7 1
2 5 3 9 7
3 3 5 2 2

df1.merge(df2,by='x')
This will do

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas replacing subset of column values with another column based on index - python

Simply use indexing: df.loc[df2.index, 'a'] = df2['d'] output: a b 0 60 4 1 50 6 2 2 2 3 7 4 4 3 6 5 50 2 6 40 7

Related

How to Convert the row unique values in to columns

sum numbers in two dataframes based on their intersecting indexes

Filtering pandas dataframe groups based on groups comparison

Groupby column keep multiple rows with minimum value

Index matching with multiple columns in python

Categories

Resources