I am trying to add a column from one pandas data-frame to another pandas data-frame.
Here is data frame 1:
print (df.head())
ID Name Gender
1 John Male
2 Denver 0
0
3 Jeff Male
Note: Both ID and Name are indexes
Here is the data frame 2:
print (df2.head())
ID Appointed
1 R
2
3 SL
Note: ID is the index here.
I am trying to add the Appointed column from df2 to df1 based on the ID. I tried inserting the column and copying the column from df2 but the Appointed column keeps returning all NAN values. So far I had no luck any suggestions would be greatly appreciated.
If I understand your problem correctly, you should get what you need using this:
df1.reset_index().merge(df2.reset_index(), left_on='ID', right_on='ID')
ID Name Gender Appointed
0 1 John Male R
1 2 Denver 0 NaN
2 3 Jeff Male SL
Or, as an alternative, as pointed out by Wen, you could use join:
df1.join(df2)
Gender Appointed
ID Name
1 John Male R
2 Denver 0 NaN
0 NaN NaN NaN
3 Jeff Male SL
Reset index for both datafrmes and then create a column named 'Appointed' in df1 and assign the same column of df2 in it.
After resetting index,both datafrmes have index beginning from 0. When we assign the column, they automatically align according to index which is a property of pandas dataframe
df1.reset_index()
df2.reset_index()
df1['Appointed'] = df2['Appointed']
Related
Let's say I have a data frame that looks like this. I want to delete everything with a certain ID if all of its Name values are empty. Like in this example, every name value is missing in the rows where ID is 2. Even if I have 100 rows with the ID 3 and only one name values is present, I want to keep it.
ID
Name
1
NaN
1
Banana
1
NaN
2
NaN
2
NaN
2
NaN
3
Apple
3
NaN
So the desired output looks like this:
ID
Name
1
NaN
1
Banana
1
NaN
3
Apple
3
NaN
Everything I tried so far was wrong. In this attempt, I tried to count every NaN Value that belongs to an ID, but it still returns me too many rows. This is the closest I got to my desired outcome.
df = df[(df['ID']) & (df['Name'].isna().sum()) != 0]
You want to exclude rows from IDs that have as many NaNs as they have rows. Therefore, you can group by ID and count their number of rows and number of NaNs.
Based on this result, you can get the IDs from people whose row count equals their NaN count and exclude them from your original dataframe.
# Declare column that indicates if `Name` is NaN
df['isna'] = df['Name'].isna().astype(int)
# Declare a dataframe that counts the rows and NaNs per `ID`
counter = df.groupby('ID').agg({'Name':'size', 'isna':'sum'})
# Get ID's from people who have as many NaNs as they have rows
exclude = counter[counter['Name'] == counter['isna']].index.values
# Exclude these IDs from your data
df = df[~df['ID'].isin(exclude)]
Using .groupby and .query
ids = df.groupby(["ID", "Name"]).agg(Count=("Name", "count")).reset_index()["ID"].tolist()
df = df.query("ID.isin(#ids)").reset_index(drop=True)
print(df)
Output:
ID Name
0 1 NaN
1 1 Banana
2 1 NaN
3 3 Apple
4 3 NaN
i have two dataframe:
df1:
colname value
gender M
status business
age 60
df2:
colname value
name Susan
Place Africa
gender F
Is there a way i can concatenate these two dataframe in a way as the expected output? I tried outer join but it doesnot work, Thank you in advance.
Note: No dataframes have always the same common attribute, and also I am trying to remove the colname and value column.
Expected output:
gender status age name Place
0 M business 60 0 0
1 F 0 0 Susan Africa
You can convert to Series with colname as index and concat:
dfs = [df1, df2]
out = pd.concat([d.set_index('colname')['value'] for d in dfs],
axis=1, ignore_index=True).T
output:
colname gender status age name Place
0 M business 60 NaN NaN
1 F NaN NaN Susan Africa
Try this:
pd.concat([df1, df2], axis=0).fillna(0).reset_index(drop=True)
gender status age name place
0 M business 60 0 0
1 F 0 0 Susan Africa
The fillna will replace the NaN values with 0.
Below line will resolves your issue , you can use the pandas transpose function to solve this problem.
df_new = pd.concat([df1,df2])
df_new.set_index("colname").T
In python, I have a df that looks like this
Name ID
Anna 1
Sarah 2
Max 3
And a df that looks like this
Name ID
Dan 1
Hallie 2
Cam 3
How can I merge the df’s so that the ID column looks like this
Name ID
Anna 1
Sarah 2
Max 3
Dan 4
Hallie 5
Cam 6
This is just a minimal reproducible example. My actual data set has 1000’s of values. I’m basically merging data frames and want the ID’s in numerical order (continuation of previous data frame) instead of repeating from one each time.
Use pd.concat:
out = pd.concat([df1, df2.assign(ID=df2['ID'] + df1['ID'].max())], ignore_index=True)
print(out)
# Output
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
Concatenate the two DataFrames, reset_index and use the new index to assign "ID"s
df_new = pd.concat((df1, df2)).reset_index(drop=True)
df_new['ID'] = df_new.index + 1
Output:
Name ID
0 Anna 1
1 Sarah 2
2 Max 3
3 Dan 4
4 Hallie 5
5 Cam 6
You can concat dataframes with ignore_index=True and then set ID column:
df = pd.concat([df1, df2], ignore_index=True)
df['ID'] = df.index + 1
I have a dataframe formatted like this in pandas.
(df)
School ID Column 1
School 1 AD6000
School 2 3000TO4000
School 3 5000TO6000
School 4 AC2000
School 5 BB3300
School 6 9000TO9900
....
All I want to do is split column 1 rows that have the word 'TO' in it as a delimiter into two new columns while leaving the original. The result would be this.
(df)
School ID Column 1 Column 2 Column 3
School 1 AD6000 NaN NaN
School 2 3000TO4000 3000 4000
School 3 5000TO6000 5000 6000
School 4 AC2000 NaN NaN
School 5 BB3300 NaN NaN
School 6 9000TO9900 9000 9900
....
Here's the code I have that I thought works, but it turns out it is leaving blanks in columns 2 and 3 instead of splitting the numbers to the left and right of TO into their respective columns.
df[['Column 2','Column 3']] = df['Column 1'].str.extract(r'(\d+)TO(\d+)')
Thanks for the help.
That's because the right hand side is a dataframe with different column names (0, 1) and Pandas couldn't find Column 2 or Column 3 in that dataframe.
You can pass the underlying numpy array instead of the dataframe:
df[['Column 2','Column 3']] = df['Column 1'].str.extract(r'(\d+)TO(\d+)').values
Output:
School ID Column 1 Column 2 Column 3
0 School 1 AD6000 NaN NaN
1 School 2 3000TO4000 3000 4000
2 School 3 5000TO6000 5000 6000
3 School 4 AC2000 NaN NaN
4 School 5 BB3300 NaN NaN
5 School 6 9000TO9900 9000 9900
Use
new = df["Column 1"].str.split("TO", n = 1, expand = True)
And give the resulting columns new names
df["Col1"]= new[0]
df["Col2"]= new[1]
I try to create a new column in panda dataframe. I have names in one column, I want to attain numbers to them in a new column. If name is repeated sequentially, they get the same number, if they are repeated after different names then they should get another number
For example, my df is like
Name/
Stephen
Stephen
Mike
Carla
Carla
Stephen
my new column should be
Numbers/
0
0
1
2
2
3
Sorry, I couldn't paste my dataframe here.
Try:
df['Numbers'] = (df['Name'] != df['Name'].shift()).cumsum() - 1
Output:
Name Numbers
0 Stephen 0
1 Stephen 0
2 Mike 1
3 Carla 2
4 Carla 2
5 Stephen 3