I would like to combine two columns: Column 1 + Column 2 and that for each row individually. Unfortunately it didn't work for me. How do i solve this?
import pandas as pd
import numpy as np
d = {'Nameid': [1, 2, 3, 1], 'Name': ['Michael', 'Max', 'Susan', 'Michael'], 'Project': ['S455', 'G874', 'B7445', 'Z874']}
df = pd.DataFrame(data=d)
display(df.head(10))
df['Dataframe']='df'
d2 = {'Nameid': [4, 2, 5, 1], 'Name': ['Petrova', 'Michael', 'Mike', 'Gandalf'], 'Project': ['Z845', 'Q985', 'P512', 'Y541']}
df2 = pd.DataFrame(data=d2)
display(df2.head(10))
df2['Dataframe']='df2'
What I tried
df_merged = pd.concat([df,df2])
df_merged.head(10)
df3 = pd.concat([df,df2])
df3['unique_string'] = df['Nameid'].astype(str) + df['Dataframe'].astype(str)
df3.head(10)
As you can see, he didn't combine every row. He probably only has the first combined with all of them. How can I combine the two columns row by row?
What I want
You can simply concat strings like this:
You don't need to do df['Dataframe'].astype(str)
In [363]: df_merged['unique_string'] = df_merged.Nameid.astype(str) + df_merged.Dataframe
In [365]: df_merged
Out[365]:
Nameid Name Project Dataframe unique_string
0 1 Michael S455 df 1df
1 2 Max G874 df 2df
2 3 Susan B7445 df 3df
3 1 Michael Z874 df 1df
0 4 Petrova Z845 df2 4df2
1 2 Michael Q985 df2 2df2
2 5 Mike P512 df2 5df2
3 1 Gandalf Y541 df2 1df2
Please make sure you are using the df3 assign back to df3 ,also do reset_index
df3 = df3.reset_index()
df3['unique_string'] = df3['Nameid'].astype(str) + df3['Dataframe'].astype(str)
Use df3 instead df, also ignore_index=True for default index is added:
df3 = pd.concat([df,df2], ignore_index=True)
df3['unique_string'] = df3['Nameid'].astype(str) + df3['Dataframe']
print (df3)
Nameid Name Project Dataframe unique_string
0 1 Michael S455 df 1df
1 2 Max G874 df 2df
2 3 Susan B7445 df 3df
3 1 Michael Z874 df 1df
4 4 Petrova Z845 df2 4df2
5 2 Michael Q985 df2 2df2
6 5 Mike P512 df2 5df2
7 1 Gandalf Y541 df2 1df2
Related
I have the following dataframe containing scores for a competition as well as a column that counts what number entry for each person.
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jim', 'John','Jim', 'John','Jim','John','Jim','John','Jim','Jack','Jack','Jack','Jack'],'Score': [10,8,9,3,5,0, 1, 2,3, 4,5,6,8,9]})
df['Entry_No'] = df.groupby(['Name']).cumcount() + 1
df
Then I have another table that stores data on the maximum number of entries that each person can have:
df2 = pd.DataFrame({'Name': ['John', 'Jim', 'Jack'],'Limit': [2,3,1]})
df2
I am trying to drop rows from df where the entry number is greater than the Limit according to each person in df2 so that my expected output is this:
If there are any ideas on how to help me achieve this that would be fantastic! Thanks
You can use pandas.merge to create another dataframe and drop columns by your condition:
df3 = pd.merge(df, df2, on="Name", how="left")
df3[df3["Entry_No"] <= df3["Limit"]][df.columns].reset_index(drop=True)
Name Score Entry_No
0 John 10 1
1 Jim 8 1
2 John 9 2
3 Jim 3 2
4 Jim 0 3
5 Jack 5 1
I used how="left" to keep the order of df and reset_index(drop=True) to reset the index of the resulting dataframe.
You could join the 2 dataframes, and then drop with a condition:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Jim', 'John','Jim', 'John','Jim','John','Jim','John','Jim','Jack','Jack','Jack','Jack'],'Score': [10,8,9,3,5,0, 1, 2,3, 4,5,6,8,9]})
df['Entry_No'] = df.groupby(['Name']).cumcount() + 1
df2 = pd.DataFrame({'Name': ['John', 'Jim', 'Jack'],'Limit': [2,3,1]})
df2 = df2.set_index('Name')
df = df.join(df2, on='Name')
df.drop(df[df.Entry_No>df.Limit].index, inplace = True)
gives the expected output
How do I add a merge columns of Pandas dataframe to another dataframe while the new columns of data has less rows? Specifically I need to new column of data to be filled with NaN at the first few rows in the merged DataFrame instead of the last few rows. Please refer to the picture. Thanks.
Use:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
})
df2 = pd.DataFrame({
'SMA':list('rty')
})
df3 = df1.join(df2.set_index(df1.index[-len(df2):]))
Or:
df3 = pd.concat([df1, df2.set_index(df1.index[-len(df2):])], axis=1)
print (df3)
A B SMA
0 a 4 NaN
1 b 5 NaN
2 c 4 NaN
3 d 5 r
4 e 5 t
5 f 4 y
How it working:
First is selected index in df1 by length of df2 from back:
print (df1.index[-len(df2):])
RangeIndex(start=3, stop=6, step=1)
And then is overwrite existing values by DataFrame.set_index:
print (df2.set_index(df1.index[-len(df2):]))
SMA
3 r
4 t
5 y
I have two dataframes
codes are below for the two dfs
import pandas as pd
df1 = pd.DataFrame({'income1': [-13036.0, 1200.0, -12077.5, 1100.0],
'income2': [-30360.0, 2000.0, -2277.5, 1500.0],
})
df2 = pd.DataFrame({'name1': ['abc', 'deb', 'hghg', 'gfgf'],
'name2': ['dfd', 'dfd1', 'df3df', 'fggfg'],
})
I want to combine the 2 dfs to get a single df with names against the respective income values, as shown below. Any help is appreciated. Please note that I want it in the same sequence as shown in my output.
Here is possible convert values to numpy array and flatten with pass to DataFrame cosntructor:
df = pd.DataFrame({'Name': np.ravel(df2.to_numpy()),
'Income': np.ravel(df1.to_numpy())})
print (df)
Name Income
0 abc -13036.0
1 dfd -30360.0
2 deb 1200.0
3 dfd1 2000.0
4 hghg -12077.5
5 df3df -2277.5
6 gfgf 1100.0
7 fggfg 1500.0
Or concat with DataFrame.stack and Series.reset_index for default index values:
df = pd.concat([df2.stack().reset_index(drop=True),
df1.stack().reset_index(drop=True)],axis=1, keys=['Name','Income'])
print (df)
Name Income
0 abc -13036.0
1 dfd -30360.0
2 deb 1200.0
3 dfd1 2000.0
4 hghg -12077.5
5 df3df -2277.5
6 gfgf 1100.0
7 fggfg 1500.0
Try this:
incomes = pd.concat([df1.income1, df1.income2], axis = 0)
names = pd.concat([df2.name1 , df2.name2] , axis = 0)
df = pd.DataFrame({'Name': names, 'Incomes': incomes})
I have two data frames: df1 and df2. They both include information like 'ID', 'Name', 'Score' and 'Status', which I need is to update the 'Score' in df1 if that person's status in df2 is "Edit", and I also need to drop the row in df1 if that person's status in df2 is "Cancel".
For example:
dic1 = {'ID': [1, 2, 3],
'Name':['Jack', 'Tom', 'Annie'],
'Score':[20, 10, 25],
'Status':['New', 'New', 'New']}
dic2 = {'ID': [1, 2],
'Name':['Jack', 'Tom'],
'Score':[28, 10],
'Status':['Edit', 'Cancel']}
df1 = pd.DataFrame(dic1)
df2 = pd.DataFrame(dic2)
The output should be like:
ID Name Score Status
1 Jack 28 Edit
3 Annie 25 New
Any pointers or hints?
Use DataFrame.merge with left join first and then filter out Cancel rows and also columns ending with _ from original DataFrame:
df = df1.merge(df2, on=['ID','Name'], how='left', suffixes=('_', ''))
df = df.loc[df['Status'] != 'Cancel', ~df.columns.str.endswith('_')]
print (df)
ID Name Score Status
0 1 Jack 28 Edit
EDIT Add DataFrame.combine_first for repalce missing rows:
df = df1.merge(df2, on=['ID','Name'], how='left', suffixes=('', '_'))
df = df.loc[df['Status_'] != 'Cancel']
df1 = df.loc[:, df.columns.str.endswith('_')]
df = df1.rename(columns=lambda x: x.rstrip('_')).combine_first(df).drop(df1.columns, axis=1)
print (df)
ID Name Score Status
0 1.0 Jack 28.0 Edit
2 3.0 Annie 25.0 New
Use pandas.DataFrame.update commnad of pandas package.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
df1.update(df2)
print(df1)
df1 = df1[df1.Status != "Cancel"]
print(df1)
I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.
Here is an example with same data and code:
DataFrame 1 :
DataFrame 2:
I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:
Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)
import pandas as pd
data1 = {
'Code': [1, 2, 3],
'Name': ['Company1', 'Company2', 'Company3'],
'Value': [200, 300, 400],
}
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])
data2 = {
'Code': [2],
'Name': ['Company2'],
'Value': [1000],
}
df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])
Any pointers or hints?
Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):
>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index() # to recover the initial structure
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
You can using concat + drop_duplicates which updates the common rows and adds the new rows in df2
pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]:
Code Name Value
0 1 Company1 200
0 2 Company2 1000
2 3 Company3 400
Update due to below comments
df1.set_index(['Code', 'Name'], inplace=True)
df1.update(df2.set_index(['Code', 'Name']))
df1.reset_index(drop=True, inplace=True)
You can merge the data first and then use numpy.where, here's how to use numpy.where
updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
There is a update function available
example:
df1.update(df2)
for more info:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html
You can align indices and then use combine_first:
res = df2.set_index(['Code', 'Name'])\
.combine_first(df1.set_index(['Code', 'Name']))\
.reset_index()
print(res)
# Code Name Value
# 0 1 Company1 200.0
# 1 2 Company2 1000.0
# 2 3 Company3 400.0
Assuming company and code are redundant identifiers, you can also do
import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()
df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)
# Code Name Value
#0 1 Company1 200
#1 2 Company2 1000
#2 3 Company3 400
You can use pd.Series.where on the result of left-joining df1 and df2
merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
You can change the line to
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)
in order to return the value to be an integer.
There's something I often do.
I merge 'left' first:
df_merged = pd.merge(df1, df2, how = 'left', on = 'Code')
Pandas will create columns with extension '_x' (for your left dataframe) and
'_y' (for your right dataframe)
You want the ones that came from the right. So just remove any columns with '_x' and rename '_y':
for col in df_merged.columns:
if '_x' in col:
df_merged .drop(columns = col, inplace = True)
if '_y' in col:
new_name = col.strip('_y')
df_merged .rename(columns = {col : new_name }, inplace=True)
Append the dataset
Drop the duplicate by code
Sort the values
combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')
None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution
indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values