merging dataframes only by certain columns - python

I try to merge dataframes by index and only take certain columns to the result.
result = pd.concat([self.retailer_categories_probes_df['euclidean_distance'], self.retailers_categories_df['euclidean_distance']])
But with the result I get the 'euclidean_distance' from first table ?
Any idea what is wrong ?
Also how I can give names to the destination columns ?

I think you may need axis=1:
result = pd.concat([self.retailer_categories_probes_df['euclidean_distance'], self.retailers_categories_df['euclidean_distance']], axis=1)
See pd.concat() docs

Related

Inner Join two dataframes does not sort properly - python

I am trying to merge two dataframes on inner join and append the values and I was able to perform the join but for some reason the values are ordered properly in each column.
To explain more about this,
Please find the below screenshot where my first dataframe has the stored values of each column
My second dataframe has the string values which needs to be replaced with the values stored in my dataframe 1 above.
Below is the output that I have got but when you look at the values and compare with dataframe 2, they are not assigned properly, For eg:If you consider row 1 in dataframe 2, the Column 1 should have value(i.e. secind column in Dataframe 2) 1.896552 but in my outut I have something else.
Below is the code I worked with to achive the above result.
Joined_df_wna_test = pd.DataFrame()
for col in Unseen_cleaned_rmv_un_2:
Joined_data = pd.merge(Output_df_unseen, my_dataframe_categorical, on=col, how='inner')
Joined_df = pd.DataFrame(Joined_data)
Joined_df_wna_test[col]= Joined_df.Value
Joined_df_wna_test
Joined_df_wna_test
Could someone please help me in overcoming this issue this?
Found Answer for this
The how='left' part is what actually makes it keep the order
Joined_data = Unseen_cleaned_rmv_un.merge(my_dataframe_categorical, on=col, how='left')

Python Merge Two DataFrames Only Retrieve Specific Columns in the Result

Hi - I want to merge two python DataFrames, but don't want to bring over ALL of the columns from both dataframes to my new dataframe. In the picture below, if I join df1 and df2 on 'acct' and want to bring back all the columns from df1 and ONLY 'entity' from df2, how would I write that? I don't want to have to drop any columns so doing a normal merge isn't what I'm looking for. Can anyone help? Thanks!
When you perform the merge operation, you can modify a dataframe object that is in your function, which will mean the underlying objects df1 and df2 remain unchanged. An example would look like this:
df_result = df1.merge(df2[ ['acct','entity'] ], on ='acct')
This will let you do your partial merge without modifying either original dataframe.

Check if two pandas dataframes are equal with mismatching indexes python

I was wondering if it is possible to check the similarity between the two dataframes below. They are the same, however the first and the third rows are flipped. Is there a way to check that these dataframes are the same regardless of the order of the index? Thank you for any help!
You can use merge and then look for a subset of rows that doesn't exist in either dataframe.
df_a = pd.DataFrame([['a','b','c'], ['c','d','e'], ['e','f','g']], columns=['col1','col2','col3'])
df_b = pd.DataFrame([['e','f','g'], ['c','d','e'], ['a','b','c']], columns=['col1','col2','col3'])
df_merged = pd.merge(df_a, df_b, on=df_a.columns.tolist(), how='outer', indicator='Exist')
print(df_merged[(df_merged['Exist'] != 'both')])
Sort the DFs in the same way and then compare, or iterate through all the columns, sort one column at a time and compare it
If this is ok and you need help writing the code, let me know

How can I join two dataframes with different dtypes?

I have a dataframe(df1) with index as a date range and no columns specified and another dataframe(df2) with float values in every column.
I tried joining a specific column from df2 to df1 using .join() method and ended up with all values as NaN in df1. What should I do to solve this?
It's unclear what you mean without any example of the data or their shape, and without more details about what kind of 'join' you're trying to do. It sounds like you are trying to concatenate dataframes without relying on a column or index level names to join on. That's what join or merge try to do, so if you don't have common values on the on parameter of the join, you'll end up with nans. If I'm correct and you just want a concatenation of dataframes, then you can use concat. I can't provide the code without more details, but it would look something like this:
new_df = pd.concat([df1, df2[['whatever_column_you_want_to_concatenate']]], axis=1)

pandas df.fillna - filling NaNs after outer join with correct values

I have two dataframes, sharing some columns together.
I'm trying to:
1) Merge the two dataframes together, i.e. adding the columns which are different:
diff = df2[df2.columns.difference(df1.columns)]
merged = pd.merge(df1, diff, how='outer', sort=False, on='ID')
Up to here, everything works as expected.
2) Now, to replace the NaN values with the values of df2
merged = merged[~merged.index.duplicated(keep='first')]
merged.fillna(value=df2)
And it is here that I get:
pandas.core.indexes.base.InvalidIndexError
I don't have any duplicates, and I can't find any information as to what can cause this.
The solution to this problem is to use a different method - combine_first()
this way, each row with missing data is filled with data from the other dataframe, as can be seen here Merging together values within Series or DataFrame columns
In case, number of rows changes because of the merge, fillna sometimes cause error. Try the following!
merged.fillna(df2.groupby(level=0).transform("mean"))
related question

Categories

Resources