This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.
I need to get label names for each corresponding code by looking in the look up dataframe.
Output should be dataframe.
I tried it as follows.
df = pd.read_csv(filepath)
codes = df['codes'].values
labels = df['labels'].values
df2 = pd.read_csv(filepath)
print (df2.shape)
for ix in df2.index:
code = df2.loc[ix, 'code']
df2.loc[ix, 'label'] = labels[codes==code][0]
print (df2)
Result is correct, but it's very slow... for looping is very slow
Can you help me?
You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:
df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]
The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.
Related
I have 2 variables (dataframes) one is 47 colums wide and the other is 87, they are DF2 and DF2.
Then I have a variable (dataframe) called full_data. Df1 and DF2 are two different subset of data I want to merge together once I find 2 rows are equal.
I am doing everything I want so far besides appending the right value to the new dataframe.
below is the line of code I have been playing around with:
full_data = full_data.append(pd.concat([df1[i:i+1].copy(),df2[j:j+1]].copy(), axis=1), ignore_index = True)
once I find the rows in both Df1 and DF2 are equal I am trying to read both those rows and put them one after the other as a single row in the variable full_data. What is happening right now is that the line of code is writting 2 rows and no one as I want.
what I want is full_data.append(Df1 DF2) and right now I am getting
full_data(i)=DF1
full_data(i+1)=DF2
Any help would be apreciated.
EM
full_data = full_data.append(pd.concat([df1[i:i+1].copy(),df2[j:j+1]].copy(), axis=1), ignore_index = True)
In the end I solved my problem. Probably I was not clear enough but my question but what was happening when concatenating is that I was getting duplicated or multiple rows when the expected result was getting a single row concatenation.
The issues was found to be with the indexing. Indexing had to be reset because of the way pandas works.
I found an example and explanation here
My solution here:
df3 = df2[j:j+1].copy()
df4 = df1[i:i+1].copy()
full_data = full_data.append(
pd.concat([df4.reset_index(drop=True), df3.reset_index(drop=True)], axis=1),
ignore_index = True
)
I first created a copy of my variables and then reset the indexes.
Good day All,
I have two data frames that needs to be merged which is a little different to the ones I found so far and could not get it working. What I am currently getting, which I am sure is to do with the index, as dataframe 1 only has 1 record. I need to copy the contents of dataframe one into new columns of dataframe 2 for all rows.
Current problem highlighted in red
I have tried merge, append, reset index etc...
DF 1:
Dataframe 1
DF 2:
Dataframe 2
Output Requirement:
Required Output
Any suggestions would be highly appreciated
Update:
I got it to work using the below statements, is there a more dynamic way than specifying the column names?
mod_df['Type'] = mod_df['Type'].fillna(method="ffill")
mod_df['Date'] = mod_df['Date'].fillna(method="ffill")
mod_df['Version'] = mod_df['Version'].fillna(method="ffill")
Assuming you have a single row in df1, use a cross merge:
out = df2.merge(df1, how='cross')
Before I start, I have found similar questions and tried the responding answers however, I am still running into an issue and can't figure out why.
I have 6 data frames. I want one resulting data frame that merges all of these 6 into one, based on their common index column Country. Things to note are: the data frames have different number of rows, some country's do not have corresponding values, resulting in NaN.
Here is what I have tried:
data_frames = [WorldPopulation_df, WorldEconomy_df, WorldEducation_df, WorldAggression_df, WorldCorruption_df, WorldCyberCapabilities_df]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Country'], how = 'outer'), data_frames)
This doesn't work as the final resulting data frame pairs up the wrong values with wrong country. Any suggestions?
let's see, "pd.merge" is used when you would add new columns from a key.
In case you have 6 dataframes with the same number of columns, in the same order, you can try this:
columns_order = ['country', 'column_1']
concat_ = pd.concat(
[data_1[columns_order], data_2[columns_order], data_3[columns_order],
data_4[columns_order], data_5[columns_order], data_6[columns_order]],
ignore_index=True,
axis=0
)
From here, if you want to have a single value for the "country" column, you can apply a group by to it:
concat_.groupby(by=['country']).agg({'column_1': max}).reset_index()
I want to split one dataframe into two different data frames based on one of the columns value
Eg: df(parents dataframe)
df has a column MODE with values swiggy , zomato
df1 with all the columns which has common with MODE = swiggy
df2 with all the columns which has common with MODE= Zomato
I know its simple, I am beginner, Please help. Thanks.
df1 = df[df['MODE'] == 'swiggy'] and df2 = df[df['MODE'] == 'Zomato'].
This way, you will be filtering the dataframe based on the MODE column and assigning the resulting dataframe to new variables.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
Aim is to detect fraud from this dataset.
I have two dataframes with columns as:
DF1[customerEmail, customerphone, customerdevice,customeripadd,NoOftransactions,Fraud] etc (168,11)
DF2[customerEmail,transactionid, payment methods,orderstatus] etc (623,11)
The customerEmail column is common in both the dataframes so it makes sense to merge tables on customerEmail.
The problem is that I have repeating customerEmail in DF2 with no reference in DF1. So when I merge using:
: DF3 = pd.merge(DF1, DF2, on='customerEmail')
the total size of rows and columns is (819,18) with repeating email ID having misleading data.
I want it to match using customerEmail from DF1 so my final dataframe DF3 should be somewhere equal to DF1.
Here's a link to the data for you to look at. Cheers
https://www.kaggle.com/aryanrastogi7767/ecommerce-fraud-data
Try changing the how parameter to 'left'.
For example:
DF3 = DF1.merge(DF2, how='left', on='customerEmail')
Failing this, we prob need some more information.
Maybe you should consider a different value for the option "how". By default, it is "inner" meaning deleting all rows without any match
Maybe the option "right", would help you, as then DF2 is the reference and DF1 is join to DF2.