So I have 2 csv files with the same number of columns. The first csv file has its columns named (age, sex). The second file though doesn't name its columns like the first one but it's data corresponds to the according column of the first csv file. How can I concat them properly?
First csv.
Second csv.
This is how I read my files:
df1 = pd.read_csv("input1.csv")
df2 = pd.read_csv("input2.csv", header=None)
I tried using concat() like this but I get 4 columns as a result..
df = pd.concat([df1, df2])
You can also use the append function. Be careful to have the same column names for both, otherwise you will end with 4 columns.
Check this link, I found it very useful.
df1 = pd.read_csv("input1.csv")
df2 = pd.read_csv("input2.csv", header = None)
df2.columns = df1.columns
df = df1.append(df2, ignore_index=True)
I found a solution. After reading the second file I added
df2.columns = df1.columns
Works just like I wanted to. I guess I better research more next time :). Thanks
Final code:
df1 = pd.read_csv("input1.csv")
df2 = pd.read_csv("input2.csv", header = None)
df2.columns = df1.columns
df = pd.concat([df1, df2])
Related
I have 2 data frame from a basic web scrape using Pandas (below). The second table has less columns than the first, and I need to concat the dataframes. I have been manually inserting columns for a while but seeing as they change frequently I would like to have a function that can assess the columns in df2, check whether they are all in df2, and if not, add the column, with the data from df2.
import pandas as pd
link = 'https://en.wikipedia.org/wiki/Opinion_polling_for_the_next_French_presidential_election'
df = pd.read_html(link,header=0)
df1 = df[1]
df1 = df1.drop([0])
df1 = df1.drop('Abs.',axis=1)
df2 = df[2]
df2 = df2.drop([0])
df2 = df2.drop(['Abs.'],axis=1)
Many thanks,
#divingTobi's answer:
pd.concat([df1, df2]) does the trick.
I am trying to add two dataframes but not getting the right result. I have two files in which one file is having column name and other file is having data. I want to merge them.
I am using '\001' delimiter.
Example:
df1:
56447MNEMILY 2703546.742893.9553218262930LP2018-11-21 09:18:46.040618
62872ILOPDYKE 1708138.269688.8052618165922LP2018-11-21 09:18:46.040618
04925MECARATUNK 2302545.231369.9861207221305LP2018-11-21 09:18:46.040618
df2:
meli_zip_cd_basemeli_stt_provncdmeli_city_nmmeli_typmeli_cntry_fipsmeli_latimeli_longimeli_area_cdmeli_fin_cdmeli_last_lnmeli_facmeli_msa_cdmeli_pmsa_cdmeli_dma_cdload_dt
Expected final result:
df_final:
meli_zip_cd_basemeli_stt_provncdmeli_city_nmmeli_typmeli_cntry_fipsmeli_latimeli_longimeli_area_cdmeli_fin_cdmeli_last_lnmeli_facmeli_msa_cdmeli_pmsa_cdmeli_dma_cdload_dt
56447MNEMILY 2703546.742893.9553218262930LP2018-11-21 09:18:46.040618
62872ILOPDYKE 1708138.269688.8052618165922LP2018-11-21 09:18:46.040618
04925MECARATUNK 2302545.231369.9861207221305LP2018-11-21 09:18:46.040618
If I understand you correctly, you want the first (and only) row from df2 to become the header of the first (and only) column in df1:
df1.columns = df2.iloc[0].values
I think I got the solution:
df1 = pd.read_csv('/medaff/eureka/CDP/DMN_MELI_ZIP/DMN_MELI_ZIP.txt', delimiter='\001')
df2 = pd.read_csv('/medaff/eureka/CDP/HEADERS/DMN_MELI_ZIP_HEADER.txt', delimiter='\001')
df1.columns = df2.columns
df1.to_csv('/medaff/eureka/CDP/HEADERS/test.txt', sep ='\001', index=False)
I'm trying to execute a merge with pandas. The two files have a common key ("KEY_PLA") which I'm trying to use with a left join. But unfortunately, all columns which are transferred from the second file to the first file have NaN values.
Here is what I have done so far:
df_1 = pd.read_excel(path1, skiprows=1)
df_2 = pd.read_excel(path2, skiprows=1)
df_1.columns = ["Index", "KEY", "KEY_PLA", "INFO1", "INFO2"]
df_2.columns = ["Index", "KEY_PLA", "INFO4"]
df_1.drop(["Index"], axis=1, inplace=True)
df_2.drop(["Index"], axis=1, inplace=True)
# Merge all dataframes
df_merge = pd.DataFrame()
df_merge = df_1.merge(df_2, left_on="KEY_PLA", right_on="KEY_PLA", how="left")
print(df_merge)
This is the result:
Here are the excel files:
Excel1
Excel2
What is wrong with the code? I also checked the types and even converted the columns in strings. But nothing works.
I think problem is different types of joined columns KEY_PLA, obviously one is integer and another strings.
Solution is cast to same, e.g. to ints:
print (df_1['KEY_PLA'].dtype)
object
print (df_2['KEY_PLA'].dtype)
int64
df_1['KEY_PLA'] = df_1['KEY_PLA'].astype(int)
I am trying to concat two dataframes using the below code. df1 is a daily update of values to the indexes in df2, which is an ongoing monthly dataset. df3 is the result that is saved.
The problem I am experiencing is that when an index value is not in df1 (no values for that particular day), it gets deleted from df3 altogether. In other words, if the index value is not in df2, then it doesn't appear in df3 at all.
How can I keep the original index of df3, so that if the index value is not in df1, it doesn't delete it? I also cannot enter 0 values, as it is relevant to the data that it is empty.
import os
import pandas as pd
import glob
def Monthly_aggregation_merge(month, date):
# file to be merged
df1 = pd.read_csv(r'Data\{}\{}\Aggregated\Aggregated_Daily_All.csv'.format(month,date), usecols=['CU', 'Parameters', 'Total/Max/Min'], index_col =[0,1])
df1 = df1.rename(columns = {'Total/Max/Min':date}) # Change column name
# original file that data should be merged with
df2 = pd.read_csv(r'Data\{}\MonthlyData\July2017NEW.csv'.format(month), index_col = [0,1])
df3 = pd.concat([df2, df1], axis=1).reindex(df1.index)
df3.to_csv(r'Data\{}\MonthlyData\July2017NEW.csv'.format(month))
print 'Monthly Merge Done!'
I have two excel, named df1 and df2.
df1.columns : url, content, ortheryy
df2.columns : url, content, othterxx
Some contents in df1 are empty, and df1 and df2 share some urls(not all).
What I want to do is fill df1's empty content by df2 if that row has same url.
I tried
ndf = pd.merge(df1, df2[['url', 'content']], on='url', how='left')
# how='inner' result same
Which result:
two column: content_x and content_y
I know it can be solve by loop through df1 and df2, but I'd like to do is in pandas way.
I think need Series.combine_first or Series.fillna:
df1['content'] = df1['content'].combine_first(ndf['content_y'])
Or:
df1['content'] = df1['content'].fillna(ndf['content_y'])
It works, because left join create in ndf same index values as df1.