I have two Datframes like
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['j0', 'j1', 'j2'])
right = pd.DataFrame({'A': ['A1', 'A0', 'A2'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
i want to column bind by the 'A' column. how to achieve this ?
and i want to do this by pd.concat not pd.merge
Related
I'd like to create a function where I can input an undefined number of arrays, turn them into data frames ,concatenate them appending their columns and output a merged dataframe.
Example:
# Suppose we have 3 arrays:
data1 = {
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B5'],
'C': ['C1', 'C2', 'C3', 'C4', 'C5'],
}
data2 = {
'D': ['D1', 'D2', 'D3', 'D4', 'D5'],
'E': ['E1', 'E2', 'E3', 'E4', 'E5'],
'F': ['F1', 'F2', 'F3', 'F4', 'F5'],
}
data3 = {
'G': ['G1', 'G2', 'G3', 'G4', 'G5'],
'H': ['H1', 'H2', 'H3', 'H4', 'H5'],
'I': ['I1', 'I2', 'I3', 'I4', 'I5'],
}
# We could convert them into data frames using:
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
# And finally join them with:
df4 = pd.concat([df1, df2, df3], axis=1)
The output dataframe would look like this:
I would like to create a function that can do this, but with an unspecified amount of arrays, for example:
func(data1, data2)
func(data1, data2, data3)
func(data1, data2, data...n)
This is a short answer using list comprehension, provided by Ch3steR.
It works and is a very compact answer.
def func(*args): d = [pd.DataFrame(dc) for dc in args]; return pd.concat(d, axis=1)
In the end I went for a longer and slower solution, but that i will easily understand when looking at my code in the future:
def add_df(*args):
""" Function to concatenate columns of unlimited dataframes"""
list = []
for file in args:
df = pd.read_csv(file)
list.append(df)
return pd.concat(list, axis=1)
For the following dataframes which are stored in a list of lists, I want to concat them if there is something to:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
for x in range(2):
df = pd.concat(fr_list[x] if len(fr_list[x]) > 1) # <-- here is the problem
The syntax you want is probably:
...
df = pd.concat((fr for fr in fr_list[x] if len(fr) > 1))
I have a list of csv files which I load as data frames using pd.read_csv()
I am currently trying to iterate through the list of csv and using the pd.concat() method and setting the axis parameter to one to add all the dataframes together by columns.
It is working as hoped however I am encountering the issue that since all of the data frames have the same colums names when I concatenated them I get for example ten columns all with the key "Date"
is there anyway that I can give the colums all unique names example London_Date, Berlin_Date? obviously the names being based on the name of the data frames.
If you pass a list of keys to concat(), you can then individually index any column you want with the given keys like so:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = df1
df3 = df1
add = pd.concat([df1, df2, df3], axis = 1, keys=['Group_1', 'Group_2', 'Group_3'])
print(add.Group_1.A) # or add.Group_2.B etc...
I have two dataframes
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A0', 'A5', 'A6', 'A7'],
'B': ['B1', 'B5', 'B6', 'B7'],
'C': ['A1', 'C5', 'C6', 'C7'],
'D': ['B1', 'D5', 'D6', 'D7']},index=[4, 5, 6, 7])
Output I
pd.merge(df1, df2, how='outer', left_index=True, left_on='A', right_on='A')
Output II
pd.merge(df1, df2, how='outer', right_index=True, left_on='A', right_on='A')
Why do these above two outputs differ? Basically I want clarification regarding the functionality of right_index and left_index?
I have the following 2 simple dataframes.
df1:
df2:
I want to add df2 to df1 by using something like:
df1["CF 0.3"]=df2
However, this only adds values where indexes in df1 and df2 are the same. I would like a way I can add a column so that missing indexes are automatically added and if there is not associated value of that index, it is filled with NaN. Something like this:
The way I did this is by writing
df1=df1.add(df2)
This adds automatically missing indexes but all values are NaN. Then I manually populated values by writing:
df1["CF 0.1"]=dummyDF1
df1["CF 0.3"]=dummyDF2
Is there an easier way to do this? I have a feeling I am missing something.
I hope you understand my question :)
Use concat refer to this documentation for detailed help.
And here is an example based on the documentation:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'X': ['A4', 'A5', 'A6', 'A7'],
'XB': ['B4', 'B5', 'B6', 'B7'],
'XC': ['C4', 'C5', 'C6', 'C7'],
'XD': ['D4', 'D5', 'D6', 'D7']},
index=[4, 5, 6, 7])
df3 = pd.DataFrame({'YA': ['A8', 'A9', 'A10', 'A11'],
'YB': ['B8', 'B9', 'B10', 'B11'],
'YC': ['C8', 'C9', 'C10', 'C11'],
'YD': ['D8', 'D9', 'D10', 'D11']},
index=[8, 9, 10, 11])
#To get the desired result you are looking for you need to reset the index.
#With the dataframes you have you may not be able to merge as well
#Since merge would need a common index or column
frames = [df1.reset_index(drop=True), df2.reset_index(drop=True), df3.reset_index(drop=True)]
df4 = pd.concat(frames, axis=1)
print df4
please read the docs
use concat or merge or join
http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational-algebra
Have a look at the concat function which does what you looking for here.