Concat DataFrame under specific condition - python

For the following dataframes which are stored in a list of lists, I want to concat them if there is something to:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
for x in range(2):
df = pd.concat(fr_list[x] if len(fr_list[x]) > 1) # <-- here is the problem

The syntax you want is probably:
...
df = pd.concat((fr for fr in fr_list[x] if len(fr) > 1))

Related

Function to concat undefinded number of dataframes

I'd like to create a function where I can input an undefined number of arrays, turn them into data frames ,concatenate them appending their columns and output a merged dataframe.
Example:
# Suppose we have 3 arrays:
data1 = {
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B5'],
'C': ['C1', 'C2', 'C3', 'C4', 'C5'],
}
data2 = {
'D': ['D1', 'D2', 'D3', 'D4', 'D5'],
'E': ['E1', 'E2', 'E3', 'E4', 'E5'],
'F': ['F1', 'F2', 'F3', 'F4', 'F5'],
}
data3 = {
'G': ['G1', 'G2', 'G3', 'G4', 'G5'],
'H': ['H1', 'H2', 'H3', 'H4', 'H5'],
'I': ['I1', 'I2', 'I3', 'I4', 'I5'],
}
# We could convert them into data frames using:
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
# And finally join them with:
df4 = pd.concat([df1, df2, df3], axis=1)
The output dataframe would look like this:
I would like to create a function that can do this, but with an unspecified amount of arrays, for example:
func(data1, data2)
func(data1, data2, data3)
func(data1, data2, data...n)
This is a short answer using list comprehension, provided by Ch3steR.
It works and is a very compact answer.
def func(*args): d = [pd.DataFrame(dc) for dc in args]; return pd.concat(d, axis=1)
In the end I went for a longer and slower solution, but that i will easily understand when looking at my code in the future:
def add_df(*args):
""" Function to concatenate columns of unlimited dataframes"""
list = []
for file in args:
df = pd.read_csv(file)
list.append(df)
return pd.concat(list, axis=1)

Split data in list based on condition

I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.
Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]

Append lists to list of lists

I want to append lists of dataframes in an existing list of lists:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
fr_list = [[] for x in range(2)]
fr_list[0].append(df1)
fr_list[0].append(df1)
fr_list[1].append(df1)
fr2 = [[] for x in range(2)]
fr2[0].append(df1)
fr2[1].append(df1)
fr_list.append(fr2) # <-- here is the problem
Output: fr_list = [[df1, df1], [df1], [fr2[0], fr2[1]]] List contains 3 elements
Expected: fr_list = [[df1, df1, fr2[0]],[df1, fr2[1]]] List contains 2 elements
fr_list=[a+b for a,b in zip(fr_list,fr2)]
Replace fr_list.append(fr2) with the above code
Explanation: using zip & list comprehension, add corresponding lists in fr_list & fr2. What you did was appended the outer list in fr_list with outer list in fr & not the inner lists.

Will output change if right_index is used instead of left_index with both left_on and right_on defined in pandas dataframe

I have two dataframes
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A0', 'A5', 'A6', 'A7'],
'B': ['B1', 'B5', 'B6', 'B7'],
'C': ['A1', 'C5', 'C6', 'C7'],
'D': ['B1', 'D5', 'D6', 'D7']},index=[4, 5, 6, 7])
Output I
pd.merge(df1, df2, how='outer', left_index=True, left_on='A', right_on='A')
Output II
pd.merge(df1, df2, how='outer', right_index=True, left_on='A', right_on='A')
Why do these above two outputs differ? Basically I want clarification regarding the functionality of right_index and left_index?

column concat by specific column pandas

I have two Datframes like
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['j0', 'j1', 'j2'])
right = pd.DataFrame({'A': ['A1', 'A0', 'A2'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
i want to column bind by the 'A' column. how to achieve this ?
and i want to do this by pd.concat not pd.merge

Categories

Resources