Create new Dataframe from matching two dataframe index's

Create new Dataframe from matching two dataframe index's - python

I'm looking create a new dataframe from data in two separate dataframes - effectively matching the index of each cell and input into a two column dataframe. My real datasets have the exact same number of rows and columns, FWIW. Example below:
DF1:
Col1 Col2 Col3
1 2 3
3 8 7
DF2:
Col1 Col2 Col3
A B E
R S W
Desired Dataframe:
Col1 Col2
1 A
2 B
3 E
3 R
8 S
7 W
Thank you for your help!

here is your code
df3 = pd.Series(df1.values.ravel('F'))
df4 = pd.Series(df2.values.ravel('F'))
df = pd.concat([df3, df4], axis=1)

Use, DataFrame.to_numpy and .flatten:
df = pd.DataFrame(
{'Col1': df1.to_numpy().flatten(), 'Col2': df2.to_numpy().flatten()})
# print(df)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

You can do it easily like so:
list1 = df1.values.tolist()
list1 = [item for sublist in list1 for item in sublist]
list2 = df2.values.tolist()
list2 = [item for sublist in list2 for item in sublist]
df = {
'Col1': list1,
'Col2': list2
}
df = DataFrame(df)
print(df)
Hope this helps :)

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].__getitem__)
Result:
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Another way (alternative):
pd.concat((df1.stack(),df2.stack()),axis=1).add_prefix('Col').reset_index(drop=True)
or:
d = {'Col1':df1,'Col2':df2}
pd.concat((v.stack() for k,v in d.items()),axis=1,keys=d.keys()).reset_index(drop=True)
#or pd.concat((d.values()),keys=d.keys()).stack().unstack(0).reset_index(drop=True)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Related

iloc[] by value columns

I want to use iloc with value in column.
df1 = pd.DataFrame({'col1': ['1' ,'1','1','2','2','2','2','2','3' ,'3','3'],
'col2': ['A' ,'B','C','D','E','F','G','H','I' ,'J','K']})
I want to select index 2 in each column value as data frame and the result will be like
col1 col2
1 C
2 F
3 K
Thank you so much

Use GroupBy.nth:
df2 = df1.groupby('col1', as_index=False).nth(2)
Alternative with GroupBy.cumcount:
df2 = df1[df1.groupby('col1').cumcount().eq(2)]
print (df2)
col1 col2
2 1 C
5 2 F
10 3 K

Use GroupBy.nth with as_index=False:
df1.groupby('col1', as_index=False).nth(2)
output:
col1 col2
2 1 C
5 2 F
10 3 K

df1.groupby('col1').agg(lambda ss:ss.iloc[2])
col2
col1
1 C
2 F
3 K

Unexpected row getting changed in pandas loc assignment

I want to copy a portion of a pandas dataframe onto a different portion, overwriting the existing values there. I am using .loc but more rows are changing than the ones I am referencing.
My example:
df = pd.DataFrame({
'col1': ['A', 'B', 'C', 'D', 'E'],
'col2': range(1, 6),
'col3': range(6, 11)
})
print(df)
col1 col2 col3
0 A 1 6
1 B 2 7
2 C 3 8
3 D 4 9
4 E 5 10
I want to write the values of col2 and col3 from the C and D rows onto the A and B rows. Using .loc:
df.loc[0:2, ["col2", "col3"]] = df.loc[2:4, ["col2", "col3"]].values
print(df)
col1 col2 col3
0 A 3 8
1 B 4 9
2 C 5 10
3 D 4 9
4 E 5 10
This does what I want for rows A and B, but row C has also changed. I expect only the first two rows to change, i.e. my expected output is
col1 col2 col3
0 A 3 8
1 B 4 9
2 C 3 8
3 D 4 9
4 E 5 10
Why did the C row also change, and how may I do this with only changing the first two rows?

Unlike list slicing pandas.DataFrame.loc slicing is inclusive-inclusive
Warning Note that contrary to usual python slices, both the start and the stop
are included
so you should do
df.loc[0:1, ["col2", "col3"]] = df.loc[2:3, ["col2", "col3"]].values

In addition, you can also pass a list of exhaustive elements, this way the rows need not to be consecutive:
df.loc[[0,1], ["col2", "col3"]] = df.loc[[2,3], ["col2", "col3"]].values

You went too far with the indices:
df.loc[0:1, ["col2", "col3"]] = df.loc[2:3, ["col2", "col3"]].values

Sum column in one dataframe based on row value of another dataframe

Say, I have one data frame df:
a b c d e
0 1 2 dd 5 Col1
1 2 3 ee 9 Col2
2 3 4 ff 1 Col4
There's another dataframe df2:
Col1 Col2 Col3
0 1 2 4
1 2 3 5
2 3 4 6
I need to add a column sum in the first dataframe, wherein it sums values of columns in the second dataframe df2, based on values of column e in df1.
Expected output
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col4 0
The Sum value in the last row is 0 because Col4 doesn't exist in df2.
What I tried: Writing some lamdas, apply function. Wasn't able to do it.
I'd greatly appreciate the help. Thank you.

Try
df['Sum']=df.e.map(df2.sum()).fillna(0)
df
Out[89]:
a b c d e Sum
0 1 2 dd 5 Col1 6.0
1 2 3 ee 9 Col2 9.0
2 3 4 ff 1 Col4 0.0

Try this. The following solution sums all values for a particular column if present in df2 using apply method and returns 0 if no such column exists in df2.
df1.loc[:,"sum"]=df1.loc[:,"e"].apply(lambda x: df2.loc[:,x].sum() if(x in df2.columns) else 0)

Use .iterrows() to iterate through a data frame pulling out the values for each row as well as index.
A nest for loop style of iteration can be used to grab needed values from the second dataframe and apply them to the first
import pandas as pd
df1 = pd.DataFrame(data={'a': [1,2,3], 'b': [2,3,4], 'c': ['dd', 'ee', 'ff'], 'd': [5,9,1], 'e': ['Col1','Col2','Col3']})
df2 = pd.DataFrame(data={'Col1': [1,2,3], 'Col2': [2,3,4], 'Col3': [4,5,6]})
df1['Sum'] = df1['a'].apply(lambda x: None)
for index, value in df1.iterrows():
sum = 0
for index2, value2 in df2.iterrows():
sum += value2[value['e']]
df1['Sum'][index] = sum
Output:
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col3 15

Create DataFrame from multiple lists?

I have two lists
list1=['a','b','c']
list2=[1,2]
I want my dataframe output to look like:
col1 col2
a 1
a 2
b 1
b 2
c 1
c 2
How can this be done?

Use itertools.product:
import itertools
list1 = ['a','b','c']
list2 = [1,2]
df = pd.DataFrame(itertools.product(list1, list2), columns=['col1', 'col2'])
print(df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

If you don't want to explicitly import itertools, pd.MultiIndex has a from_product method that you might piggyback on:
list1 = ['a','b','c']
list2 = [1, 2]
pd.DataFrame(pd.MultiIndex.from_product((list1, list2)).to_list(), columns=['col1', 'col2'])
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Convert dataframe of list in columns to rows

I have a pandas DataFrame of this type
col1 col2 col3
1 [blue] [in,out]
2 [green, green] [in]
3 [green] [in]
and I need convert it to a dataframe that keep the first column and distribute all the other values in columns as rows:
col1 value
1 blue
1 in
1 out
2 green
2 green
2 in
3 green
3 in

Use DataFrame.stack with Series.explode for convert lists, last some data cleaning with DataFrame.reset_index:
df1 = (df.set_index('col1')
.stack()
.explode()
.reset_index(level=1, drop=True)
.reset_index(name='value'))
Alternative with DataFrame.melt and DataFrame.explode:
df1 = (df.melt('col1')
.explode('value')
.sort_values('col1')[['col1','value']]
.reset_index(drop=True)
)
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
Or list comprehension solution:
L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items()
for k1, v1 in v.items()
for x in v1]
df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in

Another solution could consist of:
list comprehension to make col1 with new values and
using list concatenation of values in df['col2'] and df['col3'] in order to make value column.
The code is following:
df_final = pd.DataFrame(
{
'col1': [
i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values)
for val in range(len(sublist))
],
'value': sum((df['col2'] + df['col3']).values, [])
}
)
print(df_final)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in

d = []
c = []
for i in range(len(df)):
d.append([j for j in df['c2'][i]])
d.append([j for j in df['c3'][i]])
c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
c = [list(j) for j in c]
d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new Dataframe from matching two dataframe index's - python

here is your code df3 = pd.Series(df1.values.ravel('F')) df4 = pd.Series(df2.values.ravel('F')) df = pd.concat([df3, df4], axis=1)

Use, DataFrame.to_numpy and .flatten: df = pd.DataFrame( {'Col1': df1.to_numpy().flatten(), 'Col2': df2.to_numpy().flatten()}) # print(df) Col1 Col2 0 1 A 1 2 B 2 3 E 3 3 R 4 8 S 5 7 W

You can do it easily like so: list1 = df1.values.tolist() list1 = [item for sublist in list1 for item in sublist] list2 = df2.values.tolist() list2 = [item for sublist in list2 for item in sublist] df = { 'Col1': list1, 'Col2': list2 } df = DataFrame(df) print(df) Hope this helps :)

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].getitem) Result: Col1 Col2 0 1 A 1 2 B 2 3 E 3 3 R 4 8 S 5 7 W

Related

iloc[] by value columns

Unexpected row getting changed in pandas loc assignment

Sum column in one dataframe based on row value of another dataframe

Create DataFrame from multiple lists?

Convert dataframe of list in columns to rows

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new Dataframe from matching two dataframe index's - python

here is your code df3 = pd.Series(df1.values.ravel('F')) df4 = pd.Series(df2.values.ravel('F')) df = pd.concat([df3, df4], axis=1)

Use, DataFrame.to_numpy and .flatten: df = pd.DataFrame( {'Col1': df1.to_numpy().flatten(), 'Col2': df2.to_numpy().flatten()}) # print(df) Col1 Col2 0 1 A 1 2 B 2 3 E 3 3 R 4 8 S 5 7 W

You can do it easily like so: list1 = df1.values.tolist() list1 = [item for sublist in list1 for item in sublist] list2 = df2.values.tolist() list2 = [item for sublist in list2 for item in sublist] df = { 'Col1': list1, 'Col2': list2 } df = DataFrame(df) print(df) Hope this helps :)

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].__getitem__) Result: Col1 Col2 0 1 A 1 2 B 2 3 E 3 3 R 4 8 S 5 7 W

Related

iloc[] by value columns

Unexpected row getting changed in pandas loc assignment

Sum column in one dataframe based on row value of another dataframe

Create DataFrame from multiple lists?

Convert dataframe of list in columns to rows

Categories

Resources

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].getitem) Result: Col1 Col2 0 1 A 1 2 B 2 3 E 3 3 R 4 8 S 5 7 W