Create new Dataframe from matching two dataframe index's - python

I'm looking create a new dataframe from data in two separate dataframes - effectively matching the index of each cell and input into a two column dataframe. My real datasets have the exact same number of rows and columns, FWIW. Example below:
DF1:
Col1 Col2 Col3
1 2 3
3 8 7
DF2:
Col1 Col2 Col3
A B E
R S W
Desired Dataframe:
Col1 Col2
1 A
2 B
3 E
3 R
8 S
7 W
Thank you for your help!

here is your code
df3 = pd.Series(df1.values.ravel('F'))
df4 = pd.Series(df2.values.ravel('F'))
df = pd.concat([df3, df4], axis=1)

Use, DataFrame.to_numpy and .flatten:
df = pd.DataFrame(
{'Col1': df1.to_numpy().flatten(), 'Col2': df2.to_numpy().flatten()})
# print(df)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

You can do it easily like so:
list1 = df1.values.tolist()
list1 = [item for sublist in list1 for item in sublist]
list2 = df2.values.tolist()
list2 = [item for sublist in list2 for item in sublist]
df = {
'Col1': list1,
'Col2': list2
}
df = DataFrame(df)
print(df)
Hope this helps :)

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].__getitem__)
Result:
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Another way (alternative):
pd.concat((df1.stack(),df2.stack()),axis=1).add_prefix('Col').reset_index(drop=True)
or:
d = {'Col1':df1,'Col2':df2}
pd.concat((v.stack() for k,v in d.items()),axis=1,keys=d.keys()).reset_index(drop=True)
#or pd.concat((d.values()),keys=d.keys()).stack().unstack(0).reset_index(drop=True)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Related

iloc[] by value columns

I want to use iloc with value in column.
df1 = pd.DataFrame({'col1': ['1' ,'1','1','2','2','2','2','2','3' ,'3','3'],
'col2': ['A' ,'B','C','D','E','F','G','H','I' ,'J','K']})
I want to select index 2 in each column value as data frame and the result will be like
col1 col2
1 C
2 F
3 K
Thank you so much
Use GroupBy.nth:
df2 = df1.groupby('col1', as_index=False).nth(2)
Alternative with GroupBy.cumcount:
df2 = df1[df1.groupby('col1').cumcount().eq(2)]
print (df2)
col1 col2
2 1 C
5 2 F
10 3 K
Use GroupBy.nth with as_index=False:
df1.groupby('col1', as_index=False).nth(2)
output:
col1 col2
2 1 C
5 2 F
10 3 K
df1.groupby('col1').agg(lambda ss:ss.iloc[2])
col2
col1
1 C
2 F
3 K

Unexpected row getting changed in pandas loc assignment

I want to copy a portion of a pandas dataframe onto a different portion, overwriting the existing values there. I am using .loc but more rows are changing than the ones I am referencing.
My example:
df = pd.DataFrame({
'col1': ['A', 'B', 'C', 'D', 'E'],
'col2': range(1, 6),
'col3': range(6, 11)
})
print(df)
col1 col2 col3
0 A 1 6
1 B 2 7
2 C 3 8
3 D 4 9
4 E 5 10
I want to write the values of col2 and col3 from the C and D rows onto the A and B rows. Using .loc:
df.loc[0:2, ["col2", "col3"]] = df.loc[2:4, ["col2", "col3"]].values
print(df)
col1 col2 col3
0 A 3 8
1 B 4 9
2 C 5 10
3 D 4 9
4 E 5 10
This does what I want for rows A and B, but row C has also changed. I expect only the first two rows to change, i.e. my expected output is
col1 col2 col3
0 A 3 8
1 B 4 9
2 C 3 8
3 D 4 9
4 E 5 10
Why did the C row also change, and how may I do this with only changing the first two rows?
Unlike list slicing pandas.DataFrame.loc slicing is inclusive-inclusive
Warning Note that contrary to usual python slices, both the start and the stop
are included
so you should do
df.loc[0:1, ["col2", "col3"]] = df.loc[2:3, ["col2", "col3"]].values
In addition, you can also pass a list of exhaustive elements, this way the rows need not to be consecutive:
df.loc[[0,1], ["col2", "col3"]] = df.loc[[2,3], ["col2", "col3"]].values
You went too far with the indices:
df.loc[0:1, ["col2", "col3"]] = df.loc[2:3, ["col2", "col3"]].values

Sum column in one dataframe based on row value of another dataframe

Say, I have one data frame df:
a b c d e
0 1 2 dd 5 Col1
1 2 3 ee 9 Col2
2 3 4 ff 1 Col4
There's another dataframe df2:
Col1 Col2 Col3
0 1 2 4
1 2 3 5
2 3 4 6
I need to add a column sum in the first dataframe, wherein it sums values of columns in the second dataframe df2, based on values of column e in df1.
Expected output
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col4 0
The Sum value in the last row is 0 because Col4 doesn't exist in df2.
What I tried: Writing some lamdas, apply function. Wasn't able to do it.
I'd greatly appreciate the help. Thank you.
Try
df['Sum']=df.e.map(df2.sum()).fillna(0)
df
Out[89]:
a b c d e Sum
0 1 2 dd 5 Col1 6.0
1 2 3 ee 9 Col2 9.0
2 3 4 ff 1 Col4 0.0
Try this. The following solution sums all values for a particular column if present in df2 using apply method and returns 0 if no such column exists in df2.
df1.loc[:,"sum"]=df1.loc[:,"e"].apply(lambda x: df2.loc[:,x].sum() if(x in df2.columns) else 0)
Use .iterrows() to iterate through a data frame pulling out the values for each row as well as index.
A nest for loop style of iteration can be used to grab needed values from the second dataframe and apply them to the first
import pandas as pd
df1 = pd.DataFrame(data={'a': [1,2,3], 'b': [2,3,4], 'c': ['dd', 'ee', 'ff'], 'd': [5,9,1], 'e': ['Col1','Col2','Col3']})
df2 = pd.DataFrame(data={'Col1': [1,2,3], 'Col2': [2,3,4], 'Col3': [4,5,6]})
df1['Sum'] = df1['a'].apply(lambda x: None)
for index, value in df1.iterrows():
sum = 0
for index2, value2 in df2.iterrows():
sum += value2[value['e']]
df1['Sum'][index] = sum
Output:
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col3 15

Create DataFrame from multiple lists?

I have two lists
list1=['a','b','c']
list2=[1,2]
I want my dataframe output to look like:
col1 col2
a 1
a 2
b 1
b 2
c 1
c 2
How can this be done?
Use itertools.product:
import itertools
list1 = ['a','b','c']
list2 = [1,2]
df = pd.DataFrame(itertools.product(list1, list2), columns=['col1', 'col2'])
print(df)
Output:
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
If you don't want to explicitly import itertools, pd.MultiIndex has a from_product method that you might piggyback on:
list1 = ['a','b','c']
list2 = [1, 2]
pd.DataFrame(pd.MultiIndex.from_product((list1, list2)).to_list(), columns=['col1', 'col2'])
col1 col2
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2

Convert dataframe of list in columns to rows

I have a pandas DataFrame of this type
col1 col2 col3
1 [blue] [in,out]
2 [green, green] [in]
3 [green] [in]
and I need convert it to a dataframe that keep the first column and distribute all the other values in columns as rows:
col1 value
1 blue
1 in
1 out
2 green
2 green
2 in
3 green
3 in
Use DataFrame.stack with Series.explode for convert lists, last some data cleaning with DataFrame.reset_index:
df1 = (df.set_index('col1')
.stack()
.explode()
.reset_index(level=1, drop=True)
.reset_index(name='value'))
Alternative with DataFrame.melt and DataFrame.explode:
df1 = (df.melt('col1')
.explode('value')
.sort_values('col1')[['col1','value']]
.reset_index(drop=True)
)
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
Or list comprehension solution:
L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items()
for k1, v1 in v.items()
for x in v1]
df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
Another solution could consist of:
list comprehension to make col1 with new values and
using list concatenation of values in df['col2'] and df['col3'] in order to make value column.
The code is following:
df_final = pd.DataFrame(
{
'col1': [
i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values)
for val in range(len(sublist))
],
'value': sum((df['col2'] + df['col3']).values, [])
}
)
print(df_final)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
d = []
c = []
for i in range(len(df)):
d.append([j for j in df['c2'][i]])
d.append([j for j in df['c3'][i]])
c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
c = [list(j) for j in c]
d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1

Categories

Resources