pandas groupby transpose str column

pandas groupby transpose str column - python

here is what I am trying to do:
>>>import pandas as pd
>>>dftemp = pd.DataFrame({'a': [1] * 3 + [2] * 3, 'b': 'a a b c d e'.split()})
a b
0 1 a
1 1 a
2 1 b
3 2 c
4 2 d
5 2 e
6 3 f
how to transpose column 'b' grouped by column 'a', so that output looks like:
a b0 b1 b2
0 1 a a b
3 2 c d e
6 3 f NaN NaN

Using pivot_table with cumcount:
(df.assign(flag=df.groupby('a').b.cumcount())
.pivot_table(index='a', columns='flag', values='b', aggfunc='first')
.add_prefix('B'))
flag B0 B1 B2
a
1 a a b
2 c d e
3 f NaN NaN

You can try of grouping by column and flattening the values associated with group and reframe it as dataframe
df = df.groupby(['a'])['b'].apply(lambda x: x.values.flatten())
pd.DataFrame(df.values.tolist(),index=df.index).add_prefix('B')
Out:
B0 B1 B2
a
1 a a b
2 c d e
3 f None None

you could probably try something like this :
>>> dftemp = pd.DataFrame({'a': [1] * 3 + [2] * 2 + [3]*1, 'b': 'a a b c d e'.split()})
>>> dftemp
a b
0 1 a
1 1 a
2 1 b
3 2 c
4 2 d
5 3 e
>>> dftemp.groupby('a')['b'].apply(lambda df: df.reset_index(drop=True)).unstack()
0 1 2
a
1 a a b
2 c d None
3 e None None

Given the ordering of your DataFrame you could find where the group changes and use np.split to create a new DataFrame.
import numpy as np
import pandas as pd
splits = dftemp[(dftemp.a != dftemp.a.shift())].index.values
df = pd.DataFrame(np.split(dftemp.b.values, splits[1:])).add_prefix('b').fillna(np.NaN)
df['a'] = dftemp.loc[splits, 'a'].values
Output
b0 b1 b2 a
0 a a b 1
1 c d e 2
2 f NaN NaN 3

Related

complete and repeat one dataframe along another one

How do you combine 2 dataframes so that one is repeated over and over and combined for every line of the other dataframe, for example :
d1 = pd.DataFrame([[1,3],[2,4]])
print(d1)
0 1
0 1 3
1 2 4
and
d2 = pd.DataFrame([['A','D'],['B','E'],['C','F']])
print(d2)
0 1
0 A D
1 B E
2 C F
combining in :
d3 = pd.DataFrame([[1,3,'A','D'],[1,3,'B','E'],[1,3,'C','F'],[2,4,'A','D'],[2,4,'B','E'],[2,4,'C','F']])
print(d3)
0 1 2 3
0 1 3 A D
1 1 3 B E
2 1 3 C F
3 2 4 A D
4 2 4 B E
5 2 4 C F
I can loop over d1 and concat, but is there any implemented functionnality already doing this ?
Thanks

I believe what you are searching for is a cross-join.
You can use the following code to get your answer, you will just need to clean up the column naming
df1 = pd.DataFrame([[1,3],[2,4]])
df2 = pd.DataFrame([['A','D'],['B','E'],['C','F']])
df1.merge(df2, how = 'cross')

I hope, this works for your solution. Create a key column with value of 1 in both dataframes and join with that key and then drop it.
import pandas as pd
d1 = pd.DataFrame([[1,3],[2,4]])
print(d1)
d2 = pd.DataFrame([['A','D'],['B','E'],['C','F']])
print(d2)
d1['key'] = 1
d2['key'] = 1
d1.merge(d2, on='key').drop('key', axis=1)

Here is an alternative solution using pd.merge() and df.assign()
d2.columns = ['2', '3']
d3 = pd.merge(d1.assign(key=1), d2.assign(key=1), on='key', suffixes=('', '')).drop('key', axis=1)
print(d3)
0 1 2 3
0 1 3 A D
1 1 3 B E
2 1 3 C F
3 2 4 A D
4 2 4 B E
5 2 4 C F

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?

Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)

Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

How to populate categories in one column and paste the exact value in other column

It has been a long time that I dealt with pandas library. I searched for it but could not come up with an efficient way, which might be a function existed in the library.
Let's say I have the dataframe below:
df1 = pd.DataFrame({'V1':['A','A','B'],
'V2':['B','C','C'],
'Value':[4, 1, 5]})
df1
And I would like to extend this dataset and populate all the combinations of categories and put its corresponding value as exactly the same.
df2 = pd.DataFrame({'V1':['A','B','A', 'C', 'B', 'C'],
'V2':['B','A','C','A','C','B'],
'Value':[4, 4 , 1, 1, 5, 5]})
df2
In other words, in df1, A and B has Value of 4 and I also want to have a row of that B and A has Value of 4 in the second dataframe. It is very similar to melting. I also do not want to use a for loop. I am looking for a more efficient way.

Use:
df = pd.concat([df1, df1.rename(columns={'V2':'V1', 'V1':'V2'})]).sort_index().reset_index(drop=True)
Output:
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5

Or np.vstack:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns)
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5
>>>
For correct order:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index()
V1 V2 Value
0 A B 4
0 B A 4
1 A C 1
1 C A 1
2 B C 5
2 C B 5
>>>
And index reset:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index().reset_index(drop=True)
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
>>>

You can use methods assign and append:
df1.append(df1.assign(V1=df1.V2, V2=df1.V1), ignore_index=True)
Output:
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5

Python - Pandas Tricky sum of columns

This is a tricky question, I have a dataframe like this and I want to create 3 columns with conditional sums such as,
If the id = A then A = A1 and B and C = B1
If the id = B then B = B1 and A and C = A1
Example data:
id A1 B1 A B C
A 5 4 5 4 4
B 6 1 6 1 6
A 7 2 7 2 2
B 6 8 8 6 6
C 2 1 2 1 0
I´m trying to come with a general solution so I don´t need a lot of sum by axis.

Your condition can be reduced to:
if id == A, then column A = column A1, column C = column B1
if id == B, then column B = column B1, column C = column A1
So, it transferred to pandas code as:
df = pd.DataFrame([[5,4],[6,1],[7,2],[6,8],[2,1]], index=['A', 'B', 'A', 'B', 'C'], columns=['A1', 'B1'])
df['A'] = df['A1']
df['B'] = df['B1']
df['C'] = (df.index == 'B')*df['A1'] +(df.index == 'A')*df['B1']
# or faster method from #user3483203
# df['id'] = df.index
# df['C'] = np.select([df.id.eq('A'), df.id.eq('B')], [df.B1, df.A1], 0)
# >>> df
# A1 B1 A B C
# A 5 4 5 4 4
# B 6 1 6 1 6
# A 7 2 7 2 2
# B 6 8 6 8 6
# C 2 1 2 1 0

How can I add a column to a pandas DataFrame that uniquely identifies grouped data? [duplicate]

Given the following data frame:
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':['a','a','b','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!

Use groupby/cumcount:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3

Use groupby.rank function.
Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas groupby transpose str column - python

Using pivot_table with cumcount: (df.assign(flag=df.groupby('a').b.cumcount()) .pivot_table(index='a', columns='flag', values='b', aggfunc='first') .add_prefix('B')) flag B0 B1 B2 a 1 a a b 2 c d e 3 f NaN NaN

You can try of grouping by column and flattening the values associated with group and reframe it as dataframe df = df.groupby(['a'])['b'].apply(lambda x: x.values.flatten()) pd.DataFrame(df.values.tolist(),index=df.index).add_prefix('B') Out: B0 B1 B2 a 1 a a b 2 c d e 3 f None None

Related

complete and repeat one dataframe along another one

Join an array to every row in the pandas dataframe

How to populate categories in one column and paste the exact value in other column

Python - Pandas Tricky sum of columns

How can I add a column to a pandas DataFrame that uniquely identifies grouped data? [duplicate]

Categories

Resources