Append values of row with same values

Append values of row with same values - python

I have following data frame:
1 A a
1 A b
2 B c
1 A d
How do I append all the values of a row with same values to data frame:
1 A a,c,d
2 B c

You can use groupby and apply function join :
df.columns = ['a','b','c']
print (df)
a b c
0 1 A a
1 1 A b
2 2 B c
3 1 A d
print (df.groupby(['a', 'b'])['c'].apply(', '.join).reset_index())
a b c
0 1 A a, b, d
1 2 B c
Or if first column is index:
df.columns = ['a','b']
print (df)
a b
1 A a
1 A b
2 B c
1 A d
df1 = df.b.groupby([df.index, df.a]).apply(', '.join).reset_index(name='c')
df1.columns = ['a','b','c']
print (df1)
a b c
0 1 A a, b, d
1 2 B c

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

I have the following example data set
A
B
C
D
foo
0
1
1
bar
0
0
1
baz
1
1
0
How could extract the column names of each 1 occurrence in a row and put that into another column E so that I get the following table:
A
B
C
D
E
foo
0
1
1
C, D
bar
0
0
1
D
baz
1
1
0
B, C
Note that there can be more than two 1s per row.

You can use DataFrame.dot.
df['E'] = df[['B', 'C', 'D']].dot(df.columns[1:] + ', ').str.rstrip(', ')
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C
Inspired by jezrael's answer in this post.
Another way is that you can convert each row to boolean and use it as a selection mask to filter the column names.
cols = pd.Index(['B', 'C', 'D'])
df['E'] = df[cols].astype('bool').apply(lambda row: ", ".join(cols[row]), axis=1)
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?

Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)

Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

python pandas index of ones (1s) at row-wise

From Pandas Dataframe, how to get the index of all ones at the row level?
My data frame has around a hundred columns. here is an example:
a b c d
0 1 0 1 0
1 0 0 0 1
2 1 1 0 1
3 1 1 0 0
4 1 1 1 1
The expected result is
0 a,c
1 d
2 a,b,d
3 a,b
4 a,b,c,d
I found this question on stackoverflow
index of non "NaN" values in Pandas
but it works at the column level
Thanks in advance.

If there are only 1 and 0 values use DataFrame.dot for matrix multiplication with columns names and separator, last remove separator with Series.str.rstrip:
df['e'] = df.dot(df.columns + ', ').str.rstrip(', ')
#if exist another values like 0,1 and compare 1
#df['e'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
print (df)
a b c d e
0 1 0 1 0 a, c
1 0 0 0 1 d
2 1 1 0 1 a, b, d
3 1 1 0 0 a, b
4 1 1 1 1 a, b, c, d
Also for Series use:
s = df.dot(df.columns + ', ').str.rstrip(', ')
print (s)
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d
dtype: object

Try:
df=df.stack()
df=df.loc[df.eq(1)].reset_index(level=1).groupby(level=0).agg(', '.join)
Outputs:
level_1
0 a, c
1 d
2 a, b, d
3 a, b
4 a, b, c, d

Expand pandas dataframe by replacing cell value with a list

I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!

I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f

You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w

Display two columns with a condition in Pandas

Suppose I have a dataframe df such as
A B C
0 a 1
1 b 1
2 c 2
I would like to return B, and C when C==1, like so
B C
a 1
b 1
I've got as far as df.B[df.C==1], which returns
B
a
b
which is right (countwise) but wrong in the slice. How do I get C?

You can use loc or query:
print df.loc[df.C==1, ['B','C']]
B C
0 a 1
1 b 1
print df[['B','C']].query('C == 1')
B C
0 a 1
1 b 1
Or if you need only column C:
print df.loc[df.C==1, 'C']
0 1
1 1
Name: C, dtype: int64

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append values of row with same values - python

I have following data frame: 1 A a 1 A b 2 B c 1 A d How do I append all the values of a row with same values to data frame: 1 A a,c,d 2 B c

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

Join an array to every row in the pandas dataframe

python pandas index of ones (1s) at row-wise

Expand pandas dataframe by replacing cell value with a list

Display two columns with a condition in Pandas

Categories

Resources