How to groupby and aggregate joining values as a string [duplicate] - python

This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 2 years ago.
I have a dataframe structured like this:
df_have = pd.DataFrame({'id':[1,1,2,3,4,4], 'desc':['yes','no','chair','bird','person','car']})
How can I get something like this:
df_want = pd.DataFrame({'id':[1,2,3,4], 'desc':['yes no','chair','bird','person car']})

Use groupby().apply:
df_have.groupby('id', as_index=False)['desc'].apply(' '.join)
Output:
id desc
0 1 yes no
1 2 chair
2 3 bird
3 4 person car

I will do agg with groupby
df = df_have.groupby('id',as_index=False)[['desc']].agg(' '.join)
id desc
0 1 yes no
1 2 chair
2 3 bird
3 4 person car

Related

Replace the dataframe entries with binary value [duplicate]

This question already has answers here:
New column in pandas df based on array
(3 answers)
Create a new column with [0,1] based on match between two rows in Python
(2 answers)
How to update/create column in pandas based on values in a list
(2 answers)
Closed 5 months ago.
Trying to replace certain strings with a binary value. I tried to find and replace but it only works for one value. I'd like to do it for multiple different labels.
I'd like to replace dog and bat with 1 and cat and snail with 0 :
animal
0 cat
1 dog
2 snail
3 bat
4 deer
To:
animal
0 0
1 1
2 0
3 1
4 0
Here is my sample code:
cold=['cat','snail','deer']
df['animal'] = np.where(df['animal']=cold, '0', '1')
Your code almost works, just need a slight change.
df['animal'] = np.where(df['animal'].isin(cold), '0', '1')
animal
0 0
1 1
2 0
3 1
4 0
Or you could use the answer in the comment.
df['animal'] = (~df['animal'].isin(cold)).astype(int)
I Like np.select the most:
import numpy as np
cold = ['cat','snail','deer']
df = (df
.assign(animal=lambda x: np.select([x.animal.isin(cold)],
[0], default=1)
)
)

Count the number of occurrences from a column [duplicate]

This question already has answers here:
What is the most efficient way of counting occurrences in pandas?
(4 answers)
Closed 5 months ago.
Be a DataFrame in pandas of this format:
ID
time
other
0
81219
blue
0
32323
green
1
423
red
1
4232
blue
1
42424
red
2
42422
blue
I simply want to create a DataFrame like the following by counting the number of times each row is output in the previous DataFrame.
ID
number_appears
0
2
1
3
2
1
Try this:
df.groupby('ID').count()

what is the best way to join series with additional symbol and left out none and nan value? [duplicate]

This question already has answers here:
How to replace the white space in a string in a pandas dataframe?
(4 answers)
Closed 1 year ago.
I got a series like this
0 stand and on the top of the m
1 be aware of the p
2 in the night o
3 tt
4 锉
Here is my code
x1=x.str.split(pat='/').str[0].copy()
x2=x1.str.split(expand=True).copy()
x2['combined']=x2[x2.columns].apply(lambda row: '+'.join(row.values.astype(str)), axis=1)
x2['combined']
the result of x2 is
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p+None+None+None
2 in+the+night+o+None+None+None+None
3 tt+None+None+None+None+None+None+None
4 Nan+Nan+Nan+Nan+Nan+Nan+Nan+Nan
The outcome I want is
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p
2 in+the+night+o
3 tt
4
what should I do?
Just replace the spacer:
x.str.replace('\s+', '+', regex=True)
output:
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p
2 in+the+night+o
3 tt
4 锉
Use:
x['combined']=x.str.split(pat='/').str[0].str.split().str.join('+')

how to replace variables in a column [duplicate]

This question already has answers here:
Replacing column values in a pandas DataFrame
(16 answers)
Closed 2 years ago.
i wanted to to replace the yes and no values in No-show column to be changed to 0 and 1 valuesenter image description here
Here is a simple answer:
df = pd.DataFrame({'No-show':['Yes','No','No','Yes']})
df['No-show'] = df['No-show'].replace('Yes',1).replace('No',0)
df
output:
No-show
0 1
1 0
2 0
3 1

Sort data frame in ascending order by mean of other column [duplicate]

This question already has answers here:
How to sort a dataFrame in python pandas by two or more columns?
(3 answers)
Closed 3 years ago.
I have a data frame:
df =
ID Num
a 3
b 4
b 2
a 1
Want to sort in ascending order by taking into account unique values of ID column
My Try:
df.sort_values(by=['Num'])
But it gave me ascending order by neglecting ID column
Desired output:
df =
ID Num
a 1
a 3
b 2
b 4
Just do:
df.sort_values(['ID', 'Num'])
Output
ID Num
3 a 1
0 a 3
2 b 2
1 b 4

Categories

Resources