Replace the dataframe entries with binary value [duplicate] - python

This question already has answers here:
New column in pandas df based on array
(3 answers)
Create a new column with [0,1] based on match between two rows in Python
(2 answers)
How to update/create column in pandas based on values in a list
(2 answers)
Closed 5 months ago.
Trying to replace certain strings with a binary value. I tried to find and replace but it only works for one value. I'd like to do it for multiple different labels.
I'd like to replace dog and bat with 1 and cat and snail with 0 :
animal
0 cat
1 dog
2 snail
3 bat
4 deer
To:
animal
0 0
1 1
2 0
3 1
4 0
Here is my sample code:
cold=['cat','snail','deer']
df['animal'] = np.where(df['animal']=cold, '0', '1')

Your code almost works, just need a slight change.
df['animal'] = np.where(df['animal'].isin(cold), '0', '1')
animal
0 0
1 1
2 0
3 1
4 0
Or you could use the answer in the comment.
df['animal'] = (~df['animal'].isin(cold)).astype(int)

I Like np.select the most:
import numpy as np
cold = ['cat','snail','deer']
df = (df
.assign(animal=lambda x: np.select([x.animal.isin(cold)],
[0], default=1)
)
)

Related

Count the number of occurrences from a column [duplicate]

This question already has answers here:
What is the most efficient way of counting occurrences in pandas?
(4 answers)
Closed 5 months ago.
Be a DataFrame in pandas of this format:
ID
time
other
0
81219
blue
0
32323
green
1
423
red
1
4232
blue
1
42424
red
2
42422
blue
I simply want to create a DataFrame like the following by counting the number of times each row is output in the previous DataFrame.
ID
number_appears
0
2
1
3
2
1
Try this:
df.groupby('ID').count()

what is the best way to join series with additional symbol and left out none and nan value? [duplicate]

This question already has answers here:
How to replace the white space in a string in a pandas dataframe?
(4 answers)
Closed 1 year ago.
I got a series like this
0 stand and on the top of the m
1 be aware of the p
2 in the night o
3 tt
4 锉
Here is my code
x1=x.str.split(pat='/').str[0].copy()
x2=x1.str.split(expand=True).copy()
x2['combined']=x2[x2.columns].apply(lambda row: '+'.join(row.values.astype(str)), axis=1)
x2['combined']
the result of x2 is
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p+None+None+None
2 in+the+night+o+None+None+None+None
3 tt+None+None+None+None+None+None+None
4 Nan+Nan+Nan+Nan+Nan+Nan+Nan+Nan
The outcome I want is
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p
2 in+the+night+o
3 tt
4
what should I do?
Just replace the spacer:
x.str.replace('\s+', '+', regex=True)
output:
0 stand+and+on+the+top+of+the+m
1 be+aware+of+the+p
2 in+the+night+o
3 tt
4 锉
Use:
x['combined']=x.str.split(pat='/').str[0].str.split().str.join('+')

How to groupby and aggregate joining values as a string [duplicate]

This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 2 years ago.
I have a dataframe structured like this:
df_have = pd.DataFrame({'id':[1,1,2,3,4,4], 'desc':['yes','no','chair','bird','person','car']})
How can I get something like this:
df_want = pd.DataFrame({'id':[1,2,3,4], 'desc':['yes no','chair','bird','person car']})
Use groupby().apply:
df_have.groupby('id', as_index=False)['desc'].apply(' '.join)
Output:
id desc
0 1 yes no
1 2 chair
2 3 bird
3 4 person car
I will do agg with groupby
df = df_have.groupby('id',as_index=False)[['desc']].agg(' '.join)
id desc
0 1 yes no
1 2 chair
2 3 bird
3 4 person car

how to replace variables in a column [duplicate]

This question already has answers here:
Replacing column values in a pandas DataFrame
(16 answers)
Closed 2 years ago.
i wanted to to replace the yes and no values in No-show column to be changed to 0 and 1 valuesenter image description here
Here is a simple answer:
df = pd.DataFrame({'No-show':['Yes','No','No','Yes']})
df['No-show'] = df['No-show'].replace('Yes',1).replace('No',0)
df
output:
No-show
0 1
1 0
2 0
3 1

Duplicate rows based on value with condition [duplicate]

This question already has answers here:
Pandas - Duplicate Row based on condition
(3 answers)
Closed 2 years ago.
I need to replicate some rows in a panda data frame like this
name times
A 2
B 1
C 3
D 20
...
What I need is to replicate rows just when col2 value is less than 20
What I'm doing now is:
for t in df["times"]:
if t < 20:
df = df.loc[df.index.repeat(t)]
But the script keeps running and I have to stop it (I've been waiting a long time...).
Is there any way to improve this or doing it in another way?
Use:
#condition lt for <
mask = df['times'].lt(20)
#filter by boolean indexing
df1 = df[mask].copy()
#repeat rows
df1 = df1.loc[df1.index.repeat(df1['times'])]
#add rows higher like 20, sorting and create default index
df = pd.concat([df1, df[~mask]]).sort_index().reset_index(drop=True)
print (df)
name times
0 A 2
1 A 2
2 B 1
3 C 3
4 C 3
5 C 3
6 D 20

Categories

Resources