Pandas DataFrame update one column using another column - python

I have a two-column DataFrame df, its columns are phone and label, which label can only be 0 or 1.
Here is an example:
phone label
a 0
b 1
a 1
a 0
c 0
b 0
What I want to do is to calculate the number of '1' of each type of 'phone' and using the number replaces the 'phone' column
What i come with up is groupby, but i am not familiar with it
The answer should be:
Count the number of each 'phone'
phone count
a 1
b 1
c 0
replace the 'phone' with 'count' in the original table
phone
1
1
1
1
0
1

taking into account that the label column can only have 0 or 1, you can use .trasnform('sum') method:
In [4]: df.label = df.groupby('phone')['label'].transform('sum')
In [5]: df
Out[5]:
phone label
0 a 1
1 b 1
2 a 1
3 a 1
4 c 0
5 b 1
Explanation:
In [2]: df
Out[2]:
phone label
0 a 0
1 b 1
2 a 1
3 a 0
4 c 0
5 b 0
In [3]: df.groupby('phone')['label'].transform('sum')
Out[3]:
0 1
1 1
2 1
3 1
4 0
5 1
dtype: int64

You can filter and group data in pandas. For your case it would look
assume data is
phone label
0 a 0
1 b 1
2 a 1
3 a 1
4 c 1
5 d 1
6 a 0
7 c 0
8 b 0
df.groupby(['phone','label'])['label'].count()
phone label
a 0 2
1 2
b 0 1
1 1
c 0 1
1 1
d 1 1
If you require group count of phones given label==1 then do this -
#first filter to get only label==1 rows
phone_rows_label_one_df = df[df.label==1]
#then do groupby
phone_rows_label_one_df.groupby(['phone'])['label'].count()
phone
a 2
b 1
c 1
d 1
To get count as a new column in the dataframe do this
phone_rows_label_one_df.groupby(['phone'])['label'].count().reset_index(name='count')
phone count
0 a 2
1 b 1
2 c 1
3 d 1

Related

iterate through columns pandas dataframe and create another column based on a condition

I have a dataframe df
ID ID2 escto1 escto2 escto3
1 A 1 0 0
2 B 0 1 0
3 C 0 0 3
4 D 0 2 0
so either using indexing or using wildcard
like column name 'escto*'
if df.iloc[:, 2:]>0 then df.helper=1
or
df.loc[(df.iloc[:, 3:]>0,'Transfer')]=1
So that output becomes
ID ID2 escto1 escto2 escto3 helper
1 A 1 0 0 1
2 B 0 1 0 1
3 C 0 0 3 1
4 D 0 2 0 1
Output
One option is to use the boolean output:
df.assign(helper = df.filter(like='escto').gt(0).any(1).astype(int))
ID ID2 escto1 escto2 escto3 helper
0 1 A 1 0 0 1
1 2 B 0 1 0 1
2 3 C 0 0 3 1
3 4 D 0 2 0 1

Set value when row is maximum in group by - Python Pandas

I am trying to create a column (is_max) that has either 1 if a column B is the maximum in a group of values of column A or 0 if it is not.
Example:
[Input]
A B
1 2
2 3
1 4
2 5
[Output]
A B is_max
1 2 0
2 5 0
1 4 1
2 3 0
What I'm trying:
df['is_max'] = 0
df.loc[df.reset_index().groupby('A')['B'].idxmax(),'is_max'] = 1
Fix your code by remove the reset_index
df['is_max'] = 0
df.loc[df.groupby('A')['B'].idxmax(),'is_max'] = 1
df
Out[39]:
A B is_max
0 1 2 0
1 2 3 0
2 1 4 1
3 2 5 1
I make assumption A is your group now that you did not state
df['is_max']=(df['B']==df.groupby('A')['B'].transform('max')).astype(int)
or
df1.groupby('A')['B'].apply(lambda x: x==x.max()).astype(int)

Use groupby and merge to create new column in pandas

So I have a pandas dataframe that looks something like this.
name is_something
0 a 0
1 b 1
2 c 0
3 c 1
4 a 1
5 b 0
6 a 1
7 c 0
8 a 1
Is there a way to use groupby and merge to create a new column that gives the number of times a name appears with an is_something value of 1 in the whole dataframe? The updated dataframe would look like this:
name is_something no_of_times_is_something_is_1
0 a 0 3
1 b 1 1
2 c 0 1
3 c 1 1
4 a 1 3
5 b 0 1
6 a 1 3
7 c 0 1
8 a 1 3
I know you can just loop through the dataframe to do this but I'm looking for a more efficient way because the dataset I'm working with is quite large. Thanks in advance!
If there are only 0 and 1 values in is_something column only use sum with GroupBy.transform for new column filled by aggregate values:
df['new'] = df.groupby('name')['is_something'].transform('sum')
print (df)
name is_something new
0 a 0 3
1 b 1 1
2 c 0 1
3 c 1 1
4 a 1 3
5 b 0 1
6 a 1 3
7 c 0 1
8 a 1 3
If possible multiple values first compare by 1, convert to integer and then use transform with sum:
df['new'] = df['is_something'].eq(1).view('i1').groupby(df['name']).transform('sum')
Or we just map it
df['New']=df.name.map(df.query('is_something ==1').groupby('name')['is_something'].sum())
df
name is_something New
0 a 0 3
1 b 1 1
2 c 0 1
3 c 1 1
4 a 1 3
5 b 0 1
6 a 1 3
7 c 0 1
8 a 1 3
You could do:
df['new'] = df.groupby('name')['is_something'].transform(lambda xs: xs.eq(1).sum())
print(df)
Output
name is_something new
0 a 0 3
1 b 1 1
2 c 0 1
3 c 1 1
4 a 1 3
5 b 0 1
6 a 1 3
7 c 0 1
8 a 1 3

python: how to sum unique elements respectively of a dataframe column based on another column

For example, I have a df with two columns.
Input
df = pd.DataFrame({'user_id':list('aaabbbccc'),'label':[0,0,1,0,0,2,0,1,2]})
print('df\n',df)
Output
df
label user_id
0 0 a
1 0 a
2 1 a
3 0 b
4 0 b
5 2 b
6 0 c
7 1 c
8 2 c
I want to count the element in label group by user_id respectively.
The expected output is shown as follow.
Expected
df
label user_id label_0 label_1 label_2
0 0 a 2 1 0
1 0 a 2 1 0
2 1 a 2 1 0
3 0 b 2 0 1
4 0 b 2 0 1
5 2 b 2 0 1
6 0 c 1 1 1
7 1 c 1 1 1
8 2 c 1 1 1
Briefly, in column label_0, I count the number of 0 in column label based on column user_id.
Hopefully for help!
Idea is create helper DataFrame by groupby with size or value_counts and then unstack and join to original df:
df = (df.join(df.groupby(['user_id', 'label'])
.size()
.unstack(fill_value=0)
.add_prefix('label_'), 'user_id'))
df = (df.join(df.groupby('user_id')['label']
.value_counts()
.unstack(fill_value=0)
.add_prefix('label_'), 'user_id'))
Or using crosstab and merge with left join:
df = (df.merge(pd.crosstab(df['user_id'], df['label'])
.add_prefix('label_'), on='user_id', how='left'))
print (df)
user_id label label_0 label_1 label_2
0 a 0 1 2 0
1 a 1 1 2 0
2 a 1 1 2 0
3 b 1 1 1 1
4 b 2 1 1 1
5 b 0 1 1 1
6 c 0 1 1 1
7 c 1 1 1 1
8 c 2 1 1 1

Counting Precedant Entries of a column and creating a new varaible of these counts

I have a data frame and I want to count the number of consecutive entries of one column and record the counts in a separate variable. Here is an example:
ID Class
1 A
1 A
2 A
1 B
1 B
1 B
2 B
1 C
1 C
2 A
2 A
2 A
I want in each group ID to count the number of consecutive classes, so the output would look like this:
ID Class Counts
1 A 0
1 A 1
2 A 0
1 B 0
1 B 1
1 B 2
2 B 0
1 C 0
1 C 1
2 A 0
2 A 1
2 A 2
I am not looking the frequency of occurrence of a specific entries like here, rather the consecutive occurrences of an entry on the ID level
You can use cumcount by Series which is create by cumsum of shifted concanecate values by shift:
#use separator which is not in data like _ or ¥
s = df['ID'].astype(str) + '¥' + df['Class']
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2
Another solution with ngroup (pandas 0.20.2+):
s = df.groupby(['ID','Class']).ngroup()
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2

Categories

Resources