Getting count groupby right on pandas [duplicate] - python

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 4 years ago.
I have this dataframe:
code
0 0000
1 0000
2 0123
3 0123
4 4567
I want to groupby code and count how many of each code there is in the dataframe, so it get to look like this:
code count
0 0000 2
1 0123 2
2 4567 1
I was using: group=df.groupby('code').agg('count')
However I wasn't getting it right.
Can someone help?

Had a similar problem and I'm forming #user3483203's comments into an answer so the question can be marked as answered.
Using:
df.groupby('code').size()
# OR
df.code.value_counts()
This groups all rows with matching codes and outputs the total for each code.

Related

python-pandas: use values as index and count them in dataframe [duplicate]

This question already has answers here:
How to get value counts for multiple columns at once in Pandas DataFrame?
(14 answers)
Closed 2 years ago.
Need help with pandas.
I have this dataframe:
a b c
Yes No Yes
Yes Yes No
Yes No No
How can I implement a Pandas code that will turn my dataframe to:
a b c
Yes 3 1 1
No 0 2 2
I'm looking into using iloc and lambda but i'm really clueless. Is there a way for me to implement this?
This actually does the trick:
df.apply(pd.value_counts)

How to count values in dataframe column? [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Count the frequency that a value occurs in a dataframe column
(15 answers)
Closed 3 years ago.
For the dataframe as below
animal direction
0 monkey north
1 frog north
2 monkey east
3 zebra west
....
I would like to count the number of animals presented in this dataframe to have the below dataframe
animal count
0 monkey 3
1 frog 9
2 zebra 4
3 elephant 11
....
How can I achieve this? I tried value_counts(), and groupby but with my knowledge I couldn't quite achieve what I wanted...
Thank you for help.
df.groupby(['col1']).size().reset_index(name='counts')
This method worked beautifully. Thank you.

Pandas dataframe with concatenated column [duplicate]

This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.
This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object
Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']

How to calculate in python the sum of variables in each unique raw in dataframe? [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 3 years ago.
I have a dataframe with information of different users (ID) with many duplicated categorical variables (photo) and its corresponding numbers of interactions (likes). How i can calculate the sum of total likes for each different photo type?
For example:
id photo_type likes
1 nature 2
2 art 4
3 art 1
4 fashion 3
5 fashion 2
I expect to get information like that:
total numbers of likes for nature:2
total numbers of likes for art: 5
total numbers of likes for fashion: 5
Use pandas.DataFrame.groupby:
df.groupby('photo_type')['likes'].sum()
Output:
photo_type
art 5
fashion 5
nature 2
Name: likes, dtype: int64

How do I clean phone numbers in pandas [duplicate]

This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288

Categories

Resources