This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288
Related
This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 3 years ago.
I have a dataframe with information of different users (ID) with many duplicated categorical variables (photo) and its corresponding numbers of interactions (likes). How i can calculate the sum of total likes for each different photo type?
For example:
id photo_type likes
1 nature 2
2 art 4
3 art 1
4 fashion 3
5 fashion 2
I expect to get information like that:
total numbers of likes for nature:2
total numbers of likes for art: 5
total numbers of likes for fashion: 5
Use pandas.DataFrame.groupby:
df.groupby('photo_type')['likes'].sum()
Output:
photo_type
art 5
fashion 5
nature 2
Name: likes, dtype: int64
This question already has answers here:
Pandas dataframe: truncate string fields
(4 answers)
Closed 4 years ago.
I have a dataframe with some columns having large sentences.
How do I truncate the columns to say 50 characters max?
current df:
a b c
I like data science 1 2
new truncated df for ONLY column a:
a b c
I like data 1 2
(The above is an example sentence I made up)
For a specific column:
df['a'] = df['a'].str[:50]
This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 4 years ago.
I have a dataframe df
a b c
0 5 6 9
1 6 7 10
2 7 8 11
3 8 9 12
So if I want to select only col a and b and store it in another df I would use something like this
df1 = df[['a','b']]
But I have seen places where people write it this way
df1 = df[['a','b']].copy()
Can anyone let me know what is .copy() because the earlier code works just fine.
For example, if you want to rename a dataframe (example using replace):
df2=df
df2=df2.replace('blah','foo')
Here:
df==df2
Will be:
True
You want it to only do to, df2:
df2=df.copy()
df2=df2.replace('blah','foo')
Then now:
df==df2
Returns:
False
This question already has answers here:
Pandas filtering for multiple substrings in series
(3 answers)
Closed 4 years ago.
I tried
df = df[~df['event.properties.comment'].isin(['Extra'])]
Problem is it would just drop the row if the column contains exactly 'Extra' and I need to drop the ones that contain it even as a substring.
Any help?
You can use or condition to have multiple conditions in checking string, for your requirement you may retain text if it have "Extra" or "~".
Considered df
vals ids
0 1 ~
1 2 bball
2 3 NaN
3 4 Extra text
df[~df.ids.fillna('').str.contains('Extra')]
Out:
vals ids
0 1 ~
1 2 bball
2 3 NaN
This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 4 years ago.
I have this dataframe:
code
0 0000
1 0000
2 0123
3 0123
4 4567
I want to groupby code and count how many of each code there is in the dataframe, so it get to look like this:
code count
0 0000 2
1 0123 2
2 4567 1
I was using: group=df.groupby('code').agg('count')
However I wasn't getting it right.
Can someone help?
Had a similar problem and I'm forming #user3483203's comments into an answer so the question can be marked as answered.
Using:
df.groupby('code').size()
# OR
df.code.value_counts()
This groups all rows with matching codes and outputs the total for each code.