This question already has answers here:
How to assign a name to the size() column?
(5 answers)
Closed 3 years ago.
I want to aggregate some data to append to a dataframe. The following gives me the number of wins per name
import pandas as pd
data = [[1,'tom', 10], [1,'nick', 15], [2,'juli', 14], [2,'peter', 20], [3,'juli', 3], [3,'peter', 13]]
have = pd.DataFrame(data, columns = ['Round', 'Winner', 'Score'])
WinCount= have.groupby(['Winner']).size().to_frame('WinCount')
WinCount
, but the output does not give me two columns, named Winner and WinCount. In stead, the first column has no name, and the column name then appears on the second line:
How can I get a dataframe without these two "blank" fields
Try this
WinCount=have.groupby(['Winner']).size().to_frame('WinCount').reset_index()
Output
Winner WinCount
0 juli 2
1 nick 1
2 peter 2
3 tom 1
Related
This question already has answers here:
Pandas make new column from string slice of another column
(3 answers)
Closed 4 months ago.
data = [['Tom', '5-123g'], ['Max', '6-745.0d'], ['Bob', '5-900.0e'], ['Ben', '2-345',], ['Eva', '9-712.x']]
df = pd.DataFrame(data, columns=['Person', 'Action'])
I want to shorten the "Action" column to a length of 5. My current df has two columns:
['Person'] and ['Action']
I need it to look like this:
person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712
What I´ve tried was:
Checking the type of the Column
df['Action'].dtypes
The output is:
dtype('0')
Then I tried:
df['Action'] = df['Action'].map(str)
df['Action_short'] = df.Action.str.slice(start=0, stop=5)
I also tried it with:
df['Action'] = df['Action'].astype(str)
df['Action'] = df['Action'].values.astype(str)
df['Action'] = df['Action'].map(str)
df['Action'] = df['Action'].apply(str)```
and with:
df['Action_short'] = df.Action.str.slice(0:5)
df['Action_short'] = df.Action.apply(lambda x: x[:5])
df['pos'] = df['Action'].str.find('.')
df['new_var'] = df.apply(lambda x: x['Action'][0:x['pos']],axis=1)
The output from all my versions was:
person Action Action_short
0 Tom 5-123g 5-12
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-90
3 Ben 2-345 2-34
4 Eva 9-712.x 9-712
The lambda funktion is not working with 3-222 it sclices it to 3-22
I don't get it why it is working for some parts and for others not.
Try this:
df['Action_short'] = df['Action'].str.slice(0, 5)
By using .str on a DataFrame or a single column of a DataFrame (which is a pd.Series), you can access pandas string manipulation methods that are designed to look like the string operations on standard python strings.
# slice by specifying the length you need
df['Action_short']=df['Action'].str[:5]
df
Person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712
This question already has answers here:
Pandas column access w/column names containing spaces
(6 answers)
Closed last year.
I'm trying to read a column named Goods_Issue_Date_(GID)
How can I read this?
I tried:
Df.Goods_Issue_Date_(GID)
Returns Invalid Syntax
Using the following dataframe as an example
data = [['Carrots', "Tuesday"], ['Apples', "Monday"], ['Pears', "Sunday"]]
df = pd.DataFrame(data, columns = ['Product', 'Goods_Issue_Date_(GID)'])
df.head()
Product Goods_Issue_Date_(GID)
0 Carrots Tuesday
1 Apples Monday
2 Pears Sunday
You can select the Goods_Issue_Date_(GID) column like so
df['Goods_Issue_Date_(GID)']
0 Tuesday
1 Monday
2 Sunday
Name: Goods_Issue_Date_(GID), dtype: object
This question already has answers here:
Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?
(3 answers)
Closed 2 years ago.
The below code will create a table containing max temps for each day. What I would like to do is return the Index for all these max temp values so I can apply to the original df
df = pd.DataFrame('date':list1,'max_temp':list2)
grouped = df.groupby(by=date,as_index=False).max()
You can define another column called "index" before sorting the dataframe:
import pandas as pd
list1 = [7, 9, 3, 4]
list2 = [8, 6, 8, 9]
df = pd.DataFrame({'date': list1, 'max_temp': list2})
df['index'] = df.index
grouped = df.groupby(by="date", as_index=False).max()
print(grouped)
Output:
date max_temp index
0 3 8 2
1 4 9 3
2 7 8 0
3 9 6 1
Now, using df.query, we can get a "date" value by the "column" index:
print(grouped.query("index==0")["date"])
Output:
2 7
Name: date, dtype: int64
df.groupby('date')['max_temp'].idxmax()
It would seem i've found a great solution from the following link...
Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?
(Although this doesn't seem to be the answer they accepted for some reason). Anyway, the following worked well for me if anyone finds them selves in the same position...
idx = df.groupby('date')['max_temp'].transform(max) == df['max_temp']
This question already has answers here:
Multi Index Sorting in Pandas
(2 answers)
Closed 2 years ago.
I have dataframe and I aggredated it as below. I want to sort (descending) it according to 'mean'. I m using below code but it gives an error.
df_agg = df.groupby('Subject Field').agg({'seniority_level':['min','mean','median','max']})
df_agg.sort_values(by='mean',ascending=False).head(10)
Error
Your aggregated dataframe has a multi level column index. So you need to address this by specifying both senority_level and mean.
df_agg.sort_values(('seniority_level', 'mean'), ascending=False)
Quick check to demonstrate:
df = pd.DataFrame({
'Accounting': [1, 2, 3],
'Acoustics': [4, 5, 6],
}).melt(var_name='Subject Field', value_name='seniority_level')
df_agg = df.groupby('Subject Field').agg(
{'seniority_level':['min', 'mean', 'median']}
)
df_agg.sort_values(('seniority_level','mean'), ascending=True)
seniority_level
min mean median
Subject Field
Accounting 1 2 2
Acoustics 4 5 5
df_agg.sort_values(('seniority_level','mean'), ascending=False)
seniority_level
min mean median
Subject Field
Acoustics 4 5 5
Accounting 1 2 2
This question already has an answer here:
Difference between "as_index = False", and "reset_index()" in pandas groupby
(1 answer)
Closed 3 years ago.
I have this dataframe
df =
name age character
0 A 10 fire
1 A 15 water
2 A 20 earth
3 A 25 air
4 B 10 fire
5 B 7 air
I organized it with groupby,
df = df.groupby('name').aggregate(list)
then I have this output
age character
name
---------------------------------------
A [10,15,20,25] [fire, water, earth, air]
B [10, 7] [fire, air]
I tried to pivot this dataframe, based on name column. But after groupby, name columns is not in the columns anymore
print(df.columns)
>>> ["age", "character"]
How can I lift up this column, so that I can use for pivot?
EDIT
Expected output is,
name age character
---------------------------------------
A [10,15,20,25] [fire, water, earth, air]
B [10, 7] [fire, air]
df = df.reset_index()
However, pivoting list data is usually unhelpful.