I want to display the values in a column along with their count in separate columns
Dataframe is
Date Name SoldItem
15-Jul Joe TV
15-Jul Joe Fridge
15-Jul Joe Washing Machine
15-Jul Joe TV
15-Jul Joe Fridge
15-Jul Mary Chair
15-Jul Mary Fridge
16-Jul Joe Fridge
16-Jul Joe Fridge
16-Jul Tim Washing Machine
17-Jul Joe Washing Machine
17-Jul Jimmy Washing Machine
17-Jul Joe Washing Machine
17-Jul Joe Washing Machine
And I get the output as
Date Name Count
15-Jul Joe 2
Mary 1
16-Jul Joe 2
I want the final output to be
Date Joe Mary
15-Jul 2 1
16-Jul 2
below is the code
fields = ['Date', 'Name', 'SoldItem']
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
df_fridge = df.loc[(df['SoldItem'] == 'Fridge')]
df_fridge_grp = df_fridge.groupby(["Date", "Name"]).size()
print df_fridge_grp
If anyone can advise some pointers. I am guessing it can be done with loc, iloc, but am wondering then if my approach is wrong. Basically i want to count the values for certain types of items per person and then display that count against the name in a column display.
Does
df_fridge_grp.unstack()
Work?
Code:
df_new = df[df['SoldItem'] == 'Fridge'].groupby(['Date', 'Name']).count()
df_new = df_new.unstack().fillna(0).astype(int)
print(df_new)
Output:
SoldItem
Name Joe Mary
Date
15-Jul 2 1
16-Jul 2 0
Related
I have a dataframe that looks something like this:
id name last attribute_1_name attribute_1_rating attribute_2_name attribute_2_rating
1 Linda Smith Age 23 Hair Brown
3 Brian Lin Hair Black Job Barista
Essentially I'd like to transform this table to look like so:
id name last attribute_name attribute_rating
1 Linda Smith Age 23
1 Linda Smith Hair Brown
3 Brian Lin Hair Black
3 Brian Lin Job Barista
What's the most elegant and efficient way to perform this transformation in Python? Assuming there are many more rows and the attribute numbers go up to 13.
Assuming attribute columns are named coherently, you can do this:
result = pd.DataFrame()
# n is the number of attribute columns
for i in range(1, n):
attribute_name_col = f'attribute_{i}_name'
attribute_rating_col = f'attribute_{i}_rating'
melted = pd.melt(
df,
id_vars=['id', 'name', 'last', attribute_name_col],
value_vars=[attribute_rating_col]
)
melted = melted.rename(
columns={attribute_name_col: 'attribute_name',
'value': 'attribute_rating'}
)
melted = melted.drop('variable', axis=1)
result = pd.concat([result, melted])
where df is your original dataframe. Then printing result gives
id name last attribute_name attribute_rating
1 Linda Smith Age 23
3 Brian Lin Hair Black
1 Linda Smith Hair Brown
3 Brian Lin Job Barista
I have a pandas dataframe that includes a "Name" column. Strings in the Name column may contain "Joe", "Bob", or "Joe Bob". I want to add a column for the type of person: just Joe, just Bob, or Both.
I was able to do this by creating boolean columns, turning them into strings, combining the strings, and then replacing the values. It just...didn't feel very elegant! I am new to Python...is there a better way to do this?
My original dataframe:
df = pd.DataFrame(data= [['Joe Biden'],['Bobby Kennedy'],['Joe Bob Briggs']], columns = ['Name'])
0
Name
1
Joe Biden
2
Bobby Kennedy
3
Joe Bob Briggs
I added two boolean columns to find names:
df['Joe'] = df.Name.str.contains('Joe')
df['Joe'] = df.Joe.astype('int')
df['Bob'] = df.Name.str.contains('Bob')
df['Bob'] = df.Bob.astype('int')
Now my dataframe looks like this:
df = pd.DataFrame(data= [['Joe Biden',1,0],['Bobby Kennedy',0,1],['Joe Bob Briggs',1,1]], columns = ['Name','Joe', 'Bob'])
0
Name
Joe
Bob
1
Joe Biden
1
0
2
Bobby Kennedy
0
1
3
Joe Bob Briggs
1
1
But what I really want is one "Type" column with categorical values: Joe, Bob, or Both.
To do that, I added a column to combine the booleans, then I replaced the values:
df["Type"] = df["Joe"].astype(str) + df["Bob"].astype(str)
0
Name
Joe
Bob
Type
1
Joe Biden
1
0
10
2
Bobby Kennedy
0
1
1
3
Joe Bob Briggs
1
1
11
df['Type'] = df.Type.astype('str') df['Type'].replace({'11': 'Both', '10': 'Joe','1': 'Bob'}, inplace=True)
0
Name
Joe
Bob
Type
1
Joe Biden
1
0
Joe
2
Bobby Kennedy
0
1
Bob
3
Joe Bob Briggs
1
1
Both
This feels clunky. Anyone have a better way?
Thanks!
You can use np.select to create the column Type.
You need to ordered correctly your condlist from the most precise to the widest.
df['Type'] = np.select([df['Name'].str.contains('Joe') & df['Name'].str.contains('Bob'),
df['Name'].str.contains('Joe'),
df['Name'].str.contains('Bob')],
choicelist=['Both', 'Joe', 'Bob'])
Output:
>>> df
Name Type
0 Joe Biden Joe
1 Bobby Kennedy Bob
2 Joe Bob Briggs Both
I have a question regarding how to fill missint date values in a pandas dataframe.
I found a similar question ( pandas fill missing dates in time series )
but this doesn't answer my actual question.
I have a dataframe looking something like this:
date amount person country
01.01.2019 10 John IT
01.03.2019 5 Jane SWE
01.05.2019 3 Jim SWE
01.05.2019 10 Jim SWE
02.01.2019 10 Bob UK
02.01.2019 10 Jane SWE
02.03.2019 10 Sue IT
As you can see, there are missing values in the dates.
What I need to do is to fill the missing date-values and fill remaining column values with the values from the previous line, EXCEPT for the column 'amount', which I need to be a 0, otherwise I would falsify my amounts.
I know there is a command for that in Pandas ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html ) but I'm not sure how to apply that to filling missing values.
data = data.reindex(pd.date_range("2019-01-01", "2019-01-03"))
method='backfill') , fill_value="0") ?
The expected output would be as follows:
date amount person country
01.01.2019 10 John IT
01.02.2019 0 Jane SWE
01.03.2019 5 Jane SWE
01.04.2019 0 Jane SWE
01.05.2019 3 Jim SWE
01.05.2019 10 Jim SWE
02.01.2019 10 Bob UK
02.01.2019 10 Jane SWE
02.02.2019 0 Jane SWE
02.03.2019 10 Sue IT
I would appreciate any help on that regard.
Thank you and BR
I think simpliest is replace missing values in 2 steps:
data = data.reindex(pd.date_range("2014-11-01", "2019-10-31"))
data['amount'] = data['amount'].fillna(0)
data = data.bfill()
I have a dataframe that looks like the following:
Name Status Date
1 Joe In 1/2/2003
2 Pete Out 1/2/2003
3 Mary In 1/2/2003
• • •
4 Joe In 3/4/2004
5 Pete In 3/5/2004
6 Mary Out 4/8/2004
If I do the following group-by action :
df.groupby(["Name", "Status"]).last()
I get the following:
Joe In 3/4/2004
Pete In 3/5/2004
Out 1/2/2003
Mary In 1/2/2003
Out 4/8/2004
Notice that Joe has no "out" grouping results because there are no "out" values for Joe in the dataframe.
I want to be able to select people from the dataframe or the subsequent groupby who have only "In" status or only "out" status across a date range from people as opposed to people who have both "in"s AND "outs" across a particular date range. I'm stumped as to how to approach this. I could proceed if the groupby result gave me something like:
Joe Out np. Nan
But it doesn't.
oh, I do the groupby last because I need to get the last date where people leave both "In" and "out " status like Pete and Mary. But I need to treat Joe - who only has " in" status and no "out" status for the period - differently.
Any guidance appreciated.
Not sure what you want. But you can try reindexing
From
x = df.groupby(['Name', 'Status']).last()
Date
Name Status
Joe In 3/4/2004
Mary In 1/2/2003
Out 4/8/2004
Pete In 3/5/2004
Out 1/2/2003
You can make it
size = x.index.levels[0].size
f = np.repeat(np.arange(size), 2)
s = [0,1] * size
x.reindex(pd.MultiIndex(levels=x.index.levels, labels=[f, s]))
Date
Name Status
Joe In 3/4/2004
Out NaN
Mary In 1/2/2003
Out 4/8/2004
Pete In 3/5/2004
Out 1/2/2003
I have a dataframe with one column of last names, and one column of first names. How do I merge these columns so that I have one column with first and last names?
Here is what I have:
First Name (Column 1)
John
Lisa
Jim
Last Name (Column 2)
Smith
Brown
Dandy
This is what I want:
Full Name
John Smith
Lisa Brown
Jim Dandy.
Thank you!
Try
df.assign(name = df.apply(' '.join, axis = 1)).drop(['first name', 'last name'], axis = 1)
You get
name
0 bob smith
1 john smith
2 bill smith
Here's a sample df:
df
first name last name
0 bob smith
1 john smith
2 bill smith
You can do the following to combine columns:
df['combined']= df['first name'] + ' ' + df['last name']
df
first name last name combined
0 bob smith bob smith
1 john smith john smith
2 bill smith bill smith