Need to display the max price and company that has it - python

So, i just started working with python and i need to display the maximum price and the company that has it. I got the data from an CSV file that has multiple columns that describes some cars. I'm only interested in two of them: price and company.
I need to display the maximum price and the company that has it. Some advice?
This is what i tried and I don't know how to get the company too, not only the maximum price.
import pandas as pd
df = pd.read_csv("Automobile_data.csv")
for x in df['price']:
if x == df['price'].max():
print(x)

Use Series.max, create index by DataFrame.set_index and get company name by Series.idxmax:
df = pd.DataFrame({
'company':list('abcdef'),
'price':[7,8,9,4,2,3],
})
print (df)
company price
0 a 7
1 b 8
2 c 9
3 d 4
4 e 2
5 f 3
print(df['price'].max())
9
print(df.set_index('company')['price'].idxmax())
c
Another idea is use DataFrame.agg:
s = df.set_index('company')['price'].agg(['max','idxmax'])
print (s['max'])
9
print (s['idxmax'])
c
If possible duplicated maximum values and need all companies of max price use boolean indexing with DataFrame.loc - get Series:
df = pd.DataFrame({
'company':list('abcdef'),
'price':[7,8,9,4,2,9],
})
print (df)
company price
0 a 7
1 b 8
2 c 9
3 d 4
4 e 2
5 f 9
print(df['price'].max())
9
#only first value
print(df.set_index('company')['price'].idxmax())
c
#all maximum values
s = df.loc[df['price'] == df['price'].max(), 'company']
print (s)
2 c
5 f
Name: company, dtype: object
If need one row DataFrame:
out = df.loc[df['price'] == df['price'].max(), ['company','price']]
print (out)
company price
2 c 9
out = df.loc[df['price'] == df['price'].max(), ['company','price']]
print (out)
company price
2 c 9
5 f 9

That is how not to use Pandas. Pandas is made to avoid loops
import pandas as pd
df = pd.read_csv("Automobile_data.csv")
max_price = df[df['price'] == df['price'].max()]
print(max_price)
That is how you would do it. If you only want price and company
print(max_price[['company','price']])
Explanation: we create a boolean filter that true if the price is equal to maximum price. We use this as a mask to catch what we need.

In addition to the complete answer of Jezrael, I would suggest using groupby as follows:
df = pd.DataFrame({
'company':list('abcdef'),
'price':[7,8,9,4,2,3],
})
sorted_df = df.groupby(['price']).max().reset_index()
desired_row = sorted_df.loc[sorted_df.index[-1]]
price = desired_row[0]
company = desired_row[1]
print('Maximum price is: ', price)
print('The company is: ', company)
The above code prints:
Maximum price is: 9
The company is: c

Related

Pandas unique count of number of records in a column where mutliple values are present in another column

I'm trying to count the unique number of Customer_Key where the column Broad_Category has both the values A and B grouped by values in column Month. The sample dataframe is as follows
Customer_Key
Category
Month
ck123
A
2
ck234
A
2
ck234
B
2
ck680
A
3
ck123
B
3
ck123
A
3
ck356
B
3
ck345
A
4
The expected outcome is
Month
Unique Customers
2
1
3
1
4
0
I'm not able to think of something here. Any lead/help will be appreciated. Thanks in advance.emphasized text
Here is one way to accomplish it
first its grouping by Month and Customer that get us the customer within a month with the count of the categories. The result is further grouped by Month and we choose the max count.
Decrementing the count give us the required count of unique customer belonging to both categories
hope it helps
df2=df.groupby(['Month','Customer_Key']).count().reset_index().groupby(['Month'])['Category'].max().reset_index()
df2['Category'] = df2['Category'] -1
df2.rename(columns={'Category': 'Unique Cusomter'}, inplace=True)
df2
Month Unique Cusomter
0 2 1
1 3 1
2 4 0
Try something like that:
df.groupby(['Customer_Key', 'Month']) \
.sum() \
.query("Category in ('AB','BA')") \
.groupby('Month') \
.count() \
.rename(columns={'Category': 'Unique Customers'})
Edit...
The issue with this solution is that it does not count months with 0. I have prepared a fix:
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
data = StringIO("""ck123 A 2
ck234 A 2
ck234 B 2
ck680 A 3
ck123 B 3
ck123 A 3
ck356 B 3
ck345 A 4""")
df1 = df.groupby(['Customer_Key', 'Month']) \
.sum() \
.reset_index()
def map_categories(row):
if row['Category'] in ('AB', 'BA'):
return 1
else:
return 0
df1['Unique Customers'] = df1.apply(lambda row: map_categories(row), axis=1)
df1 = df1.groupby('Month')['Unique Customers'].sum().reset_index()

Python pandas: Need to know how many people have met two criteria

With this data set I want to know the people (id) who have made payments for both types a and b. Want to create a subset of data with the people who have made both a and b payments. (this is just an example set of data, one I'm using is much larger)
I've tried grouping by the id then making subset of data where type.len >= 2. Then tried creating another subset based on conditions df.loc[(df.type == 'a') & (df.type == 'b')]. I thought if I grouped by the id first then ran that df.loc code it would work but it doesn't.
Any help is much appreciated.
Thanks.
Separate the dataframe into two, one with type a payments and the other with type b payments, then merge them,
df_typea = df[(df['type'] == 'a')]
df_typeb = df[(df['type'] == 'b')]
df_merge = pd.merge(df_typea, df_typeb, how = 'outer', on = ['id', 'id'], suffixes =('_a', '_b'))
This will create a separate column for each payment type.
Now, you can find the ids for which both payments have been made,
df_payments = df_merge[(df_merge['type_a'] == 'a') & (df_merge['type_b'] == 'b')]
Note that this will create two records for items similar to that of id 9, for which there is more than two payments. I am assuming that you simply want to check if any payments of type 'a' and 'b' have been made for each id. In this case, you can simply drop any duplicates,
df_payments_no_duplicates = df_payments['id'].drop_duplicates()
You first split your DataFrame into two DataFrames:
one with type a payments only
one with type b payments only
You then join both DataFrames on id.
You can use groupby to solve this problem. This first time, group by id and type and then you can group again to see if the id had both types.
import pandas as pd
df = pd.DataFrame({"id" : [1, 1, 2, 3, 4, 4, 5, 5], 'payment' : [10, 15, 5, 20, 35, 30, 10, 20], 'type' : ['a', 'b', 'a','a','a','a','b', 'a']})
df_group = df.groupby(['id', 'type']).nunique()
#print(df_group)
'''
payment
id type
1 a 1
b 1
2 a 1
3 a 1
4 a 2
5 a 1
b 1
'''
# if the value in this series is 2, the id has both a and b
data = df_group.groupby('id').size()
#print(data)
'''
id
1 2
2 1
3 1
4 1
5 2
dtype: int64
'''
You can use groupby and nunique to get the count of unique payment types done.
print (df.groupby('id')['type'].agg(['nunique']))
This will give you:
id
1 2
2 1
3 1
4 1
5 1
6 2
7 1
8 1
9 2
If you want to list out only the rows that had both a and b types.
df['count'] = df.groupby('id')['type'].transform('nunique')
print (df[df['count'] > 1])
By using groupby.transform, each row will be populated with the unique count value. Then you can use count > 1 to filter out the rows that have both a and b.
This will give you:
id payment type count
0 1 10 a 2
1 1 15 b 2
7 6 10 b 2
8 6 15 a 2
11 9 35 a 2
12 9 30 a 2
13 9 10 b 2
You may also use the length of the returned set for the given id for column 'type':
len(set(df[df['id']==1]['type'])) # returns 2
len(set(df[df['id']==2]['type'])) # returns 1
Thus, the following would give you an answer to your question
paid_both = []
for i in set(df['id']):
if len(set(df[df['id']==i]['type'])) == 2:
paid_both.append(i)
## paid_both = [1,6,9] #the id's who paid both
You could then iterate through the unique id values to return the results for all ids. If 2 is returned, then the people have made payments for both types (a) and (b).

Grouping data from multiple columns in data frame into summary view

I have a data frame as below and would like to create summary information as shown. Can you please help how this can be done in pandas.
Data-frame:
import pandas as pd
ds = pd.DataFrame(
[{"id":"1","owner":"A","delivery":"1-Jan","priority":"High","exception":"No Bill"},{"id":"2","owner":"A","delivery":"2-Jan","priority":"Medium","exception":""},{"id":"3","owner":"B","delivery":"1-Jan","priority":"High","exception":"No Bill"},{"id":"4","owner":"B","delivery":"1-Jan","priority":"High","exception":"No Bill"},{"id":"5","owner":"C","delivery":"1-Jan","priority":"High","exception":""},{"id":"6","owner":"C","delivery":"2-Jan","priority":"High","exception":""},{"id":"7","owner":"C","delivery":"","priority":"High","exception":""}]
)
Result:
Use:
#crosstab and rename empty string column
df = pd.crosstab(ds['owner'], ds['delivery']).rename(columns={'':'No delivery Date'})
#change positions of columns - first one to last one
df = df[df.columns[1:].tolist() + df.columns[:1].tolist()]
#get counts by comparing and sum of True values
df['high_count'] = ds['priority'].eq('High').groupby(ds['owner']).sum().astype(int)
df['exception_count'] = ds['exception'].eq('No Bill').groupby(ds['owner']).sum().astype(int)
#convert id to string and join with ,
df['ids'] = ds['id'].astype(str).groupby(ds['owner']).agg(','.join)
#index to column
df = df.reset_index()
#reove index name delivery
df.columns.name = None
print (df)
owner 1-Jan 2-Jan No delivery Date high_count exception_count ids
0 A 1 1 0 1 1 1,2
1 B 2 0 0 2 2 3,4
2 C 1 1 1 3 0 5,6,7

Python: Efficently extract a single value for every group

I need to add a description column to a dataframe that is built by grouping items from another dataframe.
grouped= df1.groupby('item')
list= grouped['total'].agg(np.sum)
list= list.reset_index()
to assign a description label to every item I've come up with this solution:
def des(item):
return df1['description'].loc[df1['item']== item].iloc[0]
list['description'] = list['item'].apply(des)
it works but it takes an enourmous amount of time to execute.
I'd like to do something like that
list=list.assign(description= df1['description'].loc[df1['item']==list['item']]
or
list=list.assign(description= df1['description'].loc[df1['item'].isin(list['item'])]
Theese are very wrong but hope you get the idea, hoping there is some pandas stuff that do the trick more efficently but can't find it
Any ideas?
I think you need DataFrameGroupBy.agg by dict of functions - for column total sum and for description first:
df = df1.groupby('item', as_index=False).agg({'total':'sum', 'description':'first'})
Also dont use variable name list, because list is python code reserved word.
Sample:
df1 = pd.DataFrame({'description':list('abcdef'),
'B':[4,5,4,5,5,4],
'total':[5,3,6,9,2,4],
'item':list('aaabbb')})
print (df1)
B description item total
0 4 a a 5
1 5 b a 3
2 4 c a 6
3 5 d b 9
4 5 e b 2
5 4 f b 4
df = df1.groupby('item', as_index=False).agg({'total':'sum', 'description':'first'})
print (df)
item total description
0 a 14 a
1 b 15 d

Sort a column within groups in Pandas

I am new to pandas. I'm trying to sort a column within each group. So far, I was able to group first and second column values together and calculate the mean value in third column. But I am still struggling to sort 3rd column.
This is my input dataframe
This is my dataframe after applying groupby and mean function
I used the following line of code to group input dataframe,
df_o=df.groupby(by=['Organization Group','Department']).agg({'Total Compensation':np.mean})
Please let me know how to sort the last column for each group in 1st column using pandas.
It seems you need sort_values:
#for return df add parameter as_index=False
df_o=df.groupby(['Organization Group','Department'],
as_index=False)['Total Compensation'].mean()
df_o = df_o.sort_values(['Total Compensation','Organization Group'])
Sample:
df = pd.DataFrame({'Organization Group':['a','b','a','a'],
'Department':['d','f','a','a'],
'Total Compensation':[1,8,9,1]})
print (df)
Department Organization Group Total Compensation
0 d a 1
1 f b 8
2 a a 9
3 a a 1
df_o=df.groupby(['Organization Group','Department'],
as_index=False)['Total Compensation'].mean()
print (df_o)
Organization Group Department Total Compensation
0 a a 5
1 a d 1
2 b f 8
df_o = df_o.sort_values(['Total Compensation','Organization Group'])
print (df_o)
Organization Group Department Total Compensation
1 a d 1
0 a a 5
2 b f 8

Categories

Resources