I am new to pandas and stackoverflow so i will try my best to explain what my problem is.
I have a dataframe as the below and i would like to aggregate rows with same Customer Id and Date (so each Customer id-Date combination only needs to repeat ones) using multiple logics:
Sum of quantity for that date-customer id (how many pieces in total the customer bought each purchase day date)
Count of Sales id for that date-customer id (how many sales order the customer placed each purchase day)
distinct count of shop id for that date-customer id (from how many shops the customer placed orders each purchase day)
Last we have in the product category 2 products only that i identified as 0 or 1, I would like to add 2 columns that count number of sales orders of product category 0 and count number of sales orders of product category 1.
I tried using the below code to solve the first 3 points but without success.
df = df.groupby('customer id','date').sum('Quantity').count('Sales id').nunique('Shop id')
Really struggling with the last fourth point.
Hope you can help me out here.
Dataframe
Desired Output
I found the solution of the first 3 points using agg() method:
df.groupby(['Customer id','Date'],as_index=False).agg({'Quantity' : ['sum'], 'Sale id' : ['count'], 'shop id' : ['nunique']})
Ideally I would add within agg() two additional aggregations to that count 'Product category' when == '0' and when =='1'
Any ideas?
Related
I have the following tables,first one (vle) has behavioral activities ( many types of activities, some shown in the activity type column), and the other (UsersVle) has users' activities.The date column represents a day and starts from 0 till 222. I want to aggregate users' activities into weeks based on the activity types. For example in a week1 user1 will have columns as the number of activities types, and each column includes the total sum_clicks during that week. I wonder how I can do that in a pandas data frame using python?
I will appreciate your help.
Derive a new field called WEEK from date (you haven't provided enough info about date to suggest how to translate it to a week (e.g. 1 = Jan 1st?))
Join your two tables. Is id_site in table 2 a foreign key for id_site in table 1? If so, combined_df = table2.merge(table1, on='id_site'). Now, you should have all the fields in a single data frame.
Pivot like this: user_summary_by_week = pd.pivot_table(combined_df, index=['id_user', 'WEEK'], columns='activity_type', aggfunc='sum', fill_value=0).reset_index(col_level=1)
I have a dataframe containing election data of four different years. Column "Votes" contain the total votes a party got for different constituencies in each year. I need to find the winning party (party who has got maximum total votes) of each year. I have grouped the data using "Election year" and "Party". Now how can I get the Election Year and Party for the above case?
df1 = df.groupby(['Election Year', 'Party']).sum()
print(df1.loc[df1['Votes'].idxmax()])
The above code is not giving the expected result.
I have attached the
Dataframe after using groupby
How can I get the expected result. Any suggestions is appreciated.
In this database I have two columns, one is the product ID associated with the sale, and the other is the quantity sold of that item during that same sale. I am trying to figure out how to get a full tally of each item. There are thousands of sales, so many product IDs are repeated in the ID column.
I am not sure how to approach this to find a solution.
Any help would be greatly appreciated.
Example of the columns:
PRODUCT_ID SLS_QTY
0 1164203101 2
1 72047351000 1
2 3600025824 1
3 7205861079 1
4 82775501058 1
You can do it with groupby then merge
sls = sls.groupby('PRODUCT_ID',as_index=False).SLS_QTY.sum()
Totally = sls.merge(price, on = 'PRODUCT_ID', how = 'left')
Totally['sales']=Totally['SLS_QTY']*Totally['price']
First get all unique product ids by doing
all_product_ids = df['PRODUCT_ID'].unique()
This will return a numpy array containing all unique product ids. Next, for each product id you want to return each data instance (row) that has that product id. Then we can store the information in a dictionary.
sales_dict = {}
for product in all_product_ids:
info = df.loc[df['PRODUCT_ID'] == product]
total_sales = sum(info['SLS_QTY'].values)
sales_dict[product] = total_sales
This might not be the most efficient way to do it, but it should get the job done.
I have a huge dataset with a lot of different client names, bills etc.
Now I want to show the 4 clients with the cumulated total highest bill.
So far I have used the groupby function:
data.groupby(by = ["CustomerName","Bill"], as_index=False).sum()
I tried to group by the name of the customers and the bill but it's not giving me the total sum of all the individual customer orders but only each single order from the customer.
Can someone help and tell me how I can receive on the first position customer x (with the highest accumulated bill) and the sum of all his orders and on position 2 the customer with the second highest accumulated bill and so on?
Big thanks!
Since, I don't know the full structure of your data data frame, I recommend subsetting the relevant columns first:
data = data[["CustomerName", "Bill"]]
Then, you just need to group by CustomerName and sum over all columns (Bill in that case):
data.groupby(by=["CustomerName"]).sum()
Finally, you need to sort by the Bill column in non-ascending order:
data.sort_values(by='Bill', ascending=False)
print(data.head(4))
I have 4 columns which have Date , Account #, Quantity and Sale respectively. I have daily data but I want to be able to show Weekly Sales per Customer and the Quantity.
I have been able to group the column by week, but I also want to group it by OracleNumber, and Sum the Quantity and Sales columns. How would I get that to work without messing up the Week format.
import pandas as pd
names = ['Date','OracleNumber','Quantity','Sale']
sales = pd.read_csv("CustomerSalesNVG.csv",names=names)
sales['Date'] = pd.to_datetime(sales['Date'])
grouped=sales.groupby(sales['Date'].map(lambda x:x.week))
print(grouped.head())
IIUC, you could groupby w.r.t the week column and OracleNumber column by providing an extra key to the list for which the Groupby object has to use and perform sum operation later:
sales.groupby([sales['Date'].dt.week, 'OracleNumber']).sum()