Loop on a data frame to create multiple excels/csv - python

I am new to python programming, and I am trying to think how to create multiple excels from a data frame. I have a Pandas data frame as shown below,
Invoice No.
Voucher ID
MHI000000038710
100039
MHI000000038711
100043
MHI000000038712
100043
I am trying to create excels for every Invoice No. from data frame. For the above example, the output would be 3 excels named after each Invoice No. (i.e., MHI000000038710, MHI000000038711, MHI000000038712).
And each excel should contains rows grouped by Voucher ID
Excel 1(MHI000000038710.xlsx): -
Invoice No.
Voucher ID
MHI000000038710
100039
Excel 2(MHI000000038711.xlsx): -
Invoice No.
Voucher ID
MHI000000038711
100043
MHI000000038712
100043
Excel 3(MHI000000038712.xlsx): -
Invoice No.
Voucher ID
MHI000000038711
100043
MHI000000038712
100043

First iterate over a list of unique voucher IDs and select only rows matching this ID.
For each temporary dataframe (df_singleID) created this way, iterate over the unique Invoice No. and use this name to create your output file name:
Assuming that your pandas dataframe is named df, this can be done as shown below:
for myid in df['VoucherID'].unique()
df_singleID = df[df['VoucherID']==myid]
for myinvoice in df_singleID['Invoice No.'].unique():
df_singleID.to_excel('./'+str(myinvoice)+'.xlsx')

i believe this should work:
for ind, row in df.iterrows():
df.loc[df["Voucher ID"] == row["Voucher ID"]].to_excel(f'{row["Invoice No."]}.xlsx')

Related

Pandas Naming Group by aggregated column

I'm trying to summarize a dataset of top 50 novels sold.
I want to create a table of authors and the numbers of book they have written.
I used the following code:
df.Author.value_counts().sort_values(ascending = False)
how can I name the column that lists the value count for each author?
You can check below snippet
top_50= [x for x in df.Author.value_counts().sort_values(ascending=False).head(50).count()]

Removing rows in DF based on counting filtered data pandas

I have a dataframe that contains ID & month of transaction.
I want to keep only the stores that have transactions in 12 months.
I tried first to filter by unique as follows:
df.groupby('STORE_NBR')['MONTH'].nunique()
I got from the code the store ID and the number of months. The problem is not all stores IDs appeared so I couldn't get them to drop.
sample of data :
enter image description here
Try this:
df.groupby('STORE_NBR').filter(lambda group: group['MONTH'].nunique() >= 12)

Add a suffix number after each iteration when writing pandas data frame to excel file

I'm performing calculations on a double for loop that has a unique list of products and unique list of customers. I want to write out each pandas data frame of the product/customer combo to an excel file and add a number each time by 1. So essentially something like
for product in product_list:
for customer in customer_list:
dataframe = data[(data.Product==product) & (data.Customer==customer)]
# read to excel file:
dataframe.to_excel('df1.xlsx)
where the code would write out the first dataframe and call it 'df1.xlsx', then 'df2.xlsx', df3.xlsx', etc.
thanks!
You could easily add some sort of counter like this.
cnt = 0
for product in product_list:
for customer in customer_list:
dataframe = data[(data.Product==product) & (data.Customer==customer)]
# read to excel file:
cnt += 1
dataframe.to_excel(f'df{cnt}.xlsx')
However, why not add the product and customer to the filename(s) so you can they be more easily identifiable?
dataframe.to_excel(f'{product}-{customer}.xlsx')

how to aggregate daily activities of users into weekly

I have the following tables,first one (vle) has behavioral activities ( many types of activities, some shown in the activity type column), and the other (UsersVle) has users' activities.The date column represents a day and starts from 0 till 222. I want to aggregate users' activities into weeks based on the activity types. For example in a week1 user1 will have columns as the number of activities types, and each column includes the total sum_clicks during that week. I wonder how I can do that in a pandas data frame using python?
I will appreciate your help.
Derive a new field called WEEK from date (you haven't provided enough info about date to suggest how to translate it to a week (e.g. 1 = Jan 1st?))
Join your two tables. Is id_site in table 2 a foreign key for id_site in table 1? If so, combined_df = table2.merge(table1, on='id_site'). Now, you should have all the fields in a single data frame.
Pivot like this: user_summary_by_week = pd.pivot_table(combined_df, index=['id_user', 'WEEK'], columns='activity_type', aggfunc='sum', fill_value=0).reset_index(col_level=1)

Pandas groupby on two column and create new column in excel based on result

I have excel file which i am reading in jupyter.
It has three column :
1) Webinar ID: (66 unique value)
2) Email: email id of participants (which can log out in session and join again so duplicate email id for same webinar id)
3) Time in session (minutes): participant present in session, since he might log out and again log in, there are multiple entries.
Code Used:
data_group = data.groupby(['Webinar ID', 'Email'])
data_group['Time in Session (minutes)'].sum()
I want to create new column in excel and store Sum of Time in Session (minutes) information for same - Webinar ID and Email
Thanks!!
IIUC, you wish to create a new column with the sum of times per webinar group and email.
Let's use groupby with transform:
data['Sum Session Minutes'] = (data.groupby(['Webinar ID','Email'])['Time in Session (minutes)']
.transform('sum'))

Categories

Resources