Loop on a data frame to create multiple excels/csv

Loop on a data frame to create multiple excels/csv - python

I am new to python programming, and I am trying to think how to create multiple excels from a data frame. I have a Pandas data frame as shown below,
Invoice No.
Voucher ID
MHI000000038710
100039
MHI000000038711
100043
MHI000000038712
100043
I am trying to create excels for every Invoice No. from data frame. For the above example, the output would be 3 excels named after each Invoice No. (i.e., MHI000000038710, MHI000000038711, MHI000000038712).
And each excel should contains rows grouped by Voucher ID
Excel 1(MHI000000038710.xlsx): -
Invoice No.
Voucher ID
MHI000000038710
100039
Excel 2(MHI000000038711.xlsx): -
Invoice No.
Voucher ID
MHI000000038711
100043
MHI000000038712
100043
Excel 3(MHI000000038712.xlsx): -
Invoice No.
Voucher ID
MHI000000038711
100043
MHI000000038712
100043

First iterate over a list of unique voucher IDs and select only rows matching this ID.
For each temporary dataframe (df_singleID) created this way, iterate over the unique Invoice No. and use this name to create your output file name:
Assuming that your pandas dataframe is named df, this can be done as shown below:
for myid in df['VoucherID'].unique()
df_singleID = df[df['VoucherID']==myid]
for myinvoice in df_singleID['Invoice No.'].unique():
df_singleID.to_excel('./'+str(myinvoice)+'.xlsx')

i believe this should work:
for ind, row in df.iterrows():
df.loc[df["Voucher ID"] == row["Voucher ID"]].to_excel(f'{row["Invoice No."]}.xlsx')

Related

Pandas Naming Group by aggregated column

I'm trying to summarize a dataset of top 50 novels sold.
I want to create a table of authors and the numbers of book they have written.
I used the following code:
df.Author.value_counts().sort_values(ascending = False)
how can I name the column that lists the value count for each author?

You can check below snippet
top_50= [x for x in df.Author.value_counts().sort_values(ascending=False).head(50).count()]

Removing rows in DF based on counting filtered data pandas

I have a dataframe that contains ID & month of transaction.
I want to keep only the stores that have transactions in 12 months.
I tried first to filter by unique as follows:
df.groupby('STORE_NBR')['MONTH'].nunique()
I got from the code the store ID and the number of months. The problem is not all stores IDs appeared so I couldn't get them to drop.
sample of data :
enter image description here

Try this:
df.groupby('STORE_NBR').filter(lambda group: group['MONTH'].nunique() >= 12)

Add a suffix number after each iteration when writing pandas data frame to excel file

I'm performing calculations on a double for loop that has a unique list of products and unique list of customers. I want to write out each pandas data frame of the product/customer combo to an excel file and add a number each time by 1. So essentially something like
for product in product_list:
for customer in customer_list:
dataframe = data[(data.Product==product) & (data.Customer==customer)]
# read to excel file:
dataframe.to_excel('df1.xlsx)
where the code would write out the first dataframe and call it 'df1.xlsx', then 'df2.xlsx', df3.xlsx', etc.
thanks!

You could easily add some sort of counter like this.
cnt = 0
for product in product_list:
for customer in customer_list:
dataframe = data[(data.Product==product) & (data.Customer==customer)]
# read to excel file:
cnt += 1
dataframe.to_excel(f'df{cnt}.xlsx')
However, why not add the product and customer to the filename(s) so you can they be more easily identifiable?
dataframe.to_excel(f'{product}-{customer}.xlsx')

how to aggregate daily activities of users into weekly

I have the following tables,first one (vle) has behavioral activities ( many types of activities, some shown in the activity type column), and the other (UsersVle) has users' activities.The date column represents a day and starts from 0 till 222. I want to aggregate users' activities into weeks based on the activity types. For example in a week1 user1 will have columns as the number of activities types, and each column includes the total sum_clicks during that week. I wonder how I can do that in a pandas data frame using python?
I will appreciate your help.

Derive a new field called WEEK from date (you haven't provided enough info about date to suggest how to translate it to a week (e.g. 1 = Jan 1st?))
Join your two tables. Is id_site in table 2 a foreign key for id_site in table 1? If so, combined_df = table2.merge(table1, on='id_site'). Now, you should have all the fields in a single data frame.
Pivot like this: user_summary_by_week = pd.pivot_table(combined_df, index=['id_user', 'WEEK'], columns='activity_type', aggfunc='sum', fill_value=0).reset_index(col_level=1)

Pandas groupby on two column and create new column in excel based on result

I have excel file which i am reading in jupyter.
It has three column :
1) Webinar ID: (66 unique value)
2) Email: email id of participants (which can log out in session and join again so duplicate email id for same webinar id)
3) Time in session (minutes): participant present in session, since he might log out and again log in, there are multiple entries.
Code Used:
data_group = data.groupby(['Webinar ID', 'Email'])
data_group['Time in Session (minutes)'].sum()
I want to create new column in excel and store Sum of Time in Session (minutes) information for same - Webinar ID and Email
Thanks!!

IIUC, you wish to create a new column with the sum of times per webinar group and email.
Let's use groupby with transform:
data['Sum Session Minutes'] = (data.groupby(['Webinar ID','Email'])['Time in Session (minutes)']
.transform('sum'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop on a data frame to create multiple excels/csv - python

i believe this should work: for ind, row in df.iterrows(): df.loc[df["Voucher ID"] == row["Voucher ID"]].to_excel(f'{row["Invoice No."]}.xlsx')

Related

Pandas Naming Group by aggregated column

Removing rows in DF based on counting filtered data pandas

Add a suffix number after each iteration when writing pandas data frame to excel file

how to aggregate daily activities of users into weekly

Pandas groupby on two column and create new column in excel based on result

Categories

Resources