I have an excel file with multiple sheet, I want to group the columns in the logs list and store them in another excel file, but some sheets do not contain some columns, so tf a column does not exist in a sheet don't store it, the code work well but it store only the last column
import pandas as pd
sheets_names = ['R9_14062021','R9_02122020','R9_14062021','R9_28052021','R9_17052021','R9_03052021','R9_14042021','R9_24032020','R9_19032020','R9 30112020','R9_17112020','R7_27012021','LOGS R9 01032021','LOGS R7 SAT01032021','R7_30032020','G9_06032020','G5T_20012021','TNT_08122020','R7_SAT_24112020','G6T_12112020','R9 12102020']
logs = [' Msd','Provider Id','Terminal Type','chgtCh','accessRecordModule','playerPlay startOver','playerPlay PdL','playerPlay PVR','contentHasAds','pdlComplete','lirePdl','lireVod']
dfs_list = pd.read_excel('COMPIL LOGS INDICATEURS V14062021.xlsx',sheet_name = sheets_names )
writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
for sheet in dfs_list:
df = dfs_list[sheet]
df['Dt'] = pd.to_datetime(df['Dt']).dt.date
df1 = df.groupby(['Dt','webApp','mw'])[' Msd'].count()
for log in logs:
if log in df:
df1 = df.groupby(['Dt','webApp','mw'])[log].sum()
df1.to_update.get(sheet)
#df1.reset_index(inplace=True)
df1.to_excel(writer, sheet_name=sheet)
writer.save()
result:
Related
m1 = pd.ExcelFile('C:\\Users\\a\\Desktop\\1.xlsx')
df1 = pd.read_excel(m1, sheet_name='Raw Data')
df1.head()
m2 = pd.read_excel('C:\\Users\\a\\Desktop\\stbd.xlsx')
m2.head()
# drop all values in m1
df1 = df1.drop(labels=range(0 , df1.shape[0]) , axis=0)
df1.shape
I wanted to delete data in a sheet of excel workbook and overwrite the same sheet with new data and to refresh the pivot table in next sheet with new data. But once I use this code part and run it, all other sheets(sheets which has pivot tables) in the workbook is deleted. Can someone support me with this?
You have to read all the sheets of the excel file and then save them all including the changed one
m1 = pd.ExcelFile('C:\\Users\\a\\Desktop\\1.xlsx')
df1 = pd.read_excel(m1, sheet_name=list(m1.sheet_names))
#change the specific sheet
df1["sheet to change"] = 'house' #something to change
writer = pd.ExcelWriter("output_excel_file.xlsx", engine = 'xlsxwriter')
for sheet in m1.sheet_names:
df1[sheet].to_excel(writer,index=False, sheet_name=sheet)
writer.save()
Something like that.
I have two excel files and both of them have 10 worksheets. I wanted to read each worksheets, compare them and print data in 3rd excel file, even that would be written in multiple worksheets.
The below program works for single worksheet
import pandas as pd
df1 = pd.read_excel('zyx_5661.xlsx')
df2 = pd.read_excel('zyx_5662.xlsx')
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer')
with pd.ExcelWriter('xl_join_diff.xlsx') as writer:
df_join.to_excel(writer, sheet_name='testing', index=False)
How can I optimize it to work with multiple worksheets?
I think this should achieve what you need. Loop through each sheet name (assuming they're named the same across both excel documents. If not, you can use numbers instead). Write the new output to a new sheet, and save the excel document.
import pandas as pd
writer = pd.ExcelWriter('xl_join_diff.xlsx')
for sheet in ['sheet1', 'sheet2', 'sheet3']: #list of sheet names
#Pull in data for each sheet, and merge together.
df1 = pd.read_excel('zyx_5661.xlsx', sheet_name=sheet)
df2 = pd.read_excel('zyx_5662.xlsx', sheet_name=sheet)
df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)
df_join = df1.merge(right=df2, left_on=df1.columns.to_list(),
right_on=df2.columns.to_list(), how='outer')
df_join.to_excel(writer, sheet, index=False) #write to excel as new sheet
writer.save() #save excel document once all sheets have been done
You can use the loop to read files and sheets
writer = pd.ExcelWriter('multiple.xlsx', engine='xlsxwriter')
# create writer for writing all sheets in 1 file
list_files=['zyx_5661.xlsx','zyx_5662.xlsx']
count_sheets=0
for file_name in list_files:
file = pd.ExcelFile(file_name)
for sheet_name in file.sheet_names:
df = pd.read_excel(file, sheet_name)
# ... you can do your process
count_sheets=count_sheets + 1
df.to_excel(writer, sheet_name='Sheet-'+count_sheets)
writer.save()
How do I save returned row from dataframe into excel sheet?
Story: Am working with large txt file (1.7M rows), containing postal codes for Canada. I created a dataframe, and extracted values I need into it. One column of the dataframe is the province id (df['PID']). I created a list of the unique values found in that PID column, and am successfully creating the (13) sheets, each named after the unique PID, in a new excel spread sheet.
Problem: Each sheet only contains the headers, and not the values of the row.
I am having trouble writing the matching row to the sheet. Here is my code:
import pandas as pd
# parse text file into dataframe
path = 'the_file.txt'
df = pd.read_csv(path, sep='\t', header=None, names=['ORIG', 'PID','PCODE'], encoding='iso-8859-1')
# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
prov_ids_string = map(str, prov_ids)
# create new excel file
writer = pd.ExcelWriter('CanData.xlsx', engine='xlsxwriter')
for id in prov_ids_string:
mydf = df.loc[df.PID==id]
# NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
mydf.to_excel(writer, sheet_name=id)
writer.save()
I know where the writing should happen, but I haven't gotten the correct result. How can I write only the rows which have matching PID's to their respective sheets?
Thank you
The following should work:
import pandas as pd
import xlsxwriter
# parse text file into dataframe
# extract characters to fill values
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
# create list of unique province ID's
prov_ids = df['PID'].unique().tolist()
#prov_ids_string = map(str, prov_ids)
# create new excel file
writer = pd.ExcelWriter('./CanData.xlsx', engine='xlsxwriter')
for idx in prov_ids:
mydf = df.loc[df.PID==idx]
# NEED TO WRITE VALUES FROM ROW INTO SHEET HERE*
mydf.to_excel(writer, sheet_name=str(idx))
writer.save()
For example data:
df = pd.DataFrame()
df['ORIG'] = ['aaaaaa111111111111111111111',
'bbbbbb2222222222222222222222']
df['ORIG'] = df['ORIG']
df['PID'] = df['ORIG'].str[11:13].astype(int)
df['PCODE'] = df['ORIG'].str[:6]
print(df)
In my Sheet 11, I have:
Kr.
I have input data in the form of a dictionary consisting of 3 dataframes of numbers. I wish to iterate through each dataframe with some operations and then finally write results for each dataframe to excel.
The following code works fine except that it only writes the resulting dataframe for the last key in the dictionary.
How do I get results for all 3 dataframes written to individual sheets?
Input_Data={'k1':test1,'k2':test24,'k3':test3}
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
dff.to_excel('test.xlsx',index=False, header=False)
Your first issue is that with each iteration of the loop you are opening a new file.
As per pandas documentation:
"Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased."
Second, you are not providing a variable sheet name, so each time the data is being re-written as the same sheet.
An example solution, with ExcelWriter
#df1, df2, df3 - dataframes
input_data={
'sheet_name1' : df1,
'sheet_name2' : df2,
'sheet_name3' : df3
}
# Initiate ExcelWriter - use xlsx engine
writer = pd.ExcelWriter('multiple_sheets.xlsx', engine='xlsxwriter')
# Iterate over input_data dictionary
for sheet_name, df in input_data.items():
"""
Perform operations here
"""
# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name=sheet_name)
# Finally, save ExcelWriter to file
writer.save()
Note 1. You only initiate and save the ExcelWriter object once, the iterations only add sheets to that object
Note 2. Compared to your code, the variable "sheet_name" is provided to the "to_excel()" function
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
for sheet_name, df in zip(sheet_names, dfs):
df.to_excel(writer, sheet_name=sheet_name)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Try to change the file name at each iteration:
Input_Data={'k1':test1,'k2':test24,'k3':test3}
file_number = 1
for v in Input_Data.values():
df1 = v[126:236]
df=df1.sort_index(ascending=False)
Indexer=df.columns.tolist()
df = [(pd.concat([df[Indexer[0]],df[Indexer[num]]],axis=1)) for num in [1,2,3,4,5,6]]
df = [(df[num].astype(str).agg(','.join, axis=1)) for num in [0,1,2,3,4,5]]
df=pd.DataFrame(df)
dff=df.loc[0].append(df.loc[1].append(df.loc[2].append(df.loc[3].append(df.loc[4].append(df.loc[5])))))
file_name='test'
file_number=str(file_number)
dff.to_excel( str(file_name+file_number)+".xlsx",index=False, header=False)
file_number=int(file_number)
file_number = file_number+1
I am trying to acomplish something that seems to be very simple using Pandas, but getting stuck.
I want to merge multiple spreadsheets (that have multiple sheets) to one single MasterSpreadSheet with all the sheets.
input example:
spreadsheet1 -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
spreadsheet2 -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
spreadsheet3 ......
output desired:
one single file with the data from all spreadsheets separated by the especific sheetname
MasterSpreadSheet -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
Here is my code that generates that single MasterSpreadSheet, but it overrides the previous spreadsheet data, leaving the MasterFile with only data from the last spreadsheet:
with pd.ExcelWriter(outputfolder + '/' + country + '-MasterSheet.xlsx') as writer:
for spreadsheet in glob.glob(os.path.join(outputfolder, '*-Spreadsheet.xlsx')):
sheets = pd.ExcelFile(spreadsheet).sheet_names
for sheet in sheets:
df = pd.DataFrame()
sheetname = sheet.split('-')[-1]
data = pd.read_excel(spreadsheet, sheet)
data.index = [basename(spreadsheet)] * len(data)
df = df.append(data)
df.to_excel(writer, sheet_name = sheetname)
writer.save()
writer.close()
Suggestions ?
Thank you !
Got that working now :). Have looped and append first sheet by sheet, followed by the spreadsheet file, have also add the pandas concat at the end of it sheet loop:
df1 = []
sheet_list = []
sheet_counter = 0
with pd.ExcelWriter(outputfolder + '/' + country + '-MasterSheet.xlsx') as writer:
for template in glob.glob( os.path.join(templatefolder, '*.textfsm') ):
template_name = template.split('\\')[-1].split('.textfsm')[0]
sheet_list.append(template_name) ## List of Sheets per Spreadsheet file
for sheet in sheet_list:
for spreadsheet in glob.glob(os.path.join(outputfolder, '*-Spreadsheet.xlsx')):
data = pd.read_excel(spreadsheet, sheet_counter)
data.index = [basename(spreadsheet)] * len(data)
df1.append(data)
df1 = pd.concat(df1)
df1.to_excel(writer, sheet)
df1 = []
sheet_counter += 1 ##Adding a counter to get the next Sheet of each Spreadsheet