Deleting sheet in excel using python - python

m1 = pd.ExcelFile('C:\\Users\\a\\Desktop\\1.xlsx')
df1 = pd.read_excel(m1, sheet_name='Raw Data')
df1.head()
m2 = pd.read_excel('C:\\Users\\a\\Desktop\\stbd.xlsx')
m2.head()
# drop all values in m1
df1 = df1.drop(labels=range(0 , df1.shape[0]) , axis=0)
df1.shape
I wanted to delete data in a sheet of excel workbook and overwrite the same sheet with new data and to refresh the pivot table in next sheet with new data. But once I use this code part and run it, all other sheets(sheets which has pivot tables) in the workbook is deleted. Can someone support me with this?

You have to read all the sheets of the excel file and then save them all including the changed one
m1 = pd.ExcelFile('C:\\Users\\a\\Desktop\\1.xlsx')
df1 = pd.read_excel(m1, sheet_name=list(m1.sheet_names))
#change the specific sheet
df1["sheet to change"] = 'house' #something to change
writer = pd.ExcelWriter("output_excel_file.xlsx", engine = 'xlsxwriter')
for sheet in m1.sheet_names:
df1[sheet].to_excel(writer,index=False, sheet_name=sheet)
writer.save()
Something like that.

Related

How to add a empty column to a specific sheet in excel using panda?

I have an excel file that contains 3 sheets (PizzaHut, InAndOut, ColdStone). I want to add an empty column to the InAndOut sheet.
path = 'C:\\testing\\test.xlsx'
data = pd.ExcelFile(path)
sheets = data.sheet_names
if 'InAndOut' in sheets:
something something add empty column called toppings to the sheet
data.to_excel('output.xlsx')
Been looking around, but I couldn't find an intuitive solution to this.
Any help will be appreciated!
Read in the sheet by name.
Do what you need to do.
Overwrite the sheet with the modified data.
sheet_name = 'InAndOut'
df = pd.read_excel(path, sheet_name)
# Do whatever
with pd.ExcelWriter(path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name, index=False)
See pd.read_excel and pd.ExcelWriter.

python add multiple columns to excel file one by one

I have an excel file with multiple sheet, I want to group the columns in the logs list and store them in another excel file, but some sheets do not contain some columns, so tf a column does not exist in a sheet don't store it, the code work well but it store only the last column
import pandas as pd
sheets_names = ['R9_14062021','R9_02122020','R9_14062021','R9_28052021','R9_17052021','R9_03052021','R9_14042021','R9_24032020','R9_19032020','R9 30112020','R9_17112020','R7_27012021','LOGS R9 01032021','LOGS R7 SAT01032021','R7_30032020','G9_06032020','G5T_20012021','TNT_08122020','R7_SAT_24112020','G6T_12112020','R9 12102020']
logs = [' Msd','Provider Id','Terminal Type','chgtCh','accessRecordModule','playerPlay startOver','playerPlay PdL','playerPlay PVR','contentHasAds','pdlComplete','lirePdl','lireVod']
dfs_list = pd.read_excel('COMPIL LOGS INDICATEURS V14062021.xlsx',sheet_name = sheets_names )
writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
for sheet in dfs_list:
df = dfs_list[sheet]
df['Dt'] = pd.to_datetime(df['Dt']).dt.date
df1 = df.groupby(['Dt','webApp','mw'])[' Msd'].count()
for log in logs:
if log in df:
df1 = df.groupby(['Dt','webApp','mw'])[log].sum()
df1.to_update.get(sheet)
#df1.reset_index(inplace=True)
df1.to_excel(writer, sheet_name=sheet)
writer.save()
result:

Append to Existing excel without changing formatting

I'm trying to take data from other excel/csv sheets and append them to an existing workbook/worksheet. The code below works in terms of appending, however, it removes formatting for not only the sheet I've appended, but all other sheets in the workbook.
From what I understand, the reason this happens is because I'm reading the entire ExcelWorkbook as a dictionary, turning it into a Pandas Dataframe, and then rewriting it back into Excel. But I'm not sure how to go about it otherwise.
How do I need to modify my code to make it so that I'm only appending the data I need, and leaving everything else untouched? Is pandas the incorrect way to go about this?
import os
import pandas as pd
import openpyxl
#Read Consolidated Sheet as Dictionary
#the 'Consolidation' excel sheet has 3 sheets:
#Consolidate, Skip1, Skip2
ws_dict = pd.read_excel('.\Consolidation.xlsx', sheet_name=None)
#convert relevant sheet to datafarme
mod_df = ws_dict['Consolidate']
#check that mod_df is the 'Consolidate' Tab
mod_df
#do work on mod_df
#grab extra sheets with data and make into pd dataframes
excel1 = 'doc1.xlsx'
excel2 = 'doc2.xlsx'
df1 = pd.read_excel(excel1)
df1 = df1.reset_index(drop=True)
df2 = pd.read_excel(excel2)
df2 = df2.reset_index(drop=True)
#concate the sheets
mod_df = pd.concat([mod_df, df1, df2], axis=0, ignore_index = True, sort=False)
#reassign the modified to the sheet
ws_dict['Consolidate'] = mod_df
#write to the consolidation workbook
with pd.ExcelWriter('Consolidation.xlsx', engine='xlsxwriter') as writer:
for ws_name, df_sheet in ws_dict.items():
df_sheet.to_excel(writer, sheet_name=ws_name, index=False)

Python Pandas - merge multiple spreadsheets that contains multiple sheets to a single MasterSpreadsheet having all the sheets

I am trying to acomplish something that seems to be very simple using Pandas, but getting stuck.
I want to merge multiple spreadsheets (that have multiple sheets) to one single MasterSpreadSheet with all the sheets.
input example:
spreadsheet1 -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
spreadsheet2 -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
spreadsheet3 ......
output desired:
one single file with the data from all spreadsheets separated by the especific sheetname
MasterSpreadSheet -> sheetname_a, sheetname_b, sheetname_c, sheetname_d
Here is my code that generates that single MasterSpreadSheet, but it overrides the previous spreadsheet data, leaving the MasterFile with only data from the last spreadsheet:
with pd.ExcelWriter(outputfolder + '/' + country + '-MasterSheet.xlsx') as writer:
for spreadsheet in glob.glob(os.path.join(outputfolder, '*-Spreadsheet.xlsx')):
sheets = pd.ExcelFile(spreadsheet).sheet_names
for sheet in sheets:
df = pd.DataFrame()
sheetname = sheet.split('-')[-1]
data = pd.read_excel(spreadsheet, sheet)
data.index = [basename(spreadsheet)] * len(data)
df = df.append(data)
df.to_excel(writer, sheet_name = sheetname)
writer.save()
writer.close()
Suggestions ?
Thank you !
Got that working now :). Have looped and append first sheet by sheet, followed by the spreadsheet file, have also add the pandas concat at the end of it sheet loop:
df1 = []
sheet_list = []
sheet_counter = 0
with pd.ExcelWriter(outputfolder + '/' + country + '-MasterSheet.xlsx') as writer:
for template in glob.glob( os.path.join(templatefolder, '*.textfsm') ):
template_name = template.split('\\')[-1].split('.textfsm')[0]
sheet_list.append(template_name) ## List of Sheets per Spreadsheet file
for sheet in sheet_list:
for spreadsheet in glob.glob(os.path.join(outputfolder, '*-Spreadsheet.xlsx')):
data = pd.read_excel(spreadsheet, sheet_counter)
data.index = [basename(spreadsheet)] * len(data)
df1.append(data)
df1 = pd.concat(df1)
df1.to_excel(writer, sheet)
df1 = []
sheet_counter += 1 ##Adding a counter to get the next Sheet of each Spreadsheet

Turn Xlsxwriter sheet into Pandas Dataframe

I have a DataFrame read from an excel sheet in which I've made a few new columns to using Xlsxwriter. Now I need to filter this new set of data using the new column I created in Xlsxwriter (which is a date column btw). Is there a way to turn this new worksheet into a dataframe again so I can filter the new column? I'll try to provide any useful code:
export = "files/extract.xlsx"
future_days = 12
writer = pd.ExcelWriter('files/new_report-%s.xlsx' % (date.today()), engine ='xlsxwriter')
workbook = writer.book
df = pd.read_excel(export)
df.to_excel(writer, 'Full Log', index=False)
log_sheet = writer.sheets['Full Log']
new_headers = ('todays date', 'Milestone Date')
log_sheet.write_row('CW1', new_headers)
# This for loop just writes in the formula for my new columns on every line
for row_num in range(2, len(df.index)+2):
log_sheet.write_formula('CX' + str(row_num),'=IF(AND($BS{0}>1/1/1990,$BT{0}<>"Yes"),IF($BS{0}<=$CW{0},$BS{0},"Date In Future"),IF(AND($BW{0}>1/1/1990,$BX{0}<>"Yes"),IF($BW{0}<=CW{0},$BW{0},"Date In Future"),IF(AND($CA{0}>1/1/1990,$CCW{0}<>"Yes"),IF($CA{0}<=CW{0},$CA{0},"Date In Future"),IF(AND($CE{0}>1/1/1990,$CF{0}<>"Yes"),IF($CE{0}<CW{0},$CE{0},"Date In Future"),IF(AND($CI{0}>1/1/1990,$CJ{0}<>"Yes"),IF($CI{0}<CW{0},$CI{0},"Date In Future"),IF(AND($CM{0}>1/1/1990,$CN{0}<>"Yes"),IF($CM{0}<CW{0},$CM{0},"Date In Future"),"No Date"))))))'.format(row_num))
log_sheet.write_formula('CW' + str(row_num), '=TODAY()+' + str(future_days))
log_sheet.write_formula('CY' + str(row_num), '=IF(AND(AI{0}>DATEVALUE("1/1/1900"), AH{0}>DATEVALUE("1/1/1900"),A{0}<>"Test",A{0}<>"Dummy Test"),NETWORKDAYS(AH{0},AI{0}-1),"Test")'.format(row_num))
So now that's all done I need to filter this "full log" sheet so it only gets data where the values in the new milestone date column have passed the date of today. I've used Xlsxwriters Autofilter for this but I don't like it as it doesn't actually apply the filter. just sets it.
You can call the save function on the writer then load the file into a new dataframe
writer.save()
df2 = pd.read_excel('Full Log')

Categories

Resources