I'm trying to overwrite one sheet of my excel file with data from a .txt file. The excel file I'm bringing the data into has several sheets but I only want to overwrite the 'Previous Month' sheet. Every time I run this code and open the excel file only the previous month sheet is there and nothing else. Many solutions on here show how to add more sheets, I'm trying to update an already existing sheet in an excel with 8 sheets total.
How can I fix my code so that only the one sheet is edited but all of them stay there?
import pandas as pd
#importing previous month data#
writer = pd.ExcelWriter('file.xlsx')
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name='Previous Month', startrow=4, startcol=2)
writer.save()
writer.close()
Edited code- whatever is happening here keeps corrupting my original file
import pandas as pd
import openpyxl
#importing previous month data#
writer= pd.ExcelWriter('file.xlsx', mode= 'a', engine="openpyxl", if_sheet_exists="replace")
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name="Previous Month", startrow=2, startcol=4)
writer.save()
writer.close()
You can use openpyxl.load_workbook() to do what you are looking for. While I did try the above suggestions, it didn't work for me. the load_workbook() usually runs without issues. So, hope this works for you as well.
I open the output file using load_workbook(), deleted the existing sheet (Sheet2 here) if it exists, then create and write the data using create_sheet() and dataframe_to_rows (Ref). Let me know in case of questions/issues.
import pandas as pd
import openpyxl
df = pd.read_csv('file.txt', sep='\t')
wb=openpyxl.load_workbook('output.xlsx') # Open workbook
if "Sheet2" in wb.sheetnames: # If sheet exists, delete it
del wb['Sheet2']
ws = wb.create_sheet(title='Sheet2') # Create new sheet
from openpyxl.utils.dataframe import dataframe_to_rows
rows = dataframe_to_rows(df, index=False, header=True) # Write dataframe as rows
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+2, column=c_idx+4, value=value) #Add... the 2, 4 are the offset, similar to the startrow and startcol in your code
wb.save('output.xlsx')
Related
I have an excel data for three variables (Acct, Order, Date) in a Sheet name called Orders
I have created a data frame by reading this Sheet
import pandas as pd
sheet_file=pd_ExcelFile("Orders.xlsx", engine="openpyxl")
for sheet_name in worksheets:
df=pd.read_excel(sheet_file,sheet_name,header=1)
append_data.append(df)
append_data=pd.concat(append_data)
I have another Excel file called "Total_Orders.xlsx" with ~100k rows and I need to append the above dataframe to this excel file (Sheet Name="Orders")
with pd.ExcelWriter('Total_Orders.xlsx',sheet_name='Orders',engine="openpyxl") as writer:
append_data.to_excel(writer,startrow=2,header=False,index=False)
writer.save()
The above is overwriting the data instead of appending it. I know startrow is the key here but I am not sure how to fix this. Any help is much appreciated
Have you tried in mode="a", along these lines:
with pd.ExcelWriter("Total_Orders.xlsx", mode="a", engine="openpyxl") as writer:
append_data.to_excel(writer, sheet_name="Orders")
EDIT - in response to comment
import pandas as pd
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
append_data = pd.DataFrame([{'Acct':3, 'Order':333, 'Note':'third'},
{'Acct':4, 'Order':444, 'Note':'fourth'}])
wb = load_workbook(filename = "stackoverflow.xlsx")
ws = wb["Orders"]
for r in dataframe_to_rows(append_data, index=False, header=False): #No index and don't append the column headers
ws.append(r)
wb.save("stackoverflow.xlsx")
The stackoverflow.xlsx before:
The stackoverflow.xlsx after (the 'Other' sheet was not affected):
I’m really stuck on what should be an easy problem.
I have an excel workbook that I’m making an update to 2 Columns for one record for the clean_data sheet. From there, I’m saving and closing the file.
After that, I’m trying to pull in the updated roll up sheet values as a data frame (graphs_rolling) which has formulas utilizing the clean_data sheet.
When I view the data frame, all the values are Nan. I can open the exel file and see the updated values on the graphs_rolling sheet. What can I do to fix the data frame to populate with values?
Code is shown below:
import pandas as pd
import openpyxl
from openpyxl import load_workbook
#Import Data with Correct Rows and Columns for SSM Commercial
book = load_workbook('//CPI Projects//Test//SampleSSM//NewSSM.xlsx')
writer = pd.ExcelWriter('//CPI Projects//Test//SampleSSM//NewSSM.xlsx', engine = 'openpyxl')
writer.book = book
df1 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='clean_data')
df1.loc[df1['ev_id']==20201127, 'commercial_weight'] = 0 df1.loc[df1['ev_id']==20201127, 'commercial'] = 0
book.remove(book['clean_data'])
df1.to_excel(writer, sheet_name = 'clean_data',index=False)
writer.save()
writer.close()
df5 = pd.read_excel('//CPI Projects//Test//SampleSSM//NewSSM.xlsx',sheet_name='graphs_rolling_avg',skiprows=30)
print(df5)
I am trying to create a repository "Master" excel file from a CSV which will be generated and overwritten every couple of hours. The code below creates a new excel file and writes the content from "combo1.csv" to "master.xlsx". However, whenever the combo1 file is updated, the code basically overwrites the contents in the "master.xlsx" file. I need to append the contents from "combo1" to "Master" without the headers being inserted every time. Can someone help me with this?
import pandas as pd
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
df = pd.read_csv('combo1.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
Refer to Append Data at the End of an Excel Sheet section in this medium article:
Using Python Pandas with Excel Sheets
(Credit to Nensi Trambadiya for the article)
Basically you'll have to first read the Excel file and find the number of rows before pushing the new data.
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
First read the excel file and then need to perform below method to append the rows.
import pandas as pd
from xlsxwriter import load_workbook
df = pd.DataFrame({'Name': ['abc','def','xyz','ysv'],
'Age': [08,45,32,26]})
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
writer.book = load_workbook('Master.xlsx')
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.read_csv('combo.csv')
writer = pd.ExcelWriter('master.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('master.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'master.xlsx')
# write out the new sheet
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
Note that a Master has to be created before running the script
I am trying to add an empty excel sheet into an existing Excel File using python xlsxwriter.
Setting the formula up as follows works well.
workbook = xlsxwriter.Workbook(file_name)
worksheet_cover = workbook.add_worksheet("Cover")
Output4 = workbook
Output4.close()
But once I try to add further sheets with dataframes into the Excel it overwrites the previous excel:
with pd.ExcelWriter('Luther_April_Output4.xlsx') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
How should I write the code, so that I can add empty sheets and existing data frames into an existing excel file.
Alternatively it would be helpful to answer how to switch engines, once I have produced the Excel file...
Thanks for any help!
If you're not forced use xlsxwriter try using openpyxl. Simply pass 'openpyxl' as the Engine for the pandas built-in ExcelWriter class. I had asked a question a while back on why this works. It is helpful code. It works well with the syntax of pd.to_excel() and it won't overwrite your already existing sheets.
from openpyxl import load_workbook
import pandas as pd
book = load_workbook(file_name)
writer = pd.ExcelWriter(file_name, engine='openpyxl')
writer.book = book
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
writer.save()
You could use pandas.ExcelWriter with optional mode='a' argument for appending to existing Excel workbook.
You can also append to an existing Excel file:
>>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer:`
... df.to_excel(writer, sheet_name='Sheet3')`
However unfortunately, this requires using a different engine, since as you observe the ExcelWriter does not support the optional mode='a' (append). If you try to pass this parameter to the constructor, it raises an error.
So you will need to use a different engine to do the append, like openpyxl. You'll need to ensure that the package is installed, otherwise you'll get a "Module Not Found" error. I have tested using openpyxl as the engine, and it is able to append new a worksheet to existing workbook:
with pd.ExcelWriter(engine='openpyxl', path='Luther_April_Output4.xlsx', mode='a') as writer:
data_DifferingRates.to_excel(writer, sheet_name='Differing Rates')
data_DifferingMonthorYear.to_excel(writer, sheet_name='Differing Month or Year')
data_DoubleEntries.to_excel(writer, sheet_name='Double Entries')
I think you need to write the data into a new file. This works for me:
# Write multiple tabs (sheets) into to a new file
import pandas as pd
from openpyxl import load_workbook
Work_PATH = r'C:\PythonTest'+'\\'
ar_source = Work_PATH + 'Test.xlsx'
Output_Wkbk = Work_PATH + 'New_Wkbk.xlsx'
# Need workbook from openpyxl load_workbook to enumerage tabs
# is there another way with only xlsxwriter?
workbook = load_workbook(filename=ar_source)
# Set sheet names in workbook as a series.
# You can also set the series manually tabs = ['sheet1', 'sheet2']
tabs = workbook.sheetnames
print ('\nWorkbook sheets: ',tabs,'\n')
# Replace this function with functions for what you need to do
def default_col_width (df, sheetname, writer):
# Note, this seems to use xlsxwriter as the default engine.
for column in df:
# map col width to col name. Ugh.
column_width = max(df[column].astype(str).map(len).max(), len(column))
# set special column widths
narrower_col = ['OS','URL'] #change to fit your workbook
if column in narrower_col: column_width = 10
if column_width >30: column_width = 30
if column == 'IP Address': column_width = 15 #change for your workbook
col_index = df.columns.get_loc(column)
writer.sheets[sheetname].set_column(col_index,col_index,column_width)
return
# Note nothing is returned. Writer.sheets is global.
with pd.ExcelWriter(Output_Wkbk,engine='xlsxwriter') as writer:
# Iterate throuth he series of sheetnames
for tab in tabs:
df1 = pd.read_excel(ar_source, tab).astype(str)
# I need to trim my input
df1.drop(list(df1)[23:],axis='columns', inplace=True, errors='ignore')
try:
# Set spreadsheet focus
df1.to_excel(writer, sheet_name=tab, index = False, na_rep=' ')
# Do something with the spreadsheet - Calling a function
default_col_width(df1, tab, writer)
except:
# Function call failed so just copy tab with no changes
df1.to_excel(writer, sheet_name=tab, index = False,na_rep=' ')
If I use the input file name as the output file name, it fails and erases the original. No need to save or close if you use With... it closes autmatically.
I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.
I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.
The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).
I'm wondering if there's a way to either...
write a helper function that logs the keystrokes of applying the "text to columns" function call
write a helper function to double click and return down each row of the column of dates
from openpyxl import load_workbook
import pandas as pd
def transfer_hours(report_name, ER_hours_analysis_wb):
df = pd.read_csv(report_name, index_col=0)
book = load_workbook(ER_hours_analysis_wb)
sheet_name = "ER Work Log"
with pd.ExcelWriter("ER Hours Analysis 248112.xlsx",
engine='openpyxl') as writer:
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheet_name,
startrow=1, startcol=0, engine='openpyxl')
Use the xlsx module
import xlsx
load_workbook ( filen = (filePath, read_only=False, data_only=False )
Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.
As great a tool as pandas is designed to be, in this case there may not be a reason to include.
Here is a shorter structure for what you're trying to accomplish:
import csv
import datetime
from openpyxl import load_workbook
def transfer_hours(report_name, ER_hours_analysis_wb):
wb = load_workbook(ER_hours_analysis_wb)
ws = wb['ER Work Log']
csvfile = open(report_name, 'rt')
reader = csv.reader(csvfile,delimiter=',')
#iterators
rownum = 0
colnum = 0
for row in reader:
for col in row:
dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
ws.cell(column=colnum,row=rownum).value = dttm
wb.save('new_spreadsheet.xlsx')
What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:
for row in reader:
ws.cell(column=0,row=rownum,value=row[0])
dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
ws.cell(column=1,row=rownum).value = dttm
For reference:
https://openpyxl.readthedocs.io/en/stable/usage.html
In Python, how do I read a file line-by-line into a list?
How to format columns with headers using OpenPyXL