I created a pandas dataframe in my code and tried to append the final output to an existing Excel workbook. The existing workbook is called "Directory" and has three different sheets in it. I want to append my output to one of the sheets called "raw_data in the workbook". This sheet already has some data in it but the columns in this sheet match the columns in my new dataframe. Here is my code:
from pandas import ExcelWriter
from pandas import ExcelFile
from openpyxl import Workbook
with pd.ExcelWriter(r'C:\Users\Documents\Directory.xlsx', engine ='openpyxl', mode='a') as writer:
df.to_excel(writer, sheet_name = 'raw_data', index = False, header = False)
writer.save()
writer.close()
My code "runs" without any error but when I check the workbook after running the code, my code doesn't append my data frame to the specified sheet, "raw_data", but creates a new sheets called "raw_data1" and store the data in that tab. I couldn't figure out which part in my code is incorrect. Could anyone please help me with this? Thank you.
Related
I'm trying to append some information to an excel using pandas.
My excel has several sheets, most of them with formulas.
I'm only trying to replace the cells from a specific sheet sheet2.
Function:
def write():
df = pd.read_excel(path, "sheet2")
df.loc['A','B'] = 10
with pd.ExcelWriter(path, engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
df.to_excel(writer, sheet_name="sheet2")
The problem
The sheet cell values get replaced BUT in other sheets the cells with formulas are empty.
When opening the excel itself in protected view they are empty, but when editing they reappear.
Help.
I'm a super beginner and still learning Python.
I have an excel workbook which contains multiple sheets and only want certain sheets to be copied and pasted in a new created worbook and Im having some troubles.
below is my code.
import pandas as pd
import openpyxl
df = pd.read_excel('AMT.xlsb', sheet_name=['Roster','LOA'])
# print whole sheet data
with pd.ExcelWriter('output.xlsx') as writer:
df.to_excel(writer, sheet_name=['Roster','LOA'])
I get an error "IndexError: At least one sheet must be visible", none of the sheets from the AMT file are hidden.
Looks like you may be converting your frame to a dict - Try this:
import pandas as pd
import openpyxl
df = pd.read_excel('AMT.xlsb', sheet_name='Roster')
df1 = pd.read_excel('AMT.xlsb', sheet_name='LOA')
# print whole sheet data
with pd.ExcelWriter('output.xlsx') as writer:
df.to_excel(writer, sheet_name="Roster", index=False)
df1.to_excel(writer, sheet_name="LOA", index=False)
You may still have some clean up after...
I have a excel workbook that has more than one worksheets (i.e. sheet1 and sheet2)
and i did like this:
import pandas
df1 = pandas.read_excel('file.xlsx', sheet_name='sheet1')
####doing something on shee1, sheet2 is not touched######
df1.to_excel('file.xlsx', sheet_name='sheet1')
By doing above, I found sheet2 missing after saving the file.
Is there a way to open and save on same file without affecting other worksheets?
A possible way to do that is by loading all of your sheets, then modifying only the first one. Although it works, you may loose any custom styling from your tables.
# Load all sheets
workbook = pd.read_excel('file.xlsx', sheet_name=None)
# do something to workbook['sheet1']
# Write all sheets to excel file
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
for sheet, df in workbook.items():
df.to_excel(writer, sheet_name=sheet)
writer.save()
As far as I know, the only way to overwrite a sheet ─ while keeping the other ones untouched ─ requires using third-party libraries. For instance,
here's an option with openpyxl:
First, modify the data as you wish:
import pandas as pd
fname = 'file.xlsx'
target_sheet = 'sheet1'
df = pd.read_excel('file.xlsx', sheet_name='sheet1')
# further modification to `df` ...
then, save it to the specified sheet:
# Load required functions
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Read excel file (all sheets)
wb = load_workbook(fname)
# Get the index from target-sheet
idx = wb.sheetnames.index(target_sheet)
# Delete the existing target-sheet
del wb[target_sheet]
# Create a new empty target-sheet
wb.create_sheet(target_sheet, idx)
# Write `df` data on it
for r in dataframe_to_rows(df, index=False, header=True):
wb[target_sheet].append(r)
# Save file
wb.save(fname)
I suspect something is going on in the below section that is throwing off the code:
####doing something on ws1, ws2 is not touched######
When I ran your code on my system the workbook still returned both worksheets
As an isolation test can you comment/remove the code in that section and confirm if the error still appears.
I just want to overwrite certain column base on my dataframe. Suppose df2 is my dataframe.
Below is the code i use. The problem is its overwrite the other columns and row even though i code it to start on columns 80.
I want its overwrite on column 80 and beyond only, but not before the column 80. 80 is index, not name.
import pandas as pd
import xlsxwriter
df2 = pd.read_excel(r'C:\Users\RUI LEONHART\Google Drive\Shop\STOCK V2.xlsx',
usecols=['XS1','S1','M1','L1','XL1','XXL1'])
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df2.to_excel(writer, sheet_name='Sheet1', startcol=80)
# Get the xlsxwriter objects from the dataframe writer object.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Close the Pandas Excel writer and output the Excel file.
writer.save()
I search around the solution. The closest one is this
python: update dataframe to existing excel sheet without overwriting contents on the same sheet and other sheets
but still overwrite the columns and row that i dont want.
I want to import the values from a Pandas dataframe into an existing Excel sheet. I want to insert the data inside the sheet without deleting what is already there in the other cells (like formulas using those datas etc).
I tried using data.to_excel like:
writer = pd.ExcelWriter(r'path\TestBook.xlsm')
data.to_excel(writer, 'Sheet1', startrow=1, startcol=11, index = False)
writer.save()
The problem is that this way i overwrite the entire sheet.
Is there a way to only add the dataframe? It would be perfect if I could also keep the format of the destination cells.
Thanks
I found a good solution for it. Xlwings natuarally supports pandas dataframe:
https://docs.xlwings.org/en/stable/datastructures.html#pandas-dataframes
The to_excel function provides a mode parameter to insert (w) of append (a) a data frame into an excel sheet, see below example:
with pd.ExcelWriter(p_file_name, mode='a') as writer:
df.to_excel(writer, sheet_name='Data', startrow=2, startcol=2)