I'm using to_excel to write multiple DataFrames to multiple Excel documents. This works fine except that the index of the Dataframes is appended in bold with a border around each cell (see image).
The following code is a simplification of the code I use but has the same problem:
import pandas as pd
from openpyxl import load_workbook
df = pd.DataFrame(np.random.randint(50,60, size=(20, 3)))
xls_loc = r'test_doc.xlsx'
wb = load_workbook(xls_loc)
writer = pd.ExcelWriter(xls_loc, engine='openpyxl')
writer.book = wb
df.to_excel(writer, sheet_name='test sheet',index=True,startrow=1,startcol=1, header=False)
writer.save()
writer.close()
Is there a way to append the index without making the index bold and add borders?
Make the index a new column and then set index=False in to_excel()
df.insert(0, 'index', df.index)
You could insert the dataframe using xlwings to avoid formatting:
import pandas as pd
import xlwings as xw
df = pd._testing.makeDataFrame()
with xw.App(visible=False) as app:
wb = xw.Book()
wb.sheets[0]["A1"].value = df
wb.save("test.xlsx")
wb.close()
import pandas as pd
data = [11,12,13,14,15]
df = pd.DataFrame(data)
wb = pd.ExcelWriter('FileName.xlsx', engine='xlsxwriter')
df.style.set_properties(**{'text-align': 'center'}).to_excel(wb, sheet_name='sheet_01',index=False,header=None)
wb.save()
In to_excel() method index=False & header=None is the main trick
Related
You can find what I've tried so far below:
import pandas
from openpyxl import load_workbook
book = load_workbook('C:/Users/Abhijeet/Downloads/New Project/Masterfil.xlsx')
writer = pandas.ExcelWriter('C:/Users/Abhijeet/Downloads/New Project/Masterfiles.xlsx', engine='openpyxl',mode='a',if_sheet_exists='replace')
df.to_excel(writer,'b2b')
writer.save()
writer.close()
Generate Sample data
import pandas as pd
# dataframe Name and Age columns
df = pd.DataFrame({'Col1': ['A', 'B', 'C', 'D'],
'Col2': [10, 0, 30, 50]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('sample.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
This code will add two columns, Col1 and Col2, with data to Sheet1 of sample.xlsx.
To Append data to existing excel
import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.DataFrame({'Col1': ['E','F','G','H'],
'Col2': [100,70,40,60]})
writer = pd.ExcelWriter('sample.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('sample.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'sample.xlsx')
# write out the new sheet
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()
This code will append data at the end of an excel.
Check these as well
how to append data using openpyxl python to excel file from a specified row?
Suppose you have excel file abc.xlsx.
and You Have Dataframe to be appended as "df1"
1.Read File using Pandas
import pandas as pd
df = pd.read_csv("abc.xlsx")
2.Concat Two dataframes and write to 'abc.xlsx'
finaldf = pd.concat(df,df1)
# write finaldf to abc.xlsx and you are done
I am trying to use this code to append a dataframe to an existing sheet in Excel, but instead of appending the new data to it, it creates a new sheet. Here is the code:
import pandas as pd
import openpyxl as op
df = ['normal_dataframe']
with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False)
'test.xlsx' has a 'Sheet1', but when the file is appended, theres 2 sheets. 'Sheet1' and 'Sheet11'.
One approach with COM:
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
path = r'c:\Users\Alex20\Documents\test.xlsx'
wb = xl.Workbooks.Open(path)
ws = wb.Worksheets("Sheet1")
ws.Range("E9:F10").Value = [[9,9],[10,10]]
wb.Close(True)
xl.Quit()
I am trying to write two Pandas dataframes to two different worksheets within the same workbook.
I am using openpyxl 3.0.7 and Pandas 1.2.3.
My workbook's name is 'test.xlsx', and there are two tabs inside: 'Tab1' and 'Tab2'.
Here is the code I am using:
import pandas as pd
from openpyxl import load_workbook
def export(df1, df2):
excelBook = load_workbook('test.xlsx')
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
writer.book = excelBook
writer.sheets = dict((ws.title, ws) for ws in excelBook.worksheets)
df1.to_excel(writer, sheet_name = 'Tab1', index = False)
df2.to_excel(writer, sheet_name = 'Tab2', index = False)
writer.save()
df1 = pd.DataFrame(data = [1,2,3], columns = ['Numbers1'])
df2 = pd.DataFrame(data = [4,5,6], columns = ['Numbers2'])
export(df1, df2)
When running the above code, it executes without error. However, when I go to open test.xlsx in Excel, I get a warning telling me that: "We found a problem with some content in 'test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
When I click "Yes", Excel fixes the issue and my two dataframes are populated on their proper tabs. I can then save the file as a new filename, and the file is no longer corrupted.
Any help is much appreciated!
Try to use one engine to open/write at one time:
import pandas as pd
def export(df1, df2):
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
df1.to_excel(writer, sheet_name = 'Tab1', index = False)
df2.to_excel(writer, sheet_name = 'Tab2', index = False)
writer.save()
The solution to this question is to remove writer.save() from the script. In Pandas versions 1.1.5 and earlier, having this writer.save() did not cause file corruption. However, in versions 1.2.0 and later, this does cause file corruption. The official pandas docs do not show using writer.save after calling pd.ExcelWriter.
So I've been trying to code a script which loads all excel files from a specific location and moves worksheets inside these files into one workbook. I'm ending with and error:
AttributeError: 'DataFrame' object has no attribute 'DataFrame'.
I'm pretty new to this so I would really appreciate any tip on how to make that work. I can stick only
with openpyxl because at the moment I cannot install xlrd module on my workstation.
from pandas import ExcelWriter
import glob
import pandas as pd
import openpyxl
writer = ExcelWriter("output.xlsx")
for filename in glob.glob (r"C:\path\*.xlsx"):
wb = openpyxl.load_workbook(filename)
for ws in wb.sheetnames:
ws = wb[ws]
print (ws)
data = ws.values
columns = next(data)[0:]
df= pd.DataFrame(data, columns=columns)
print(df)
for df in df.DataFrame:
df.to_excel([writer,sheet_name= ws)
writer.save()
first you have to use sheet_name as a string not an object and another thing is last for loop is not needed as we loop through sheet names.
from pandas import ExcelWriter
import glob
import pandas as pd
import openpyxl
writer = ExcelWriter("output.xlsx")
for filename in glob.glob (r"C:\path\*.xlsx"):
wb = openpyxl.load_workbook(filename)
for ws in wb.sheetnames:
ws1 = wb[ws]
data = ws1.values
columns = next(data)[0:]
df= pd.DataFrame(data, columns=columns)
df.to_excel(writer,sheet_name=ws,index = False)
writer.save()
I have an xlsx file with multiple tabs, one of them being Town_names that already has some data in it.
I'd like to overwrite that data with a dataframe - Town_namesDF - while keeping the rest of the xlsx tabs intact.
I've tried the following:
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
writer.save()
writer.close()
But it ends up creating a new tab Town_names1 instead of overwriting the Town_names tab. Am I missing something? Thanks.
Since you want to overwrite, but there is no direct option for that(like in julia's XLSX there is option for cell_ref). Simply delete the duplicate if it exists and then write.
with pd.ExcelWriter('/path/to/file.xlsx',engine = "openpyxl", mode='a') as writer:
workBook = writer.book
try:
workBook.remove(workBook['Town_names'])
except:
print("worksheet doesn't exist")
finally:
df.to_excel(writer, sheet_name='Town_names')
writer.save()
You could try this to store all of the other sheets temporarily and then add them back. I don't think this would save any formulas or formatting though.
Store_sheet1=pd.read_excel('path/to/file.xlsx',sheetname='Sheet1')
Store_sheet2=pd.read_excel('path/to/file.xlsx',sheetname='Sheet2')
Store_sheet3=pd.read_excel('path/to/file.xlsx',sheetname='Sheet3')
with pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a') as writer:
Town_namesDF.to_excel(writer,sheet_name='Town_names')
Store_sheet1.to_excel(writer,sheet_name='Sheet1')
Store_sheet2.to_excel(writer,sheet_name='Sheet2')
Store_sheet3.to_excel(writer,sheet_name='Sheet3')
writer.save()
writer.close()
Well, I've managed to do this. This is not a clean solution and not fast at all, but I've made use of openpyxl documentation for working with pandas found here: https://openpyxl.readthedocs.io/en/latest/pandas.html
I'm effectively selecting the Town_names sheet, clearing it with ws.delete_rows() and then appending each row of my dataframe to the sheet.
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
ws = wb.get_sheet_by_name('Town_names')
ws.delete_rows(0, 1000)
wb.save(r'path/to/file.xlsx')
wb = openpyxl.load_workbook(r'path/to/file.xlsx')
activeSheet = wb.get_sheet_by_name('Town_names')
for r in dataframe_to_rows(Town_namesDF, index=False, header=True):
activeSheet.append(r)
for cell in activeSheet['A'] + activeSheet[1]:
cell.style = 'Pandas'
wb.save(r'path/to/file.xlsx')
A bit messy and I hope there's a better solution than mine, but this worked for me.
since pandas version 1.3.0. there is a new parameter: "if_sheet_exists"
{‘error’, ‘new’, ‘replace’}
pd.ExcelWriter(r'path/to/file.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace')
Hi you could use xlwings for that task. Here is an example.
import xlwings as xw
import pandas as pd
filename = "test.xlsx"
df = pd.read_excel(filename, "Town_names")
# Do your modifications of the worksheet here. For example, the following line "df * 2".
df = df * 2
app = xw.App(visible=False)
wb = xw.Book(filename)
ws = wb.sheets["Town_names"]
ws.clear()
ws["A1"].options(pd.DataFrame, header=1, index=False, expand='table').value = df
# If formatting of column names and index is needed as xlsxwriter does it, the following lines will do it.
ws["A1"].expand("right").api.Font.Bold = True
ws["A1"].expand("down").api.Font.Bold = True
ws["A1"].expand("right").api.Borders.Weight = 2
ws["A1"].expand("down").api.Borders.Weight = 2
wb.save(filename)
app.quit()