Insert Pandas Dataframe in Excel sheet between preexisting data - python

My current project deals with writing data from one excel file to a specific format, chosen by the user. The format is saved in a folder as excel file where headers and some other text (which will always stay the same) is already in the file, and the only thing that needs to be done is to fill the file with data.
For this I would like to "simply" insert my pandas dataframe at a certain row, so that neither the header nor the footer will be overwritten.
Here an example format:
And how I want the result to look like:
I already managed to write the data to the file below the header row, but it overwrites the footer. This is the code that does exactly that:
fileName = saveFolder+"test.xlsx"
shutil.copyfile(format_path, fileName)
book = load_workbook(fileName)
writer = pd.ExcelWriter(fileName, engine='openpyxl')
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}
df.to_excel(writer, sheet_name="Tabelle1", startrow=1)
writer.close()
If this is not possible the only workaround I can think of is to read the format used, save it in python, write the header in the given format (background colour, fontsize,...), then the data, and then the footer.
However, if I remember correctly when reading text python will not remember which words are written in bold, and which words are normal. If someone, however, knows how to do this, I would also very much appreciate comments that try to solve my issue in that direction.

To preserve the existing format, etc. you will need insert data into specific cells and openpyxl will allow you to do that. to_excel() will overwrite the worksheet you are trying to add the data. There are some gaps in the question, but I will try to answer it the best I can. Below is the code which will:
Assume there is a dataframe existing with a few rows of data
The program will open the template file (like in the screen shot shared)
Add that dataframe (insert rows and add data without header) to the file (screen shot shared)
Save it as a new file
I am assuming there is just one sheet in template and you are writing to that. You can use for loops to add more sheets or new files
Note that the new rows of data will need to be inserted so that the footer will move down and NOT get deleted. The format, color, etc. of the template will remain intact.
dataframe df (including header)
Name Age Nationality
ABC 12 US
DEF 111 UK
GHI 22 India
JKL 49 Japan
Code
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
file = 'inputfile.xlsx' ## Your template file
wb = openpyxl.load_workbook(filename=file)
ws = wb.active ## You can ws = wb['Sheet1'] if you want to specify a specific sheet
ws.insert_rows(idx=2, amount=len(df)) ## Insert as many rows as in df (4 in our case) after row 1
rows = dataframe_to_rows(df, index=False, header=None)
for r_idx, row in enumerate(rows, 2):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value) ##Add the data
wb.save('NewFile.xlsx') ##Your output file
Template (inputfile.xlsx)
Output (Newfile.xlsx)

Related

How to modify xml tables inside excel with openpyxl?

I'm trying fo fill a table (inside an xlsx template for Dynamics NAV) with openpyxl, but when I open the file with Excel it promts an alert: “We found a problem with some content in <Excel_filename>. Do you want us to try recovering the file as much as we can? If you trust the source of this workbook, then click Yes”
Then Excel 'repairs' the file and I can still see the data but the table /xl/tables/table1.xml is gone, and Navision can't accept the file.
This is my code in python:
import openpyxl
wb = openpyxl.load_workbook("data_source.xlsx", data_only=True)
sheet1 = wb.active
wb2 = openpyxl.load_workbook('template.xlsx')
sheet2 = wb2.active
filas = sheet1.max_row
for fila in range(3,filas):
sheet2["A"+ str(fila)] = sheet1["A"+ str(fila)].value
sheet2["B"+ str(fila)] = sheet1["B"+ str(fila)].value
sheet2["C"+ str(fila)] = "FRA"
sheet2["D"+ str(fila)] = "NAC"
wb2.save('tax1.xlsx')
wb2.close()
When I create a table from zero with the code they show in the openpyxl official site:
https://openpyxl.readthedocs.io/en/latest/worksheet_tables.html#creating-a-table
it works fine only if the table starts from row one (ref="A1:E5").
...but this template has a table that starts from row 3!
So when I try to make the table I need (ref="A3:D6") I get this: 'UserWarning: File may not be readable: column headings must be strings.' and as expected, I get the same alert and the same result when I open it with Excel.
Is there a way to modify/fill a table without corrupting the xlsx file?
or, like a workaround
Is there a way to create a table from A3 with no errors?
Thanks in advance

How can I use xlsxwriter and Excelwriter together to populate an excel file which contains some formatted text as well as python exported DataFrame?

I am writing a program in python that will work on a csv dataset. The aim of the code is to export the aggregated output in an excel, along with hardcoded header information. The excel file is a final report that is sent to the client; means full and final (as shown in below image).
This is how I begin:
I first created a typical header of the report using xlsxwriter and then tried to export the summary DataFrame (i.e. main table output) below the header using Excelwriter- DataFrame.to_excel
But as soon as I paste the DataFrame in the excel template I created in above step, the initially created header is wiped out, the cells appear blank. And only the DataFrame (table output) is displayed.
Alternatively, if I first export the dataFrame to the excel and then try to add a header to the excel report, header remains but the DataFrame is gone now.
What should I do in order to retain both; dataframe (table output) and header information? I used xlsxwritter, excelwriter of pandas.
Below are few lines of codes that might be important to explain the story.
writer = pd.ExcelWriter('SampleReport_1.xlsx', engine='xlsxwriter')
workbook = writer.book
WS_I = workbook.add_worksheet('I')
with ExcelWriter('SampleReport_1.xlsx') as writer:
dfI.to_excel(writer, sheet_name='I', header=True, index=True, startrow=9, startcol=0, engine=xlsxwriter, merge_cells=True)
writer.save()
cell_format1 = workbook.add_format()
cell_format1.set_bold()
for worksheet in workbook.worksheets():
worksheet.write('A1', 'Exhibit', cell_format1)
workbook.close()
I think I now get what the problem is. I tried the following approach on a sample Excel sheet with a preexiting header and it worked:
First - save your dataframe to csv
SampleReport_1_csv = df.to_csv (r'C:\Users\[your path]\SampleReport_1.csv', index = None, header=True)
Then open the Excel sheet with the preformatted header and (depending on your version of Excel) import the csv file by going to "data/get external data/from text". Importantly, in the last import step, Excel will ask you for the cell into which the csv is to be imported - make sure that that cell is under the preformatted block (I think it's cell A10, in your case).
Let me know if that works.
I found the below solution:
writer = pd.ExcelWriter('SampleReport_1.xlsx', engine='xlsxwriter')
workbook = writer.book
WS_I = workbook.add_worksheet('I')
cell_format1 = workbook.add_format()
cell_format1.set_bold()
for worksheet in workbook.worksheets():
worksheet.write('A1', 'Exhibit', cell_format1)
workbook.close()
WS_I.add_table('A3:G6', {'data': dfI_new.values.tolist(),'header_row': True})
Create a 2nd excel file with formatted template and link each cell to the 1st excel file. As long as you enable the link, changes to your python output should populate in the "template" excel file.

Copying several columns from a csv file to an existing xls file using Python

I'm pretty new to Python but I was having some difficulty on getting started on this. I am using Python 3.
I've googled and found quite a few python modules that help with this but was hoping for a more defined answer here. So basically, I need to read from a csv file certain columns i.e G, H, I, K, and M. The ones I need aren't consecutive.
I need to read those columns from the csv file and transfer them to empty columns in an existing xls with data already in it.
I looked in to openpyxl but it doesn't seem to work with csv/xls files, only xlsx.
Can I use xlwt module to do this?
Any guidance on which module may work best for my usecase would be greatly appreciated. Meanwhile, i'm going to tinker around with xlwt/xlrd.
I recommend using pandas. It has convenient functions to read and write csv and xls files.
import pandas as pd
from openpyxl import load_workbook
#read the csv file
df_1 = pd.read_csv('c:/test/test.csv')
#lets say df_1 has columns colA and colB
print(df_1)
#read the xls(x) file
df_2=pd.read_excel('c:/test/test.xlsx')
#lets say df_2 has columns aa and bb
#now add a column from df_1 to df_2
df_2['colA']=df_1['colA']
#save the combined output
writer = pd.ExcelWriter('c:/test/combined.xlsx')
df_2.to_excel(writer)
writer.save()
#alternatively, if you want to add just one column to an existing xlsx file:
#i.e. get colA from df_1 into a new dataframe
df_3=pd.DataFrame(df_1['colA'])
#create writer using openpyxl engine
writer = pd.ExcelWriter('c:/test/combined.xlsx', engine='openpyxl')
#need this workaround to provide a list of work sheets in the file
book = load_workbook('c:/test/combined.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
column_to_write=16 #this would go to column Q (zero based index)
writeRowIndex=0 #don't plot row index
sheetName='Sheet1' #which sheet to write on
#now write the single column df_3 to the file
df_3.to_excel(writer, sheet_name=sheetName, columns =['colA'],startcol=column_to_write,index=writeRowIndex)
writer.save()
You could try XlsxWriter , which is fully featured python module for writing Excel 2007+ XLSX file format.
https://pypi.python.org/pypi/XlsxWriter

Exporting plain text header and image to Excel

I am fairly new to Python, but I'm getting stuck trying to pass an image file into a header during the DataFrame.to_excel() portion of my file.
Basically what I want is a picture in the first cell of the Excel table, followed by a couple of rows (5 to be exact) of text which will include a date (probably from datetime.date.today().ctime() if possible).
I already have the code to output the table portion as:
mydataframe.to_excel(my_path_name, sheet_name= my_sheet_name, index=False, startrow=7,startcol=0)
Is there a way to output the image and text portion directly from Python?
UPDATE:
For clarity, mydataframe is exporting the meat and potatoes of the worksheet (data rows and columns). I already have it starting on row 7 of the worksheet in Excel. The header portion is the trouble spot.
I found the solution and thanks for all of the help.
The simple answer is to use the xlsxwriter package as the engine. In other words assume that the image is saved at the path /image.png. Then the code to insert the data into the excel file with the image located at the top of the data would be:
# Importing packages and storing string for image file
import pandas as pd
import xlsxwriter
import numpy as np
image_file = '/image.png'
# Creating a fictitious data set since the actual data doesn't matter
dataframe = pd.DataFrame(np.random.rand(5,2),columns=['a','b'])
# Opening the xlsxwriter object to a path on the C:/ drive
writer = pd.ExcelWriter('C:/file.xlsx',engine='xlsxwriter')
dataframe.to_excel(writer,sheet_name = 'Arbitrary', startrow=3)
# Accessing the workbook / worksheet
workbook = writer.book
worksheet = writer.sheets['Arbitrary']
# Inserting the image into the workbook in cell A1
worksheet.insert_image('A1',image_file)
# Closing the workbook and saving the file to the specified path and filename
writer.save()
And now I have an image on the top of my excel file. Huzzah!

xlsxwriter: is there a way to open an existing worksheet in my workbook?

I'm able to open my pre-existing workbook, but I don't see any way to open pre-existing worksheets within that workbook. Is there any way to do this?
You cannot append to an existing xlsx file with xlsxwriter.
There is a module called openpyxl which allows you to read and write to preexisting excel file, but I am sure that the method to do so involves reading from the excel file, storing all the information somehow (database or arrays), and then rewriting when you call workbook.close() which will then write all of the information to your xlsx file.
Similarly, you can use a method of your own to "append" to xlsx documents. I recently had to append to a xlsx file because I had a lot of different tests in which I had GPS data coming in to a main worksheet, and then I had to append a new sheet each time a test started as well. The only way I could get around this without openpyxl was to read the excel file with xlrd and then run through the rows and columns...
i.e.
cells = []
for row in range(sheet.nrows):
cells.append([])
for col in range(sheet.ncols):
cells[row].append(workbook.cell(row, col).value)
You don't need arrays, though. For example, this works perfectly fine:
import xlrd
import xlsxwriter
from os.path import expanduser
home = expanduser("~")
# this writes test data to an excel file
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
sheet1 = wb.add_worksheet()
for row in range(10):
for col in range(20):
sheet1.write(row, col, "test ({}, {})".format(row, col))
wb.close()
# open the file for reading
wbRD = xlrd.open_workbook("{}/Desktop/test.xlsx".format(home))
sheets = wbRD.sheets()
# open the same file for writing (just don't write yet)
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
# run through the sheets and store sheets in workbook
# this still doesn't write to the file yet
for sheet in sheets: # write data from old file
newSheet = wb.add_worksheet(sheet.name)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
newSheet.write(row, col, sheet.cell(row, col).value)
for row in range(10, 20): # write NEW data
for col in range(20):
newSheet.write(row, col, "test ({}, {})".format(row, col))
wb.close() # THIS writes
However, I found that it was easier to read the data and store into a 2-dimensional array because I was manipulating the data and was receiving input over and over again and did not want to write to the excel file until it the test was over (which you could just as easily do with xlsxwriter since that is probably what they do anyway until you call .close()).
After searching a bit about the method to open the existing sheet in xlxs, I discovered
existingWorksheet = wb.get_worksheet_by_name('Your Worksheet name goes here...')
existingWorksheet.write_row(0,0,'xyz')
You can now append/write any data to the open worksheet.
You can use the workbook.get_worksheet_by_name() feature:
https://xlsxwriter.readthedocs.io/workbook.html#get_worksheet_by_name
According to https://xlsxwriter.readthedocs.io/changes.html the feature has been added on May 13, 2016.
"Release 0.8.7 - May 13 2016
-Fix for issue when inserting read-only images on Windows. Issue #352.
-Added get_worksheet_by_name() method to allow the retrieval of a worksheet from a workbook via its name.
-Fixed issue where internal file creation and modification dates were in the local timezone instead of UTC."
Although it is mentioned in the last two answers with it's documentation link, and from the documentation it seems indeed there are new methods to work with the "worksheets", I couldn't able to find this methods in the latest package of "xlsxwriter==3.0.3"
"xlrd" has removed support for anything other than xls files now.
Hence I was able to workout with "openpyxl" this gives you the expected functionality as mentioned in the first answer above.

Categories

Resources