Importing Multiple Excel Files using OpenPyXL

Importing Multiple Excel Files using OpenPyXL - python

I am trying to read in multiple excel files and append the data from each file into one master file. Each file will have the same headers (So I can skip the import of the first row after the initial file).
I am pretty new to both Python and the OpenPyXL module. I am able to import the first workbook without problem. My problem comes in when I need to open the subsequent file and copy the data to paste into the original worksheet.
Here is my code so far:
# Creating blank workbook
from openpyxl import Workbook
wb = Workbook()
# grab active worksheet
ws = wb.active
# Read in excel data
from openpyxl import load_workbook
wb = load_workbook('first_file.xlsx') #explicitly loading workbook, will automate later
# grab active worksheet in current workbook
ws = wb.active
#get max columns and rows
sheet = wb.get_sheet_by_name('Sheet1')
print ("Rows: ", sheet.max_row) # for debugging purposes
print ("Columns: ", sheet.max_column) # for debugging purposes
last_data_point = ws.cell(row = sheet.max_row, column = sheet.max_column).coordinate
print ("Last data point in current worksheet:", last_data_point) #for debugging purposes
#import next file and add to master
append_point = ws.cell(row = sheet.max_row + 1, column = 1).coordinate
print ("Start new data at:", append_point)
wb = load_workbook('second_file.xlsx')
sheet2 = wb.get_sheet_by_name('Sheet1')
start = ws.cell(coordinate='A2').coordinate
print("New data start: ", start)
end = ws.cell(row = sheet2.max_row, column = sheet2.max_column).coordinate
print ("New data end: ", end)
# write a value to selected cell
#sheet[append_point] = 311
#print (ws.cell(append_point).value)
#save file
wb.save('master_file.xlsx')
Thanks!

I don't really understand your code. It looks too complicated. When copying between worksheets you probably want to use ws.rows.
wb1 = load_workbook('master.xlsx')
ws2 = wb1.active
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
for row in ws2.rows[1:]:
ws1.append((cell.value for cell in row))

Related

How to copy worksheet onto a different worksheet (not as additional worksheet)

I'm trying to copy a worksheet from one excel file, to a worksheet on different excel file. But the problem is it is adding it as a additional sheet and not pasting on the existing sheet and overwriting the sheet. I know that I am using Before=wb2.Worksheets(1) which adds the new sheet before the existing sheet, but what is the argument to paste onto the existing sheet instead?
import time, os.path, os
from win32com.client import Dispatch
path1 = 'C:\\example.xlsx'
path2 = 'C:\\Todolist2.xlsx'
path3 = 'C:\\example2.xlsx'
xl = Dispatch("Excel.Application")
xl.Visible = True
wb1= xl.Workbooks.Open(Filename=path1)
wb2= xl.Workbooks.Open(Filename=path2)
ws1 = wb1.Worksheets(1)
ws1.Copy(Before=wb2.Worksheets(1))

One way to do it is using openpyxl library. If the cell has style, we can copy it to new sheet also.
import openpyxl as xl
from copy import copy
path1 = 'C:\\example.xlsx'
path2 = 'C:\\Todolist2.xlsx'
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.worksheets[0]
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
if cell.has_style:
ws2[cell.coordinate].font = copy(cell.font)
ws2[cell.coordinate].border = copy(cell.border)
ws2[cell.coordinate].fill = copy(cell.fill)
ws2[cell.coordinate].number_format = copy(cell.number_format)
ws2[cell.coordinate].protection = copy(cell.protection)
ws2[cell.coordinate].alignment = copy(cell.alignment)
wb2.save(path2)
Then you will see your sheet2 is replaced by sheet1.

#tomcy
The code is below. What I am really trying to accomplish is to be able to keep rewriting data to Todolist2.xlsx. I would really want to have Todolist2.xlsx open in excel application in windows and have it update the sheet whenever there is new data. So far I have found two ways to do this. One is the code you are helping me with using openpyxl. Doing it this way I think I will have to write data to Todolist2 then open. Then with new data, it will have to close before writing data back in. Then reopen it again. Below is what I have so far. Using the 10 sleep to allow me the chance to update example.xlsx so as to simulate writing new data to Todolist2. It works the first go, but on the second, it gives me permission denied to Todolist2.
import openpyxl as xl
from copy import copy
import time
path1 = 'C:\\example.xlsx'
path2 = 'C:\\Todolist2.xlsx'
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.worksheets[0]
while True:
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
if cell.has_style:
ws2[cell.coordinate].font = copy(cell.font)
ws2[cell.coordinate].border = copy(cell.border)
ws2[cell.coordinate].fill = copy(cell.fill)
ws2[cell.coordinate].number_format =
copy(cell.number_format)
ws2[cell.coordinate].protection = copy(cell.protection)
ws2[cell.coordinate].alignment = copy(cell.alignment)
wb2.save(path2)
wb2.close()
time.sleep(10) #during this time I will modify example.xlsx and
#save, so on the next go around it rewrites the
#new data to Todolist1.xlsx
The second way I'm trying to solve this is with win32com. This allows me to keep Todolist2 open in excel in windows while it writes to it from example, example1, then example2. But the problem is, it does not write on the activesheet, it keeps adding additional sheets. So on this one, If I can find a way to keep rewriting over the active sheet in Todolist2 or after it adds the additional sheet, if I can only delete one sheet i'm golden.
import time, os.path, os
from win32com.client import Dispatch
path1 = 'C:\\example.xlsx'
path2 = 'C:\\Todolist2.xlsx'
path3 = 'C:\\example2.xlsx'
path4 = 'C:\\example3.xlsx'
xl = Dispatch("Excel.Application")
xl.Visible = True
wb1= xl.Workbooks.Open(Filename=path1)
wb2= xl.Workbooks.Open(Filename=path2)
ws1 = wb1.Worksheets(1)
ws1.Copy(Before=wb2.Worksheets(1))
time.sleep(5)
wb3= xl.Workbooks.Open(Filename=path3)
ws3 = wb3.Worksheets(1)
ws2 = wb2.Worksheets(3) #it seems by using (3) is the only way it
#allows me to delete one sheet before it
#adds another.
ws2.Delete()
ws3.Copy(Before=wb2.Worksheets(1))
time.sleep(5)
wb4= xl.Workbooks.Open(Filename=path4)
ws4 = wb4.Worksheets(1)
ws2.Delete() #I got into trouble here, and throws error even
#though it does the delete and copy
ws4.Copy(Before=wb2.Worksheets(1))

How to copy a worksheet from one excel file to another?
from openpyxl import load_workbook
from copy import copy
def copySheet(target, source):
for (row, col), source_cell in source._cells.items():
target_cell = target.cell(column=col, row=row)
target_cell._value = source_cell._value
target_cell.data_type = source_cell.data_type
if source_cell.has_style:
target_cell.font = copy(source_cell.font)
target_cell.border = copy(source_cell.border)
target_cell.fill = copy(source_cell.fill)
target_cell.number_format = copy(source_cell.number_format)
target_cell.protection = copy(source_cell.protection)
target_cell.alignment = copy(source_cell.alignment)
if source_cell.hyperlink:
target_cell._hyperlink = copy(source_cell.hyperlink)
if source_cell.comment:
target_cell.comment = copy(source_cell.comment)
for attr in ('row_dimensions', 'column_dimensions'):
src = getattr(source, attr)
trg = getattr(target, attr)
for key, dim in src.items():
trg[key] = copy(dim)
trg[key].worksheet = trg
target.sheet_format = copy(source.sheet_format)
target.sheet_properties = copy(source.sheet_properties)
target.merged_cells = copy(source.merged_cells)
target.page_margins = copy(source.page_margins)
target.page_setup = copy(source.page_setup)
target.print_options = copy(source.print_options)
"copy to"
wb1 = load_workbook(path_to)
target = wb1.create_sheet("lol")
"copy from"
wb2 = load_workbook(path_from)
source = wb2.active
copySheet(target=target, source=source)
wb1.save("fusion.xlsx")

How to paste to a specific column with python excel

I am wanting to copy and paste data from a csv to an excel so I can later filter that table. I have done all these steps in VBA but I've noticed that VBA can be buggy so am wanting to migrate to Python.
I have converted the csv to an excel and I have successfully copied the converted xlsx file to the excel document.
My question is, how do I copy and paste to a specific starting column. As I have other data I need to copy at cell AN1.
I have tried the below.. I am able to write to one specific cell but I am wanting to post the data...
for row in ws1:
for cell in row:
ws2['K1'].value
#ws2[cell.coordinate].value = cell.value
wb2.save(path2)
Entirety...
## csv to xlsx
from openpyxl import Workbook
import csv
wb = Workbook()
ws = wb.active
with open('C:/B.csv', 'r') as f:
for row in csv.reader(f):
ws.append(row)
wb.save('C:/B.xlsx')
###### COPY FROM B to existing E workbook
import openpyxl as xl
path1 = 'C:/B.xlsx'
path2 = 'C:/E.xlsx'
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.worksheets[0]
#ws2 = wb2.create_sheet(ws1.title)
#cell.value = ['A2']
for row in ws1:
for cell in row:
ws2.cell(row=1, column=1).value = cell.value
wb2.save(path2)

Copying columns between two different workbooks using openpyxl could be done as follows:
import openpyxl
wb1 = openpyxl.load_workbook('B.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('E.xlsx')
ws2 = wb2.active
for src, dst in zip(ws1['B:B'], ws2['AN:AN']):
dst.value = src.value
wb2.save('E.xlsx')
For a range of columns, the following would work:
import openpyxl
wb1 = openpyxl.load_workbook('B.xlsx')
ws1 = wb1.active
wb2 = openpyxl.load_workbook('E.xlsx')
ws2 = wb2.active
for src, dst in zip(ws1['A:I'], ws2['AN:AV']):
for cell_src, cell_dst in zip(src, dst):
cell_dst.value = cell_src.value
wb2.save('E.xlsx')

for row in range(1, ws1.max_row + 1):
#for cell in row:
ws1.column_dimensions.group('A', 'D', hidden=True)
sheet.cell(row=i + 2, column=k + 1).value = val
wb2.save(path2)
Should do it

Unfortunately the solutions provide were very much unacceptable as they did not work. VBA is also off the table. I am using openpyxl and the above created an error. Ideally I would like to copy to a new column, but that is beyond my skill. Instead use the below and use excel formulas to get the data where you want. I will have to spend about 4 hours redesigning my excel but worth it I suppose as I am unable to find a workaround.
## csv to xlsx
from openpyxl import Workbook
import csv
wb = Workbook()
ws = wb.active
with open('C/B.csv', 'r') as f:
for row in csv.reader(f):
ws.append(row)
wb.save('C:/B.xlsx')
###### COPY FROM B to existing E workbook
import openpyxl as xl
path1 = 'C:/B.xlsx'
path2 = 'C:/E.xlsx'
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.worksheets[0]
#ws2 = wb2.create_sheet(ws1.title)
#cell.value = ['A2']
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save(path2)

Improving python code for updating google spreadsheets

First of all, I am new to python (practically I have learned only from Sololearn, that too only up to half course). So I request you to give me a little bit detailed answer.
My task has following broad steps:-
Delete old .xlsx file(if any)
Convert two .xls files into .xlsx file using win32, delete the first row and then delete .xls file [weird xls files already downloaded into source directory + xlrd,pyexcel show error (unsupported format or corrupt) file in opening .xls file (online analysis of file predicts it to be html/htm) ]
Get data from xlsx file
First, delete old worksheet on google spreadsheet to remove old data. Create a new worksheet with the same name. Insert data into new worksheet on the google spreadsheet.
Open second sheet(which imports data from the first sheet) and update one cell in Dummy Sheet to make sure google spreadsheet is synchronised in the background.
Now, I wrote a code by combining many codes and by using a lot of google.
The code is working fine but it takes on an avg about 65 seconds to complete the whole process.
My question has 3 parts:-
Is there any way to directly access data from .xls file?
Is there any way this code's performance can be improved.
Any other more efficient method for completing the above-said task?
My Code:-
import time
import win32com.client as win32
import os
import openpyxl
from openpyxl.utils import get_column_letter
import gspread
from oauth2client.service_account import ServiceAccountCredentials
start = time.time()
# set input-output file locations
source_dir = "C:\\Users\\XYZ\\Downloads"
output_dir = "C:\\Users\\XYZ\\Excels"
# use creds to create a client to interact with the Google Drive API
# make sure to share files with email contained in json file
scope = ['https://spreadsheets.google.com/feeds']
# code will not work without json file
creds = ServiceAccountCredentials.from_json_keyfile_name("C:\\Users\\XYZ\\your.json", scope)
gc = gspread.authorize(creds)
# following code is to open any spreadsheet by name
sh = gc.open("First Sheet")
def save_as_xlsx(input_file,output_dir,output_file_name) :
# call excel using win32, then open .xls file
# delete first row and then save as .xlsx
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(input_file)
wbk = excel.ActiveWorkbook
sheet = wbk.Sheets(1)
sheet.Rows(1).Delete()
wb.SaveAs(output_dir + '\\' + output_file_name, FileFormat = 51)
#FileFormat = 51 is for .xlsx extension. FileFormat = 56 is for .xls extension
wb.Close()
excel.Application.Quit()
return True
def get_the_data_from_xlsx(output_dir,output_file_name) :
# use openpyxl.load to find out last cell of file
# store cell values in list called data
wb = openpyxl.load_workbook(output_dir + '\\' + output_file_name)
sheet = wb.active
max_row_no = sheet.max_row
max_column_no = sheet.max_column
max_column = get_column_letter(max_column_no)
last_cell = str(max_column) + str(max_row_no)
cell_addresses = sheet['A1' : last_cell]
data = []
for i in cell_addresses :
for e in i :
data.append(e.value)
return (data,last_cell)
def insert_data_into_spreadsheet(name_of_worksheet,data,last_cell) :
# Find a workbook by name in already opened spreadsheet
# delete the worksheet to clear old data
# create worksheet with same name to maintain import connections in sheets.
worksheet = sh.worksheet(name_of_worksheet)
sh.del_worksheet(worksheet)
worksheet = sh.add_worksheet(title=name_of_worksheet, rows="500", cols="30")
# store range of cells for spreadsheet in list named cell_list
cell_list = worksheet.range('A1' + ':' + str(last_cell))
# attach all the values from data list as per the cell_list
a = 0
for cell in cell_list :
cell.value = data[a]
a = a + 1
# update all cells stored in cell_list in one go
worksheet.update_cells(cell_list)
def delete_file(directory,file_initials) :
for filename in os.listdir(directory) :
if filename.startswith(file_initials) :
os.unlink(directory +"\\" + filename)
# check if files are in source_dir
for filename in os.listdir(source_dir) :
# check for file1.xls and set input_file name if any file exists.
if filename.startswith("file1"):
input_file = source_dir + "\\file1.xls"
output_file1 = "output_file1.xlsx"
# detect and delete any old file in output directory
delete_file(output_dir,"output_file1")
if save_as_xlsx(input_file,output_dir,output_file1) == True :
# delete the file from source directory after work is done
delete_file(source_dir,'file1')
# get data from new xlsx file
data_from_xlsx = get_the_data_from_xlsx(output_dir,output_file1)
data_to_spreadsheet = data_from_xlsx[0]
last_cell = data_from_xlsx[1]
# insert updated data into spreadsheet
insert_data_into_spreadsheet("file1_data",data_to_spreadsheet,last_cell)
# repeat the same process for 2nd file
if filename.startswith('file2'):
input_file = source_dir + "\\file2.xls"
output_file2 = "output_file2.xlsx"
delete_file(output_dir,"output_file2")
if save_as_xlsx(input_file,output_dir,output_file2) == True :
delete_file(source_dir,'file2')
data_from_xlsx = get_the_data_from_xlsx(output_dir,output_file2)
data_to_spreadsheet = data_from_xlsx[0]
last_cell = data_from_xlsx[1]
insert_data_into_spreadsheet("file2_data",data_to_spreadsheet,last_cell)
# open spreadsheet by name and open Dummy worksheet
# update one cell to sync the sheet with other sheets
sh = gc.open("second sheet")
worksheet = sh.worksheet("Dummy")
worksheet.update_acell('B1', '=Today()')
end = time.time()
print(end-start)

Print Excel workbook using python

Suppose I have an excel file excel_file.xlsx and i want to send it to my printer using Python so I use:
import os
os.startfile('path/to/file','print')
My problem is that this only prints the first sheet of the excel workbook but i want all the sheets printed. Is there any way to print the entire workbook?
Also, I used Openpyxl to create the file, but it doesn't seem to have any option to select the number of sheets for printing.
Any help would be greatly appreciated.

from xlrd import open_workbook
from openpyxl.reader.excel import load_workbook
import os
import shutil
path_to_workbook = "/Users/username/path/sheet.xlsx"
worksheets_folder = "/Users/username/path/worksheets/"
workbook = open_workbook(path_to_workbook)
def main():
all_sheet_names = []
for s in workbook.sheets():
all_sheet_names.append(s.name)
for sheet in workbook.sheets():
if not os.path.exists("worksheets"):
os.makedirs("worksheets")
working_sheet = sheet.name
path_to_new_workbook = worksheets_folder + '{}.xlsx'.format(sheet.name)
shutil.copyfile(path_to_workbook, path_to_new_workbook)
nwb = load_workbook(path_to_new_workbook)
print "working_sheet = " + working_sheet
for name in all_sheet_names:
if name != working_sheet:
nwb.remove_sheet(nwb.get_sheet_by_name(name))
nwb.save(path_to_new_workbook)
ws_files = get_file_names(worksheets_folder, ".xlsx")
# Uncomment print command
for f in xrange(0, len(ws_files)):
path_to_file = worksheets_folder + ws_files[f]
# os.startfile(path_to_file, 'print')
print 'PRINT: ' + path_to_file
# remove worksheets folder
shutil.rmtree(worksheets_folder)
def get_file_names(folder, extension):
names = []
for file_name in os.listdir(folder):
if file_name.endswith(extension):
names.append(file_name)
return names
if __name__ == '__main__':
main()
probably not the best approach, but it should work.
As a workaround you can create separate .xlsx files where each has only one spreadsheet and then print them with os.startfile(path_to_file, 'print')

I have had this issue(on windows) and it was solved by using pywin32 module and this code block(in line 5 you can specify the sheets you want to print.)
import win32com.client
o = win32com.client.Dispatch('Excel.Application')
o.visible = True
wb = o.Workbooks.Open('/Users/1/Desktop/Sample.xlsx')
ws = wb.Worksheets([1 ,2 ,3])
ws.printout()

you could embed vBa on open() command to print the excel file to a default printer using xlsxwriter's utility mentioned in this article:
PBPYthon's Embed vBA in Excel

Turns out, the problem was with Microsoft Excel,
os.startfile just sends the file to the system's default app used to open those file types. I just had to change the default to another app (WPS Office in my case) and the problem was solved.

Seems like you should be able to just loop through and change which page is active. I tried this and it did print out every sheet, BUT for whatever reason on the first print it grouped together two sheets, so it gave me one duplicate page for each workbook.
wb = op.load_workbook(filepath)
for sheet in wb.sheetnames:
sel_sheet = wb[sheet]
# find the max row and max column in the sheet
max_row = sel_sheet.max_row
max_column = sel_sheet.max_column
# identify the sheets that have some data in them
if (max_row > 1) & (max_column > 1):
# Creating new file for each sheet
sheet_names = wb.sheetnames
wb.active = sheet_names.index(sheet)
wb.save(filepath)
os.startfile(filepath, "print")

How to stop openpyxl - python from clearing my excel file every time I re-run the program?

I wrote a simple program for testing with openpyxl where I simply open the .xlsx file, input data into a certain cell, then close the program and run it again, inputting data in a different cell, but when I open the .xlsx after running the program for the second.
My assumption is that openpyxl clears the entire .xlsx file everytime you open it again, is there a way to avoid this?
Here is my code:
from openpyxl import Workbook
wb = Workbook()
dest_filename = 'teste.xlsx'
ws = wb.active
ws.title = "2017"
Row = int(input('row: '))
Column = int(input('column: '))
data = input('data: ')
ws.cell(row = Row, column = Column).value = data
wb.save(filename = dest_filename)
Here is the .xlsx file after running the program for the first time
Here is the .xlsx file after running the program for the second time

You have not read the excel file at all:
Use this to read the existing workbook:
from openpyxl import Workbook,load_workbook
import os
dest_filename = 'teste.xlsx'
if os.path.isfile(dest_filename):
wb = load_workbook(filename = dest_filename)
else:
wb = Workbook()
ws = wb.active
ws.title = "2017"
Row = int(input('row: '))
Column = int(input('column: '))
data = input('data: ')
ws.cell(row = Row, column = Column).value = data
wb.save(filename = dest_filename)
Output:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Importing Multiple Excel Files using OpenPyXL - python

Related

How to copy worksheet onto a different worksheet (not as additional worksheet)

How to paste to a specific column with python excel

Improving python code for updating google spreadsheets

Print Excel workbook using python

How to stop openpyxl - python from clearing my excel file every time I re-run the program?

Categories

Resources