I want to loop over multiple files, creating new formulas for each file as I go. To do that I want to concatenate a string with a variable, then use excel wings create a new formula. To that end I've written this code:
file_name=r'[Element_IA_Gross.xlsx]'
equation=r'=SUM(\''+file_name+'Total Immediates\'!$R6:$T6)'
ws.range("O19").value= [equation]
If I print the variable equation I see what I want in the excel formula bar. However using the above code doesn't work? Any help much appreciated.
Thanks
The formula looks for the workbook in the default location so opens a file search window and waits for the input.
The following works based on why it worked for me by opening both workbooks, adding the formula to 'book1', save 'book1' and then close both books.
import xlwings as xw
file_name = 'Element_IA_Gross.xlsx'
workbook = 'Book1.xlsx'
with xw.App() as app:
wb1 = xw.Book(workbook)
wb2 = xw.Book(file_name)
ws = wb1.sheets('Sheet1')
equation = "=SUM('[" + file_name + "]Total Immediates'!$R6:$T6)"
ws["O19"].value = equation
wb1.save(workbook)
wb1.close()
wb2.close()
-------------Option 2 ----------
This option also worked including the whole path and filename in the formula.
import os
import xlwings as xw
path = r'F:\Projects\xlwings_test'
file_name = 'Element_IA_Gross.xlsx'
link = os.join(path, file_name)
workbook = 'Book1.xlsx'
with xw.App() as app:
wb1 = xw.Book(workbook)
ws = wb1.sheets('Sheet1')
equation = "SUM('[" + link + "]Total Immediates'!$R6:$T6)"
ws["O19"].value = equation
wb1.save(workbook)
wb1.close()
Related
I need to copy cells like hyperlinks from one Excel file to another. I can't find anything relating to this problem. I can copy cells values but it's not what I need.
I tried to modify some examples of coping cells from one book to another, but it wasn't success
To copy the value in each cell from a source workbook (in this example 'foo1.xlsx') to a new workbook (destination workbook) and have the destination cells link back to the source cells
from openpyxl import load_workbook, Workbook
from openpyxl.worksheet.hyperlink import Hyperlink
source_path = "foo1.xlsx"
source_sheet = 'Sheet1'
source_wb = load_workbook(source_path)
source_ws = source_wb[source_sheet]
### Create a new workbook and worksheet to copy data to and rename the
### sheet to 'Sheet1'
destination_wb = Workbook()
destination_ws = destination_wb.active
destination_ws.title = 'Sheet1'
### Loop thru the rows and cells in the source sheet
for row in source_ws.iter_rows():
for source_cell in row:
cell_coord = source_cell.coordinate
# Skipping empty cells.
# Otherwise these cells in the destination workbook will be
# filled with the source filename.
if source_cell.value is None:
continue
### Create hyperlink to source cell
hyperlink = Hyperlink(target=source_path,
ref=cell_coord,
location = f'{source_sheet}!{cell_coord}')
### Copy source cell value to the destination sheet
destination_ws.cell(source_cell.row, source_cell.column).value = source_cell.value
### Update destination cell with hyperlink to source cell
destination_ws.cell(source_cell.row, source_cell.column).hyperlink = hyperlink
### Save new workbook specifying file name
destination_wb.save('foo2.xlsx')
###################################################
Change code to put full path to cell...
Instead of adding origin cell and hyperlink to it, set the cell value to the link path. Change the 8 lines from and including
### Create hyperlink to source cell
to
### Set full path to the original cell
destination_ws.cell(source_cell.row, source_cell.column).value = \
f'{source_path}#{source_sheet}!{cell_coord}'
moken's solution is more convenient and reliable.
This is the code to store hyperlink to a cell from a different file:
import os
from openpyxl import load_workbook, Workbook
def main():
input_workbook_path = r"c:\excel_books\input book.xlsx"
output_workbook_path = r"c:\excel_books\output book.xlsx"
input_wb = load_workbook(input_workbook_path)
output_wb = Workbook()
sheet_in = input_wb["Sheet1"]
sheet_out = output_wb["Sheet"]
cell_index = "B12"
anchor = "CLICK HERE"
# =HYPERLINK("[c:\excel_books\input book.xlsx]Sheet1!B12","CLICK HERE")
external_cell_link = f'=HYPERLINK("[{input_workbook_path}]{sheet_in.title}!{cell_index}", "{anchor}")'
sheet_out["A2"].value = external_cell_link
output_wb.save(output_workbook_path)
if __name__ == '__main__':
main()
This code is for getting the value from a cell from a different file
import os
from openpyxl import load_workbook, utils, Workbook
def construct_link(workbook_absolute_path, sheet_name, cell_index):
"""
The function onstructs full path to the cell in the external
book, e.g. - ='c:\excel_books\[input book.xlsx]Sheet1'!C1
"""
# Adding square brackets arround filename in the path.
# Before - c:\excel_books\input book.xlsx
# After - c:\excel_books\[input book.xlsx]
filename = os.path.basename(workbook_absolute_path)
dirname = os.path.dirname(workbook_absolute_path)
full_path = os.path.join(dirname, f"[{filename}]")
return f"={utils.quote_sheetname(full_path + sheet_name)}!{cell_index}"
def main():
input_workbook_path = r"c:\excel_books\input book.xlsx"
output_workbook_path = r"c:\excel_books\output book.xlsx"
input_wb = load_workbook(input_workbook_path)
output_wb = Workbook()
sheet_in = input_wb["Sheet1"]
sheet_out = output_wb["Sheet"]
external_cell_link = construct_link(
input_workbook_path,
sheet_in.title,
"C1")
sheet_out["A2"].value = external_cell_link
output_wb.save(output_workbook_path)
if __name__ == '__main__':
main()
This link might be helpful - Control when external references (links) are updated
I've been trying to edit a .xslx worksheet using python and I have been successfully able to alter the cells value but when I use the save command from openpyxl, close the program, and open the excel spreadsheet, no changes have been saved. I have attached the code below and I would appreciate it if you could help me. I have tried reading other stackoverflow posts but when I try what they suggest it still doesn't work so I've turned to creating my first post here.
def editStock(choice, edit, stockSymbol):
sheet = setup()
stockRow = getStockRow(stockSymbol)
if choice == 7 or choice == 3 or choice == 2 or choice == 1:
print("Before")
print(sheet.cell(row = stockRow, column = choice).value)
sheet.cell(row = stockRow, column = choice).value = edit
print("After")
print(sheet.cell(row = stockRow, column = choice).value)
else:
sheet.cell(row = stockRow, column = choice).value = float(edit)
workbook = getWorkBook()
workbook.save(filename="Stocks.xlsx")
Here's my setup():
def setup():
directory = "C:\\Users\\shrey\\Desktop"
directory = directory.lower()
os.chdir(directory)
spreadname = "Stocks.xlsx"
workbook = openpyxl.load_workbook(spreadname)
sheet = workbook["Sheet1"]
return sheet
Here's my getWorkBook() for reference:
def getWorkBook():
directory = "C:\\Users\\shrey\\Desktop"
directory = directory.lower()
os.chdir(directory)
spreadname = "Stocks.xlsx"
workbook = openpyxl.load_workbook(spreadname)
return workbook
Here's my output when I call editStock():
Before
None
After
Dec-21-2021
And proof that it doesn't work: date is not altered
Sorry, the image is not very clear but the Dec-21-2021 should be right after the 'TSLA'
You should probably actually make a separate test script and share the whole thing, because you probably posted the methods that are working exactly correctly, and people typically will ask for that on stack overflow (I couldn't just grab your code and run it; that should usually be the case)
I wrote this little script to see what the matter was, and for me it worked fine.
But I noticed that there were two sheets called Sheet1. So make sure you are looking at the different sheets (the tabs). Once I figured that out, the data showed up just fine.
This code works when I run it, (including if the file already exists):
import os
import openpyxl
spreadname = "Stocks.xlsx"
sheetname = "THISONE"
if not os.path.exists(spreadname):
workbook = openpyxl.Workbook()
workbook.create_sheet(title=sheetname)
else:
workbook = openpyxl.load_workbook(spreadname)
sheet = workbook[sheetname]
c1 = sheet.cell(row=3, column=6)
c1.value = 123.456
c2 = sheet['B9']
c2.value = 456.321
workbook.save(spreadname)
Specifically, it creates a sheet called "THISONE" and the data is there.
I have a number of HTML files that I need to open up or import into a single Excel Workbook and simply save the Workbook. Each HTML file should be on its own Worksheet inside the Workbook.
My existing code does not work and it crashes on the workbook.Open(html) line and probably will on following lines. I can't find anything searching the web specific to this topic.
import win32com.client as win32
import pathlib as path
def save_html_files_to_worksheets(read_directory):
read_path = path.Path(read_directory)
save_path = read_path.joinpath('Single_Workbook_Containing_HTML_Files.xlsx')
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel_app.Workbooks.Add() # create a new excel workbook
indx = 1 # used to add new worksheets dependent on number of html files
for html in read_path.glob('*.html'): # loop through directory getting html files
workbook.Open(html) # open the html in the newly created workbook - this doesn't work though
worksheet = workbook.Worksheets(indx) # each iteration in loop add new worksheet
worksheet.Name = 'Test' + str(indx) # name added worksheets
indx += 1
workbook.SaveAs(str(save_path), 51) # win32com requires string like path, 51 is xlsx extension
excel_app.Application.Quit()
save_html_files_to_worksheets(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
The following code does half of want I want, if this helps. It will convert each HTML file into a separate Excel file. I need each HTML file in one Excel file with multiple WorkSheets.
import win32com.client as win32
import pathlib as path
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
wb = excel_app.Workbooks.Open(html)
wb.SaveAs(str(save_path), 51)
excel_app.Application.Quit()
save_as_xlsx(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
Here is a link to a sample HTML file you can use, the data in the file is not real: HTML Download Link
One solution would be to open the HTML file into a temporary workbook, and copy the sheet from there into the workbook containing all of them:
workbook = excel_app.Application.Workbooks.Add()
sheet = workbook.Sheets(1)
for path in read_path.glob('*.html'):
workbook_tmp = excel_app.Application.Workbooks.Open(path)
workbook_tmp.Sheets(1).Copy(Before=sheet)
workbook_tmp.Close()
# Remove the redundant 'Sheet1'
excel_app.Application.ShowAlerts = False
sheet.Delete()
excel_app.Application.ShowAlerts = True
I believe pandas will make your job much easier.
pip install pandas
Here's an example on how to get multiple tables from a wikipedia html and input it into a Pandas DataFrame and save it to disk.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_American_films_of_2017"
wikitables = pd.read_html(url, header=0, attrs={"class":"wikitable"})
for idx,df in enumerate(wikitables):
df.to_csv('{}.csv'.format(idx),index=False)
For your use case, something like this should work:
import pathlib as path
import pandas as pd
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
dfs_from_html = pd.read_html(html, header=0,)
for idx, df in enumerate(dfs_from_html):
df.to_excel('{}.xlsx'.format(idx),index=False)
** Make sure to set the correct html attribute in the pd.read_html function.
How about this?
Sub From_XML_To_XL()
'UpdatebyKutoolsforExcel20151214
Dim xWb As Workbook
Dim xSWb As Workbook
Dim xStrPath As String
Dim xFileDialog As FileDialog
Dim xFile As String
Dim xCount As Long
On Error GoTo ErrHandler
Set xFileDialog = Application.FileDialog(msoFileDialogFolderPicker)
xFileDialog.AllowMultiSelect = False
xFileDialog.Title = "Select a folder [Kutools for Excel]"
If xFileDialog.Show = -1 Then
xStrPath = xFileDialog.SelectedItems(1)
End If
If xStrPath = "" Then Exit Sub
Application.ScreenUpdating = False
Set xSWb = ThisWorkbook
xCount = 1
xFile = Dir(xStrPath & "\*.xml")
Do While xFile <> ""
Set xWb = Workbooks.OpenXML(xStrPath & "\" & xFile)
xWb.Sheets(1).UsedRange.Copy xSWb.Sheets(1).Cells(xCount, 1)
xWb.Close False
xCount = xSWb.Sheets(1).UsedRange.Rows.Count + 2
xFile = Dir()
Loop
Application.ScreenUpdating = True
xSWb.Save
Exit Sub
ErrHandler:
MsgBox "no files xml", , "Kutools for Excel"
End Sub
First of all, I am new to python (practically I have learned only from Sololearn, that too only up to half course). So I request you to give me a little bit detailed answer.
My task has following broad steps:-
Delete old .xlsx file(if any)
Convert two .xls files into .xlsx file using win32, delete the first row and then delete .xls file [weird xls files already downloaded into source directory + xlrd,pyexcel show error (unsupported format or corrupt) file in opening .xls file (online analysis of file predicts it to be html/htm) ]
Get data from xlsx file
First, delete old worksheet on google spreadsheet to remove old data. Create a new worksheet with the same name. Insert data into new worksheet on the google spreadsheet.
Open second sheet(which imports data from the first sheet) and update one cell in Dummy Sheet to make sure google spreadsheet is synchronised in the background.
Now, I wrote a code by combining many codes and by using a lot of google.
The code is working fine but it takes on an avg about 65 seconds to complete the whole process.
My question has 3 parts:-
Is there any way to directly access data from .xls file?
Is there any way this code's performance can be improved.
Any other more efficient method for completing the above-said task?
My Code:-
import time
import win32com.client as win32
import os
import openpyxl
from openpyxl.utils import get_column_letter
import gspread
from oauth2client.service_account import ServiceAccountCredentials
start = time.time()
# set input-output file locations
source_dir = "C:\\Users\\XYZ\\Downloads"
output_dir = "C:\\Users\\XYZ\\Excels"
# use creds to create a client to interact with the Google Drive API
# make sure to share files with email contained in json file
scope = ['https://spreadsheets.google.com/feeds']
# code will not work without json file
creds = ServiceAccountCredentials.from_json_keyfile_name("C:\\Users\\XYZ\\your.json", scope)
gc = gspread.authorize(creds)
# following code is to open any spreadsheet by name
sh = gc.open("First Sheet")
def save_as_xlsx(input_file,output_dir,output_file_name) :
# call excel using win32, then open .xls file
# delete first row and then save as .xlsx
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(input_file)
wbk = excel.ActiveWorkbook
sheet = wbk.Sheets(1)
sheet.Rows(1).Delete()
wb.SaveAs(output_dir + '\\' + output_file_name, FileFormat = 51)
#FileFormat = 51 is for .xlsx extension. FileFormat = 56 is for .xls extension
wb.Close()
excel.Application.Quit()
return True
def get_the_data_from_xlsx(output_dir,output_file_name) :
# use openpyxl.load to find out last cell of file
# store cell values in list called data
wb = openpyxl.load_workbook(output_dir + '\\' + output_file_name)
sheet = wb.active
max_row_no = sheet.max_row
max_column_no = sheet.max_column
max_column = get_column_letter(max_column_no)
last_cell = str(max_column) + str(max_row_no)
cell_addresses = sheet['A1' : last_cell]
data = []
for i in cell_addresses :
for e in i :
data.append(e.value)
return (data,last_cell)
def insert_data_into_spreadsheet(name_of_worksheet,data,last_cell) :
# Find a workbook by name in already opened spreadsheet
# delete the worksheet to clear old data
# create worksheet with same name to maintain import connections in sheets.
worksheet = sh.worksheet(name_of_worksheet)
sh.del_worksheet(worksheet)
worksheet = sh.add_worksheet(title=name_of_worksheet, rows="500", cols="30")
# store range of cells for spreadsheet in list named cell_list
cell_list = worksheet.range('A1' + ':' + str(last_cell))
# attach all the values from data list as per the cell_list
a = 0
for cell in cell_list :
cell.value = data[a]
a = a + 1
# update all cells stored in cell_list in one go
worksheet.update_cells(cell_list)
def delete_file(directory,file_initials) :
for filename in os.listdir(directory) :
if filename.startswith(file_initials) :
os.unlink(directory +"\\" + filename)
# check if files are in source_dir
for filename in os.listdir(source_dir) :
# check for file1.xls and set input_file name if any file exists.
if filename.startswith("file1"):
input_file = source_dir + "\\file1.xls"
output_file1 = "output_file1.xlsx"
# detect and delete any old file in output directory
delete_file(output_dir,"output_file1")
if save_as_xlsx(input_file,output_dir,output_file1) == True :
# delete the file from source directory after work is done
delete_file(source_dir,'file1')
# get data from new xlsx file
data_from_xlsx = get_the_data_from_xlsx(output_dir,output_file1)
data_to_spreadsheet = data_from_xlsx[0]
last_cell = data_from_xlsx[1]
# insert updated data into spreadsheet
insert_data_into_spreadsheet("file1_data",data_to_spreadsheet,last_cell)
# repeat the same process for 2nd file
if filename.startswith('file2'):
input_file = source_dir + "\\file2.xls"
output_file2 = "output_file2.xlsx"
delete_file(output_dir,"output_file2")
if save_as_xlsx(input_file,output_dir,output_file2) == True :
delete_file(source_dir,'file2')
data_from_xlsx = get_the_data_from_xlsx(output_dir,output_file2)
data_to_spreadsheet = data_from_xlsx[0]
last_cell = data_from_xlsx[1]
insert_data_into_spreadsheet("file2_data",data_to_spreadsheet,last_cell)
# open spreadsheet by name and open Dummy worksheet
# update one cell to sync the sheet with other sheets
sh = gc.open("second sheet")
worksheet = sh.worksheet("Dummy")
worksheet.update_acell('B1', '=Today()')
end = time.time()
print(end-start)
Suppose I have an excel file excel_file.xlsx and i want to send it to my printer using Python so I use:
import os
os.startfile('path/to/file','print')
My problem is that this only prints the first sheet of the excel workbook but i want all the sheets printed. Is there any way to print the entire workbook?
Also, I used Openpyxl to create the file, but it doesn't seem to have any option to select the number of sheets for printing.
Any help would be greatly appreciated.
from xlrd import open_workbook
from openpyxl.reader.excel import load_workbook
import os
import shutil
path_to_workbook = "/Users/username/path/sheet.xlsx"
worksheets_folder = "/Users/username/path/worksheets/"
workbook = open_workbook(path_to_workbook)
def main():
all_sheet_names = []
for s in workbook.sheets():
all_sheet_names.append(s.name)
for sheet in workbook.sheets():
if not os.path.exists("worksheets"):
os.makedirs("worksheets")
working_sheet = sheet.name
path_to_new_workbook = worksheets_folder + '{}.xlsx'.format(sheet.name)
shutil.copyfile(path_to_workbook, path_to_new_workbook)
nwb = load_workbook(path_to_new_workbook)
print "working_sheet = " + working_sheet
for name in all_sheet_names:
if name != working_sheet:
nwb.remove_sheet(nwb.get_sheet_by_name(name))
nwb.save(path_to_new_workbook)
ws_files = get_file_names(worksheets_folder, ".xlsx")
# Uncomment print command
for f in xrange(0, len(ws_files)):
path_to_file = worksheets_folder + ws_files[f]
# os.startfile(path_to_file, 'print')
print 'PRINT: ' + path_to_file
# remove worksheets folder
shutil.rmtree(worksheets_folder)
def get_file_names(folder, extension):
names = []
for file_name in os.listdir(folder):
if file_name.endswith(extension):
names.append(file_name)
return names
if __name__ == '__main__':
main()
probably not the best approach, but it should work.
As a workaround you can create separate .xlsx files where each has only one spreadsheet and then print them with os.startfile(path_to_file, 'print')
I have had this issue(on windows) and it was solved by using pywin32 module and this code block(in line 5 you can specify the sheets you want to print.)
import win32com.client
o = win32com.client.Dispatch('Excel.Application')
o.visible = True
wb = o.Workbooks.Open('/Users/1/Desktop/Sample.xlsx')
ws = wb.Worksheets([1 ,2 ,3])
ws.printout()
you could embed vBa on open() command to print the excel file to a default printer using xlsxwriter's utility mentioned in this article:
PBPYthon's Embed vBA in Excel
Turns out, the problem was with Microsoft Excel,
os.startfile just sends the file to the system's default app used to open those file types. I just had to change the default to another app (WPS Office in my case) and the problem was solved.
Seems like you should be able to just loop through and change which page is active. I tried this and it did print out every sheet, BUT for whatever reason on the first print it grouped together two sheets, so it gave me one duplicate page for each workbook.
wb = op.load_workbook(filepath)
for sheet in wb.sheetnames:
sel_sheet = wb[sheet]
# find the max row and max column in the sheet
max_row = sel_sheet.max_row
max_column = sel_sheet.max_column
# identify the sheets that have some data in them
if (max_row > 1) & (max_column > 1):
# Creating new file for each sheet
sheet_names = wb.sheetnames
wb.active = sheet_names.index(sheet)
wb.save(filepath)
os.startfile(filepath, "print")