Python win32com - Read text in a text box to a cell? - python

I would like to read the text from a text box in an Excel File and save that value to a variable. The problem I am having is with the reading of the TextBox. I have tried several methods, this one showed the most promise, as it does not generate an error, but it does not elicit the desired result either. Any suggestions are appreciated. See code below.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open("C:\\users\\khillstr\\Testing\\Scripts\\Book1.xlsx")
excel.Visible = False
ws = wb.Worksheets
canvas = excel.ActiveSheet.Shapes
for shp in canvas.CanvasItems:
if shp.TextFrame.Characters:
print shp.TextFrame.Characters
else:
print "no"

Canvas has to do with graphics in excel files. I think you want access to the cells. Below is code that prints out each row as a tuple.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open("C:\\users\\khillstr\\Testing\\Scripts\\Book1.xlsx")
excel.Visible = False
sheet = wb.Worksheets(1)
for row in sheet.UsedRange.Value:
print row

To get the text in a textbox object on a sheet you need to use shp.TextFrame.Characters.Caption as the Characters method returns a Characters object and not a string.

import win32com.client as win32
file_name = 'path_to_excel'
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(self.file_name)
excel.Visible = False
sheet = wb.Worksheets(1)
deep = lambda r,c: sheet.Cells(r,c)
print(deep(row_num,col_num))
excel.Application.Quit()
This code will open an excel located at 'path_to_excel' and read a cell located at (Row_Number = row_num, Column_Number = col_num)

Related

Use win32com in python to assign column width value, wrap text and add cell border to Excel file

I have some code to convert an excel file from excel to PDF. Although I know that openpyxl has methods to assign column width value, wrap text and add cell border, I am looking for a way to do it with win32com module. This is because I already have opened the Excel file with win32com and I can save execution time by not having to load the Excel file with openpyxl again.
# Import Module
from win32com import client
# Open Microsoft Excel
excel = client.gencache.EnsureDispatch('Excel.Application')
# Make excel work in the background without appearing
excel.Visible = False
# Read Excel File
wb = excel.Workbooks.Open(r'C:\Spaced out data.xlsx')
ws = wb.Worksheets('Sheet1')
# Adjust page setup to landscape
ws.PageSetup.Orientation = 1
# Set Zoom to false because you want to fit all columns to the width of 1 page.
ws.PageSetup.Zoom = False
# Allow rows to be on multiple pages
ws.PageSetup.FitToPagesTall = False
# Fit all columns to the width of 1 page.
ws.PageSetup.FitToPagesWide = 1
# Convert into PDF File
ws.ExportAsFixedFormat(0, r'C:\Spaced out data.pdf')
wb.Close(SaveChanges=False)
excel.Quit()
My "go to" is to record an Excel macro and use it as a basis to write the Python code. After recording a column width change, enable wrap, and change some borders I came up with this:
from win32com import client
excel = client.gencache.EnsureDispatch('Excel.Application')
excel.Visible = False
wb = excel.Workbooks.Open(r'c:\users\metolone\test.xlsx')
ws = wb.Worksheets('Sheet1')
ws.Range("A:F").ColumnWidth = 10
ws.Range("A1:F1").WrapText = True
ws.Range("A1:F15").Borders(client.constants.xlEdgeLeft).LineStyle = client.constants.xlContinuous
ws.Range("A1:F15").Borders(client.constants.xlEdgeLeft).Weight = client.constants.xlThick
ws.PageSetup.Orientation = 1
ws.PageSetup.Zoom = False
ws.PageSetup.FitToPagesTall = False
ws.PageSetup.FitToPagesWide = 1
ws.ExportAsFixedFormat(0, r'c:\users\metolone\test.pdf')
wb.Close(SaveChanges=False)
excel.Quit()
Set the column width for specific range(row or column) or specific column:
import os
import win32com.client as win32
from win32com.client import Dispatch
file = os.getcwd() + os.sep + 'Portfolio.xls'
excel = Dispatch('Excel.Application')
workbook = excel.Workbooks.Open(file)
worksheet = workbook.Worksheets("Portfolio")
excel.DisplayAlerts = False
excel.Visible = False
worksheet.Range("A1:A14").ColumnWidth = 25 #Specify the rows in range
worksheet.Columns(1).ColumnWidth = 25 #Specific Column number
worksheet.Range("B:B").ColumnWidth = 25
worksheet.Columns.AutoFit() #Use autofit
workbook.Save()
workbook.Close()
excel.Application.Quit()

Importing Multiple HTML Files Into Excel as Separate Worksheets

I have a number of HTML files that I need to open up or import into a single Excel Workbook and simply save the Workbook. Each HTML file should be on its own Worksheet inside the Workbook.
My existing code does not work and it crashes on the workbook.Open(html) line and probably will on following lines. I can't find anything searching the web specific to this topic.
import win32com.client as win32
import pathlib as path
def save_html_files_to_worksheets(read_directory):
read_path = path.Path(read_directory)
save_path = read_path.joinpath('Single_Workbook_Containing_HTML_Files.xlsx')
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel_app.Workbooks.Add() # create a new excel workbook
indx = 1 # used to add new worksheets dependent on number of html files
for html in read_path.glob('*.html'): # loop through directory getting html files
workbook.Open(html) # open the html in the newly created workbook - this doesn't work though
worksheet = workbook.Worksheets(indx) # each iteration in loop add new worksheet
worksheet.Name = 'Test' + str(indx) # name added worksheets
indx += 1
workbook.SaveAs(str(save_path), 51) # win32com requires string like path, 51 is xlsx extension
excel_app.Application.Quit()
save_html_files_to_worksheets(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
The following code does half of want I want, if this helps. It will convert each HTML file into a separate Excel file. I need each HTML file in one Excel file with multiple WorkSheets.
import win32com.client as win32
import pathlib as path
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
wb = excel_app.Workbooks.Open(html)
wb.SaveAs(str(save_path), 51)
excel_app.Application.Quit()
save_as_xlsx(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
Here is a link to a sample HTML file you can use, the data in the file is not real: HTML Download Link
One solution would be to open the HTML file into a temporary workbook, and copy the sheet from there into the workbook containing all of them:
workbook = excel_app.Application.Workbooks.Add()
sheet = workbook.Sheets(1)
for path in read_path.glob('*.html'):
workbook_tmp = excel_app.Application.Workbooks.Open(path)
workbook_tmp.Sheets(1).Copy(Before=sheet)
workbook_tmp.Close()
# Remove the redundant 'Sheet1'
excel_app.Application.ShowAlerts = False
sheet.Delete()
excel_app.Application.ShowAlerts = True
I believe pandas will make your job much easier.
pip install pandas
Here's an example on how to get multiple tables from a wikipedia html and input it into a Pandas DataFrame and save it to disk.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_American_films_of_2017"
wikitables = pd.read_html(url, header=0, attrs={"class":"wikitable"})
for idx,df in enumerate(wikitables):
df.to_csv('{}.csv'.format(idx),index=False)
For your use case, something like this should work:
import pathlib as path
import pandas as pd
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
dfs_from_html = pd.read_html(html, header=0,)
for idx, df in enumerate(dfs_from_html):
df.to_excel('{}.xlsx'.format(idx),index=False)
** Make sure to set the correct html attribute in the pd.read_html function.
How about this?
Sub From_XML_To_XL()
'UpdatebyKutoolsforExcel20151214
Dim xWb As Workbook
Dim xSWb As Workbook
Dim xStrPath As String
Dim xFileDialog As FileDialog
Dim xFile As String
Dim xCount As Long
On Error GoTo ErrHandler
Set xFileDialog = Application.FileDialog(msoFileDialogFolderPicker)
xFileDialog.AllowMultiSelect = False
xFileDialog.Title = "Select a folder [Kutools for Excel]"
If xFileDialog.Show = -1 Then
xStrPath = xFileDialog.SelectedItems(1)
End If
If xStrPath = "" Then Exit Sub
Application.ScreenUpdating = False
Set xSWb = ThisWorkbook
xCount = 1
xFile = Dir(xStrPath & "\*.xml")
Do While xFile <> ""
Set xWb = Workbooks.OpenXML(xStrPath & "\" & xFile)
xWb.Sheets(1).UsedRange.Copy xSWb.Sheets(1).Cells(xCount, 1)
xWb.Close False
xCount = xSWb.Sheets(1).UsedRange.Rows.Count + 2
xFile = Dir()
Loop
Application.ScreenUpdating = True
xSWb.Save
Exit Sub
ErrHandler:
MsgBox "no files xml", , "Kutools for Excel"
End Sub

Print Excel workbook using python

Suppose I have an excel file excel_file.xlsx and i want to send it to my printer using Python so I use:
import os
os.startfile('path/to/file','print')
My problem is that this only prints the first sheet of the excel workbook but i want all the sheets printed. Is there any way to print the entire workbook?
Also, I used Openpyxl to create the file, but it doesn't seem to have any option to select the number of sheets for printing.
Any help would be greatly appreciated.
from xlrd import open_workbook
from openpyxl.reader.excel import load_workbook
import os
import shutil
path_to_workbook = "/Users/username/path/sheet.xlsx"
worksheets_folder = "/Users/username/path/worksheets/"
workbook = open_workbook(path_to_workbook)
def main():
all_sheet_names = []
for s in workbook.sheets():
all_sheet_names.append(s.name)
for sheet in workbook.sheets():
if not os.path.exists("worksheets"):
os.makedirs("worksheets")
working_sheet = sheet.name
path_to_new_workbook = worksheets_folder + '{}.xlsx'.format(sheet.name)
shutil.copyfile(path_to_workbook, path_to_new_workbook)
nwb = load_workbook(path_to_new_workbook)
print "working_sheet = " + working_sheet
for name in all_sheet_names:
if name != working_sheet:
nwb.remove_sheet(nwb.get_sheet_by_name(name))
nwb.save(path_to_new_workbook)
ws_files = get_file_names(worksheets_folder, ".xlsx")
# Uncomment print command
for f in xrange(0, len(ws_files)):
path_to_file = worksheets_folder + ws_files[f]
# os.startfile(path_to_file, 'print')
print 'PRINT: ' + path_to_file
# remove worksheets folder
shutil.rmtree(worksheets_folder)
def get_file_names(folder, extension):
names = []
for file_name in os.listdir(folder):
if file_name.endswith(extension):
names.append(file_name)
return names
if __name__ == '__main__':
main()
probably not the best approach, but it should work.
As a workaround you can create separate .xlsx files where each has only one spreadsheet and then print them with os.startfile(path_to_file, 'print')
I have had this issue(on windows) and it was solved by using pywin32 module and this code block(in line 5 you can specify the sheets you want to print.)
import win32com.client
o = win32com.client.Dispatch('Excel.Application')
o.visible = True
wb = o.Workbooks.Open('/Users/1/Desktop/Sample.xlsx')
ws = wb.Worksheets([1 ,2 ,3])
ws.printout()
you could embed vBa on open() command to print the excel file to a default printer using xlsxwriter's utility mentioned in this article:
PBPYthon's Embed vBA in Excel
Turns out, the problem was with Microsoft Excel,
os.startfile just sends the file to the system's default app used to open those file types. I just had to change the default to another app (WPS Office in my case) and the problem was solved.
Seems like you should be able to just loop through and change which page is active. I tried this and it did print out every sheet, BUT for whatever reason on the first print it grouped together two sheets, so it gave me one duplicate page for each workbook.
wb = op.load_workbook(filepath)
for sheet in wb.sheetnames:
sel_sheet = wb[sheet]
# find the max row and max column in the sheet
max_row = sel_sheet.max_row
max_column = sel_sheet.max_column
# identify the sheets that have some data in them
if (max_row > 1) & (max_column > 1):
# Creating new file for each sheet
sheet_names = wb.sheetnames
wb.active = sheet_names.index(sheet)
wb.save(filepath)
os.startfile(filepath, "print")

Adding Excel Sheets to End of Workbook

I am trying to add excel worksheets to the end of a workbook, reserving the first sheet for a summary.
import win32com.client
Excel = win32com.client.DispatchEx('Excel.Application')
Book = Excel.Workbooks.Add()
Excel.Visible = True
Book.Worksheets(3).Delete()
Book.Worksheets(2).Delete()
Sheet = Book.Worksheets(1)
Sheet.Name = "Summary"
Book.Worksheets.Add(After=Sheet)
Sheet = Book.Worksheets(2)
Sheet.Name = "Data1"
This code adds the new sheet to the left, despite using After=Sheet, and when I modify the sheet named "Data1", it overwrites the sheet named "Summary".
This is similar to this problem:
Adding sheets to end of workbook in Excel (normal method not working?)
but the given solutions don't work for me.
Try using this by adding Before = None:
add = Book.Sheets.Add(Before = None , After = Book.Sheets(book.Sheets.count))
add.Name = "Data1"
Try using Sheet = excelApp.ActiveSheet:
Book.Worksheets.Add(After=Sheet)
Sheet = Book.ActiveSheet
Sheet.Name = "Data1"
import win32com.client as win32
xl = win32.gencache.EnsureDispatch('Excel.Application')
xl.Sheets.Add(After=xl.ActiveSheet).Name ="Name_of_your_Sheet"

python win32 COM closing excel workbook

I open several different workbooks (excel xlsx format) in COM, and mess with them. As the program progresses I wish to close one specific workbook but keep the rest open.
How do I close ONE workbook? (instead of the entire excel application)
xl = Dispatch("Excel.Application")
xl.Visible = False
try:
output = xl.Workbooks.Open(workbookName)
output2 = xl.Workbooks.Open(workbook2Name)
except com_error:
print "you screwed up blahblahblah"
exit()
#work on some stuff
#close output but keep output2 open
The the Workbook COM object has a Close() method. Basically, it should be something like:
xl = Dispatch('Excel.Application')
wb = xl.Workbooks.Open('New Workbook.xlsx')
# do some stuff
wb.Close(True) # save the workbook
The above was just a skeleton here's some code that works on my machine against Office 2010:
from win32com.client import Dispatch
xl = Dispatch('Excel.Application')
wb = xl.Workbooks.Add()
ws = wb.Worksheets.Add()
cell = ws.Cells(1)
cell.Value = 'Some text'
wb.Close(True, r'C:\Path\to\folder\Test.xlsx')
Of course, that creates a new xlsx file. But then I'm able to successfully open and modify the file in the same session as follows:
wb = xl.Workbooks.Open(r'C:\Path\to\folder\Test.xlsx')
ws = wb.Worksheets(1)
cell = ws.Cells(2)
cell.Value = 'Some more text'
wb.Close(True)
Don't know if any of that helps...
You can also try to use the following code:
excel = Dispatch("Excel.Application")
excel.Visible = False
workbook = excel.Workbooks.Open(fileName)
# with saving
excel.DisplayAlerts = False
if saveAs:
excel.ActiveWorkbook.SaveAs(fullFileNameToSave)
else:
excel.ActiveWorkbook.Save()
excel.Quit()
#without saving
map(lambda book: book.Close(False), excel.Workbooks)
excel.Quit()
This function closes any opened excel file
import os
def closeFile():
try:
os.system('TASKKILL /F /IM excel.exe')
except Exception:
print("KU")
closeFile()
def setAutoFilter(self,ws,AmountToMatch):
amounttopass = f"{AmountToMatch}"
print("The Amount of that month is ::",amounttopass)
ws.Range("B:G").AutoFilter(Field=6, Criteria1=amounttopass,VisibleDropDown=False)
time.sleep(timeout)

Categories

Resources