Python - Win32Com - How to extract hyperlink from Excel spreadsheet cell? - python

I'm trying to get hyperlinks from individual cells in an excel spreadsheet with the following code:
import win32com.client
import win32ui
app = win32com.client.Dispatch("Excel.Application")
app.visible = True
workbook = app.Workbooks.Open("test.xlsx")
sheet = workbook.Sheets[0]
test_cell = sheet.Range("A8").value
This prints the following:
test_cell
u'Link title'
But when I try to extract the hyperlink it did not return the link/url in string format but a 'COMObject unknown':
test_cell = sheet.Range("A8").Hyperlinks
test_cell
<COMObject <unknown>>

sheet.Range("A8").Hyperlinks.Item(1).Address

Related

Copy Pivot Table as Picture with python

How does it work to select a pivot table with Python and then copy it to a .png?
I am trying wb.Worksheets("Test").PivotTables(1) but it keeps throwing an error...
win32c = win32.constants
ws.PivotTables("PivotTable1").CopyPicture(Format=win32c.xlBitmap)
img = ImageGrab.grabclipboard()
image_path = 'C:/Prueba/test.png'
img.save(image_path)
excel.Quit()
Error:
AttributeError: '<win32com.gen_py.Microsoft Excel 16.0 Object
Library.PivotTable instance at 0x2569187443096>' object has no
attribute 'CopyPicture'
As mentioned in the comments, you can use Excel automation to copy a pivot table to an image file.
For this code, I used a sample Excel file which can be found here:
https://www.timeatlas.com/wp-content/uploads/pivot_table_example.xlsx
Here is the code to copy the pivot table from the sample file:
from win32com.client import Dispatch
import win32com.client as win32
from PIL import ImageGrab
ExcelFile = 'C:/tmp/pivot_table_example.xlsx'
SheetName = "Sheet2" # sheet with pivot table
ImgFile = 'C:/tmp/test.png'
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True
excel.Workbooks.Open(ExcelFile)
excel.Sheets(SheetName).Select()
excel.Sheets(SheetName).PivotTables(1).TableRange1.Select() # select first pivot table
excel.Sheets(SheetName).PivotTables(1).TableRange1.Copy()
img = ImageGrab.grabclipboard()
img.save(ImgFile)
excel.Quit()

Validation and Drop Down in python

I'm attempting to teach myself python skills and took an awesome tutorial from Giraffe Academy and using some of the skills.
I created a file called country.xlsx and installed xlsxwriter to read, validate and create a dropdown box using this tutorial - https://xlsxwriter.readthedocs.io/example_data_validate.html
When I run or debug the code below,
import xlsxwriter
workbook = xlsxwriter.workbook("Countries.xlsx")
worksheet = workbook.sheet_by_name("Sheet1")
workbook = worksheet.set_column("A:A")
workbook = worksheet.set_column("B:B")
workbook = worksheet.set_column("C:C")
workbook = worksheet.set_column("D:D")
workbook - worksheet.set.row(0, 6)
heading1 = "Continent"
heading2 = "Country"
heading3 = "Capital"
heading4 = "Airline"
workbook = worksheet("A1", {heading1})
workbook = worksheet("B1", {heading2})
workbook = worksheet("C1", {heading3})
workbook = worksheet("D1", {heading4})
txt = "Select from the Dropdown List"
workbook = worksheet.data_validation("B15", {"validate": "list", "source" : "=$A$1:$D$7"})
workbook.close()
I receive this error
workbook = xlsxwriter.workbook('Countries.xlsx')
TypeError: 'module' object is not callable
Can someone point me in the right direction??
Looking at the Documentation it seems like you have to use Workbook with a capital W:
workbook = xlsxwriter.Workbook('Countries.xlsx')

Importing Multiple HTML Files Into Excel as Separate Worksheets

I have a number of HTML files that I need to open up or import into a single Excel Workbook and simply save the Workbook. Each HTML file should be on its own Worksheet inside the Workbook.
My existing code does not work and it crashes on the workbook.Open(html) line and probably will on following lines. I can't find anything searching the web specific to this topic.
import win32com.client as win32
import pathlib as path
def save_html_files_to_worksheets(read_directory):
read_path = path.Path(read_directory)
save_path = read_path.joinpath('Single_Workbook_Containing_HTML_Files.xlsx')
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel_app.Workbooks.Add() # create a new excel workbook
indx = 1 # used to add new worksheets dependent on number of html files
for html in read_path.glob('*.html'): # loop through directory getting html files
workbook.Open(html) # open the html in the newly created workbook - this doesn't work though
worksheet = workbook.Worksheets(indx) # each iteration in loop add new worksheet
worksheet.Name = 'Test' + str(indx) # name added worksheets
indx += 1
workbook.SaveAs(str(save_path), 51) # win32com requires string like path, 51 is xlsx extension
excel_app.Application.Quit()
save_html_files_to_worksheets(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
The following code does half of want I want, if this helps. It will convert each HTML file into a separate Excel file. I need each HTML file in one Excel file with multiple WorkSheets.
import win32com.client as win32
import pathlib as path
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
excel_app = win32.gencache.EnsureDispatch('Excel.Application')
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
wb = excel_app.Workbooks.Open(html)
wb.SaveAs(str(save_path), 51)
excel_app.Application.Quit()
save_as_xlsx(r'C:\Users\<UserName>\Desktop\HTML_FOLDER')
Here is a link to a sample HTML file you can use, the data in the file is not real: HTML Download Link
One solution would be to open the HTML file into a temporary workbook, and copy the sheet from there into the workbook containing all of them:
workbook = excel_app.Application.Workbooks.Add()
sheet = workbook.Sheets(1)
for path in read_path.glob('*.html'):
workbook_tmp = excel_app.Application.Workbooks.Open(path)
workbook_tmp.Sheets(1).Copy(Before=sheet)
workbook_tmp.Close()
# Remove the redundant 'Sheet1'
excel_app.Application.ShowAlerts = False
sheet.Delete()
excel_app.Application.ShowAlerts = True
I believe pandas will make your job much easier.
pip install pandas
Here's an example on how to get multiple tables from a wikipedia html and input it into a Pandas DataFrame and save it to disk.
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_American_films_of_2017"
wikitables = pd.read_html(url, header=0, attrs={"class":"wikitable"})
for idx,df in enumerate(wikitables):
df.to_csv('{}.csv'.format(idx),index=False)
For your use case, something like this should work:
import pathlib as path
import pandas as pd
def save_as_xlsx(read_directory):
read_path = path.Path(read_directory)
for html in read_path.glob('*.html'):
save_path = read_path.joinpath(html.stem + '.xlsx')
dfs_from_html = pd.read_html(html, header=0,)
for idx, df in enumerate(dfs_from_html):
df.to_excel('{}.xlsx'.format(idx),index=False)
** Make sure to set the correct html attribute in the pd.read_html function.
How about this?
Sub From_XML_To_XL()
'UpdatebyKutoolsforExcel20151214
Dim xWb As Workbook
Dim xSWb As Workbook
Dim xStrPath As String
Dim xFileDialog As FileDialog
Dim xFile As String
Dim xCount As Long
On Error GoTo ErrHandler
Set xFileDialog = Application.FileDialog(msoFileDialogFolderPicker)
xFileDialog.AllowMultiSelect = False
xFileDialog.Title = "Select a folder [Kutools for Excel]"
If xFileDialog.Show = -1 Then
xStrPath = xFileDialog.SelectedItems(1)
End If
If xStrPath = "" Then Exit Sub
Application.ScreenUpdating = False
Set xSWb = ThisWorkbook
xCount = 1
xFile = Dir(xStrPath & "\*.xml")
Do While xFile <> ""
Set xWb = Workbooks.OpenXML(xStrPath & "\" & xFile)
xWb.Sheets(1).UsedRange.Copy xSWb.Sheets(1).Cells(xCount, 1)
xWb.Close False
xCount = xSWb.Sheets(1).UsedRange.Rows.Count + 2
xFile = Dir()
Loop
Application.ScreenUpdating = True
xSWb.Save
Exit Sub
ErrHandler:
MsgBox "no files xml", , "Kutools for Excel"
End Sub

Adding Excel Sheets to End of Workbook

I am trying to add excel worksheets to the end of a workbook, reserving the first sheet for a summary.
import win32com.client
Excel = win32com.client.DispatchEx('Excel.Application')
Book = Excel.Workbooks.Add()
Excel.Visible = True
Book.Worksheets(3).Delete()
Book.Worksheets(2).Delete()
Sheet = Book.Worksheets(1)
Sheet.Name = "Summary"
Book.Worksheets.Add(After=Sheet)
Sheet = Book.Worksheets(2)
Sheet.Name = "Data1"
This code adds the new sheet to the left, despite using After=Sheet, and when I modify the sheet named "Data1", it overwrites the sheet named "Summary".
This is similar to this problem:
Adding sheets to end of workbook in Excel (normal method not working?)
but the given solutions don't work for me.
Try using this by adding Before = None:
add = Book.Sheets.Add(Before = None , After = Book.Sheets(book.Sheets.count))
add.Name = "Data1"
Try using Sheet = excelApp.ActiveSheet:
Book.Worksheets.Add(After=Sheet)
Sheet = Book.ActiveSheet
Sheet.Name = "Data1"
import win32com.client as win32
xl = win32.gencache.EnsureDispatch('Excel.Application')
xl.Sheets.Add(After=xl.ActiveSheet).Name ="Name_of_your_Sheet"

Python win32com - Read text in a text box to a cell?

I would like to read the text from a text box in an Excel File and save that value to a variable. The problem I am having is with the reading of the TextBox. I have tried several methods, this one showed the most promise, as it does not generate an error, but it does not elicit the desired result either. Any suggestions are appreciated. See code below.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open("C:\\users\\khillstr\\Testing\\Scripts\\Book1.xlsx")
excel.Visible = False
ws = wb.Worksheets
canvas = excel.ActiveSheet.Shapes
for shp in canvas.CanvasItems:
if shp.TextFrame.Characters:
print shp.TextFrame.Characters
else:
print "no"
Canvas has to do with graphics in excel files. I think you want access to the cells. Below is code that prints out each row as a tuple.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open("C:\\users\\khillstr\\Testing\\Scripts\\Book1.xlsx")
excel.Visible = False
sheet = wb.Worksheets(1)
for row in sheet.UsedRange.Value:
print row
To get the text in a textbox object on a sheet you need to use shp.TextFrame.Characters.Caption as the Characters method returns a Characters object and not a string.
import win32com.client as win32
file_name = 'path_to_excel'
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(self.file_name)
excel.Visible = False
sheet = wb.Worksheets(1)
deep = lambda r,c: sheet.Cells(r,c)
print(deep(row_num,col_num))
excel.Application.Quit()
This code will open an excel located at 'path_to_excel' and read a cell located at (Row_Number = row_num, Column_Number = col_num)

Categories

Resources