Python openpyxl lose hyperlink when modifying existed files - python

Strangely, when load an existed excel with openpyxl and save it again, the hyperlinks in the file disappears.
Either openpyxl 1.7.2 or the newest 1.8.5 has this problem.
Anyone can help with this problem?
Or is there any better choice than openpyxl?
I know xlrd/xlwt and XlsxWriter, but xlwt doesn't support .xlsx files, and XlsxWriter can't read existed files. I need modify a file many times in my application.
[UPDATED]: Look here. Seems this is bug not yet fixed?
The following code may be helpfull for your test.
#-*- coding: utf-8 -*-
import openpyxl
def create():
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
ws.cell('A1').value = 'Click Me'
ws.cell('A1').hyperlink = 'http://www.google.com'
wb.save('test1.xlsx')
def rewrite():
wb = openpyxl.load_workbook('test1.xlsx')
ws = wb.worksheets[0]
wb.save('test2.xlsx')
if __name__ == '__main__':
create()
rewrite()
[2017-03-07 UPDATED]: The bug has been fixed, and the problem does not exist any more.

Try to use the HYPERLINK function in Excel. That results in a formula and not a value in that cell, but from a user's standpoint it most probably makes no difference:
ws.cell('A1').value = '=HYPERLINK("http://www.google.com","Click Me")'

As an addendum to Cedric's answer, if wanting to use excel's built in hyperlink function directly, you can use the following to format as a link:
'=HYPERLINK("{}", "{}")'.format(link, "Link Name")
Without this formatting, the file didn't open for me without needing repair, which removed the cell values with the attempted hyperlinks.
e.g. ws.cell(row=1, column=1).value = '=HYPERLINK("{}", "{}")'.format(link, "Link Name")

Related

How to fetch Value from Excel cell with formula ? Openpyxl data_Only flag doesn't work properly

I have a automation script, in which openpyxl writes some data into Excel file.
And that Excel file has some Formulas.
On next step i want to fetch that formulated cell value in python using openpyxl or Pandas, but OpenpyXl return as None and pandas return as Nan .
I know about Xlwings, but unfortunately xlwings doesn't work in Linux.
If there are any other workaround and working in Linux, please let me know. Thanks in Advance.
Xlrd module can read the cell value, even it has formula. try the below code
Mark it as solved if it works.
import xlrd
book = xlrd.open_workbook("excel.xlsx")
sheet = book.sheet_by_index(0)
value= sheet.cell_value(1, 1)
print(value)
You probably need to save the document first and then reopen it. You could try using xlwings or the win32 module to save as.

Python Openpyxl - Add column in write_only spreadsheet

I'm using Python and openpyxl library, but, I'm not able to use the insert_cols() function in openpyxl when my spreadsheet is in write_only=True mode. So, basically, I just want to add a new column to my spreadsheet when it's in write_only=True mode.
I'm able to use insert_cols() when loading the workbook by load_workbook(), but, not when I'm using the write_only mode. I have to use the write_only mode because my spreadsheets are quite large.
Any ideas on how to add a new column are appreciated.
Thank you.
This is my code:
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook
wb = load_workbook(filename=r'path\myExcel.xlsx', read_only=True)
ws = wb['PC Details']
wb_output = Workbook(write_only=True)
ws_output = wb_output.create_sheet(title='PC Details')
for row in ws.rows:
rowInCorrectFormat = [cell.value for cell in row]
ws_output.append(rowInCorrectFormat)
for cell in row:
print(cell.value)
### THIS IS THE PART OF THE CODE WHICH DOES NOT WORK
ws_output.insert_cols(12)
ws_output['L5'] = 'OK or NOT GOOD?'
###
wb_output.save(r'path\test_Output_optimized.xlsx')
This is the exact error that I'm getting:
ws_output.insert_cols(12)
AttributeError: 'WriteOnlyWorksheet' object has no attribute 'insert_cols'
The problem here lies in the flag write_only = True. Workbooks created by this flag set to true are different from regular Workbooks as you can look below.
Functions like insert_cols & insert_rows also do not work for such workbooks.
Possible solutions might be to not use this flag or use the ways suggested in the official documentation for adding data to the sheet.
For working with workbooks you might also find this article interesting. https://medium.com/aubergine-solutions/working-with-excel-sheets-in-python-using-openpyxl-4f9fd32de87f
You can read more in the official documentation. https://openpyxl.readthedocs.io/en/stable/optimized.html

Python openpyxl data_only=True returning None

I have a simple excel file:
A1 = 200
A2 = 300
A3 = =SUM(A1:A2)
this file works in excel and shows proper value for SUM, but while using openpyxl module for python I cannot get value in data_only=True mode
Python code from shell:
wb = openpyxl.load_workbook('writeFormula.xlsx', data_only = True)
sheet = wb.active
sheet['A3']
<Cell Sheet.A3> # python response
print(sheet['A3'].value)
None # python response
while:
wb2 = openpyxl.load_workbook('writeFormula.xlsx')
sheet2 = wb2.active
sheet2['A3'].value
'=SUM(A1:A2)' # python response
Any suggestions what am I doing wrong?
It depends upon the provenance of the file. data_only=True depends upon the value of the formula being cached by an application like Excel. If, however, the file was created by openpyxl or a similar library, then it's probable that the formula was never evaluated and, thus, no cached value is available and openpyxl will report None as the value.
I have replicated the issue with Openpyxl and Python.
I am currently using openpyxl version 2.6.3 and Python 3.7.4. Also I am assuming that you are trying to complete an exercise from ATBSWP by Al Sweigart.
I tried and tested Charlie Clark's answer, considering that Excel may indeed cache values. I opened the spreadsheet in Excel, copied and pasted the formula into the same exact cell, and finally saved the workbook. Upon reopening the workbook in Python with Openpyxl with the data_only=True option, and reading the value of this cell, I saw the proper value, 500, instead of the wrong value, the None type.
I hope this helps.
I had the same issue. This may not be the most elegant solution, but this is what worked for me:
import xlwings
from openpyxl import load_workbook
excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('writeFormula.xlsx')
excel_book.save()
excel_book.close()
excel_app.quit()
workbook = load_workbook(filename='writeFormula.xlsx', data_only=True)
I have suggestion to this problem. Convert xlsx file to csv :).
You will still have the original xlsx file. The conversion is done by libreoffice (it is that subprocess.call() line).You can use also Pandas for this as a more pythonic way.
from subprocess import call
from openpyxl import load_workbook
from csv import reader
filename="test"
wb = load_workbook(filename+".xlsx")
spread_range = wb['Sheet1']
#what ever function there is in A1 cell to be evaluated
print(spread_range.cell(row=1,column=1).value)
wb.close()
#this line can be done with subprocess or os.system()
#libreoffice --headless --convert-to csv $filename --outdir $outdir
call("libreoffice --headless --convert-to csv "+filename+".xlsx", shell=True)
with open(filename+".csv", newline='') as f:
reader = reader(f)
data = list(reader)
print(data[0][0])
or
# importing pandas as pd
import pandas as pd
# read an excel file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_excel("Test.xlsx"))
# show the dataframe
df
I hope this helps somebody :-)
Yes, #Beno is right. If you want to edit the file without touching it, you can make a little "robot" that edits your excel file.
WARNING: This is a recursive way to edit the excel file. These libraries are depend on your machine, make sure you set time.sleep properly before continuing the rest of the code.
For instance, I use time.sleep, subprocess.Popen, and pywinauto.keyboard.send_keys, just add random character to any cell that you set, then save it. Then the data_only=True is working perfectly.
for more info about pywinauto.keyboard: pywinauto.keyboard
# import these stuff
import subprocess
from pywinauto.keyboard import send_keys
import time
import pygetwindow as gw
import pywinauto
excel_path = r"C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE"
excel_file_path = r"D:\test.xlsx"
def focus_to_window(window_title=None): # function to focus to window. https://stackoverflow.com/a/65623513/8903813
window = gw.getWindowsWithTitle(window_title)[0]
if not window.isActive:
pywinauto.application.Application().connect(handle=window._hWnd).top_window().set_focus()
subprocess.Popen([excel_path, excel_file_path])
time.sleep(1.5) # wait excel to open. Depends on your machine, set it propoerly
focus_to_window("Excel") # focus to that opened file
send_keys('%{F3}') # excel's name box | ALT+F3
send_keys('AA1{ENTER}') # whatever cell do you want to insert somthing | Type 'AA1' then press Enter
send_keys('Stackoverflow.com') # put whatever you want | Type 'Stackoverflow.com'
send_keys('^s') # save | CTRL+S
send_keys('%{F4}') # exit | ALT+F4
print("Done")
Sorry for my bad english.
As others already mentioned, Openpyxl only reads cashed formula value in data_only mode. I have used PyWin32 to open and save each XLSX file before it's processed by Openpyxl to read the formulas result value. This works for me well, as I don't process large files. This solution will work only if you have MS Excel installed on your PC.
import os
import win32com.client
from openpyxl import load_workbook
# Opening and saving XLSX file, so results for each stored formula can be evaluated and cashed so OpenPyXL can read them.
excel_file = os.path.join(path, file)
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
excel.DisplayAlerts = False # disabling prompts to overwrite existing file
excel.Workbooks.Open(excel_file )
excel.ActiveWorkbook.SaveAs(excel_file, FileFormat=51, ConflictResolution=2)
excel.DisplayAlerts = True # enabling prompts
excel.ActiveWorkbook.Close()
wb = load_workbook(excel_file)
# read your formula values with openpyxl and do other stuff here
I ran into the same issue. After reading through this thread I managed to fix it by simply opening the excel file, making a change then saving the file again. What a weird issue.

How to append to an existing excel sheet with XLWT in Python

I have created an excel sheet using XLWT plugin using Python. Now, I need to re-open the excel sheet and append new sheets / columns to the existing excel sheet. Is it possible by Python to do this?
After investigation today, (2014-2-18) I cannot see a way to read in a XLS file using xlwt. You can only write from fresh. I think it is better to use openpyxl. Here is a simple example:
from openpyxl import Workbook, load_workbook
wb = Workbook()
ws = wb.create_sheet()
ws.title = 'Pi'
ws.cell('F5').value = 3.14156265
wb.save(filename=r'C:\book2.xls')
# Re-opening the file:
wb_re_read = load_workbook(filename=r'C:\book2.xls')
sheet = wb_re_read.get_sheet_by_name('Pi')
print sheet.cell('F5').value
See other examples here: http://pythonhosted.org/openpyxl/usage.html (where this modified example is taken from)
You read in the file using xlrd, and then 'copy' it to an xlwt Workbook using xlutils.copy.copy().
Note that you'll need to install both xlrd and xlutils libraries.
Note also that not everything gets copied over. Things like images and print settings are not copied, for example, and have to be reset.

Extracting Hyperlinks From Excel (.xlsx) with Python

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xlsx files, so I can not use the xlrd hyperlink_map function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):
import openpyxl
wb = openpyxl.load_workbook('testFile.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
r = 0
c = 0
print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id
Output:
test
None
I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?
This is possible with openpyxl:
import openpyxl
wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
# This will fail if there is no hyperlink to target
print(ws.cell(row=2, column=1).hyperlink.target)
Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:
hl_obj = ws.row(col).hyperlink # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
print(hl_obj.display)
print(hl_obj.target)
print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
print(hl_obj) # to see other stuff if you need
FYI, the problem with openpyxl is an actual bug.
And, yes, xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx.
In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'
http://ironpython.net/
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()
wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']
address = ws.Cells(row, col).Hyperlinks.Item(1).Address
A successful solution I've worked with is to install unoconv on the server and implement a
method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.
For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.
import openpyxl
wb = openpyxl.load_workbook('yourfile.xlsx')
ws = wb['Sheet1']
try:
print(ws.cell(row=2, column=1).hyperlink.target)
#This fail if their is no hyperlink
except:
print(ws.cell(row=2, column=1).value)
In order to handle the exception 'message': "'NoneType' object has no attribute 'target'", we can use it in a try/except block. So even if there are no hyperlinks available in the given cell, it will print the content contained in the cell.
If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.

Categories

Resources