Python, openpyxl: I get the wrong value when running get_highest_column() - python

I am practicing with openpyxl and I'm working on an Excel file called 'test.xlsx'. The file only has 3 columns and 7 rows. The .xlsx file was created with LibreOffice.
When I run...
>>> #! python3
>>> import openpyxl
>>> wb = openpyxl.load_workbook('test.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> sheet.get_highest_column()
1025
The returned value should be 3.
A quick Google search suggested I run:
>>> sheet.calculate_dimension()
and got the return value:
'A1:AMK7'
This should only be 'A1:C7'.
I remember reading that LibreOffice could be part of the problem to this.
However, I can't switch to MSOffice, and I hate OpenOffice.
Is there suggestion on how I could fix this, or work around it?
Thanks!

It sounds like you're using older versions of LibreOffice and openpyxl. LibreOffice did used to set a default value of "A1:AMK7" for the dimensions but it version 5 doesn't seem to be doing that any more. openpyxl used to rely on the dimensions tag when reading files but hasn't done this for a while. Please try using openpyxl 2.3-b2

Related

wrap_text with openpyxl. How to use documentation to resolve deprecation warning?

I run the following openpyxl command to wrap text in all rows after row 9. It works fine but throws a deprecation warning. I'd love to figure out how to use documentation such as https://openpyxl.readthedocs.io/en/stable/ to determine the current, non-deprecated, way to wrap_text. But I always find the documentation confusing and unhelpful to me. For example, if I search for wrap_text I get this: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.styles.alignment.html#openpyxl.styles.alignment.Alignment.wrapText
But that tells me nothing about how to wrap text. Do I simply not know know how to use the documentation? Is there some great mystery I am to unravel so I don't have to endlessly google how to use openpyxl? How does one look at such documentation and figure out out how to wrap_text in a cell?
Here is the code:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
file1 = "C:\\folder\\inputFile1.xlsx"
wb=load_workbook(file1)
ws = wb.active
for rows in ws.iter_rows(min_row=10, max_row=None, min_col=None, max_col=None):
for cell in rows:
cell.alignment = cell.alignment.copy(wrapText=True)
wb.save('C:\\folder\file1_wrap.xlsx')
here is the deprecation warning:
C:\Users\Jcurran\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:10: DeprecationWarning: Call to deprecated function copy (Use copy(obj) or cell.obj = cell.obj + other).
Remove the CWD from sys.path while we load stuff.
How might I figure out the way to find the information required to use the current (non-deprecated) approach to wrapping text in cells via the documentation at https://openpyxl.readthedocs.io/en/stable/?
I am using Jupyter for my environment. Shift tab or tab doesn't give me anything useful.
Any suggestions? I crave self sufficiency but can't grasp how to navigate the documentation for answer. There must be some clue somewhere? Some source code perhaps that I do not know how to locate?
I learned that I did not have the latest openpyxl version. pip install openpyxl installed 2.5. I upgraded it to 3.0.
Now when I look at https://openpyxl.readthedocs.io/en/stable/index.html it makes more sense :-)
I now know that the "Working with Styles" section of the openpyxl 3.0 documentation is the place to go for formatting data.
So I click that, and go to https://openpyxl.readthedocs.io/en/stable/styles.html
That page shows me this:
>>> alignment=Alignment(horizontal='general',
... vertical='bottom',
... text_rotation=0,
... wrap_text=False,
... shrink_to_fit=False,
... indent=0)
and I could use that info to wrap text with this line:
cell.alignment = Alignment(wrapText=True)
Now things are starting to make sense for me. :-) Thanks!

Failing to open an Excel file with Python

I'm on a Debian GNU/Linux computer, working with Python 2.7.9.
As a part of my job, I have been making python scripts that read inputs in various formats (e.g. Excel, Csv, Txt) and parse the information to more standarized files. It's not my first time opening or working with Excel files.
There's a particular file which is giving me problems, I just can't open it. When I tried with xlrd (version 0.9.3), it gave me the following error:
xlrd.open_workbook('sample.xls')
XLRDError: Unsupported format, or corrupt file: BOF not
workbook/worksheet: op=0x0009 vers=0x0002 strm=0x000a build=0 year=0
-> BIFF21
I tried to investigate the matter on my own, found a couple of answers in StackOverflow but I couldn't open it anyway. This particular answer I found may be the problem (the second explanation), but it doesn't include a workaround: https://stackoverflow.com/a/16518707/4345659
A tool that could conert the file to csv/txt would also solve the problem.
I already tried with:
xlrd
openpyxl
xlsx2csv (the shell tool)
A sample file is available here:
https://ufile.io/r4m6j
As a side note, I can open it with LibreOffice Calc and MS Excel, so I could eventually change it to csv that way. The thing is, I need to do it all with a python script.
Thanks in advance!
It seems like MS Problem. The xls file is very strange, maybe you should contact xlrd support.
But I have a crazy workaround for you: xls2ods. It works for me even though xls2csv doesn't (SiC!).
So, install catdoc first:
$sudo apt-get install catdoc
Then convert your xls file to ods and open ods using pyexcel_ods or whatever you prefer. To use pyexcel_ods install it first using pip install pyexcel_ods.
import subprocess
from pyexcel_ods import get_data
file_basename = 'sample'
returncode = subprocess.call(['xls2ods', '{}.xls'.format(file_basename)])
if returnecode > 0:
# consider to use subprocess.Popen if you need more control on stderr
exit(returncode)
data = get_data('{}.ods'.format(file_basename))
print(data)
I'm getting following output:
OrderedDict([(u'sample',
[[u'labo',
u'codfarm',
u'farmacia',
u'direccion',
u'localidad',
u'nom_medico',
u'matricula',
u'troquel',
u'producto',
u'cant_total']])])
Here is a kludge I would use:
Assuming you have LibreOffice on Debian, you could either convert all your *.xls files into *.csv using:
import os
os.system("libreoffice --headless --convert-to csv *.xls")
#or use os.call
... and then work consistently with csv.
Or you could convert only the corrupted file(s) when needed using a try/except block:
import os
try:
xlrd.open_workbook('sample.xls')
except XLRDError:
os.system("libreoffice --headless --convert-to csv sample.xls")
# mycsv = open("sample.csv", "r")
# for line in mycsv.readlines():
# ...
# ...
OBS: Keep LibreOffice closed while running the script.
Alternatively there are other tools out there to do the conversion. Here is one (which I have not tested): https://github.com/dilshod/xlsx2csv
If you are targeting windows, if you have Excel installed, and if you are familiar with Excel VBA, you will have a quick solution using the comtypes package:
http://pythonhosted.org/comtypes/
You will have direct access to Excel by its COM interfaces.
This code open an xls file and saves it as a cvs file, using the comtypes package:
import comtypes.client as cl
progId = "Excel.Application.15"
xl = cl.CreateObject(progId)
wb = xl.Workbooks.Open(r"C:\Users\aUser\Desktop\thermoList.xls")
wb.SaveAs(r"C:\Users\aUser\Desktop\thermoList.csv",FileFormat=6)
xl.DisplayAlerts = False
xl.Quit()
I could not test it with "sample.xls" which is corrupt.
Your could try with another file.
You might need to adjust the progId according to your version of Excel.
It's a file format issue. I'm not sure what file type is it but it's not Excel. I just open and saved the file with sample2.xls name and compare the types:
How are you creating this file?
If you need to get the words as a list of strings:
text_file = open("sample.xls", "r")
lines = text_file.read().replace(chr(200), '').replace(chr(0), '').replace(chr(1), '').replace(chr(5), '').replace(chr(2), '').replace(chr(3), '').replace(chr(4), '').replace(chr(6), '').replace(chr(7), '').replace(chr(8), '').replace(chr(9), '').replace(chr(10), '').replace(chr(12), '').replace(chr(15), '').replace(chr(16), '').replace(chr(17), '').replace(chr(18), '').replace(chr(49), '').replace('Arial', '')
for line in lines.split(chr(128)):
print(line)
the output:
The file you provided is corrupted, so there is no way for other responders to test it and recommend a good solution. And exception you posted confirming that.
As a solution you can try to debug some things, please see some steps below:
You mentioned you tried the xlrd library. Try to check if your xlrd module is upto date by executing this:
Python 2.7.9
>>> import xlrd
>>> xlrd.__VERSION
update to the latest official version if needed
Try to open any other *.xls file and see if it works with Python version you're using and current library.
Check module documentation it's pretty good, and there are some different things described how to use this module on various platforms( Win vs. Linux)http://xlrd.readthedocs.io/en/latest/dates.html
You always can rich out to the community (there is still a chance that you might be getting into some weird state or bug) the link is here https://github.com/python-excel/xlrd/issues
Hope that helps.
Unable to open your Excel either. Just as yadayada said, I think it is the problem of data source. If you really want to figure out the reason, I suggest you ask questions about the excel instead of python.
It's always work for me with any xls or xlsx files:
def csv_from_excel(filename_xls, filename_csv):
wb = xlrd.open_workbook(filename_xls, encoding_override='YOUR_ENCODING_HERE (f.e. "cp1251"')
sh = wb.sheet_by_index(0)
your_csv_file = open(filename_csv, 'wb')
wr = unicodecsv.writer(your_csv_file)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()
So, i don't work directly with excel file before convert them to csv. Mb it will help you

Add icon sets to an existing excel file with python

I'm trying to add some icon sets to an existing excel file using python.
The excel file is written using xlsxwriter. As xlsxwriter does not support icon sets, I close the file, reopen it with openpyxl, add the icon sets and save it again. Problem is, that I loose all conditional formatting added previously. Opening the file in openpyxl with "keep_vba=True" results in a non-readable xlsx-File.
Any ideas how to achieve this?
Thanks in advance!
P.S.: Missed some details. Sorry for that. I write xlsx files in both cases (xlsxwriter and openpyxl) and use python 2.7 and the latest versions of openpyxl and xlsxwriter on a windows machine with excel 2013. Icon sets are little symbols like arrows (up, down) which can be used in conditional formatting.
OpenPyXl has a support for conditional formatting and Icon Sets.
See the official documentation: Conditional Formatting > IconSet
Here is an example:
>>> from openpyxl.formatting.rule import IconSet, FormatObject
>>> first = FormatObject(type='percent', val=0)
>>> second = FormatObject(type='percent', val=33)
>>> third = FormatObject(type='percent', val=67)
>>> iconset = IconSet(iconSet='3TrafficLights1', cfvo=[first, second, third], showValue=None, percent=None, reverse=None)
>>> # assign the icon set to a rule
>>> from openpyxl.formatting.rule import Rule
>>> rule = Rule(type='iconSet', iconSet=iconset)

Python 2.7 Openpyxl UserWarning

Why do I receive this warning message every time I run my code? (below). Is it possible to get rid of it? If so, how do I do that?
My code:
from openpyxl import load_workbook
from openpyxl import Workbook
wb = load_workbook('NFL.xlsx', data_only = True)
ws = wb.active
sh = wb["Sheet1"]
ptsDiff = (sh['J127'].value)
print ptsDiff
The code works but I get this warning message:
Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/openpyxl/reader/worksheet.py", line 320
warn(msg)
UserWarning: Unknown extension is not supported and will be removed
This error happens when openpyxl cannot understand/read an extension (source). Here is the list of built-in extensions openpyxl currently knows that is doesn't support:
Conditional Formatting
Data Validation
Sparkline Group
Slicer List
Protected Range
Ignored Error
Web Extension
Slicer List
Timeline Ref
Also see the Worksheet extension list specification.
Try to add single quotes to your data_only parameter like this:
wb = load_workbook('NFL.xlsx', data_only = **'True'**)
This works for me.
Using python 3.5 under Anaconda3, Excel 2016, Windows10 -- I had the same problem initially with an xlsx file. Tried to make it into a csv and did not work. What worked was: select the entire spreadsheet, copy on a Notepad, select the notepad text, paste in a new spreadsheet, save as xslx. It looks like any extra formatting would result in a warning.
It is already listed in the first answer what is wrong with it If you only want to get rid of the error that is given in red for some reason. You can go to the file location of the error and # the line where is says warn(msg) this will stop the error being displayed the code still works fine in my experience.I am not sure if this will work after compiled but this should work in the same machine.
PS:I had the same error and this is what I did because I though it could be confusing for the end user
PS:You can use a try and except error catcher too but this is quicker.

Use Python to Write VBA Script?

This might be a bit of a stretch, but is there a possibility that a python script can be used to create VBA in MS Excel (or any other MS Office product that uses VBA) using pythonwin or any other module.
Where this idea came from was pythons openpyxl modules inability to do column autowidth. The script I have creates a workbook in memory and eventually saves it to disc. There are quite a few sheets and within each sheet, there are quite a few columns. I got to thinking....what if I just use python to import a VBA script (saved somewhere in notepad or something) into the VBA editor in excel and then run that script from python using pythonwin.
Something like:
Workbooks.worksheets.Columns("A:Z").EntireColumn.Autofit
Before you comment, yes I have seen lots of pythonic examples of how to work around auto adjusting columns in openpyxl, but I see some interesting opportunities that can be had utilizing the functionality you get from VBA that may not be available in python.
Anyways, I dug around the internet a bit and I didn't see anything that indicates i can, so i thought I'd ask.
Cheers,
Mike
Yes, it is possible. You can start looking at how you can generate a VBA macro from VB on that Microsoft KB.
The Python code below is illustrating how you can do the same ; it is a basic port of the first half of the KB sample code:
import win32com.client as win32
import comtypes, comtypes.client
xl = win32.gencache.EnsureDispatch('Excel.Application')
xl.Visible = True
ss = xl.Workbooks.Add()
sh = ss.ActiveSheet
xlmodule = ss.VBProject.VBComponents.Add(1) # vbext_ct_StdModule
sCode = '''sub VBAMacro()
msgbox "VBA Macro called"
end sub'''
xlmodule.CodeModule.AddFromString(sCode)
You can look at the visible automated Excel macros, and you will see the VBAMacro defined above.
The top answer will only add the macro, if you actually want to execute it there is one more step.
import win32com.client as win32
xl = win32.gencache.EnsureDispatch('Excel.Application')
xl.Visible = True
ss = xl.Workbooks.Add()
xlmodule = ss.VBProject.VBComponents.Add(1)
xlmodule.Name = 'testing123'
code = '''sub TestMacro()
msgbox "Testing 1 2 3"
end sub'''
xlmodule.CodeModule.AddFromString(code)
ss.Application.Run('testing123.TestMacro')
Adding a module name will help deconflict from any existing scripts.

Categories

Resources