xlrd: struct.error: unpack requires a string argument of length 512 - python

I was using xlrd 0.6.1 and 0.7.1 to open my xls files the both complained:
Traceback (most recent call last):
File "../../xls2csv.py", line 53, in <module>
book = xlrd.open_workbook(args[0])
File "build/bdist.linux-i686/egg/xlrd/__init__.py", line 366, in open_workbook
File "build/bdist.linux-i686/egg/xlrd/__init__.py", line 760, in __init__
File "build/bdist.linux-i686/egg/xlrd/compdoc.py", line 149, in __init__
struct.error: unpack requires a string argument of length 512
I googled around and found this advice helped:
open the xls file with open office and save to a new file. the problem will go away.
Just in case someone else got the same problem, I post it here.

If you have an xls file that opens OK in Excel, OpenOffice Calc, or Gnumeric, but isn't opened by xlrd, then you should e-mail the xlrd author (sjmachin at lexicon dot net) with the details and a copy of the file, so that xlrd can be improved; this will benefit you and all other xlrd users.
Update after examining the source:
The stack trace that you supplied was from the antique 0.6.1 version; why on earth are you using that?
According to my reading of the code, xlrd should have emitted a message like this: `WARNING * file size (SIZE) not 512 + multiple of sector size (512)' ... did it?
This is already out of spec. Often the cause is that the data payload (the Workbook stream) is not a multiple of 512 bytes, it is the last structure written, and the writer has not bothered to pad it out. In that case it is safe to continue, as the missing padding will not be accessed.
However, in your case where xlrd falls off the end of the file it is following a chain of index sectors (MS calls it the "double indirect FAT") that is used when the file size is bigger than about 7 MB. The last 4 bytes in each of those sectors contains the sector number of the next sector in the chain (or a special end-of-chain value). Consequently if one of those sectors is shorter than 512 bytes, the file is corrupt. Recovering from that without even a warning message is NOT something that I'd call good behaviour, and NOT something I'd be advocating SO users to rely on.
Please contact me via e-mail to discuss how I can get a copy of this file (under a non-disclosure agreement, if necessary).

I came across this issue too when running xlrd on a procedurally created XLS from a provider.
My solution was to run libreoffice to convert the file, afterwhich, I could use xlrd successfully on the file!
libreoffice --headless --convert-to xls --outdir converted original/not_working.xls
Which I did in Python3 by:
from subprocess import call
call(["libreoffice", "--headless",
"--convert-to", "xls",
"--outdir", "converted" , "original/not_working.xls"])
Sources:
https://unix.stackexchange.com/questions/354043/convert-xlsx-to-xls-in-linux-shell-script#354054
https://www.computerhope.com/forum/index.php?topic=160219.0

Related

TypeError while using Openpyxl to read file

I was trying out Openpyxl, and wrote the following code:
from openpyxl import load_workbook, __version__
workbook = load_workbook(filename="Contacts.xlsx")
sheet = workbook.active
That's it, no other code.
I got the following error on running:
(TL;DR, due to line 2, I got
"TypeError: __init__() got an unexpected keyword argument 'extLst'")
"C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\Scripts\python.exe" "C:/Users/taijee/Documents/PythonAll/Python Scripts/newfile.py"
Traceback (most recent call last):
File "C:/Users/taijee/Documents/PythonAll/Python Scripts/newfile.py", line 2, in <module>
workbook = load_workbook(filename="Contacts.xlsx")
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader.read()
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\reader\excel.py", line 279, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\stylesheet.py", line 192, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\stylesheet.py", line 102, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\serialisable.py", line 83, in from_tree
obj = desc.from_tree(el)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in from_tree
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in <listcomp>
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\fills.py", line 64, in from_tree
return PatternFill._from_tree(child)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\fills.py", line 102, in _from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'extLst'
Process finished with exit code 1
I've searched for similar problems and found one, but it was from 2014 so the solution given was to download version 2.0 of Openpyxl instead of 2.2.
The openpyxl version is 3.0.4, python version 3.8.3, and Contacts.xlsx definitely exists in the same folder.
Edit:
Making a new file had worked, but I recently made another program using openpyxl. It was working fine but it suddenly broke, giving an error very similar to the one above (I didn't record it at the time, but the last line TypeError: __init__() got an unexpected keyword argument 'extLst' definitely matched).
I had made no changes to the program between the successful runs and the unsuccessful ones, so I concluded the file was at fault. I had opened and closed the file once, though I don't remember whether the very next run was the one when it broke. After the error appeared, when I opened the file I couldn't save any changes - it gave some sort of error.
Note: I don't have Office, so I'm using Planmaker of the SoftMaker Office Suite to open the .xlsx files when I need to.
I deleted the file and created a new one with the same name so as to not change the code, but the problems persisted with the new file, including the inability to save changes.
Only when I created a new file with a different name did the code work.
It may not even be an openpyxl problem, but rather a Planmaker problem, though I don't understand why the error would be the one I keep getting.
Neverthless, If someone could explain why this happens or how to fix it, I'll be really grateful.
Meanwhile, I'll see if a file I never use openpyxl on will still give me the same problem, and if it does, whether openpyxl gives the same error with that file when used in a program.
I too was using Planmaker of the SoftMaker Office Suite, got the same error when I try to edit one of my .xlsx with openpyxl. The only solution that I found is by not using the Planmaker and go back to MS Excel.
As much as I don't like MS Excel like the next guy, but it works flawlessly with openpyxl.
Probably not the solution you seek, but it's a solution for me at least.
I got the same error "
TypeError: init() got an unexpected keyword argument 'extLst'
" because I changed and saved a tabular workbook file in xlsx format with Planmaker software.
Thereafter I saved this file with libreoffice calc. Then loading of this xlsx file with openpyxl was working.
This might be a workaround for some users.
Just a note, in case others stumble over this: openpyxl in the latest version (3.0.10) seems to now raise an error on the "unexpected keyword". Before you could using the warnings-module to ignore it. No more.
Since I am working on Linux, going to MS Excel is not an option, so I rewrote my latest Python project to use pylightxl instead of openpyxl. It more hassle, as it has a lot less functionality, but it does not choke on Softwaker's extensions.
– Hendrik

How to open smile file

I want to export some data to an app that is installed on my mobile phone. So I exported some dummy data in the app, in order to investigate how I can produce data to import.
First step: It's a gzipped file. No problem, that is what gunzip is for.
Second step:
$ file export
export: Smile binary data version 0: binary encoded, shared String values disabled, shared field names
I have never heard of a smile file (which is quite ugly to google because of the emoticons), but I found pySmile. Problem: I am not even a noob regarding python. To be more specific: I don't know anything about python.
But I tried it anyways.
import pysmile
import sys
f = open(sys.argv[1],'r')
a = f.read()
print repr(a)
o=pysmile.decode(a)
print o
This worked pretty well with a smile file I generated myself, but with the given export smile file I get the following error:
Traceback (most recent call last):
File "dec.py", line 7, in <module>
o=pysmile.decode(a)
File "/usr/local/lib/python2.7/dist-packages/pysmile/decode.py", line 224, in decode
state.copy_shared_value_string()
File "/usr/local/lib/python2.7/dist-packages/pysmile/decode.py", line 151, in copy_shared_value_string
raise SMILEDecodeError('Cannot lookup shared value, sharing disabled!')
pysmile.decode.SMILEDecodeError: Cannot lookup shared value, sharing disabled!
After that I tried to investigate where the difference between the two files is:
export: Smile binary data version 0: binary encoded, shared String values disabled, shared field names enabled
dummyf: Smile binary data version 0: binary encoded, shared String values enabled, shared field names enabled
This in addition to the Error Trace lead me to my question: How can I enable sharing in pysmile (decode and encode), and is there another python-free method to convert a smile file to a text file and (which is even more important) the other way around?

xlrd - issue on opening file

I'm using xlrd 0.9.4 and I would like verify if the file that I must open is valid.
To do this, I wrote this code in according with this question:
try:
book = xlrd.open_workbook(file_path)
print "Done"
except XLRDError:
print "Wrong type of file."
where file_path is the path of my file.
This works fine, the problem is the following. First of all I have a valid .xls file, so script prints Done. Now, assume that the valid .xls file is renamed (also extension), for example from test.xls to test.txt.
If I run the script, i have the same result (Done).
Instead, if I use a "real" .txt file (empty or with some text), the script prints Wrong type of file.
This behavior happens because the "structure" of the file is not changed? Am I doing something wrong? There is another type of Exception that I can add to except branch?
Thanks in advance
You can see how to xlrd check the file before reading. In xldr source at lines 18-19 defined a «magic» bytes. First bytes of file compared with this byte sequence at line 85. If its not equal exception will be rise. File extention not involved.
Signatures for different file types can be found there.

AIFF-C file cannot be read with aifc module in python

I am trying to read a compressed .aiff file stored on my local directory. I get this;
>>>import aifc
>>>s = aifc.open('/Users/machinename/Desktop/folder/AudioTrack.aiff','r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 942, in open
return Aifc_read(f)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 347, in __init__
self.initfp(f)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 317, in initfp
self._read_comm_chunk(chunk)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 497, in _read_comm_chunk
raise Error, 'cannot read compressed AIFF-C files'
aifc.Error: cannot read compressed AIFF-C files
>>>
I believe there must be a workaround for this. Here you can see aifc is supports aiff-c files as well.
A simple question, yet I could not find a solution on the web.
Old post, but... There seem to be two possible issues with this.
1 - You might need to pip install cl. If AIFC fails to import the cl module, it'll report the error you mention.
2 - There seems to be a bug in the aifc.py source (at least the one I found) where it expects uncompressed files to specify compression as 'NONE'. However some files seem to report 'raw ' (notice the extra space at the end) and AIFC does not recognize this as a compression format.
you might find that scikits.audiolab (requires mega-nerd.com/libsndfile/ is installed) does what you need. For example, I recently needed to get the duration of an .aif file (in seconds):
import scikits.audiolab
aiff_file = scikits.audiolab.Sndfile('best_song_ever.aif')
print aiff_file.nframes / float(aiff_file.samplerate)
You can do a bunch of other cool stuff too (Full API docs).
I hope that helps!

scipy.io.wavfile gives "WavFileWarning: chunk not understood" error

I'm trying to read a .wav file using scipy. I do this:
from scipy.io import wavfile
filename = "myWavFile.wav"
print "Processing " + filename
samples = wavfile.read(filename)
And I get this ugly error:
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/wavfile.py:121: WavFileWarning: chunk not understood
warnings.warn("chunk not understood", WavFileWarning)
Traceback (most recent call last):
File "fingerFooler.py", line 15, in <module>
samples = wavfile.read(filename)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/wavfile.py", line 127, in read
size = struct.unpack(fmt, data)[0]
struct.error: unpack requires a string argument of length 4
I'm using Python 2.6.6, numpy 1.6.2, and scipy 0.11.0
Here's a wav file that causes the problem.
Any thoughts? What's wrong here?
The files is no longer available (not surprising after 9 months!), but for future reference the most likely cause is that it had extra metadata which scipy can't parse.
In my case, it was default metadata (copyright, track name etc) which was added by Audacity- you can open the file in Audacity and use File ... Open Metadata Editor to see it. Then use the 'Clear' button to strip it, and try again.
The current version of scipy supports the following RIFF chunks - 'fmt', 'fact', 'data' and 'LIST'. The Wikipedia page on RIFF has a bit more detail on how a WAV file is structured, for example yours might have included an unsupported-but-popular INFO chunk
I don't know anything about the WAV file format, but digging into the scipy code it looks like scipy isn't familiar with the chunk that's present towards the end of the file (chunk ID is bext, 2753632 bytes in, if that helps). That chunk is declared as 603 bytes long so it reads past it expecting another chunk ID 603 bytes later -- it doesn't find it (runs out of file) and falls over.
Have you tried it on other WAV files successfully? How was this one generated?
The easiest solution to this problem is to convert the wav file into other wav file using SoX.
$ sox wavfile.wav wavfile2.wav
Works for me!
I had the same error and could successfully convert to what it can read.
My original file was from Logic Pro. Then I used audacity to read the file.
I also got this error because of (presumably) metadata introduced by Audacity. I exported my wav file from another DAW (Ableton Live), and scipy.io.wavfile loaded it without error.
Solved this problem when exporting from Reaper:
simply deselect "Write BWF ('bext') chunk" in the Render to File window.

Categories

Resources