scipy.io.wavfile gives "WavFileWarning: chunk not understood" error - python

I'm trying to read a .wav file using scipy. I do this:
from scipy.io import wavfile
filename = "myWavFile.wav"
print "Processing " + filename
samples = wavfile.read(filename)
And I get this ugly error:
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/wavfile.py:121: WavFileWarning: chunk not understood
warnings.warn("chunk not understood", WavFileWarning)
Traceback (most recent call last):
File "fingerFooler.py", line 15, in <module>
samples = wavfile.read(filename)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/io/wavfile.py", line 127, in read
size = struct.unpack(fmt, data)[0]
struct.error: unpack requires a string argument of length 4
I'm using Python 2.6.6, numpy 1.6.2, and scipy 0.11.0
Here's a wav file that causes the problem.
Any thoughts? What's wrong here?

The files is no longer available (not surprising after 9 months!), but for future reference the most likely cause is that it had extra metadata which scipy can't parse.
In my case, it was default metadata (copyright, track name etc) which was added by Audacity- you can open the file in Audacity and use File ... Open Metadata Editor to see it. Then use the 'Clear' button to strip it, and try again.
The current version of scipy supports the following RIFF chunks - 'fmt', 'fact', 'data' and 'LIST'. The Wikipedia page on RIFF has a bit more detail on how a WAV file is structured, for example yours might have included an unsupported-but-popular INFO chunk

I don't know anything about the WAV file format, but digging into the scipy code it looks like scipy isn't familiar with the chunk that's present towards the end of the file (chunk ID is bext, 2753632 bytes in, if that helps). That chunk is declared as 603 bytes long so it reads past it expecting another chunk ID 603 bytes later -- it doesn't find it (runs out of file) and falls over.
Have you tried it on other WAV files successfully? How was this one generated?

The easiest solution to this problem is to convert the wav file into other wav file using SoX.
$ sox wavfile.wav wavfile2.wav
Works for me!

I had the same error and could successfully convert to what it can read.
My original file was from Logic Pro. Then I used audacity to read the file.

I also got this error because of (presumably) metadata introduced by Audacity. I exported my wav file from another DAW (Ableton Live), and scipy.io.wavfile loaded it without error.

Solved this problem when exporting from Reaper:
simply deselect "Write BWF ('bext') chunk" in the Render to File window.

Related

How to open smile file

I want to export some data to an app that is installed on my mobile phone. So I exported some dummy data in the app, in order to investigate how I can produce data to import.
First step: It's a gzipped file. No problem, that is what gunzip is for.
Second step:
$ file export
export: Smile binary data version 0: binary encoded, shared String values disabled, shared field names
I have never heard of a smile file (which is quite ugly to google because of the emoticons), but I found pySmile. Problem: I am not even a noob regarding python. To be more specific: I don't know anything about python.
But I tried it anyways.
import pysmile
import sys
f = open(sys.argv[1],'r')
a = f.read()
print repr(a)
o=pysmile.decode(a)
print o
This worked pretty well with a smile file I generated myself, but with the given export smile file I get the following error:
Traceback (most recent call last):
File "dec.py", line 7, in <module>
o=pysmile.decode(a)
File "/usr/local/lib/python2.7/dist-packages/pysmile/decode.py", line 224, in decode
state.copy_shared_value_string()
File "/usr/local/lib/python2.7/dist-packages/pysmile/decode.py", line 151, in copy_shared_value_string
raise SMILEDecodeError('Cannot lookup shared value, sharing disabled!')
pysmile.decode.SMILEDecodeError: Cannot lookup shared value, sharing disabled!
After that I tried to investigate where the difference between the two files is:
export: Smile binary data version 0: binary encoded, shared String values disabled, shared field names enabled
dummyf: Smile binary data version 0: binary encoded, shared String values enabled, shared field names enabled
This in addition to the Error Trace lead me to my question: How can I enable sharing in pysmile (decode and encode), and is there another python-free method to convert a smile file to a text file and (which is even more important) the other way around?

AIFF-C file cannot be read with aifc module in python

I am trying to read a compressed .aiff file stored on my local directory. I get this;
>>>import aifc
>>>s = aifc.open('/Users/machinename/Desktop/folder/AudioTrack.aiff','r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 942, in open
return Aifc_read(f)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 347, in __init__
self.initfp(f)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 317, in initfp
self._read_comm_chunk(chunk)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/aifc.py", line 497, in _read_comm_chunk
raise Error, 'cannot read compressed AIFF-C files'
aifc.Error: cannot read compressed AIFF-C files
>>>
I believe there must be a workaround for this. Here you can see aifc is supports aiff-c files as well.
A simple question, yet I could not find a solution on the web.
Old post, but... There seem to be two possible issues with this.
1 - You might need to pip install cl. If AIFC fails to import the cl module, it'll report the error you mention.
2 - There seems to be a bug in the aifc.py source (at least the one I found) where it expects uncompressed files to specify compression as 'NONE'. However some files seem to report 'raw ' (notice the extra space at the end) and AIFC does not recognize this as a compression format.
you might find that scikits.audiolab (requires mega-nerd.com/libsndfile/ is installed) does what you need. For example, I recently needed to get the duration of an .aif file (in seconds):
import scikits.audiolab
aiff_file = scikits.audiolab.Sndfile('best_song_ever.aif')
print aiff_file.nframes / float(aiff_file.samplerate)
You can do a bunch of other cool stuff too (Full API docs).
I hope that helps!

Is there a way to determine in Python (or other language) to see if a JPG image is corrupt?

I was wondering if there was a way to determine in Python (or another language) to open a JPEG file, and determine whether or not it is corrupt (for instance, if I terminate a download for a JPG file before it completes, then I am unable to open the file and view it)? Are there libraries that allow this to be done easily?
You can try using PIL. But just opening a truncated JPG file won't fail, and neither will the verify method. Trying to load it will raise an exception, though;
First we mangle a good jpg file:
> du mvc-002f.jpg
56 mvc-002f.jpg
> dd if=mvc-002f.jpg of=broken.jpg bs=1k count=20
20+0 records in
20+0 records out
20480 bytes transferred in 0.000133 secs (154217856 bytes/sec)
Then we try the Python Imaging Library:
>>> import Image
>>> im = Image.open('broken.jpg')
>>> im.verify()
>>> im = Image.open('broken.jpg') # im.verify() invalidates the file pointer
>>> im.load()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/PIL/ImageFile.py", line 201, in load
raise IOError("image file is truncated (%d bytes not processed)" % len(b))
IOError: image file is truncated (16 bytes not processed)
As user827992 said, even a truncated image can usually still be partially decoded and shown.
You could do it using PIL package:
import Image
def is_image_ok(fn):
try:
Image.open(fn)
return True
except:
return False
I don't think so.
The JPEG standard is more like a container rather than a standard about the implementation.
The word corrupted usually mean that the file no longer represent the original data but most of the time can still be decoded, it will produce an undefined output, not the one that is supposed to produce, but putted in a JPEG decoder most likely it is going to output something, also since there is no way to associate an unique bit arrangement to the JPEG file format you can't do this programmatically, you don't have a specific pattern and even if you have it you can't say that a bit is the wrong place or is missing without knowing what is the original content when only parsing the actual file.
Also the header of the file can be corrupted but in this case your file is probably designated as corrupted without caring about "what is", is corrupted as any generic file can be.

xlrd: struct.error: unpack requires a string argument of length 512

I was using xlrd 0.6.1 and 0.7.1 to open my xls files the both complained:
Traceback (most recent call last):
File "../../xls2csv.py", line 53, in <module>
book = xlrd.open_workbook(args[0])
File "build/bdist.linux-i686/egg/xlrd/__init__.py", line 366, in open_workbook
File "build/bdist.linux-i686/egg/xlrd/__init__.py", line 760, in __init__
File "build/bdist.linux-i686/egg/xlrd/compdoc.py", line 149, in __init__
struct.error: unpack requires a string argument of length 512
I googled around and found this advice helped:
open the xls file with open office and save to a new file. the problem will go away.
Just in case someone else got the same problem, I post it here.
If you have an xls file that opens OK in Excel, OpenOffice Calc, or Gnumeric, but isn't opened by xlrd, then you should e-mail the xlrd author (sjmachin at lexicon dot net) with the details and a copy of the file, so that xlrd can be improved; this will benefit you and all other xlrd users.
Update after examining the source:
The stack trace that you supplied was from the antique 0.6.1 version; why on earth are you using that?
According to my reading of the code, xlrd should have emitted a message like this: `WARNING * file size (SIZE) not 512 + multiple of sector size (512)' ... did it?
This is already out of spec. Often the cause is that the data payload (the Workbook stream) is not a multiple of 512 bytes, it is the last structure written, and the writer has not bothered to pad it out. In that case it is safe to continue, as the missing padding will not be accessed.
However, in your case where xlrd falls off the end of the file it is following a chain of index sectors (MS calls it the "double indirect FAT") that is used when the file size is bigger than about 7 MB. The last 4 bytes in each of those sectors contains the sector number of the next sector in the chain (or a special end-of-chain value). Consequently if one of those sectors is shorter than 512 bytes, the file is corrupt. Recovering from that without even a warning message is NOT something that I'd call good behaviour, and NOT something I'd be advocating SO users to rely on.
Please contact me via e-mail to discuss how I can get a copy of this file (under a non-disclosure agreement, if necessary).
I came across this issue too when running xlrd on a procedurally created XLS from a provider.
My solution was to run libreoffice to convert the file, afterwhich, I could use xlrd successfully on the file!
libreoffice --headless --convert-to xls --outdir converted original/not_working.xls
Which I did in Python3 by:
from subprocess import call
call(["libreoffice", "--headless",
"--convert-to", "xls",
"--outdir", "converted" , "original/not_working.xls"])
Sources:
https://unix.stackexchange.com/questions/354043/convert-xlsx-to-xls-in-linux-shell-script#354054
https://www.computerhope.com/forum/index.php?topic=160219.0

Python decompressing gzip chunk-by-chunk

I've a memory- and disk-limited environment where I need to decompress the contents of a gzip file sent to me in string-based chunks (over xmlrpc binary transfer). However, using the zlib.decompress() or zlib.decompressobj()/decompress() both barf over the gzip header. I've tried offsetting past the gzip header (documented here), but still haven't managed to avoid the barf. The gzip library itself only seems to support decompressing from files.
The following snippet gives a simplified illustration of what I would like to do (except in real life the buffer will be filled from xmlrpc, rather than reading from a local file):
#! /usr/bin/env python
import zlib
CHUNKSIZE=1000
d = zlib.decompressobj()
f=open('23046-8.txt.gz','rb')
buffer=f.read(CHUNKSIZE)
while buffer:
outstr = d.decompress(buffer)
print(outstr)
buffer=f.read(CHUNKSIZE)
outstr = d.flush()
print(outstr)
f.close()
Unfortunately, as I said, this barfs with:
Traceback (most recent call last):
File "./test.py", line 13, in <module>
outstr = d.decompress(buffer)
zlib.error: Error -3 while decompressing: incorrect header check
Theoretically, I could feed my xmlrpc-sourced data into a StringIO and then use that as a fileobj for gzip.GzipFile(), however, in real life, I don't have memory available to hold the entire file contents in memory as well as the decompressed data. I really do need to process it chunk-by-chunk.
The fall-back would be to change the compression of my xmlrpc-sourced data from gzip to plain zlib, but since that impacts other sub-systems I'd prefer to avoid it if possible.
Any ideas?
gzip and zlib use slightly different headers.
See How can I decompress a gzip stream with zlib?
Try d = zlib.decompressobj(16+zlib.MAX_WBITS).
And you might try changing your chunk size to a power of 2 (say CHUNKSIZE=1024) for possible performance reasons.
I've got a more detailed answer here: https://stackoverflow.com/a/22310760/1733117
d = zlib.decompressobj(zlib.MAX_WBITS|32)
per documentation this automatically detects the header (zlib or gzip).

Categories

Resources