OSError: Unable to open file (File signature not found) h5 file - python

I am trying to read an h5 file
# Reads the Training data file. However, this just reads the speaker list data required for encoding the targets with scikit-learn block "LabelEncoder".
dataServer = h5py.File('Librispeech_960_train_list.h5', 'r')
sLabels=dataServer['lSpeaker'][:]
encoder_spk = LabelEncoder()
encoder_spk.fit(sLabels)
num_spk_class=np.unique(sLabels).shape[0]
but i get this error :
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
I get this OSError and I don't know how to solve it

Related

Read h5 files through a NAS

I'm reading h5 files from remote, previously it was on a server but more recently I had to store some on a NAS device. When I'm trying to read the ones on the NAS, for some of them I have the following error:
HDF5ExtError: HDF5 error back trace
File "C:\ci\hdf5_1593121603621\work\src\H5Dio.c", line 199, in H5Dread
can't read data
File "C:\ci\hdf5_1593121603621\work\src\H5Dio.c", line 603, in H5D__read
can't read data
File "C:\ci\hdf5_1593121603621\work\src\H5Dcontig.c", line 621, in H5D__contig_read
contiguous read failed
File "C:\ci\hdf5_1593121603621\work\src\H5Dselect.c", line 283, in H5D__select_read
read error
File "C:\ci\hdf5_1593121603621\work\src\H5Dselect.c", line 218, in H5D__select_io
read error
File "C:\ci\hdf5_1593121603621\work\src\H5Dcontig.c", line 956, in H5D__contig_readvv
can't perform vectorized sieve buffer read
File "C:\ci\hdf5_1593121603621\work\src\H5VM.c", line 1500, in H5VM_opvv
can't perform operation
File "C:\ci\hdf5_1593121603621\work\src\H5Dcontig.c", line 753, in H5D__contig_readvv_sieve_cb
block read failed
File "C:\ci\hdf5_1593121603621\work\src\H5Fio.c", line 118, in H5F_block_read
read through page buffer failed
File "C:\ci\hdf5_1593121603621\work\src\H5PB.c", line 732, in H5PB_read
read through metadata accumulator failed
File "C:\ci\hdf5_1593121603621\work\src\H5Faccum.c", line 260, in H5F__accum_read
driver read request failed
File "C:\ci\hdf5_1593121603621\work\src\H5FDint.c", line 205, in H5FD_read
driver read request failed
File "C:\ci\hdf5_1593121603621\work\src\H5FDsec2.c", line 725, in H5FD_sec2_read
file read failed: time = Tue May 10 11:37:06 2022
, filename = 'Y:/myFolder\myFile.h5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0000020F03F14040, total read size = 16560000, bytes this sub-read = 16560000, bytes actually read = 18446744073709551615, offset = 480252764
End of HDF5 error back trace
Problems reading the array data.
I don't really understand the error, it always happend for the same files, but I can open the file and read the data myself with HDFView. If I put it on the server I can read it without problem with the same lines of code (path is correct for both):
hdf5store = pd.HDFStore(myPath[fich])
datacopy = hdf5store['my_data']
Btw the error occurs at this 2nd line of code. Right now I don't have access to the server and can't copy the file on local because I don't have enough space. If anyone know how to correct this so I could continue to work through the NAS ?

How can read Minecraft .mca files so that in python I can extract individual blocks?

I can't find a way of reading the Minecraft world files in a way that i could use in python
I've looked around the internet but can find no tutorials and only a few libraries that claim that they can do this but never actually work
from nbt import *
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
I expected this to work but instead I got errors about the file not being compressed or something of the sort
Full error:
Traceback (most recent call last):
File "C:\Users\rober\Desktop\MinePy\MinecraftWorldReader.py", line 2, in <module>
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 628, in __init__
self.parse_file()
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 652, in parse_file
type = TAG_Byte(buffer=self.file)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 99, in __init__
self._parse_buffer(buffer)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 105, in _parse_buffer
self.value = self.fmt.unpack(buffer.read(self.fmt.size))[0]
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 276, in read
return self._buffer.read(size)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 463, in read
if not self._read_gzip_header():
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x00\x00')
Use anvil parser. (Install with pip install anvil-parser)
Reading
import anvil
region = anvil.Region.from_file('r.0.0.mca')
# You can also provide the region file name instead of the object
chunk = anvil.Chunk.from_region(region, 0, 0)
# If `section` is not provided, will get it from the y coords
# and assume it's global
block = chunk.get_block(0, 0, 0)
print(block) # <Block(minecraft:air)>
print(block.id) # air
print(block.properties) # {}
https://pypi.org/project/anvil-parser/
According to this page, the .mca files is not totally kind of of NBT file. It begins with an 8KiB header which includes the offsets of chunks in the region file itself and the timestamps for the last updates of those chunks.
I recommend you to see the offical announcement and this page for more information.

Tablib xlsx file badZip file issue

I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'
Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.

zipfile.BadZipfile: Bad CRC-32 for file | Read only file

Got a read-only file within a zip file which are password protected and I need to extract it to the /tmp directory.
I get a CRC-32 error which suggests that the file would be corrupted yet I know it isn't and is in fact a read-only file. Any Suggestions?
Error:
Traceback (most recent call last):
File "/tmp/usercode.py", line 45, in <module>
zip.extractall('/tmp',pwd = "piso")
File "/usr/lib64/python2.7/zipfile.py", line 1040, in extractall
self.extract(zipinfo, path, pwd)
File "/usr/lib64/python2.7/zipfile.py", line 1028, in extract
return self._extract_member(member, path, pwd)
File "/usr/lib64/python2.7/zipfile.py", line 1084, in _extract_member
shutil.copyfileobj(source, target)
File "/usr/lib64/python2.7/shutil.py", line 49, in copyfileobj
buf = fsrc.read(length)
File "/usr/lib64/python2.7/zipfile.py", line 632, in read
data = self.read1(n - len(buf))
File "/usr/lib64/python2.7/zipfile.py", line 672, in read1
self._update_crc(data, eof=(self._compress_left==0))
File "/usr/lib64/python2.7/zipfile.py", line 647, in _update_crc
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipfile: Bad CRC-32 for file 'alien-12.txt'
Code:
# importing required modules
from zipfile import ZipFile
# specifying the zip file name
file_name = "/tmp/alien-12.zip"
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
# printing all the contents of the zip file
zip.printdir()
# extracting all the files
print('Extracting all the files now...')
zip.extractall('/tmp',pwd = "piso")
print('Done!')
If I change the line of:
zip.extractall('/tmp',pwd = "piso")
then I get the error of:
IOError: [Errno 30] Read-only file system:
Then go on to try and fix it first by trying to output what is in the zip file.
zipfile.testzip() returns which then errors
Error:
RuntimeError: File alien-12.txt is encrypted, password required for extraction

Error while reading wav file: wave.Error: unknown format: 6 [duplicate]

I try to open a wave file with the wave module, but I keep getting the same error whatever I try.
The line with the error is the following:
wav = wave.open(f)
This is the error message:
Traceback (most recent call last):
File "annotate.py", line 47, in <module>
play(file)
File "annotate.py", line 33, in play
wav = wave.open(f)
File "C:\Program Files (x86)\Python\lib\wave.py", line 498, in open
return Wave_read(f)
File "C:\Program Files (x86)\Python\lib\wave.py", line 163, in __init__
self.initfp(f)
File "C:\Program Files (x86)\Python\lib\wave.py", line 143, in initfp
self._read_fmt_chunk(chunk)
File "C:\Program Files (x86)\Python\lib\wave.py", line 269, in _read_fmt_chunk
raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 49
String f is a path to a .WAV file and it works when played in any of my media players.
I have of course imported the wave module.
I tried f both as a relative and an absolute path.
I tried replacing "WAV" by "wav".
What is the error caused by?
Python's wave module works with a specific type of WAV: PCM (WAVE_FORMAT_PCM: 0x0001).
In your case, you're using a WAV of type WAVE_FORMAT_GSM610 [0x0031 = hex(49)].
You can use a program like Audacity or some lib for converting codecs to change the type of the WAV file.
You can see a list of WAV types here:
https://www.videolan.org/developers/vlc/doc/doxygen/html/vlc__codecs_8h.html
Python's wave module source code:
https://github.com/python/cpython/blob/master/Lib/wave.py
The file is compressed and the wave module does not support this type of compression.

Categories

Resources