I am accessing some .mat files hosted on a SimpleHTTP server and trying to load them using Python's loadmat:
from scipy.io import loadmat
import requests
r = requests.get(requestURL,stream=True)
print loadmat(r.raw)
However, I am getting an error
io.UnsupportedOperation: seek
After looking around it seems that r.raw is a file object which is not seekable, and therefore I can't use loadmat on it. I tried the solution here (modified slightly to work with Python 2.7), but maybe I did something wrong because when I try
seekfile = SeekableHTTPFile(requestURL,debug=True)
print 'closed', seekfile.closed
It tells me that the file is closed, and therefore I can't use loadmat on that either. I can provide the code for my modified version if that would be helpful, but I was hoping that there is some better solution to my problem here.
I can't copy/download the .mat file because I don't have write permission on the server where my script is hosted. So I'm looking for a way to get a seekable file object that loadmat can use.
Edit: attempt using StringIO
import StringIO
stringfile = StringIO.StringIO()
stringfile.write(repr(r.raw.read())) # repr needed because of null characters
loaded = loadmat(stringfile)
stringfile.close()
r.raw.close()
results in:
ValueError: Unknown mat file type, version 48, 92
Related
I have seen this question floating around without any definite answers such as here. I have .mat data converted from a different data structure and am trying to load it in python using scipy.io.loadmat. For some files, this approach works perfectly fine, but for others I get this error:
mat = sio.loadmat(i, verify_compressed_data_integrity=False)
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio.py", line 226, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 390, in get_variables
hdr, next_position = self.read_var_header()
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 346, in read_var_header
hdr = self._matrix_reader.read_header()
File "/Users/aeglick/opt/anaconda3/lib/python3.8/site-packages/scipy-1.7.1-py3.8-macosx-10.9-x86_64.egg/scipy/io/matlab/mio4.py", line 108, in read_header
raise ValueError('Mat 4 mopt wrong format, byteswapping problem?')
ValueError: Mat 4 mopt wrong format, byteswapping problem?
I'm not sure what is causing this issue. I save the .mat files the same way every time so they should all be readable. I also tried h5py and get a similar error. Are there any suggestions on how I can read my data files?
You might be saving the .mat files in different "version" formats without realizing it. If your code calls save(...) without explicitly specifying a format, it uses the default version for your Matlab session, which is a persisted per-user preference that you can set inside the Matlab GUI. And if you don't set a default format in Preferences, the default version that save(...) uses varies with the version of Matlab.
The differences between MAT-file versions are significant. In particular, v7.3 completely changed the format to an HDF5-based one (which I don't think scipy.io.loadmat supports). See https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html.
Check your actual .mat file versions. And if you want your code to be really portable, change the save(...) calls in your code to explicitly specify the MAT-file version using a '-v<whatever>' argument.
I am new to python and am having trouble reading a *.npy file that somebody else saved. If I use the following commands:
import numpy as np
np.load('lat.npy')
I get the following error:
ValueError: Cannot load file containing pickled data when allow_pickle=False
So, I set allow_pickle=True:
np.load('lat.npy',allow_pickle=True)
Then, I get a different error:
OSError: Failed to interpret file 'lat.npy' as a pickle
Maybe it is relevant that I am on a PC, and the other file was written on a Mac.
Am I doing something wrong? (I am sorry if this question has been asked already.) Thank you!
I learned that my colleague's data file was written in python 2, while I am using python 3. Using the np.load command with the following options will work:
np.load('lat.npy',allow_pickle=True,fix_imports=True,encoding='latin1')
It seems I need to set all of those options, but the 'encoding' argument seems especially important. The doc for numpy.load says about the encoding argument, "Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays."
I am trying to read a matlab file using scipy
import scipy.io as sio
data = sio.loadmat(filepath)
but I get the error
ValueError: Did not fully consume compressed contents of an miCOMPRESSED element. This can indicate that the .mat file is corrupted.
In Matlab I can open this file without any problem. I also tried to save it again, but nothing changed...
Can you help me?
Here: https://drive.google.com/drive/folders/0B3vXKJ_zYaCJanZfOUVIcGJyR0E
you can find 2 files saved in the same way..
I can open part_000, but not part_001.... why? :(
The problem seems to be caused by the compression. .mat files are compressed automatically from version 7 onward.
Therefore, I suggest trying to save the file in the earlier, uncompressed .mat file version 6:
save(filename, 'data', '-v6');
The problem is with scipy.io.loadmat's verify_compressed_data_integrity keyword argument, which defaults to True. It's trying to do some error checking of the headers, but can raise an error even when the data is extracted just fine. See this related GitHub issue. I'm unsure of the implications of switching this off full-time, but if you use the following, it should resolve your issue in the meantime (I can't check it against your data, it's no longer available at the provided URL).
import scipy.io as sio
data = sio.loadmat(filepath, verify_compressed_data_integrity=False)
I can load both files with Octave, and rewrite the one that causes problems
>> data1 = load('part_0001.mat');
>> save -v7 part_0002.mat -struct data1
In Python the rewritten file loads fine, just like your 0000.mat file.
In [8]: data2=loadmat('part_0002.mat')
In [10]: data2.keys()
Out[10]: dict_keys(['RealTime', 'AccNorm', 'Alt', 'FsP', 'DeviceTime', 'FsA', 'Acc', 'imatemp', 'Time', '__version__', '__globals__', '__header__'])
The rewritten file is actually a bit smaller. A V6 file is 13M, and also loadable.
>> save -v6 part_0003.mat -struct data1
So there must be some glitch in loadmat's handling of the V7 format.
sometimes the mat files become corrupt, so Matlab cant recognize the data type and unable to load it. so while saving the mat file. try to save the mat file by setting the long_field_names=True
scipy.io.savemat(filename,long_field_names=True)
I have a matrix-factorization process that I'm running on picloud. The output is a set of numpy arrays (ndarray).
Now, I want to save it to my bucket, but I'm not able to zero in on the right way to do it. Let's assume that the array to be saved is P.
I tried:
cloud.bucket.putf(P,'p.csv')
but that returned an error: "IOError: File object is not seekable. Cannot transmit".
I tried
numpy.ndarray.tofile(P,f, sep=",", format="%s") #outputing the array to a file object f
cloud.bucket.putf(f,'p.csv') #saving the file object f in the bucket.
I tried a couple of other things, including using using numpy.savetext (as I would if I ran it locally) but I'm not able to solve this between the picloud documentation and stackexchange questions. I haven't tried pickle yet, though. I felt this was something straightforward, but I'm feeling quite silly after spending a few hours on this.
As you guessed, you want to pickle the array as follows:
import cloud
import cPickle as pickle
# to write
cloud.bucket.putf(pickle.dumps(P), 'p.csv')
# to read
obj = pickle.loads(cloud.bucket.getf('p.csv').read())
This is a general way to serialize and store any Python object in your PiCloud Bucket. I also recommend that you store your csv files under a prefix to keep it organized [1].
[1] http://docs.picloud.com/bucket.html#namespacing-with-prefix
This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!
I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!
In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.