Reading Matrix File (mtx) using Python: not enough values to unpack - python

I want to read a ".mtx file" using Python. The matrix file (31x31) is given by a transportation simulation tool (visum). I used the following code:
from scipy.io import mmread
A = mmread('./saclay/demand_visum.mtx')
I got the message:
ValueError: not enough values to unpack (expected 5, got 1)
Thanks a lot for your help.

I've investigated a little and found that there is a library for loading the files, called matrixconverters, you can load a PTV Visum matrix by using the following command:
from matrixconverters.read_ptv import ReadPTVMatrix
the_matrix = ReadPTVMatrix(filename="pathtoyourfile")
That's a short answer, it works with some exports, not with others, depending on your format. The concrete MTX subformat is the text in the first line of your .mtx file, for example: $V;D3 format didn't work for me, but others did.

Related

Storing multiple GeoTiffs in HDF5 file in Python

I want to store multiple GeoTiff files in one HDF5 file to use it for further analysis since the function I am supposed to use can just deal with HDF5 (so basically like a raster stack in R but stored in a HDF5). I have to use Python. I am relatively new to HDF5 format (and geoanalysis in Python generally) and don't really know how to approach this issue. Especially keeping the geolocation/projection inforation seems tricky to me. So far I tried:
import h5py
import rasterio
r1 = rasterio.open("filename.tif")
r2 = rasterio.open("filename2.tif")
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1)
hdf.create_dataset('GeoTiff2', data=r2)
Yielding the following errror:
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
I am pretty sure this not at all the correct approach and I'm happy about any suggestions.
What you can try is to do this:
import numpy as np
spec_dtype = h5py.special_dtype(vlen=np.dtype('float64'))
Just make a spec_dtype variable with float64 type then apply this to create_dataset:
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1,, dtype=spec_dtype)
hdf.create_dataset('GeoTiff2', data=r2,, dtype=spec_dtype)
Apply these and hopefully it will work.
Using HDFql in Python, your use-case could be solved as follows:
import HDFql
HDFql.execute("SHOW FILE SIZE filename.tif, filename2.tif")
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff1 AS OPAQUE(%d) VALUES FROM BINARY FILE filename.tif" % HDFql.cursor_get_bigint())
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff2 AS OPAQUE(%d) VALUES FROM BINARY FILE filename2.tif" % HDFql.cursor_get_bigint())

Trouble reading npy file

I am new to python and am having trouble reading a *.npy file that somebody else saved. If I use the following commands:
import numpy as np
np.load('lat.npy')
I get the following error:
ValueError: Cannot load file containing pickled data when allow_pickle=False
So, I set allow_pickle=True:
np.load('lat.npy',allow_pickle=True)
Then, I get a different error:
OSError: Failed to interpret file 'lat.npy' as a pickle
Maybe it is relevant that I am on a PC, and the other file was written on a Mac.
Am I doing something wrong? (I am sorry if this question has been asked already.) Thank you!
I learned that my colleague's data file was written in python 2, while I am using python 3. Using the np.load command with the following options will work:
np.load('lat.npy',allow_pickle=True,fix_imports=True,encoding='latin1')
It seems I need to set all of those options, but the 'encoding' argument seems especially important. The doc for numpy.load says about the encoding argument, "Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays."

Issues loading fieldtrip data with MNE python

I am having challenges loading fieldtrip data with MNE python. When I enter the code below
import mne
original_data = mne.io.read_raw_fif('data.fif', preload=False)
original_info = original_data.info
data_from_ft = mne.read_epochs_fieldtrip('data.mat', original_info)
I get this error:
TypeError: only size-1 arrays can be converted to Python scalars
Is the Fieldtrip data in the form of epochs or raw? It is difficult to say without looking at the data, but you can try using mne.io.read_raw_fieldtrip() or read_evoked_fieldtrip() as the case may be.
It also helps to look at the docs and ask on the more active mailing list here.

How to output a Numpy array to a file object for Picloud

I have a matrix-factorization process that I'm running on picloud. The output is a set of numpy arrays (ndarray).
Now, I want to save it to my bucket, but I'm not able to zero in on the right way to do it. Let's assume that the array to be saved is P.
I tried:
cloud.bucket.putf(P,'p.csv')
but that returned an error: "IOError: File object is not seekable. Cannot transmit".
I tried
numpy.ndarray.tofile(P,f, sep=",", format="%s") #outputing the array to a file object f
cloud.bucket.putf(f,'p.csv') #saving the file object f in the bucket.
I tried a couple of other things, including using using numpy.savetext (as I would if I ran it locally) but I'm not able to solve this between the picloud documentation and stackexchange questions. I haven't tried pickle yet, though. I felt this was something straightforward, but I'm feeling quite silly after spending a few hours on this.
As you guessed, you want to pickle the array as follows:
import cloud
import cPickle as pickle
# to write
cloud.bucket.putf(pickle.dumps(P), 'p.csv')
# to read
obj = pickle.loads(cloud.bucket.getf('p.csv').read())
This is a general way to serialize and store any Python object in your PiCloud Bucket. I also recommend that you store your csv files under a prefix to keep it organized [1].
[1] http://docs.picloud.com/bucket.html#namespacing-with-prefix

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!
I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!
In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Categories

Resources