Read very long array from mat with scipy

Read very long array from mat with scipy - python

I have a result file from Dymola (.mat v4) which stores all variables in a huge 1D array (More or less 2GB of data in one array...). I can't do anything about the file format as we are bound to use Dymola. When trying to read the file using scipy (with Python 2.7.13 64bit), I get the following error:
C:\Users\...\scipy\io\matlab\mio4.py:352: RuntimeWarning: overflow encountered
in long_scalars
remaining_bytes = hdr.dtype.itemsize * n
C:\...\scipy\io\matlab\mio4.py:172: RuntimeWarning: overflow
encountered in long_scalars
num_bytes *= d
Traceback (most recent call last):
File
...
self.mat = scipy.io.loadmat(fileName, chars_as_strings=False)
File "C:\...\scipy\io\matlab\mio.py", line 136, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "C:\...\scipy\io\matlab\mio4.py", line 399, in get_variables
mdict[name] = self.read_var_array(hdr)
File "C:\...\scipy\io\matlab\mio4.py", line 374, in read_var_array
return self._matrix_reader.array_from_header(header, process)
File "C:\...\scipy\io\matlab\mio4.py", line 137, in array_from_header
arr = self.read_full_array(hdr)
File "C:\...\scipy\io\matlab\mio4.py", line 207, in read_full_array
return self.read_sub_array(hdr)
File "C:\...\scipy\io\matlab\mio4.py", line 178, in read_sub_array
"`variable_names` kwarg to `loadmat`" % hdr.name)
ValueError: Not enough bytes to read matrix 'data_2'; is this a badly-formed
file? Consider listing matrices with `whosmat` and loading named matrices with `variable_names` kwarg to `loadmat`
The error/problem is pretty clear to me. My question: Are there any workarounds? Can I still read the file and get the data? Is it possible to split the array while reading it?

I suggest you turn on conversion to SDF file format which is based on HDF5. This format can better handle large files. See Simulation/Setup.
Alternatively you can reduce the number of variables stored in the file using Variable Selections in Dymola.

Related

trace32 python api read memory address - how?

I am trying to use a python script to read from memory through trace32. I've found the following document: https://www2.lauterbach.com/pdf/api_remote.pdf
I managed the following code:
local_buffer = ctypes.POINTER(ctypes.c_uint32)
t32api.T32_ReadMemory(byteAddress=addr, access=0x0, buffer=local_buffer, size=size)
print(local_buffer)
Of course there is an initialization of the t32api object - that works. But the code I pasted here causes the following python error:
Traceback (most recent call last):
File "<path_to_python_script>", line 599, in <module>
main()
File "<path_to_python_script>", line 590, in main
process()
File "<path_to_python_script>", line 269, in process
NumberOfEmpr = read_addr(0xf0083100)
File "<path_to_python_script>", line 148, in read_addr
return read_addr_t32(addr, size)
File "<path_to_python_script>", line 137, in read_addr_t32
t32api.T32_ReadMemory(byteAddress=addr, access=0x0, buffer=local_buffer, size=size)
OSError: exception: access violation writing 0xXXXXXXXX
Of course 0xXXXXXXXX is a placeholder to some address, I am guessing it is the address of local_buffer.
If anyone knows how to fix this I will be thankful.

The problem is that the buffer pointer you give to T32_ReadMemory() should not only be a pointer but should be a pointer to existing memory.
So you need to change
local_buffer = ctypes.POINTER(ctypes.c_uint32)
t32api.T32_ReadMemory(byteAddress=addr, access=0x0, buffer=local_buffer, size=size)
print(local_buffer)
to
local_buffer = (ctypes.c_ubyte * size)()
t32api.T32_ReadMemory(byteAddress=addr, access=0x0, buffer=local_buffer, size=size)
print(local_buffer)
Two remarks independent of your problem:
I would suggest to use T32_ReadMemoryObj() instead of T32_ReadMemory().
Check trace32_and_python.pdf. New TRACE32 version include a Python module which you can just import.

How can read Minecraft .mca files so that in python I can extract individual blocks?

I can't find a way of reading the Minecraft world files in a way that i could use in python
I've looked around the internet but can find no tutorials and only a few libraries that claim that they can do this but never actually work
from nbt import *
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
I expected this to work but instead I got errors about the file not being compressed or something of the sort
Full error:
Traceback (most recent call last):
File "C:\Users\rober\Desktop\MinePy\MinecraftWorldReader.py", line 2, in <module>
nbtfile = nbt.NBTFile("r.0.0.mca",'rb')
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 628, in __init__
self.parse_file()
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 652, in parse_file
type = TAG_Byte(buffer=self.file)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 99, in __init__
self._parse_buffer(buffer)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nbt\nbt.py", line 105, in _parse_buffer
self.value = self.fmt.unpack(buffer.read(self.fmt.size))[0]
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 276, in read
return self._buffer.read(size)
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 463, in read
if not self._read_gzip_header():
File "C:\Users\rober\AppData\Local\Programs\Python\Python36-32\lib\gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\x00\x00')

Use anvil parser. (Install with pip install anvil-parser)
Reading
import anvil
region = anvil.Region.from_file('r.0.0.mca')
# You can also provide the region file name instead of the object
chunk = anvil.Chunk.from_region(region, 0, 0)
# If `section` is not provided, will get it from the y coords
# and assume it's global
block = chunk.get_block(0, 0, 0)
print(block) # <Block(minecraft:air)>
print(block.id) # air
print(block.properties) # {}
https://pypi.org/project/anvil-parser/

According to this page, the .mca files is not totally kind of of NBT file. It begins with an 8KiB header which includes the offsets of chunks in the region file itself and the timestamps for the last updates of those chunks.
I recommend you to see the offical announcement and this page for more information.

h5py randomly unable to open object (component not found)

I'm trying to load hdf5 datasets into a pytorch training for loop.
Regardless of num_workers in dataloader, this randomly throws "KeyError: 'Unable to open object (component not found)' " (traceback below).
I'm able to start the training loop, but not able to get through 1/4 of one epoch without this error which happens for random 'datasets' (which are 2darrays each). I'm able to separately load these arrays in the console using the regular f['group/subroup'][()] so it doesn't appear like the hdf file is corrupted or that there's anything wrong with the datasets/array.
I've tried:
adjusting num_workers as per various other issues that people have had with pytorch - still happens with 0 num_workers.
upgrading /downgrading, torch, numpy and python versions.
using f.close() at the end of data loader getitem
using a fresh conda env and installing dependencies.
calling parent groups first, then initialising array eg:
X = f[ID] then X = X[()]
using double slashes in hdf path
Because this recurs with num_workers=0, I figure it's not a multithreading issue although the traceback seems to point to lines from /torch/utils/data/dataloader that prep the next batch.
I just can't figure out why h5py can't see the odd individual dataset, randomly.
IDs are strings to match hdf paths eg:
ID = "ID_12345//Ep_-1//AN_67891011//ABC"
excerpt from dataloader:
def __getitem__(self, index):
ID = self.list_IDs[index]
# Start hdf file in read mode:
f = h5py.File(self.hdf_file, 'r', libver='latest', swmr=True)
X = f[ID][()]
X = X[:, :, np.newaxis] # torchvision 0.2.1 needs (H x W x C) for transforms
y = self.y_list[index]
if self.transform:
X = self.transform(X)
return ID, X, y
`
Expected: training for loop
Actual: IDs / datasets / examples are loaded fine initially, then after between 20 and 200 steps...
Traceback (most recent call last):
File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 287, in
main() File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 203, in main
for i, (IDs, images, labels) in enumerate(train_loader): File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices]) File
"/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices]) File
"/home/james/Documents/BSSA-loc/mamdl/src/data_loading/Data_loader_v3.py",
line 59, in getitem
X = f[ID][()] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55,
in h5py._objects.with_phil.wrapper File
"/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/h5py/_hl/group.py",
line 262, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

For the record, my best guess is that this was due a bug in my code for hdf construction, which was stopped and started multiple times in append mode.
Some datasets appeared as though they were complete when queried f['group/subroup'][()] but were not able to loaded with pytorch dataloader.
Haven't had this issue since rebuilding hdf differently.

Persisting a Large scipy.sparse.csr_matrix

I have a very large sparse scipy matrix. Attempting to use save_npz resulted in the following error:
>>> sp.save_npz('/projects/BIGmatrix.npz',W)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 716, in _savez
pickle_kwargs=pickle_kwargs)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/format.py", line 597, in write_array
array.tofile(fp)
OSError: 6257005295 requested and 3283815408 written
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/_matrix_io.py", line 78, in save_npz
np.savez_compressed(file, **arrays_dict)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 659, in savez_compressed
_savez(file, args, kwds, True)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 721, in _savez
raise IOError("Failed to write to %s: %s" % (tmpfile, exc))
OSError: Failed to write to /projects/BIGmatrix.npzg6ub_z3y-numpy.npy: 6257005295 requested and 3283815408 written
As such I wanted to try persisting it to postgres via psycopg2 but I haven't found a method of iterating over all nonzeros so I can persist them as rows in a table.
What is the best way to handle this task?

Save all the attributes in __dict__ of the matrix object, and recreate the csr_matrix when load:
from scipy import sparse
import numpy as np
a = np.zeros((1000, 2000))
a[np.random.randint(0, 1000, 100), np.random.randint(0, 2000, 100)] = np.random.randn(100)
b = sparse.csr_matrix(a)
np.savez("tmp", data=b.data, indices=b.indices, indptr=b.indptr, shape=np.array(b.shape))
f = np.load("tmp.npz")
b2 = sparse.csr_matrix((f["data"], f["indices"], f["indptr"]), shape=f["shape"])
(b != b2).sum()

It seems that the way things go is:
When you invoke scipy.sparse.save_npz(), by default it saves as a compressed file; however, in order to do so it first creates a temporary uncompressed version of the target file that it then compresses down to the final result. This means that whatever drive you save to needs to be large enough to accommodate the uncompressed temp file which in my case was 47G.
I re-tried the save in a larger drive and the process completed without incident.
Note: The compression can take quite a long time.

Create HDF5 file using pytables with table format and data columns

I want to read a h5 file previously created with PyTables.
The file is read using Pandas, and with some conditions, like this:
pd.read_hdf('myH5file.h5', 'anyTable', where='some_conditions')
From another question, I have been told that, in order for a h5 file to be "queryable" with read_hdf's where argument it must be writen in table format and, in addition, some columns must be declared as data columns.
I cannot find anything about it in PyTables documentation.
The documentation on PyTable's create_table method does not indicate anything about it.
So, right now, if I try to use something like that on my h5 file createed with PyTables I get the following:
>>> d = pd.read_hdf('test_file.h5','basic_data', where='operation==1')
C:\Python27\lib\site-packages\pandas\io\pytables.py:3070: IncompatibilityWarning:
where criteria is being ignored as this version [0.0.0] is too old (or
not-defined), read the file in and write it out to a new file to upgrade (with
the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 323, in read_hdf
return f(store, True)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 305, in <lambda>
key, auto_close=auto_close, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 665, in select
return it.get_result()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
results = self.func(self.start, self.stop, where)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 658, in func
columns=columns, **kwargs)
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3968, in read
if not self.read_axes(where=where, **kwargs):
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3196, in read_axes
values = self.selection.select()
File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 4482, in select
start=self.start, stop=self.stop)
File "C:\Python27\lib\site-packages\tables\table.py", line 1567, in read_where
self._where(condition, condvars, start, stop, step)]
File "C:\Python27\lib\site-packages\tables\table.py", line 1528, in _where
compiled = self._compile_condition(condition, condvars)
File "C:\Python27\lib\site-packages\tables\table.py", line 1366, in _compile_condition
compiled = compile_condition(condition, typemap, indexedcols)
File "C:\Python27\lib\site-packages\tables\conditions.py", line 430, in compile_condition
raise _unsupported_operation_error(nie)
NotImplementedError: unsupported operand types for *eq*: int, bytes
EDIT:
The traceback mentions something about IncompatibilityWarning and version [0.0.0], however if I check my versions of Pandas and Tables I get:
>>> import pandas
>>> pandas.__version__
'0.15.2'
>>> import tables
>>> tables.__version__
'3.1.1'
So, I am totally confused.

I had the same issue, and this is what I have done.
Create a HDF5 file by PyTables;
Read this HDF5 file by pandas.read_hdf and use parameters like "where = where_string, columns = selected_columns"
I got the warning message like below and other error messages:
D:\Program
Files\Anaconda3\lib\site-packages\pandas\io\pytables.py:3065:
IncompatibilityWarning: where criteria is being ignored as this
version [0.0.0] is too old (or not-defined), read the file in and
write it out to a new file to upgrade (with the copy_to method)
warnings.warn(ws, IncompatibilityWarning)
I tried commands like this:
hdf5_store = pd.HDFStore(hdf5_file, mode = 'r')
h5cpt_store_new = hdf5_store.copy(hdf5_new_file, complevel=9, complib='blosc')
h5cpt_store_new.close()
And run the command exactly like step 2, it works.
pandas.version
'0.17.1'
tables.version
'3.2.2'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read very long array from mat with scipy - python

I suggest you turn on conversion to SDF file format which is based on HDF5. This format can better handle large files. See Simulation/Setup. Alternatively you can reduce the number of variables stored in the file using Variable Selections in Dymola.

Related

trace32 python api read memory address - how?

How can read Minecraft .mca files so that in python I can extract individual blocks?

h5py randomly unable to open object (component not found)

Persisting a Large scipy.sparse.csr_matrix

Create HDF5 file using pytables with table format and data columns

Categories

Resources