Building a huge numpy array using pytables - python

How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error:
import numpy as np
import tables as tb
ndim = 60000
h5file = tb.openFile('test.h5', mode='w', title="Test Array")
root = h5file.root
h5file.createArray(root, "test", np.zeros((ndim,ndim), dtype=float))
h5file.close()

Piggybacking off of #b1r3k's response, to create an array that you are not going to access all at once (i.e. bring the whole thing into memory), you want to use a CArray (Chunked Array). The idea is that you would then fill and access it incrementally:
import numpy as np
import tables as tb
ndim = 60000
h5file = tb.openFile('test.h5', mode='w', title="Test Array")
root = h5file.root
x = h5file.createCArray(root,'x',tb.Float64Atom(),shape=(ndim,ndim))
x[:100,:100] = np.random.random(size=(100,100)) # Now put in some data
h5file.close()

You could try to use tables.CArray class as it supports compression but...
I think questions is more about numpy than pytables because you are creating array using numpy before storing it with pytables.
In that way you need a lot of ram to execute np.zeros((ndim,ndim) - and this is probably the place where exception: "ValueError: array is too big." is raised.
If matrix/array is not dense then you could use sparse matrix representation available in scipy: http://docs.scipy.org/doc/scipy/reference/sparse.html
Another solution is to try to access your array via chunks if it you don't need whole array at once - check out this thread: Very large matrices using Python and NumPy

Related

Problems with obtaining and saving 2D slices from a 3D array

I'm trying to save a 2D slice of a 3D array that I'm slicing with the following code:
import nibabel as nib
import numpy as np
from nibabel.testing import data_path
import os
vol1= np.load("teste01.npy")
zSlice= (vol1[1, :, :]).squeeze()
print (zSlice.shape)
np.save(zSlice, os.path.join("D:/Volumes convertidos LIDC/slice01.npy"))
I'm getting an error: TypeError: expected str, bytes or os.PathLike object, not ndarray
Is there any way to fix this? I need 2D arrays in order to be able to insert my images into an automatic lung vessel segmentation model but I only have 3D images, is there any way to obtain all the slices from said 3D image instead of slicing it manually (like I'm trying to do?
You just mixed up the parameters for numpy.save. Use the filename as the first parameter and the data as the second:
np.save(os.path.join("D:/Volumes convertidos LIDC/slice01.npy"), zSlice)

VTK vtkDataSet to 3D numpy array and back

I'm moving my first steps into vtk and I'm quite struggling given the lack of documentation.
I've got a .vtk file which is a vtkDataSet type object I haven't created. I would need to export the content of it and convert it to a 3D numpy matrix, customise it and its tensor and write everything in a vtkDataSet object and .vtk file.
What I've ended up so far is save the coordinates, which is not what I need, of the points into a numpy array using vtk.util.numpy_support vtk_to_numpy. However, I'd need a 3D numpy matrix representing the volume rendering of it.
Speaking about the tensor, I figured out how and where to save my 9-elements tensor into the file. I'm just not sure on how to set it properly to be related to the points.
The last step, which is 3D numpy array to vtk, looks feasible using numpy.ravel and numpy_to_vtk from vtk.util.numpy_support.
Here's some code I'm using as a test:
# reader for mrtrix vtk file
reader = vtk.vtkDataSetReader()
file_name = 'my_file.vtk'
reader.SetFileName(file_name)
reader.Update()
# get the vtkDataArray
data_set = reader.GetOutput()
# these are the coordinates of the points
# I'd need the 3D numpy volume rendering matrix instead
point_array = data_set.GetPoints().GetData()
# test tensor
# I'd need to save a tensor for every element of the 3D numpy matrix
tensor = numpy_to_vtk(np.zeros([data_set.GetNumberOfPoints(), 9]))
tensor.SetName('Tensors_')
point_data = data_set.GetPointData()
point_data.SetAttribute(tensor, 4)
This may be useful in your case:
https://github.com/marcomusy/vedo/blob/master/vedo/examples/volumetric/numpy2volume1.py
and retrieve the numpy object with e.g.
print('numpy array from Volume:', vol.getPointArray().shape)

How to effectively store a very large list in python

Question:I have a big 3D image collection that i would like to store into one file. How should I effectively do it?
Background: The dataset has about 1,000 3D MRI images with a size of 256 by 256 by 156. To avoid frequent files open and close, I was trying to store all of them into one big list and export it.
So far I tried reading each MRI in as 3D numpy array and append it to a list. When i tried to save it using numpy.save, it consumed all my memory and exited with "Memory Error".
Here is the code i tried:
import numpy as np
import nibabel as nib
import os
file_list = os.listdir('path/to/files')
for file in file_list:
mri = nib.load(os.path.join('path/to/files',file))
mri_array = np.array(mri.dataobj)
data.append(mri_array)
np.save('imported.npy',data)
Expected Outcome:
Is there a better way to store such dataset without consuming too much memory?
Using HDF5 file format or Numpy's memmap are the two options that I would go to first if you want to jam all your data into one file. These options do not load all the data into memory.
Python has the h5py package to handle HDF5 files. These have a lot of features, and I would generally lean toward this option. It would look something like this:
import h5py
with h5py.File('data.h5') as h5file:
for n, image in enumerate(mri_images):
h5file[f'image{n}'] = image
memmap works with binary files, so not really feature rich at all. This would look something like:
import numpy as np
bin_file = np.memmap('data.bin', mode='w+', dtype=int, shape=(1000, 256, 256, 156))
for n, image in enumerate(mri_images):
bin_file[n] = image
del bin_file # dumps data to file

How to save numpy masked array to file

What is the most efficient way of saving a numpy masked array? Unfortunately numpy.save doesn't work:
import numpy as np
a = np.ma.zeros((500, 500))
np.save('test', a)
This gives a:
NotImplementedError: Not implemented yet, sorry...
One way seems to be using pickle, but that unfortunately is not very efficient (huge file sizes), and not platform-independent. Also, netcdf4 seems to work, but it has a large overhead just to save a simple array.
Anyone has had this problem before? I'm tempted just to do numpy.save of array.data and another for the mask.
import numpy as np
a = np.ma.zeros((500, 500))
a.dump('test')
then read it with
a = np.load('test')
The current accepted answer is somewhat obsolete, and badly inefficient if the array being stored is sparse (it relies on uncompressed pickling of the array).
A better way to save/load a masked array would be to use an npz file:
import numpy as np
# Saving masked array 'arr':
np.savez_compressed('test.npz', data=arr.data, mask=arr.mask)
# Loading array back
with np.load('test.npz') as npz:
arr = np.ma.MaskedArray(**npz)
If you have a fixed mask that doesn't need to be saved, then you can just save the valid values:
a = np.ma.MaskedArray(values,mask)
np.save('test', a.compressed())
You can then recover it doing something like:
compressed = np.load('test')
values = np.zeros_like(mask, dtype=compressed.dtype)
np.place(values, ~mask, compressed)
a = np.ma.MaskedArray(values, mask)
A simple way to do it would be to save the data and mask of the masked array separately:
np.save('DIN_WOA09.npy',DIN_woa.data)
np.save('mask_WOA09.npy',DIN_woa.mask)
Then later, you can reconstruct the masked array from the data and mask.
Saving it inside a dictionary will allow you to keep its original format and mask without any trouble. Something like:
b={}
b['a'] = a
np.save('b', b)
should work fine.

Python NumPy Convert FFT To File

I was wondering if it's possible to get the frequencies present in a file with NumPy, and then alter those frequencies and create a new WAV file from them? I would like to do some filtering on a file, but I have yet to see a way to read a WAV file into NumPy, filter it, and then output the filtered version. If anyone could help, that would be great.
SciPy provides functions for doing FFTs on NumPy arrays, and also provides functions for reading and writing them to WAV files. e.g.
from scipy.io.wavfile import read, write
from scipy.fftpack import rfft, irfft
import np as numpy
rate, input = read('input.wav')
transformed = rfft(input)
filtered = function_that_does_the_filtering(transformed)
output = irfft(filtered)
write('output.wav', rate, output)
(input, transformed and output are all numpy arrays)

Categories

Resources