Memory management with python and VTK - python

I wrote a rather clunky python program that creates a unit cell of a certain geometry and duplicates it so that I get e.g. a 3x3x3 array of the unit cell. The result is saved as an .stl. The goal was to create these structures directly instead of using CAD.
My problem is that the computation time is annoying (~2 minutes for 8x8x8, which is smaller than what I need usually). The biggest problem is, that 10x10x10 isn't even possible. VTK immediately throws an error "Unable to allocate [large number] of elements of size 8 bytes." This leads me to believe that my memory management is insufficient (non-existent).
I read about vtkSmartPointer, but can only seem to find explanations for C++. How do I correctly use vtkSmartPointer with python? I should add that I do not have any experience with C++ whatsoever.
A MWE is kind of impossible due to the program size. Here's is a shortened example of a part of my algorithm pipeline instead:
import vtk
appendFilter = vtk.vtkAppendPolyData()
# create all 12 struts and combine them into one object
for i in range(1, 13, 1):
tf = create_strut(i, node_dist, amp, d, sides, mode, render=False) # method that creates my unit cell out of 12 struts, parameters are irrelevant
appendFilter.AddInputData(tf.GetOutput())
appendFilter.Update()
# clean up poly data
cleanFilter = vtk.vtkCleanPolyData()
cleanFilter.SetInputConnection(appendFilter.GetOutputPort())
cleanFilter.Update()
# cut the cell to its right size
planes = plane_collection(node_dist) # method that generates 6 planes
clip = vtk.vtkClipClosedSurface()
clip.SetInputData(cleanFilter.GetOutput())
clip.SetClippingPlanes(planes)
clip.Update()
# assemble an array from the unit cell
array = duplicate_cells(clip, xyz[0], xyz[1], xyz[2], node_dist)
# save as .stl
data = array.GetOutputPort()
stlwriter = vtk.vtkSTLWriter()
stlwriter.SetInputConnection(data)
stlwriter.SetFileName("Z:/example.stl")
stlwriter.Update()
stlwriter.Write()
This should demonstrate that I use a lot of filters and never care about deleting them or anything. What is the correct/preferred way to clean up my memory?

Since I posted this I read a lot about VTK and Python came to the conclusion, that Python takes care of memory management itself. SmartPointers and Delete() should only be necessary in a C++ environment.
Furthermore I upgraded from Python 2.7 and VTK 6.2 to Python 3.4 and VTK 7.0 which vastly improved the overall performance of the program. Therefore, I think that the problem was with the old VTK version.

Related

How to make a for loop in python, which is called multiple times consecutively, execute faster?

I'd like to state off the bat that I don't have a lot of experience with numPy, and deeper explanation would be appreciated(even obvious ones).
Here's my issue:
converted_X = X
for col in X:
curr_data = X[col]
i = 0
for pix in curr_data:
inv_pix = 255.0 - pix
curr_data[i] = inv_pix
i+=1
converted_X[col] = curr_data.values
Context: X is a DataFrame with images of handwritten digits (70k images, 784 pixels/image).
The entire point of doing this is to change the black background to white and white numbers to black.
The only problem I'm facing with this is that it's taking a ridiculously long time. I tried using rich.Progress() to track its execution, and it's an astonishing 4 hour ETA.
Also, I'm executing this code block in the jupyter notebook extension of VSCode (Might help).
I know it probably has to do with a ton of inefficiencies and under-usage of numPy functionality, but I need guidance.
Thanks in advance.
Never ever write for loop in python on numpy data, that is how you make them faster.
Most of the times, there are ways to have numpy do the for loop for you (meaning, process data by batch. Obviously, there is still a for loop. But not one you wrote in python)
Here, it seems you are trying to compute an inverted image, whose pixels are 255-original pixel.
Just write inverted_image = 255-image
Addition: note that as a python array, numpy arrays are quite inefficient. If you use them just as 2D arrays, that you read and write with low level instruction (settings values individually), then, most of the time, even good'ol python lists are faster. For example, in your case (I've just tried), on my machine, your code is 9 times slower with ndarrays than the exact same code, using directly python list of list of values.
The whole point of ndarrays is that they are faster because you can use them with numpy functions that deal with the whole data in batch for you. And that would not be feasible as easily with python lists.
If X is a numpy array, you can do the following, without any loops:
converted_X = 255.0 - X

tensorflow.dataset.shuffle = tensorflow.dataset.prefetch + then shuffle internally?

According to the documentation of tf.dataset.shuffle, it will fill in a buffer with size k then shuffle inside of it. Tho I don't want the order of data to be changed, I want it to be buffered. Then I found there is tf.dataset.prefetch, which says "This allows later elements to be prepared while the current element is being processed."
From the description I guess prefetch is what I want (i.e. pre-loading the data while the pervious data are being used in training), but while trying to look into the code of tf.dataset.shuffle to see if they actually call tf.dataset.prefetch, I got stuck in these lines (paste them below), cannot find where is shuffle_dataset_v3 defined.
variant_tensor = gen_dataset_ops.shuffle_dataset_v3(
input_dataset._variant_tensor, # pylint: disable=protected-access
buffer_size=self._buffer_size,
seed=self._seed,
seed2=self._seed2,
seed_generator=gen_dataset_ops.dummy_seed_generator(),
reshuffle_each_iteration=self._reshuffle_each_iteration,
**self._flat_structure)
My major question is whether prefetch is the replacement of shuffle in terms of buffering the data, and it would also be nice if someone can point me to where shuffle_dataset_v3 was implemented?
Yes. Prefetch is for buffering data.
gen_dataset_ops, and other gen_xxx_ops are not included in source code because it is automatically generated by bazel to wrap C++ implementation for use in python. You should be able to find these gen_xxx_ops code in your local installation. For example, ${PYTHON_ROOT}/site-packages/tensorflow/python/ops/gen_dataset_ops.py

Multiplying a numpy array within Psychopy

TL;DR: Can I multiply a numpy.average by 2? If yes, how?
For an orientation discrimination experiment, during which people respond on how well they're able to discriminate the angle between an visible grating and non-visible reference grating, I want to calculate the Just Noticeable Difference (JND).
At the end of the code I have this:
#write JND to logfile (average of last 10 reversals)
if len(staircase[stairnum].reversalIntensities) < 10:
dataFile.write('JND = %.3f\n' % numpy.average(staircase[stairnum].reversalIntensities))
else:
dataFile.write('JND = %.3f\n' % numpy.average(staircase[stairnum].reversalIntensities[-10:]))
This is where the JND is written to the file, and I thought it'd be easy to multiply that "numpy.average" line by 2, which doesn't work. I thought of making two different variables that contained the same array, and using numpy.sum to add them together.
#Possible solution
x=numpy.average(staircase[stairnum].reversalIntensities[-10:]))
y=numpy.average(staircase[stairnum].reversalIntensities[-10:]))
numpy.sum(x,y, [et cetera])
I am sure the procedure is very simple, but my current capabilities of programming are limited and the psychopy and python reference materials did not provide what I was looking for (if there is, please share!).

Voxelize STL file?

Basically, i have a corpus of ~10,000 STL files, and i need to turn them all into 32x32x32 arrays of 1's and 0's (voxels)
I already have this script that turns STL files into voxels; https://github.com/rcpedersen/stl-to-voxel , but sometimes even though i specify that i need a 32x32x32 array, it will give me some huge array, and also along with being buggy, it takes FOREVER (processed ~600 files in 48 hours...)
Would it be easier to attempt to fix this script, or to write my own? It doesnt seem like voxelizing an STL would be a hard task, but I don't know any of the methods out there for this; if there are any strategies/tips, anything would be greatly appreciated.
Sorry to be a bummer, but voxelisation is actually quite a hard task. And not something Python is suitable to do quickly. Even for the simple slice/crossing test I would think a c++ implementation will beat python 1:100. I recommend libigl. Or do it on the GPU for realtime :) Look for conservative rasterization. But that is for "good" meshes that are non intersecting and closed. Otherwise it becomes a lot harder. Look for "generalized winding numbers" - also in igl.
Basicly voxelizing facet surface means separation inside form outside. It can be done in different ways: easiest way is to find signed distance from each voxel but it requeres input mesh to be closed, other way is to find winding number. You can find implemetation of both in MeshLib. Also there is python module that can help you:
pip install --upgrade pip
pip install meshlib
from meshlib import mrmeshpy as mm
# load mesh
mesh = mm.loadMesh(mm.Path("path_to_file.stl"))
mtvParams = mm.MeshToVolumeParams()
# signed will have negative values inside mesh and positive outside, but requires closed mesh
mtvParams.type = mm.MeshToVolumeParamsType.Signed
# voxels with presice distance - 3 inside, 3 - outside
mtvParams.surfaceOffset = 3
# find correct voxel size to have 32x32x32 volume
meshBox = mesh.computeBoundingBox()
boxSize = meshBox.max-meshBox.min
mtvParams.voxelSize = boxSize / 27.0
voxels = mm.meshToVolume(mesh,mtvParams)
# save voxels as tiff slices
vsParams = mm.VoxelsSaveSavingSettings()
vsParams.path = "save_voxels_dir"
vsParams.slicePlane = mm.SlicePlane.XY
mm.saveAllSlicesToImage(voxels,vsParams)

Python: Fastest way to iterate this through a large file

Right, I'm iterating through a large binary file
I need to minimise the time of this loop:
def NB2(self, ID_LEN):
r1=np.fromfile(ReadFile.fid,dTypes.NB_HDR,1)
num_receivers=r1[0][0]
num_channels=r1[0][1]
num_samples=r1[0][5]
blockReturn = np.zeros((num_samples,num_receivers,num_channels))
for rec in range(0,num_receivers):
for chl in range(0,num_channels):
for smpl in range(0,num_samples):
r2_iq=np.fromfile(ReadFile.fid,np.int16,2)
blockReturn[smpl,rec,chl] = np.sqrt(math.fabs(r2_iq[0])*math.fabs(r2_iq[0]) + math.fabs(r2_iq[1])*math.fabs(r2_iq[1]))
return blockReturn
So, what's going on is as follows:
r1 is the header of the file, dTypes.NB_HDR is a type I made:
NB_HDR= np.dtype([('f3',np.uint32),('f4',np.uint32),('f5',np.uint32),('f6',np.int32),('f7',np.int32),('f8',np.uint32)])
That gets all the information about the forthcoming data block, and nicely puts us in the right position within the file (the start of the data block!).
In this data block there is:
4096 samples per channel,
4 channels per receiver,
9 receivers.
So num_receivers, num_channels, num_samples will always be the same (at the moment anyway), but as you can see this is a fairly large amount of data. Each 'sample' is a pair of int16 values that I want to find the magnitude of (hence Pythagoras).
This NB2 code is executed for each 'Block' in the file, for a 12GB file (which is how big they are) there are about 20,900 Blocks, and I've got to iterate through 1000 of these files (so, 12TB overall). Any speed advantage even it's it's milliseconds would be massively appreciated.
EDIT: Actually it might be of help to know how I'm moving around inside the file. I have a function as follows:
def navigateTo(self, blockNum, indexNum):
ReadFile.fid.seek(ReadFile.fileIndex[blockNum][indexNum],0)
ReadFile.currentBlock = blockNum
ReadFile.index = indexNum
Before I run all this code I scan the file and make a list of index locations at ReadFile.fileIndex that I browse using this function and then 'seek' to the absolute location - is this efficient?
Cheers
Because you know the length of a block after you read the header, read the whole block at once. Then reshape the array (very fast, only affects metadata) and take use the np.hypot ufunc:
blockData = np.fromfile(ReadFile.fid, np.int16, num_receivers*num_channels*num_samples*2)
blockData = blockData.reshape((num_receivers, num_channes, num_samples, 2))
return np.hypot(blockData[:,:,:,0], blockData[:,:,:,1])
On my machine it runs in 11ms per block.
import numpy as np
def NB2(self, ID_LEN):
r1=np.fromfile(ReadFile.fid,dTypes.NB_HDR,1)
num_receivers=r1[0][0]
num_channels=r1[0][1]
num_samples=r1[0][5]
# first, match your array bounds to the way you are walking the file
blockReturn = np.zeros((num_receivers,num_channels,num_samples))
for rec in range(0,num_receivers):
for chl in range(0,num_channels):
# second, read in all the samples at once if you have enough memory
r2_iq=np.fromfile(ReadFile.fid,np.int16,2*num_samples)
r2_iq.shape = (-1,2) # tell numpy that it is an array of two values
# create dot product vector by squaring data elementwise, and then
# adding those elements together. Results is of length num_samples
r2_iq = r2_iq * r2_iq
r2_iq = r2_iq[:,0] + r2_iq[:,1]
# get the distance by performing the square root "into" blockReturn
np.sqrt(r2_iq, out=blockReturn[rec,chl,:])
return blockReturn
This should help your performance. Two main ideas in numpy work. First, your result arrays dimensions should match how your loop dimensions are crafted, for memory locality.
Second, Numpy is FAST. I've beaten hand coded C with numpy, simply because it uses LAPack and vector acceleration. However to get that power, you have to let it manipulate more data at a time. That is why your sample loop has been collapsed to read in the full sample for the receiver and channel in one large read. Then use the supreme vector powers of numpy to calculate your magnitude by dot product.
There is a little more optimization to be had in the magnitude calculation, but numpy recycles buffers for you, making it less important than you might think. I hope this helps!
I'd try to use as few loops and as much constants as possible.
Everything that can be done in a linear fashion should be done so.
If values don't change, use constants to reduce lookups and such,
because that eats up cpu cycles.
This is from a theoretical point of view ;-)
If possible use highly optimised libraries. I don't exaclty know what you are trying to achieve but i'd rather use an existing FFT-Lib than writing it myself :>
One more thing: http://en.wikipedia.org/wiki/Big_O_notation (can be an eye-opener)
Most importantly, you shouldn't do file access at the lowest level of a triple nested loop, whether you do this in C or Python. You've got to read in large chunks of data at a time.
So to speed this up, read in large chunks of data at a time, and process that data using numpy indexing (that is, vectorize your code). This is particularly easy in your case since all your data is int32. Just read in big chunks of data, and reshape the data into an array that reflects the (receiver, channel, sample) structure, and then use the appropriate indexing to multiply and add things for Pythagoras, and the 'sum' command to add up the terms in the resulting array.
This is more of an observation than a solution, but porting that function to C++ and loading it in with the Python API would get you a lot of speed gain to begin with before loop optimization.

Categories

Resources