Numpy array via QBuffer to QGeometry - python

My goal is to create a numpy array and convert its bytes data to QBuffer. I am wondering how to set properly DataSize, ByteStride, and Count. See my code below:
self.mesh = Qt3DRender.QGeometryRenderer()
self.mesh.setPrimitiveType(Qt3DRender.QGeometryRenderer.Points)
self.geometry = Qt3DRender.QGeometry(self.mesh)
vertex_data_buffer = Qt3DRender.QBuffer(Qt3DRender.QBuffer.VertexBuffer, self.geometry)
data = np.random.rand(1000, 3)
vertex_data_buffer.setData(QByteArray(data.tobytes()))
self.position_attribute = Qt3DRender.QAttribute()
self.position_attribute.setAttributeType(
Qt3DRender.QAttribute.VertexAttribute)
self.position_attribute.setBuffer(vertex_data_buffer)
self.position_attribute.setDataType(Qt3DRender.QAttribute.Float)
self.position_attribute.setDataSize(3) # ??
self.position_attribute.setByteOffset(0)
self.position_attribute.setByteStride(6) # ??
self.position_attribute.setCount(1000) # ??
self.position_attribute.setName(
Qt3DRender.QAttribute.defaultPositionAttributeName())
self.geometry.addAttribute(self.position_attribute)

We managed to fix this.
First of all these two copied below lines are deprecated. They can be removed.
self.position_attribute.setDataType(Qt3DRender.QAttribute.Float)
self.position_attribute.setDataSize(3)
One line is added:
self.position_attribute.setVertexSize(3)
ByteStride should be set to 12. 3 is the number of coordinates and 4 is length of float32 in bytes. Be aware to set numpy's array: data = np.random.rand(1000, 3).astype(np.float32).
self.position_attribute.setByteOffset(0)
self.position_attribute.setByteStride(3*4)
self.position_attribute.setCount(1000)

Related

Behavior of Python numpy tofile and fromfile when saving int16

After appending data to an empty array and verifying it's shape, I use tofile to save it. When I read it back with fromfile, the shape is much larger (4x).
# Create empty array
rx_data = np.empty((0), dtype='int16')
# File name to write data
file_name = "Test_file.bin"
# Generate two sinewaves and add to rx_data array in interleaved fashion
for i in range(100):
I = np.sin(2*3.14*(i/100))
rx_data = np.append(rx_data, I)
Q = np.cos(2*3.14*(i/100))
rx_data = np.append(rx_data, Q)
print(rx_data.shape)
# Write array to .bin file
rx_data.tofile(file_name)
# Read back data from the .file
s_interleaved = np.fromfile(file_name, dtype=np.int16)
print(s_interleaved.shape)
The above code returns (200) before the array is saved, but return 800 when the array is read back in. Why does this happen?
The problem is that rx_data has a float64 data type before you save it. This is because I and Q are float64 arrays, so when you use np.append, the type will be promoted to be compatible with the float64 values.
Also, populating an array using np.append is an anti-pattern in numpy. It's common to do in standard python, but with numpy, it is often better to create an array of the shape and data type you need, then to fill in the values in the for loop. This is better because you only need to create the array once. With np.append, you create a new copy every time you call it.
rx_data = np.empty(200, dtype="float64")
for i in range(200):
rx_data[i] = np.sin(...)
Here is code that works, because it uses float64. But this pattern should be avoided in general. Prefer pre-allocating an array and replacing the values in the for loop.
# Create empty array
rx_data = np.empty((0), dtype='int16')
# File name to write data
file_name = "Test_file.bin"
# Generate two sinewaves and add to rx_data array in interleaved fashion
for i in range(100):
I = np.sin(2*3.14*(i/100))
rx_data = np.append(rx_data, I)
Q = np.cos(2*3.14*(i/100))
rx_data = np.append(rx_data, Q)
print(rx_data.shape)
# Write array to .bin file
print("rx_data data type:", rx_data.dtype)
rx_data.tofile(file_name)
# Read back data from the .file
s_interleaved = np.fromfile(file_name, dtype=np.float64)
print(s_interleaved.shape)
# Test that the arrays are equal.
np.allclose(rx_data, s_interleaved) # True

How to slice and loop through a netCDF variable in Python?

I have a netCDF variable with 372 time-steps, I need to slice this variable to read in each individual time-step for subsequent processing.
I have used glob. to read in my 12 netCDF files and then defined the variables.
NAME_files = glob.glob('RGL*nc')
NAME_files = NAME_files[0:12]
for n in (NAME_files):
RGL = Dataset(n, mode='r')
footprint = RGL.variables['fp'][:]
lons = RGL.variables['lon'][:]
lats = RGL.variables['lat'][:]
I now need to repeat the code below in a loop for each of the 372 time-steps of the variable 'footprint'.
footprint_2 = RGL.variables['fp'][:,:,1:2]
I'm new to Python and have a poor grasp of looping. Any help would be appreciated, including better explanation/description of my issue.
You need to determine both the dimensions and shape of the fp variable in order to access it properly.
I'm making assumptions here about those values.
Your code implies 3 dimensions: time,lon,lat. Again just assuming.
footprint_2 = RGL.variables['fp'][:,:,1:2]
But the code above gets all the times, all the lons, for 1 latitude. Slice 1:2 selects 1 value.
fp_dims = RGL.variables['fp'].dimensions
print(fp_dims)
# a tuple of dimesions names
(u'time', u'lon', u'lat')
fp_shape = RGL.variables['fp'].shape
# a tuple of dimesions sizes or lengths
print(fp_shape)
(372, 30, 30)
len = fp_shape[0]
for time_idx in range(0,len)):
# you don't say if you want a single lon,lat or all the lon,lat's for a given time step.
test = RGL.variables['fp'][time_idx,:,:]
# or if you really want this:
test = RGL.variables['fp'][time_idx,:,1:2]
# or a single lon, lat
test = RGL.variables['fp'][time_idx,8,8]

Can vtk InsertValue take float arguments?

I have a question about InsertValue
If I understand it only takes integer arguements. I was wondering if there is a way to have it take float values? Or maybe some other function that does the job of InsertValue but takes float values? I know there is InsertNextValue, but I am not sure if it'll be efficient in my case since my array is a very big array (~ 100.000 by 120)
Below is my code and in my code I am making the entries of fl values integers to make it work for now but ideally it'll be great if I don't have to do that.
Thanks in advance :)
import vtk
import math
from vtk import vtkStructuredGrid, vtkPoints, vtkFloatArray, vtkXMLStructuredGridWriter
import scipy.io
import numpy
import os
#loading the matlab files
mats = scipy.io.loadmat('/home/lusine/data/3DDA/donut_for_vtk/20130228_050000_3D_E=1.mat')
#x,y,z coordinate, fl flux values
xx = mats['xvect']
yy = mats['yvect']
zz = mats['zvect']
fl = mats['fluxmesh3d'] #3d matrix
nx = xx.shape[1]
ny = yy.shape[1]
nz = zz.shape[1]
fl = numpy.nan_to_num(fl)
inx = numpy.nonzero(fl)
l = len(inx[1])
grid = vtk.vtkStructuredGrid()
grid.SetDimensions(nx,ny,nz) # sets the dimensions of the grid
pts = vtk.vtkPoints() # represents 3D points, The data model for vtkPoints is an array of vx-vy-vz triplets accessible by (point or cell) id.
pts.SetNumberOfPoints(nx*ny*nz) # Specify the number of points for this object to hold.
p=0
for i in range(l):
pts.InsertPoint(p, xx[0][inx[0][i]], yy[0][inx[1][i]], zz[0][inx[2][i]])
p = p + 1
SetPoint()
grid.SetPoints(pts)
cdata = vtk.vtkFloatArray()
cdata.SetNumberOfComponents(1)
cdata.SetNumberOfTuples((nx-1)*(ny-1)*(nz-1))
cdata.SetName('cellData')
p=0
for i in range(l-1):
cdata.InsertValue(p,inx[0][i]+inx[1][i]+inx[2][i])
p = p+1
grid.GetCellData().SetScalars(cdata)
pdata = vtk.vtkFloatArray()
pdata.SetNumberOfComponents(1)
#Get the number of tuples (a component group) in the array
pdata.SetNumberOfTuples(nx*ny*nz)
#Sets the array name
pdata.SetName('pointData')
for i in range(l):
pdata.InsertValue(int(fl[inx[0][i]][inx[1][i]][inx[2][i]]), inx[0][i]+inx[1][i]+inx[2][i])
grid.GetPointData().SetScalars(pdata)
writer = vtk.vtkXMLStructuredGridWriter()
writer.SetFileName('new_grid.vts')
#writer.SetInput(grid)
writer.SetInputData(grid)
writer.Update()
print 'end'
The first argument of InsertValue requires an integer because it's the index where the value is going to be inserted. If instead of a vtkFloatArray pdata you had a numpy array called p, this would be the equivalent of your instruction:
pdata.InsertValue(a,b) becomes p[a]=b
p[0.1] wouldn't make sense, it a must be an integer!
But I am a bit lost on the data. What do you mean that your array is (~ 100.000 by 120)..do you have 100.000 points, and each point has a vector of 120 components? In such a case, your pdata should have 120 components, and for each point point_index you call
pdata.SetTuple[point_index,[v0,v1...,v119]
or
pdata.SetComponent[point_index,0,v0]
...
pdata.SetComponent[point_index,119,v119]
If not, are you sure that you have to access pdata based on fl values (you have to be sure that fl is int, 0 <= fl < ntuples, and that you are not going to have holes). Check if you can do the same thing that you do for cdata (btw in your code p is always equal to i, you can just use i)
It's also possible to copy a numpy array directly to vtk , see http://vtk.1045678.n5.nabble.com/vtk-to-numpy-how-to-get-a-vtk-array-tp1244891p1244895.html , but you have to be very careful with the structure of your data

Accessing data range with h5py

I have an h5 file that contains 62 different attributes. I would like to access the data range of each one of them.
to explain more here what I'm doing
import h5py
the_file = h5py.File("myfile.h5","r")
data = the_file["data"]
att = data.keys()
the previous code gives me a list of attributes "U","T","H",.....etc
lets say I want to know what is the minimum and maximum value of "U". how can I do that ?
this is the output of running "h5dump -H"
HDF5 "myfile.h5" {
GROUP "/" {
GROUP "data" {
ATTRIBUTE "datafield_names" {
DATATYPE H5T_STRING {
STRSIZE 8;
STRPAD H5T_STR_SPACEPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 62 ) / ( 62 ) }
}
ATTRIBUTE "dimensions" {
DATATYPE H5T_STD_I32BE
DATASPACE SIMPLE { ( 4 ) / ( 4 ) }
}
ATTRIBUTE "time_variables" {
DATATYPE H5T_IEEE_F64BE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
}
DATASET "Temperature" {
DATATYPE H5T_IEEE_F64BE
DATASPACE SIMPLE { ( 256, 512, 1024 ) / ( 256, 512, 1024 ) }
}
It might be a difference in terminology, but hdf5 attributes are access via the attrs attribute of a Dataset object. I call what you have variables or datasets. Anyway...
I'm guessing by your description that the attributes are just arrays, you should be able to do the following to get the data for each attribute and then calculate the min and max like any numpy array:
attr_data = data["U"][:] # gets a copy of the array
min = attr_data.min()
max = attr_data.max()
So if you wanted the min/max of each attribute you can just do a for loop over the attribute names or you could use
for attr_name,attr_value in data.items():
min = attr_value[:].min()
Edit to answer your first comment:
h5py's objects can be used like python dictionaries. So when you use 'keys()' you are not actually getting data, you are getting the name (or key) of that data. For example, if you run the_file.keys() you will get a list of every hdf5 dataset in the root path of that hdf5 file. If you continue along a path you will end up with the dataset that holds the actual binary data. So for example, you might start with (in an interpreter at first):
the_file = h5py.File("myfile.h5","r")
print the_file.keys()
# this will result in a list of keys maybe ["raw_data","meta_data"] or something
print the_file["raw_data"].keys()
# this will result in another list of keys maybe ["temperature","humidity"]
# eventually you'll get to the dataset that actually has the data or attributes you are looking for
# think of this process as going through a directory structure or a path to get to a file (or a dataset/variable in this case)
the_data_var = the_file["raw_data"]["temperature"]
the_data_array = the_data_var[:]
print the_data_var.attrs.keys()
# this will result in a list of attribute names/keys
an_attr_of_the_data = data_var.attrs["measurement_time"][:]
# So now you have "the_data_array" which is a numpy array and "an_attr_of_the_data" which is whatever it happened to be
# you can get the min/max of the data by doing like before
print the_data_array.min()
print the_data_array.max()
Edit 2 - Why do people format their hdf files this way? It defeats the purpose.
I think you may have to talk to the person who made this file if possible. If you made it, then you'll be able to answer my questions for yourself. First, are you sure that in your original example data.keys() returned "U","T",etc.? Unless h5py is doing something magical or if you didn't provide all of the output of the h5dump, that could not have been your output. I'll explain what the h5dump is telling me, but please try to understand what I am doing and not just copy and paste into your terminal.
# Get a handle to the "data" Group
data = the_file["data"]
# As you can see from the dump this data group has 3 attributes and 1 dataset
# The name of the attributes are "datafield_names","dimensions","time_variables"
# This should result in a list of those names:
print data.attrs.keys()
# The name of the dataset is "Temperature" and should be the only item in the list returned by:
print data.keys()
As you can see from the h5dump, there are 62 datafield_names (strings), 4 dimensions (32-bit integers, I think), and 2 time_variables (64-bit floats). It also tells me that Temperature is a 3-dimensional array, 256 x 512 x 1024 (64-bit floats). Do you see where I'm getting this information? Now comes the hard part, you will need to determine how the datafield_names match up with the Temperature array. This was done by the person who made the file, so you'll have to figure out what each row/column in the Temperature array means. My first guess would be that each row in the Temperature array is one of the datafield_names, maybe 2 more for each time? But this doesn't work since there are too many rows in the array. Maybe the dimensions fit in there some how? Lastly here is how you get each of those pieces of information (continuing from before):
# Get the temperature array (I can't remember if the 3 sets of colons is required, but try it and if not just use one)
temp_array = data["Temperature"][:,:,:]
# Get all of the datafield_names (list of strings of length 62)
datafields = data.attrs["datafield_names"][:]
# Get all of the dimensions (list of integers of length 4)
dims = data.attrs["dimensions"][:]
# Get all of the time variables (list of floats of length 2)
time_variables = data.attrs["time_variables"]
# If you want the min/max of the entire temperature array this should work:
print temp_array.min()
print temp_array.max()
# If you knew that row 0 of the array had the temperatures you wanted to analyze
# then this would work, but it all depends on how the creator organized the data/file:
print temp_array[0].min()
print temp_array[1].max()
I'm sorry I can't be of more help, but without actually having the file and knowing what each field means this is about all I can do. Try to understand how I used h5py to read the information. Try to understand how I translated the header information (h5dump output) into information that I could actually use with h5py. If you know how the data is organized in the array you should be able to do what you want. Good luck, I'll help more if I can.
Since h5py arrays are closely related to numpy arrays, you can use the numpy.min and numpy.max functions to do this:
maxItem = numpy.max(data['U'][:]) # Find the max of item 'U'
minItem = numpy.min(data['H'][:]) # Find the min of item 'H'
Note the ':', it is needed to convert the data to a numpy array.
You can call min and max (row-wise) on the DataFrame:
In [1]: df = pd.DataFrame([[1, 6], [5, 2], [4, 3]], columns=list('UT'))
In [2]: df
Out[2]:
U T
0 1 6
1 5 2
2 4 3
In [3]: df.min(0)
Out[3]:
U 1
T 2
In [4]: df.max(0)
Out[4]:
U 5
T 6
Did you mean data.attrs rather than data itself? If so,
import h5py
with h5py.File("myfile.h5", "w") as the_file:
dset = the_file.create_dataset('MyDataset', (100, 100), 'i')
dset.attrs['U'] = (0,1,2,3)
dset.attrs['T'] = (2,3,4,5)
with h5py.File("myfile.h5", "r") as the_file:
data = the_file["MyDataset"]
print({key:(min(value), max(value)) for key, value in data.attrs.items()})
yields
{u'U': (0, 3), u'T': (2, 5)}

Python Image Shuffle Failure - where have I gone wrong?

I'm trying to scramble all the pixels in an image and my implementation of Knuths shuffle (as well as someone else's) seems to fail. Seems it is working doing each row. I cannot work out why - just can't see it.
Here is what happens:
Which ain't very scrambly! Well, it could be more scrambly, and more scrambly it needs to be.
Here's my code:
import Image
from numpy import *
file1 = "lhooq"
file2 = "kandinsky"
def shuffle(ary):
a=len(ary)
b=a-1
for d in range(b,0,-1):
e=random.randint(0,d)
ary[d],ary[e]=ary[e],ary[d]
return ary
for filename in [file1, file2]:
fid = open(filename+".jpg", 'r')
im = Image.open(fid)
data = array(im)
# turn into array
shape = data.shape
data = data.reshape((shape[0]*shape[1],shape[2]))
# Knuth Shuffle
data = shuffle(data)
data = data.reshape(shape)
imout = Image.fromarray(data)
imout.show()
fid.close()
When ary is a 2D array, ary[d] is a view of a that array rather than a copy of the contents.
Therefore, ary[d],ary[e]=ary[e],ary[d] is equivalent to the assignment ary[d] = ary[e]; ary[e] = ary[e], since ary[d] on the RHS is simply a pointer to the dth element of ary (as opposed to a copy of the pixel value).
To solve this, you can use advanced indexing:
ary[[d,e]] = ary[[e,d]]

Categories

Resources