Feature matching with flann in opencv

Feature matching with flann in opencv - python

I am working on an image search project for which i have defined/extracted the key point features using my own algorithm. Initially i extracted only single feature and tried to match using cv2.FlannBasedMatcher() and it worked fine which i have implemented as below:
Here vec is 2-d list of float values of shape (10, )
Ex:
[[0.80000000000000004, 0.69999999999999996, 0.59999999999999998, 0.44444444444444448, 0.25, 0.0, 0.5, 2.0, 0, 2.9999999999999996]
[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001]
[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001]
[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001]]
vec1 = extractFeature(img1)
vec2 = extractFeature(img2)
q1 = np.asarray(vec1, dtype=np.float32)
q2 = np.asarray(vec2, dtype=np.float32)
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(q1,q2,k=2)
But now i have one more feature descriptor for each key point along with previous one but of different length.
So now my feature descriptor has shape like this:
[[[0.80000000000000004, 0.69999999999999996, 0.59999999999999998, 0.44444444444444448, 0.25, 0.0, 0.5, 2.0, 0, 2.9999999999999996],[2.06471330e-01, 1.59191645e-02, 9.17678759e-05, 1.32570314e-05, 4.58424252e-10, 1.66717250e-06,6.04810165e-11]
[[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001],[ 2.06471330e-01, 1.59191645e-02, 9.17678759e-05, 1.32570314e-05, 4.58424252e-10, 1.66717250e-06, 6.04810165e-11],
[[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001],[ 2.06471330e-01, 1.59191645e-02, 9.17678759e-05, 1.32570314e-05, 4.58424252e-10, 1.66717250e-06, 6.04810165e-11],
[[2.25, 2.666666666666667, 3.4999999999999996, 0, 2.5, 1.0, 0.5, 0.37499999999999994, 0.20000000000000001, 0.10000000000000001],[ 2.06471330e-01, 1.59191645e-02, 9.17678759e-05, 1.32570314e-05, 4.58424252e-10, 1.66717250e-06, 6.04810165e-11]]
Now since each point's feature descriptor is a list two lists(descriptors) with different length that is (10, 7, ) so in this case i am getting error:
setting an array element with a sequence.
while converting feature descriptor to numpy array of float datatype:
q1 = np.asarray(vec1, dtype=np.float32)
I understand the reason of this error is different length of lists, so i wonder What would be the right way to implement the same?

You should define a single descriptor of size 10+7=17.
This way, the space descriptor is now of 17 and you should be able to use cv2.FlannBasedMatcher.
Either create a global descriptor of the correct size desc_glob = np.zeros((nb_pts,17)) and fill it manually or find a Python way to do it. Maybe np.reshape((nb_pts,17))?
Edit:
To not favor one descriptor type over the other, you need to weight or normalize the descriptors. This is the same principle than computing a global descriptor distance from two descriptors:
dist(desc1, desc2) = dist(desc1a, desc2a) + lambda * dist(desc1b, desc2b)

Related

Calculating Expected Value With Matrix Values

I have the following input data
class_p = [0.0234375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1748046875, 0.0439453125, 0.0, 0.35302734375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3828125]
league_p = [0.4765625, 0.0, 0.00634765625, 0.4658203125, 0.0, 0.0, 0.046875, 0.0, 0.0, 0.0029296875, 0.0, 0.0, 0.0, 0.0, 0.0]
a2_p = [0.1171875, 0.0, 0.0, 0.1171875, 0.0, 0.0078125, 0.30322265625, 0.31103515625, 0.0, 0.0, 0.0, 0.1435546875, 0.0, 0.0, 0.0]
p1_p = [0.0, 0.03125, 0.375, 0.09375, 0.0234375, 0.0, 0.46875, 0.0078125, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
p2_p = [0.3984375, 0.0, 0.0, 0.3828125, 0.08935546875, 0.08935546875, 0.023345947265625, 0.007720947265625, 0.0, 0.0, 0.0087890625, 0.00018310546875, 0.0, 0.0, 0.0]
class_v = [55, 75, 55, 75, 500, 10000, 55, 55, 55, 75, 75, 55, 55, 500, 55, 55, 75, 75, 55, 55, 55]
league_v = [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
a2_v= [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
p1_v = [0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 40, 1500, 1500, 3000]
p2_v = [0, 0, 0, 0, 0, 0, 0, 0, 40, 40, 40, 40, 1500, 1500, 3000]
With that data, I am generating the odds of each combination occurring.
As an example to generate the chance of a given combination
class_p[0]
league_p[6]
a2_p[11]
p1_p[7]
p2_p[3]
I would multiply their values with each other
0.0234375x0.046875x0.1435546875x0.0078125x0.3828125
That would give me 4.716785042546689510345458984375 × 10^-7
Since the given combination had class_p[0], league_p[6], a2_p[11], p1_p[7], p2_p[3], I would take the following values in the "values" arrays.
I would sum
class_v[0] + league_v[6] + a2_v[11] + p1_v[7] + p2_v[3]
That would give me 55+0+40+40+0 = 135
To finalize the process I would do
(0.0234375*0.046875*0.1435546875*0.0078125*0.3828125)*(55+0+40+40+0) = 0.00006367659807
The full final calc is
(0.0234375×0.046875×0.1435546875×0.0078125×0.3828125) (55 + 0 + 40 + 40 + 0)
(combintation_chance)*(combination_value)
I need to do this process for all possible combinations of combintation_chance
This should give me a column of values(1xN). If I sum the values of that column I reach the EV overall, by summing the EV of individual combinations.
Calculating combintation_chance is working just fine. My issue is how to line up the given combination with its corresponding value sum (combination_value). At the moment, I have additional identifiers attached to the *_p arrays and I then do a string comparison with them to determine which combination value to use. This is very slow for billions of comparisons, therefore I am exploring a better approach.
I am using python 3.8 & numpy 1.24
Edit
The question has been adjusted to include much more detail

Broadcasting
Ok, so it seems that this is a simple broadcasting problem.
You want a 5D-array of probabilities, times a 5d-array of values. And, of course, you want it without any for loop.
In numpy the classical way to have numpy do nested loops for you (which is, indeed, way faster than doing them yourself. First rule of numpy is "avoid at all cost to iterate over elements. No for loop"), is to use broadcasting.
Let's start with 2d example (as was your first intention. And that was a good idea. Problem was it was ambiguous, but restraining your question to 2d was not bad).
You have
class_p = np.array([0.0234375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1748046875, 0.0439453125, 0.0, 0.35302734375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3828125])
league_p = np.array([0.4765625, 0.0, 0.00634765625, 0.4658203125, 0.0, 0.0, 0.046875, 0.0, 0.0, 0.0029296875, 0.0, 0.0, 0.0, 0.0, 0.0])
One way (not the only one, but probably the one easier to adapt to any similar question) is to use broadcasting.
If you indeed convert class_p in a column, that is a 21×1 2D array, and league_p into a line, that is a 1×15 2D array, then, if you multiply both, result will be a 21x15 2D array, containing all combinations.
Because
np.array([[1],[2],[3]]) * np.array([[4,5]])
is
[[4,5],
[8,10],
[12,15]]
That's how broadcasting works.
There are several way to convert a 1D-array so a row or a column of a 2D-array. For example you could use .reshape. Like class_p.reshape(-1,1) and league_p.reshape(1,-1). But the fastest is to add a new axis. Like class_p[:,None] and league_p[None,:]. Note that the second way doesn't really create a new array. It is just a different view of the same array. This is way it is faster.
So, our 2D probability map is
class_p[:,None]*league_p[None,:]
Likewise, to have all 21×15 combination of sum of values, you can rely on the same broadcasting to perform additon
class_v[:,None]+league_v[None,:]
Broadcasting solution
So solution, in 2D, using broadcasting, is
class_p[:,None]*league_p[None,:] * (class_v[:,None] + league_v[None,:])
In 5D, with all your variables, it is still manageable (but don't add too much dimensions! it would soon become a huge result. And I suspect what you are really interested at the end is just the sum of all that), this time, not in one line (not that it couldn't be done that way, but, that would be a big line...)
pr = class_p[:,None,None,None,None]*league_p[None,:,None,None,None]*a2_p[None,None,:,None,None]*p1_p[None,None,None,:,None]*p2_p[None,None,None,None,:]
vl = class_v[:,None,None,None,None]+league_v[None,:,None,None,None]+a2_v[None,None,:,None,None]+p1_v[None,None,None,:,None]+p2_v[None,None,None,None,:]
pr*vl
add.outer and multiply.outer
As you can see, in 5D, it is a little bit tedious. But I wanted to show you the principle of broadcasting, before introducing another (not really shorter, but a bit less tedious) way. Way that was already given by Reinderien. But since it was before you clarified the question, it was not the good result, but principle is the same
In 2D
np.multiply.outer(class_p, league_p) * np.add.outer(class_v, league_v)
Unfortunately, those function take only 2 args. So in 5D, you have to chain them
pr = np.multiply.outer(class_p, np.multiply.outer(league_p, np.multiply.outer(a2_p, np.multiply.outer(p1_p, p2_p))))
vl = np.add.outer(class_v, np.add.outer(league_v, np.add.outer(a2_v, np.add.outer(p1_v, p2_v))))
pr * vl
Expected value
Note that if the aim of all this is to compute the expected "value" (whatever that value is), that is Σ p(i,j,k,l,m)×v(i,j,k,l,m), for all possible outcomes, then, doing it that way is probably not a good idea.
For your example, it is manageable. You are computing "only" 1 million possible outcomes that is 1 million probabilities (each of them being 4 multiplications) and 1 million associated values (4 additions each). And the performing 1 million multiplication between those 2 sets of 1 million probabilities and values. And then summing the result, that is one extra million addition. Altogether, that is only 10 millions elementary arithmetic operation. Not much for a modern computer, and response still feels instantaneous. But, yet, it is O(Nᵏ) is both cpu and memory. N being the typical length of an array, and k the number of variables.
But if you intend to add more dimensions (more variables, associated with more set of probabilities and set of values), then that is unnecessary explosive, in both CPU time, and memory (those 5D arrays of probabilities and values are stored), or simply if you intend to perform this computation more than once, that expected value can be computed way faster, using just O(Nk) operations.
I spare you the development (but it is just a matter of expanding sum Σᵢⱼₖₗₘ pᵢpⱼpₖpₗpₘ (vᵢ+vⱼ+vₖ+vₗ+vₘ)), you can compute it faster like this
P1 = class_p.sum()
PV1 = (class_p*class_v).sum()
P2 = league_p.sum()
PV2 = (league_p*league_v).sum()
P3 = a2_p.sum()
PV3 = (a2_p*a2_v).sum()
P4 = p1_p.sum()
PV4 = (p1_p*p1_v).sum()
P5 = p2_p.sum()
PV5 = (p2_p*p2_v).sum()
expectedValue = P1*P2*P3*P4*PV5 + P1*P2*P3*PV4*P5 + P1*P2*PV3*P4*P5 + P1*PV2*P3*P4*P5 + PV1*P2*P3*P4*P5
sameAs = (pr*vl).sum()
It appears more complicated because there are more lines. But each line is along 1 dimension only. So it is replacing an order of magnitude of n₁n₂n₃n₄n₅ operations by an order of magnitude of n₁+n₂+n₃+n₄+n₅ operations, where n₁,...,n₅ are the size of arrays of each of the 5 variables.
So, again, if your objective is to compute expected value, then, trying to compute the 5D arrays (as your question is), is a really costly way.

This doesn't make any attempt to cache intermediate results, etc.
import numpy as np
class_percentages = (0.0, 0.0, 0.0, 0.3, 0.50)
league_percentages = (0.1, 0.0, 0.2, 0.1, 0.05)
class_values = (50, 50, 50, 75, 100)
league_values = (0, 10, 10, 25, 75)
combined = np.add.outer(class_percentages, league_percentages)*np.add.outer(class_values, league_values)
print(combined)
Output:
[[ 5. 0. 12. 7.5 6.25]
[ 5. 0. 12. 7.5 6.25]
[ 5. 0. 12. 7.5 6.25]
[30. 25.5 42.5 40. 52.5 ]
[60. 55. 77. 75. 96.25]]

Summing three consecutive number when equal to or great than 0 - Python

I am using numpy in Python
I have an array of numbers, for example:
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1)
If i is a position in the array, I want to create a function which creates a running sum of i and the two previous numbers, but only accumulating the number if it is equal to or greater than 0.
In other words, negative numbers in the array become equal to 0 when calculating the three number running sum.
For example, the answer I would be looking for here is
2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6
The new array has two elements less than the original array as the calculation can't be completed for the first two number.
Thank you !

As Dani Mesejo answered, you can use stride tricks. You can either use clip or boolean indexing to handle the <0 elements. I have explained how stride tricks work below -
arr[arr<0]=0 sets all elements below 0 as 0
as_strided takes in the array, the expected shape of the view (7,3) and the number of strides in the respective axes, (8,8). This is the number of bytes you have to move in axis0 and axis1 respectively to access the next element. E.g. If you want to move every 2 elements, then you can set it to (16,8). This means you would move 16 bytes each time to get the element in axis0 (which is 0.1->1.2->0->0.1->.., till a shape of 7) and 8 bytes each time to get element in axis1 (which is 0.1->1->1.2, till a shape of 3)
Use this function with caution! Always use x.strides to define the strides parameter to avoid corrupting memory!
Lastly, sum this array view over axis=1 to get your rolling sum.
arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
w = 3 #rolling window
arr[arr<0]=0
shape = arr.shape[0]-w+1, w #Expected shape of view (7,3)
strides = arr.strides[0], arr.strides[0] #Strides (8,8) bytes
rolling = np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
rolling_sum = np.sum(rolling, axis=1)
rolling_sum
array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])

You could clip, roll and sum:
import numpy as np
def rolling_window(a, window):
"""Recipe from https://stackoverflow.com/q/6811183/4001592"""
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
a = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
res = rolling_window(np.clip(a, 0, a.max()), 3).sum(axis=1)
print(res)
Output
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]

You may use np.correlate to sweep an array of 3 ones over the clipped of arr to get desired output
In [20]: np.correlate(arr.clip(0), np.ones(3), mode='valid')
Out[20]: array([2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6])

arr = np.array([0.1, 1, 1.2, 0.5, -0.3, -0.2, 0.1, 0.5, 1])
def sum_3(x):
collector = []
for i in range(len(arr)-2):
collector.append(sum(arr[i:i+3][arr[i:i+3]>0]))
return collector
#output
[2.3, 2.7, 1.7, 0.5, 0.1, 0.6, 1.6]
Easiest and most comprehensible way. The collector will append the sum of the 3 consecutive numbers if their indices are True otherwise, they are all turned to 0s.
The method is not general, it is for 3 consecutives but you can adapt it.
def sum_any(x,n):
collector = []
for i in range(len(arr)-(n-1)):
collector.append(sum(arr[i:i+n][arr[i:i+n]>0]))
return collector

Masked arrays and view_as_windows (which uses numpy strides under the hood) are built for this purpose:
from skimage.util import view_as_windows
arr = view_as_windows(arr, 3)
arr2 = np.ma.masked_array(arr, arr<0).sum(-1)
output:
[2.3 2.7 1.7 0.5 0.1 0.6 1.6]

Creating a new tensor based on the old

I have the following code:
import numpy as np
import tensorflow as tf
a = np.array([0.5, 0.5])
b = np.array([0.2, 0.2, 0.0, 0.0])
non_zeros = ~tf.equal(b, 0.)
cast_op = tf.cast(non_zeros, tf.float64)
new_vec = tf.multiply(a, cast_op) # won't work
# the required output is [0.5, 0.5, 0.0, 0.0]
I am trying to obtain the vector [0.5, 0.5, 0.0, 0.0] as explained in the code. Does anyone know how to do this? I also looked at tf.fill but that takes a scalar value, so won't work for me.

You get an error because tf.multiply expects tensors of the same shape. What you could do, however, is to simply do this:
a = np.array[0.5, 0.5])
b = np.array([0.2, 0.2, 0.0, 0.0])
b = np.logical_and(b, n.ones(b.shape)).astype(float)
a = np.concatenate((a, np.zeros(b.shape[0] - a.shape[0])))
new_vec = a * b

You can exploit the broadcasting capability of the tf.multiply op.
I've added next to every line the shape of the tensor: please note the usage of tf.expand_dims to add a 1 dimension to the a tensor in order to get, after the multiplication, a tensor with shape (2,4).
This tensor has repeated values (2 rows, 4 columns equal), hence we can just take the first row
import numpy as np
import tensorflow as tf
a = np.array([0.5, 0.5]) #(2)
b = np.array([0.2, 0.2, 0.0, 0.0]) #(4)
non_zeros = ~tf.equal(b, 0.) #(4)
cast_op = tf.cast(non_zeros, tf.float64) # (4)
new_vec = tf.multiply(tf.expand_dims(a, axis=[1]),
cast_op) # (2, 1) * (4) = (2, 4)
new_vec = new_vec[0, :] # (4)
print(new_vec)
sess = tf.InteractiveSession()
print(sess.run(new_vec))
This code produces [0.5 0.5 0. 0.]

How does XGBoost/lightGBM evaluate ndcg for ranking tasks?

I am currently running tests between XGBoost/lightGBM for their ability to rank items. I am reproducing the benchmarks presented here: https://github.com/guolinke/boosting_tree_benchmarks.
I have been able to successfully reproduce the benchmarks mentioned in their work. I want to make sure that I am correctly implementing my own version of the ndcg metric and also understanding the ranking problem correctly.
My questions are:
When creating the validation for the test set using ndcg - there is a test.group file that says the first X rows are group 0, etc. To get the recommendations for the group, I get the predicted values and known relevance scores and sort that list by descending predicted values for each group?
In order to get the final ndcg scores from the lists created above - do I get the ndcg scores and take the mean over all the scores? Is this the same evaluation methodology that XGBoost/lightGBM in the evaluation phase?
Here is my methodology for evaluating the test set after the model has finished training.
For the final tree when I run lightGBM I obtain these values on the validation set:
[500] valid_0's ndcg#1: 0.513221 valid_0's ndcg#3: 0.499337 valid_0's ndcg#5: 0.505188 valid_0's ndcg#10: 0.523407
My final step is to take the predicted output for the test set and calculate the ndcg values for the predictions.
Here is my python code for calculating ndcg:
import numpy as np
def dcg_at_k(r, k):
r = np.asfarray(r)[:k]
if r.size:
return np.sum(np.subtract(np.power(2, r), 1) / np.log2(np.arange(2, r.size + 2)))
return 0.
def ndcg_at_k(r, k):
idcg = dcg_at_k(sorted(r, reverse=True), k)
if not idcg:
return 0.
return dcg_at_k(r, k) / idcg
After I get the predictions for the test set for a particular group (GROUP-0) I have these predictions:
query_id predict
0 0 (2.0, -0.221681199441)
1 0 (1.0, 0.109895548348)
2 0 (1.0, 0.0262799346312)
3 0 (0.0, -0.595343431322)
4 0 (0.0, -0.52689043426)
5 0 (0.0, -0.542221350664)
6 0 (1.0, -0.448015576024)
7 0 (1.0, -0.357090949646)
8 0 (0.0, -0.279677741045)
9 0 (0.0, 0.2182200869)
NOTE
Group-0 actually has about 112 rows.
I then sort the list of tuples in descending order which provides a list of relevance scores:
def get_recommendations(x):
sorted_list = sorted(list(x), key=lambda i: i[1], reverse=True)
return [k for k, _ in sorted_list]
relavance = evaluation.groupby('query_id').predict.apply(get_recommendations)
query_id
0 [4.0, 2.0, 2.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, ...
1 [4.0, 2.0, 2.0, 2.0, 1.0, 1.0, 3.0, 2.0, 1.0, ...
2 [2.0, 3.0, 2.0, 2.0, 1.0, 0.0, 2.0, 2.0, 1.0, ...
3 [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, ...
4 [1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...
Finally, for each query id I calculated the ndcg scores on the relevance list and then take the mean of all the ndcg scores calculated for each query id:
relavance.apply(lambda x: ndcg_at_k(x, 10)).mean()
The value I obtain is ~0.497193.

Cross-posting my Cross Validated answer to this cross-posted question:
https://stats.stackexchange.com/questions/303385/how-does-xgboost-lightgbm-evaluate-ndcg-metric-for-ranking/487487#487487
I happened across this myself, and finally dug into the code to figure it out.
The difference is the handling of a missing IDCG. Your code returns 0, while LightGBM is treating that case as a 1.
The following code produced matching results for me:
import numpy as np
def dcg_at_k(r, k):
r = np.asfarray(r)[:k]
if r.size:
return np.sum(np.subtract(np.power(2, r), 1) / np.log2(np.arange(2, r.size + 2)))
return 0.
def ndcg_at_k(r, k):
idcg = dcg_at_k(sorted(r, reverse=True), k)
if not idcg:
return 1. # CHANGE THIS
return dcg_at_k(r, k) / idcg

I think the problem is caused by data in the same query that have same labels.
In that case, Both XGBoost and LightGBM will produce ndcg 1 for that query.

H5PY - How to store many 2D arrays of different dimensions

I would like to organize my collected data (from computer simulations) into a hdf5 file using Python.
I measured positions and velocities [x,y,z,vx,vy,vz] of all atoms within a certain space region over many time steps. The number of atoms, of course, varies from time step to time step.
A minimal example could look as follows:
[
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ],
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ]
]
(2 time steps,
first time step: 2 atoms,
second time step: 3 atoms)
My idea was to create a hdf5 dataset within Python which stores all the information. At each time step it should store a 2d array of alls positions/velocities of all atoms, i.e.
dataset[0] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ]
dataset[1] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ].
The idea is clear, I think. However, I struggle with the definition of the correct data type of the data set with varying array length.
My code looks like this:
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
rowtype = np.dtype("%sfloat32" % columnNo)
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
dataset = file.create_dataset("dset", (2,), dtype=dt)
print dataset.value
testarray = np.array([[1.,2.,3.,2.,3.,4.],[1.,2.,3.,2.,3.,4.]])
print testarray
dataset[0] = testarray
print dataset[0]
This, however, does not work. When I run the script I get the error message "AttributeError: 'float' object has no attribute 'dtype'."
It seems that my defined dtype is wrong.
Does anybody see how it should be defined correctly?
Thanks very much,
Sven

The error in your case is buried, though it is clear it occurs when trying to assign the testarray to the dataset:
Traceback (most recent call last):
File "stack41465480.py", line 26, in <module>
dataset[0] = testarray
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_objects.c:2577)
...
File "h5py/_conv.pyx", line 712, in h5py._conv.ndarray2vlen (/build/h5py-GhwtGD/h5py-2.6.0/h5py/_conv.c:6171)
AttributeError: 'float' object has no attribute 'dtype'
I'm not skilled with the special_dtype and vlen, but I was able to write a numpy structured arrays to h5py.
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
# rowtype = np.dtype("%sfloat32" % columnNo)
rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
print('rowtype',rowtype)
print('dt',dt)
dataset = file.create_dataset("dset", (2,), dtype=rowtype)
print('value')
print(dataset.value[0])
arr = np.ones((2,),dtype=rowtype)
print(repr(arr))
dataset[0] = arr[0]
print(dataset.value)
testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
print(repr(testarray))
dataset[1] = testarray[1]
print(dataset.value)
print(dataset.value['f0'])
producing
1316:~/mypy$ python3 stack41465480.py
rowtype [('f0', '<f4', (6,))]
dt object
value
([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)
array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)],
dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([0.0, 0.0, 0.0, 0.0, 0.0, 0.0],)]
array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)],
dtype=[('f0', '<f4', (6,))])
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]
[[ 1. 1. 1. 1. 1. 1.]
[ 2. 3. 4. 1. 2. 3.]]

Thanks for the quick answer. It helped a lot.
If I now simply change the data type of the data set to
dtype = dt,
I get what I would like to have.
Here, the Python code (for completeness):
import numpy as np
import h5py
file = h5py.File ('file.h5','w')
columnNo = 6
rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )
print('rowtype',rowtype)
print('dt',dt)
dataset = file.create_dataset("dset", (2,), dtype=dt)
# print('value')
# print(dataset.value[0])
arr = np.ones((3,),dtype=rowtype)
# print(repr(arr))
dataset[0] = arr
# print(dataset.value)
testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
# print(repr(testarray))
dataset[1] = testarray
print(dataset.value)
for i in range(2): print dataset[i]
And to corresponding output reads
('rowtype', dtype([('f0', '<f4', (6,))]))
('dt', dtype('O'))
[ array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],),
([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)],
dtype=[('f0', '<f4', (6,))])
array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)],
dtype=[('f0', '<f4', (6,))])]
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)
([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)]
[([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]
Just to get it right: The problem in my original code was a bad definition of my rowtype data structure, right?
Best,
Sven

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Feature matching with flann in opencv - python

Related

Calculating Expected Value With Matrix Values

Summing three consecutive number when equal to or great than 0 - Python

Creating a new tensor based on the old

How does XGBoost/lightGBM evaluate ndcg for ranking tasks?

H5PY - How to store many 2D arrays of different dimensions

Categories

Resources