Scipy cKDTree query_pairs versus query_ball_tree - python

I am a bit confused about the differences/similarities between the query_pairs and query_ball_tree methods of Scipy's cKDTree.
Reading the docs we can see that they have the same description:
query_ball_tree(self, other, r[, p, eps]) - Find all pairs of points whose distance is at most r
query_pairs(self, r[, p, eps]) - Find all pairs of points whose distance is at most r.
They even require the same obligatory parameters, except for query_ball_tree that asks for other which is (from docs): "The tree containing points to search against.".
So, is there a practical difference between these two methods? Is it preferred to use one over the other? I currently use query_pairs for my purposes, but I am considering alternatives to it.
I have seen people use the former something like tree1.query_ball_tree(tree2, ...), which suggests that you can query between different trees... but I guess that it would be equivalent to query_pairs if we did something like tree1.query_ball_tree(tree1, ...). Any guidance is greatly appreciated.

query_ball_tree find all pairs of points between self and other whose distance is at most r.
query_pairs find all pairs of points in self whose distance is at most r.
These new official docs might help you:
scipy.spatial.cKDTree.query_ball_tree — SciPy v1.6.0.dev Reference Guide http://scipy.github.io/devdocs/generated/scipy.spatial.cKDTree.query_ball_tree.html#scipy.spatial.cKDTree.query_ball_tree
scipy.spatial.cKDTree.query_pairs — SciPy v1.6.0.dev Reference Guide http://scipy.github.io/devdocs/generated/scipy.spatial.cKDTree.query_pairs.html#scipy.spatial.cKDTree.query_pairs

Related

Is the geoToH3 function as pseudo-code available?

Is there a (python) or pseudocode example of geoToH3 available? I just need this function and would like to avoid installing the library on my target environment (AWS GLUE, PySpark)
I tried to follow the javascript implementation but even that used C magic internally.
There isn't a pseudocode implementation that I'm aware of, but there's a fairly thorough explanation in the documentation. Roughly:
Select the icosahedron face (0-20) the point lies on (using point square distance in 3d space)
Project the point into face-oriented IJK coordinates
Convert the IJK coords to an H3 index by calculating the index digits at each resolution and setting the appropriate bits
The core logic can be found here and here. It's not trivial to implement - unless there's a strong reason to avoid installing, that would be the far easier and more reliable option.

Rotation argument for scikit-learn's factor analysis

One of the hallmarks of factor analysis is that it allows for non-orthogonal latent variables.
In R for example this feature is accessible via the rotation parameter of factanal.
Is there any such provision for sklearn.decomposition.FactorAnalysis? Clearly it's not among the arguments - but maybe there is another way to achieve this?
Sadly I have been unable to find many examples of usage for this function.
Interesting question. I think there are no rotations implemented indeed - see this issue.
Maybe this implementation is what you are looking for.
It appears this is now implemented.
Example: https://scikit-learn.org/stable/auto_examples/decomposition/plot_varimax_fa.html
rotation{‘varimax’, ‘quartimax’}, default=None
If not None, apply the indicated rotation. Currently, varimax and
quartimax are implemented. See “The varimax criterion for analytic
rotation in factor analysis” H. F. Kaiser, 1958.
New in version 0.24.
Source: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html

Using a distance matrix *with errors* to find the coordinates of points

I would like to find the coordinates of a set of points in 3D from a distance matrix that may contain (experimental) errors.
The approach suggested here is not symmetric (treats the first point differently), and that is not adequate when there are uncertainties.
These uncertainties may lead to numerical instabilities as suggested here. But the answer to this question also assumes exact data.
So I would like to see if there is any statistical approach that best uses the redundancy of the data to minimize the error in the predicted coordinates and avoids potential instabilities due to inconsistent distances.
I am aware that the final result is invariant to rigid body translations and rotations.
It would be great if you can suggest algorithms present in or based on numpy/scipy, but general suggestions are also welcome.
After asking this same question in cross correlated #wuber edited my post by adding the multidimensional-scaling keyword. With this keyword I could find many algorithms, starting from the wikipedia:
https://en.wikipedia.org/wiki/Multidimensional_scaling

Manipulate 2D symbolic parametric curves in Python

I am trying to compute intersections, distances and derivatives on 2D symbolic parametric curves (that is a curve defined on the plan by a function) but I can't find any Python module that seems to do the job.
So far I have only found libraries that deal with plotting or do numerical approximation so I thought I could implement it myself as a light overlay on top of a symbolic mathematics library.
I start experimenting with SymPy but I can wrap my head around it: it doesn't seems to be able to return intervals even in finite number (for instance solve(x = x) fails !) and only a small numbers of solutions is some simple cases.
What tool could be suitable for the task ?
I guess that parametric functions relate to the advanced topics of mathematical analysis, and I haven't seen any libraries yet that could match your demands. However you could try to look through the docs of the Sage project...
It would help if you give an example of two curves that you want to define. solve is up to the task for finding intersections of all quadratic curves (it will actually solve quartics and some quintics, too).
When you say "distance" what do you mean - arc length sort of distance or distance from a point to the curve?
As for tangents, that is easily handled with idiff (see its docstring for examples with help(idiff).

kmedoids using Pycluster with various distance functions

I am using python 2.6 for windows. I am working on OpenCv core module. I search around about the kmedoids function defined in Pycluster, but did not get the accurate answer.
I have installed Pycluster 1.50 in windows7. Can somebody explain how to use Eucledian diatnce, L1 and L2 distance, hellinger distance and Chi-square distance using kmedoids?
Through searching I know so far.
import Pycluster
from Pycluster import distancematrix, kmedoids
The kmedoid function takes four arguments (as mentioned below), among them one is a distance. But I am unable to understand how to specify different distance measures in kmedoids function
clusterid, error, nfound = kmedoids (distance, nclusters=2, npass=1, initialid=None)
Any help regarding the matter would be highly encouraged
As Shambool points out, the documentation gives you the answer. You don't pass a distance function directly, you pass a pairwise distance matrix. So compute that first, with whatever distance metric you want and then pass it along to kmedoids
It seems you didn't even bother to look at the documentation, on pages 28-29 this is clearly explained.

Categories

Resources