kmedoids using Pycluster with various distance functions - python

I am using python 2.6 for windows. I am working on OpenCv core module. I search around about the kmedoids function defined in Pycluster, but did not get the accurate answer.
I have installed Pycluster 1.50 in windows7. Can somebody explain how to use Eucledian diatnce, L1 and L2 distance, hellinger distance and Chi-square distance using kmedoids?
Through searching I know so far.
import Pycluster
from Pycluster import distancematrix, kmedoids
The kmedoid function takes four arguments (as mentioned below), among them one is a distance. But I am unable to understand how to specify different distance measures in kmedoids function
clusterid, error, nfound = kmedoids (distance, nclusters=2, npass=1, initialid=None)
Any help regarding the matter would be highly encouraged

As Shambool points out, the documentation gives you the answer. You don't pass a distance function directly, you pass a pairwise distance matrix. So compute that first, with whatever distance metric you want and then pass it along to kmedoids

It seems you didn't even bother to look at the documentation, on pages 28-29 this is clearly explained.

Related

Write a scipy function without using a standard library (exponential power)

My question might come across as stupid or so simple, but I could not work towards finding a solution. Here is my question: I want to write an exponential power distribution function which is available in scipy. However, I don't want to use the scipy for this. How do I go about it?
Here are my efforts so far:
import math
import numpy as np
def ExpPowerFun(x,b, size=1000):
distribution = b*x**(b-1)*math.exp(1+x**b-math.exp(x**b))
return distribution
I used this equation based on this scipy doc. To be fair, using this equation and writing a function using it doesn't do much. As you can see, it returns only one value. I want to generate a distribution of random numbers based on scipy's exponential power distribution function without using scipy.
I have looked at class exponpow_gefrom github code. However, it uses scipy.special(-sc), so it's kind of useless for me, unless there is any workaround and avoids the use of scipy.
I can't figure out how to go about it. Again, this might be a simple task, but I am stuck. Please help.
the simplest way to generate a random number for a given distribution is using the inverse of the CDF of that function, the PPF (Percent point function) will give you the distribution you need when you apply it on uniform distributed numbers.
for you case the PPF (taken directly from scipy source code with some modifications) is:
np.power(np.log(1-np.log(1-x)), 1.0/b)
hence you code should look like this
def ExpPowerFun(b, size=1000):
x = np.random.rand(size)
return np.power(np.log(1-np.log(1-x)), 1.0/b)
import matplotlib.pyplot as plt
plt.hist(ExpPowerFun(2.7,10000),20)
plt.show()
Edit: the uniform distribution has to be from 0 to 1 ofc since the probabilities are from 0% to 100%

Python equivalent for ordertrack function in matlab

I'm looking for a Python (numpy/scipy) equivalent of the ordertrack function in matlab. With this functionality i want to be able to perform order-tracking analysis with vibration measurements on slow rotating machinery. I searched extensively for examples on Google/Stackexchange, but I could not find anything. Although i found enough examples with regular FFT spectra analysis.
More information on the function can be found here: https://nl.mathworks.com/help/signal/ref/ordertrack.htm
You could use the vibration-toolbox package. More precise, the class vibration-toolbox.vibesystem.VibeSystem.
It is set up a little bit differently than the python function, but such an instance
from vibration-toolbox.vibesystem import VibeSystem
sys = VibeSystem(M=your_signal_mass, C=your_signal_damping, K=your_signal_stiffness)
is basically a vibration signal instance with a specific mass, damping and stiffness, which would correspond to the signal x in MATLABs ordertrack function.
The method VibeSystem.freq_response would then be able to calcuate the magnitudes you want.
omega, magdb, phase = sys.freq_response(omega=your_signal_rpm, modes=your_signal_orderlist)
magdb should then contain the magnitudes you are looking for.
Unfortunately, I do not have the Signal Processing Toolbox in MATLAB, so I cannot compare the code and show an example.

Scipy cKDTree query_pairs versus query_ball_tree

I am a bit confused about the differences/similarities between the query_pairs and query_ball_tree methods of Scipy's cKDTree.
Reading the docs we can see that they have the same description:
query_ball_tree(self, other, r[, p, eps]) - Find all pairs of points whose distance is at most r
query_pairs(self, r[, p, eps]) - Find all pairs of points whose distance is at most r.
They even require the same obligatory parameters, except for query_ball_tree that asks for other which is (from docs): "The tree containing points to search against.".
So, is there a practical difference between these two methods? Is it preferred to use one over the other? I currently use query_pairs for my purposes, but I am considering alternatives to it.
I have seen people use the former something like tree1.query_ball_tree(tree2, ...), which suggests that you can query between different trees... but I guess that it would be equivalent to query_pairs if we did something like tree1.query_ball_tree(tree1, ...). Any guidance is greatly appreciated.
query_ball_tree find all pairs of points between self and other whose distance is at most r.
query_pairs find all pairs of points in self whose distance is at most r.
These new official docs might help you:
scipy.spatial.cKDTree.query_ball_tree — SciPy v1.6.0.dev Reference Guide http://scipy.github.io/devdocs/generated/scipy.spatial.cKDTree.query_ball_tree.html#scipy.spatial.cKDTree.query_ball_tree
scipy.spatial.cKDTree.query_pairs — SciPy v1.6.0.dev Reference Guide http://scipy.github.io/devdocs/generated/scipy.spatial.cKDTree.query_pairs.html#scipy.spatial.cKDTree.query_pairs

Manipulate 2D symbolic parametric curves in Python

I am trying to compute intersections, distances and derivatives on 2D symbolic parametric curves (that is a curve defined on the plan by a function) but I can't find any Python module that seems to do the job.
So far I have only found libraries that deal with plotting or do numerical approximation so I thought I could implement it myself as a light overlay on top of a symbolic mathematics library.
I start experimenting with SymPy but I can wrap my head around it: it doesn't seems to be able to return intervals even in finite number (for instance solve(x = x) fails !) and only a small numbers of solutions is some simple cases.
What tool could be suitable for the task ?
I guess that parametric functions relate to the advanced topics of mathematical analysis, and I haven't seen any libraries yet that could match your demands. However you could try to look through the docs of the Sage project...
It would help if you give an example of two curves that you want to define. solve is up to the task for finding intersections of all quadratic curves (it will actually solve quartics and some quintics, too).
When you say "distance" what do you mean - arc length sort of distance or distance from a point to the curve?
As for tangents, that is easily handled with idiff (see its docstring for examples with help(idiff).

Calculate area between two curves (that are normal distributions)

I need to calculate the area between two curves.
I have lots of data, so I'd like to do it programmatically.
Basically, I always have 2 normal distributions, calculated from a mean value and standard deviation. I would then like to calculate how much they intersect.
Here is an example of what I mean, and also some code in R (that I don't know).
Is there already a function in matplotlib or scipy or some other module that does it for me?
In case I have to implement it myself, I think that I should do:
find the intersections (there will be max 2)
see which function is lower before, [between], and after the intersection
calculate the integral of the lower function and add them all together
Is that right? How can I do the single steps? Are there functions, modules, etc that can help?
I don't know R either, but the answer seems to be in the link you provided: just integrate the minimum of your distributions.
You don't need to find intersections, just feed min(f(x), g(x)) to scipy.integrate.quad.

Categories

Resources