Calculate mean of hue angles - python

I have been struggling with this for some time, despite there being related questions on SO (e.g. this one).
def circmean(arr):
arr = np.deg2rad(arr)
return np.rad2deg(np.arctan2(np.mean(np.sin(arr)),np.mean(np.cos(arr))))
But the results I'm getting don't make sense! I regularly get negative values, e.g.:
test = np.array([323.64,161.29])
circmean(test)
>> -117.53500000000004
I don't know if (a) my function is incorrect, (b) the method I'm using is incorrect, or (c) I just have to do a transformation to the negative values (add 360 degrees?). My research suggests that the problem isn't (a), and I've seen implementations (e.g. here) matching my own, so I'm leaning towards (c), but I really don't know.

Following this question, I've done some research that led me to find the circmean function in the scipy library.
Considering you're using the numpy library, I thought that a proper implementation in the scipy library shall suit your needs.
As noted in my answer to the aforementioned question, I haven't found any documentation of that function, but inspecting its source code revealed the proper way it should be invoked:
>>> import numpy as np
>>> from scipy import stats
>>>
>>> test = np.array([323.64,161.29])
>>> stats.circmean(test, high=360)
242.46499999999995
>>>
>>> test = np.array([5, 350])
>>> stats.circmean(test, high=360)
357.49999999999994
This might not be of any use to you, since some time passed since you posted your question and considering you've already implemented the function yourself, but I hope it may benefit future readers who are struggling with the same issue.

Related

R/apcluster and skilearn

I have been involved in analysis using a software called depict which includes affinity propagation analysis in Python.
I am keen to implement a counterpart using R/apcluster for additional analysis. It seems both use correlation but the results are slightly different. Is that possible to get to the bottom of this? Thanks very much.
af_obj = AffinityPropagation(affinity = 'precomputed', max_iter=10000, convergence_iter=1000) # using almost only default parameters
print "Affinity Propagation parameters:"
for param, val in af_obj.get_params().items():
print "\t{}: {}".format(param, val)
print "Perfoming Affinity Propagation.."
af = af_obj.fit(matrix_corr)
as in Python: https://github.com/jinghuazhao/PW-pipeline/blob/master/files/network_plot.py
require(apcluster)
apres <- apcluster(corSimMat,tRaw,details=TRUE)
as in R:
https://github.com/jinghuazhao/PW-pipeline/blob/master/files/network.R
J
Jing hua
It would be great to have all functionality of the R package apcluster available in Python!
To answer your questions regarding different results:
First of all, check whether the correlation/similarity matrixes are the same.
Also note that the results are not 100% deterministic, since a small amount of random noise is added internally.
You would have to check all parameters of the two implementations if they are all the same. Obviously, you do not get the same results for both implementations if you use default parameters. But this is only an issue if the defaults are exactly the same. As far as I know, they are not. The default damping parameter, for instance, is not the same.
I hope that helps.

Are numpy-matrix-functions buffered?

Are numpy matrix specific functions, such as x.max() buffered when applied several times?
So should one write:
bincount=np.apply_along_axis(lambda x: np.bincount(x, minlength=data.max()+1), axis=0, arr=data)
or better
data_max=data.max()+1
bincount=np.apply_along_axis(lambda x: np.bincount(x, minlength=data_max), axis=0, arr=data)
where data is e.g.
data=np.array([[1,2,5,4,8,7,8,9,14,8,14,5,2,1],
[5,8,7,13,7,8,9,21,5,7,9,24,3,2]])
or of course even much larger
After updating the question, it seems that you are asking whether numpy implements some form of caching of its results. While there is no general response to this question, for a method like ndarray.max, it is clear that no caching is done.
How can we know that without looking at the implementation? Consider that a caching scheme must resolve two problems:
find a place to store the cached result(s);
have a strategy to invalidate the cache once it no longer applies.
Although the first issue is non-trivial, the second one is the real killer. Not only can a numpy array be changed at any time, but the contents of the array can be shared by many objects. Additionally, C code can obtain the address of the internal buffers, and implement its own modifications to the underlying memory. Caching results would effectively disable many interesting uses of numpy.
You can consider numpy as a low-level library that doesn't concern itself with optimizations of that nature. If caching is needed, it should be implemented at a higher level, such as shown in your second example.
Like Slater Tyranus pointed out, only a benchmakr will show any results:
import numpy as np
import timeit
def func_a(data):
return np.apply_along_axis(lambda x: np.bincount(x, minlength=data.max()+1), axis=0, arr=data)
def func_b(data):
data_max=data.max()+1
return np.apply_along_axis(lambda x: np.bincount(x, minlength=data_max), axis=0, arr=data)
setup = '''import numpy as np
data=np.array([[1,2,5,4,8,7,8,9,14,8,14,5,2,1],
[5,8,7,13,7,8,9,21,5,7,9,24,3,2]])
from __main__ import func_a, func_b'''
min(timeit.Timer('func_a(data)', setup=setup).repeat(100,100))
0.02922797203063965
min(timeit.Timer('func_b(data)', setup=setup).repeat(100,100))
0.018524169921875
I tested also with much larger data. Overall one can say, it pays back calculating data_max=data.max() before. With much bigger arrays the discrepancy gets even larger.

Inverting tridiagonal matrix

I am having an equation
Ax=By
Where A and B are tridiagonal matrices. I want to calculate a matrix
C=inv (A).B
there are different x,s which will give different y,s hence calculation of C is handy.
Can someone please tell me a faster method to compute the inverse. I am using Python 3.5 and prefer if we use any method from numpy. If not possible I can use scipy or cython as second and third choice.
I have seen other similar questions but they do not fully match with my problem.
Thank you
There are many method to do it, anyway one of the simplest is the Tridiagonal matrix algorithm see the Wiki page. This algorithm work in O(n) time, there is a simple implementation in Numpy at the following Github link.
However, you may think to implement by yourself one of the known algorithm, for example something like a LU factorization
scipy.linalg.solve_banded is a wrapper for LAPACK which should in turn call MKL. It seems to run O(N). For a trivial example to show syntax
a = np.array([[1,2,0,0], [-1,2,1,0], [0,1,3,1], [0,0,1,2]])
x = np.array([1,2,3,4])
b = np.dot(a,x)
ab = np.empty((3,4))
ab[0,1:] = np.diag(a,1)
ab[1,:] = np.diag(a,0)
ab[2,:-1] = np.diag(a,-1)
y = solve_banded((1,1),ab,b)
print y

Can I decorate an explicit function call such as np.sqrt()

I understand a bit about python function decorators. I think the answer to my question is no, but I want to make sure. With a decorator and a numpy array of x = np.array([1,2,3]) I can override x.sqrt() and change the behavior. Is there some way I can override np.sqrt(x) in Python?
Use case: working on the quantities package. Would like to be able to take square root of uncertain quantities without changing code base that currently uses np.sqrt().
Edit:
I'd like to modify np.sqrt in the quantities package so that the following code works (all three should print identical results, note the 0 uncertainty when using np.sqrt()). I hope to not require end-users to modify their code, but in the quantities package properly wrap/decorate np.sqrt(). Currently many numpy functions are decorated (see https://github.com/python-quantities/python-quantities/blob/ca87253a5529c0a6bee37a9f7d576f1b693c0ddd/quantities/quantity.py), but seem to only work when x.func() is called, not numpy.func(x).
import numpy as np
import quantities as pq
x = pq.UncertainQuantity(2, pq.m, 2)
print x.sqrt()
>>> 1.41421356237 m**0.5 +/- 0.707106781187 m**0.5 (1 sigma)
print x**0.5
>>> 1.41421356237 m**0.5 +/- 0.707106781187 m**0.5 (1 sigma)
print np.sqrt(x)
>>> 1.41421356237 m**0.5 +/- 0.0 dimensionless (1 sigma)
Monkeypatching
If I understand your situation correctly, your use case is not really about decoration (modifying a function you write, in a standard manner)
but rather about monkey patching:
Modifying a function somebody else wrote without actually changing that function's definition's source code.
The idiom for what you then need is something like
import numpy as np # provide local access to the numpy module object
original_np_sqrt = np.sqrt
def my_improved_np_sqrt(x):
# do whatever you please, including:
# - contemplating the UncertainQuantity-ness of x and
# - calling original_np_sqrt as needed
np.sqrt = my_improved_np_sqrt
Of course, this can change only the future meaning of numpy.sqrt,
not the past one.
So if anybody has imported numpy before the above and has already used numpy.sqrt in a way you would have liked to influence, you lose.
(And the name to which they map numpy does not matter.)
But after the above code was executed, the meaning of numpy.sqrt in all
modules (whether they imported numpy before it or after it)
will be that of my_improved_np_sqrt, whether the creators of those modules
like it or not (and of course unless some more monkeypatching of numpy.sqrt
is going on elsewhere).
Note that
When you do weird things, Python can become a weird platform!
When you do weird things, Python can become a weird platform!
When you do weird things, Python can become a weird platform!
This is why monkey patching is not normally considered good design style.
So if you take that route, make sure you announce it very prominently
in all relevant documentation.
Oh, and if you do not want to modify other code than that which is
directly or indirectly executed from your own methods, you could
introduce a decorator that performs monkeypatching before the call
and un-monkeypatching (reassigning original_np_sqrt)
after the call and apply that decorator to
all your functions in question.
Make sure you handle exceptions in that decorator then, so that
the un-monkeypatching is really executed in all cases.
Maybe, as BrenBarn stated,
np.sqrt = decorator(np.sqrt)
because a decorator is just a callable that takes an object and returns a modified object.

incomplete gamma function in python?

the scipy.special.gammainc can not take negative values for the first argument. Are there any other implementations that could in python? I can do a manual integration for sure but I'd like to know if there are good alternatives that already exist.
Correct result: 1 - Gamma[-1,1] = 0.85
Use Scipy: scipy.special.gammainc(-1, 1) = 0
Thanks.
I typically reach for mpmath whenever I need special functions and I'm not too concerned about performance. (Although its performance in many cases is pretty good anyway.)
For example:
>>> import mpmath
>>> mpmath.gammainc(-1,1)
mpf('0.14849550677592205')
>>> 1-mpmath.gammainc(-1,1)
mpf('0.85150449322407795')
>>> mpmath.mp.dps = 50 # arbitrary precision!
>>> 1-mpmath.gammainc(-1,1)
mpf('0.85150449322407795208164000529866078158523616237514084')
I just had the same issue and ended up using the recurrence relations for the function when a<0.
http://en.wikipedia.org/wiki/Incomplete_gamma_function#Properties
Note also that the scipy functions gammainc and gammaincc give the regularized forms Gamma(a,x)/Gamma(a)
Still an issue in 2021, and they still haven't improved this in scipy. Especially it is frustrating that scipy does not even provide unregularised versions of the upper and lower incomplete Gamma functions. I also ended up using mpmath, which uses its own data type (here mpf for mpmath floating - which supports arbitrary precision). In order to cook up something quick for the upper and lower incomplete Gamma function that works with numpy arrays, and that behaves like one would expect from evaluating those integrals I came up with the following:
import numpy as np
from mpmath import gammainc
"""
In both functinos below a is a float and z is a numpy.array.
"""
def gammainc_up(a,z):
return np.asarray([gammainc(a, zi, regularized=False)
for zi in z]).astype(float)
def gammainc_low(a,z):
return np.asarray([gamainc(a, 0, zi, regularized=False)
for zi in z]).astype(float)
Note again, this is for the un-regularised functions (Eq. 8.2.1 and 8.2.2 in the DLMF), the regularised functions (Eq. 8.2.3 and 8.2.4) can be obtined in mpmath by setting the keyword regularized=True.

Categories

Resources