Error in scipy wilcoxon signed rank test for equal series? - python

I have a problem with the results of scipy.wilcoxon signed rank test;
x1=[29.39958, 29.21756, 29.350915, 29.34911, 29.212635]
sp.wilcoxon(x1,x1,zero_method="wilcox",correction=True)
returns statistic=0.0, pvalue=nan
But with zero_method="pratt", it returns statistic=0.0, pvalue=0.043114446783075355 instead.
I think there is a mistake there.
The statistics (for z-score, if I am not mistaken) are the same, but the results are different for the p-value.
Am I right and scipy wrong or am I missing something?!
I wanted to check with another module, but neither alglib's (http://www.alglib.net/hypothesistesting/wilcoxonsignedrank.php) or statmodels' (http://statsmodels.sourceforge.net/devel/generated/statsmodels.sandbox.descstats.sign_test.html?highlight=wilcoxon) allows for pratt correction, which I needed as it was deemed as more conservative...
Could you also advice for alternative modules for python stats? Thanks

Related

What is the best way of storing a discontinuous range of numbers in python? [duplicate]

I'd like to represent an arbitrarily complex range of real values, which can be discontinuous, i.e.:
0--4 and 5--6 and 7.12423--8
Where I'll be adding new ranges incrementally:
(0--4 and 5--6 and 7.12423--8) | ( 2--7) = (0--7 and 7.12423--8)
I don't really know the right language to describe this, so I'm struggling to search, but it seems like a class probably already exists to do what I want to do. Does it?
There are at least a couple of packages listed in the Python Package Index which deal with intervals:
interval
pyinterval
I've experimented with interval before and found it to be well-written and documented (at the moment its website seems to be unavailable). I've not used pyinterval.
In addition to the interval and pyinterval packages mentioned by Ned, there is also pyinter.
As pyinterval didn't compile on my machine, I only played with interval and pyinter.
The former seems better to me, because it has addition and subtraction operators defined for interval sets, which pyinter has not. Also when I tried to calculate the union of two discrete points, it worked as expected in interval, but raised AttributeError ("'int' object has no attribute 'overlaps'") in pyinter.
One very visible difference of pyinter was the the __repr__ function of the interval class which would output (7,9] instead of Interval(7, 9, lower_closed=False, upper_closed=True) (the latter is the representation of the interval package). While this is nice for quick interactive work, closed intervals might be confunded with two-element lists. Here I also like the interval package's approach more: It has a less ambiguous representation, but additionally defines a __str__ method, so that when calling str() or print() on the example interval, it would output as (7..9].

R/apcluster and skilearn

I have been involved in analysis using a software called depict which includes affinity propagation analysis in Python.
I am keen to implement a counterpart using R/apcluster for additional analysis. It seems both use correlation but the results are slightly different. Is that possible to get to the bottom of this? Thanks very much.
af_obj = AffinityPropagation(affinity = 'precomputed', max_iter=10000, convergence_iter=1000) # using almost only default parameters
print "Affinity Propagation parameters:"
for param, val in af_obj.get_params().items():
print "\t{}: {}".format(param, val)
print "Perfoming Affinity Propagation.."
af = af_obj.fit(matrix_corr)
as in Python: https://github.com/jinghuazhao/PW-pipeline/blob/master/files/network_plot.py
require(apcluster)
apres <- apcluster(corSimMat,tRaw,details=TRUE)
as in R:
https://github.com/jinghuazhao/PW-pipeline/blob/master/files/network.R
J
Jing hua
It would be great to have all functionality of the R package apcluster available in Python!
To answer your questions regarding different results:
First of all, check whether the correlation/similarity matrixes are the same.
Also note that the results are not 100% deterministic, since a small amount of random noise is added internally.
You would have to check all parameters of the two implementations if they are all the same. Obviously, you do not get the same results for both implementations if you use default parameters. But this is only an issue if the defaults are exactly the same. As far as I know, they are not. The default damping parameter, for instance, is not the same.
I hope that helps.

Minimize function with trust-ncg method proposes value greater than max_trust_radius

So far I understand the minimize function with method Trust-ncg, the "method specific" parameter "max_trust_radius" is the maximum value for a new step optimization.
However, I experience a weird behaviour.
I work in my doctorate data and I have a code that invokes minimize function (with trust ncg method)
passing parameters
{
'initial_trust_radius':0.1,
'max_trust_radius':1,
'eta':0.15,
'gtol':1e-5,
'disp': True
}
I invoke minimize function as:
res = minimize(bbox, x0, method='trust-ncg',jac=bbox_der, hess=bbox_hess,options=opt_par)
where
bbox is a function to evaluate the objective function
x0 is the initial guess
bbox_der is the gradient function
bbox_hess hessian function
opt_par is the dictionary above with the parameters.
Bbox invokes simulation code and get the data. It works: minimize go back and forth, proposing new values, bbox invokes simulation.
Everything works well until I got a weird issue.
The "x" vector contains 8 values. I realize that one of the iterations, the last value is greater than 1.
Per the max_trust_radius, I think that it should be less than 1, but it is 1.0621612802208713e+00
The issue causes problems because bbox can not receive the value greater than 1, as it invokes a simulation program and there is a constraint that it can not receive 1 or greater than 1.
I found the scipy code and tried to see if I could be able to find a bug or something wrong but I am not.
My main concerns are:
My understanding is that there is a bug in the scipy minimize code as the new value is greater than max_trust_radius .
How can I manipulate or control the values to avoid that values became greater than 1?
Do you suggest something to investigate the issue?
The max_trust_radius controls how large steps you are allowed to take:
max_trust_radius : float
Maximum value of the trust-region radius.
No steps that are longer than this value will be proposed.
Since you are very likely to take many steps during the minimization, each which can be up to 1 long, it is not strange at all that you (assuming ||x0||=0) end up with ||x|| > 1.
If your problem is strictly bounded then you need to apply an optimization algorithm that supports bounds on the parameters.
For scipy.optimize.minimize only L-BFGS-B, TNC and SLSQP methods seem to support the bounds= keyword.

Calculate mean of hue angles

I have been struggling with this for some time, despite there being related questions on SO (e.g. this one).
def circmean(arr):
arr = np.deg2rad(arr)
return np.rad2deg(np.arctan2(np.mean(np.sin(arr)),np.mean(np.cos(arr))))
But the results I'm getting don't make sense! I regularly get negative values, e.g.:
test = np.array([323.64,161.29])
circmean(test)
>> -117.53500000000004
I don't know if (a) my function is incorrect, (b) the method I'm using is incorrect, or (c) I just have to do a transformation to the negative values (add 360 degrees?). My research suggests that the problem isn't (a), and I've seen implementations (e.g. here) matching my own, so I'm leaning towards (c), but I really don't know.
Following this question, I've done some research that led me to find the circmean function in the scipy library.
Considering you're using the numpy library, I thought that a proper implementation in the scipy library shall suit your needs.
As noted in my answer to the aforementioned question, I haven't found any documentation of that function, but inspecting its source code revealed the proper way it should be invoked:
>>> import numpy as np
>>> from scipy import stats
>>>
>>> test = np.array([323.64,161.29])
>>> stats.circmean(test, high=360)
242.46499999999995
>>>
>>> test = np.array([5, 350])
>>> stats.circmean(test, high=360)
357.49999999999994
This might not be of any use to you, since some time passed since you posted your question and considering you've already implemented the function yourself, but I hope it may benefit future readers who are struggling with the same issue.

Is there a discontinuous range class for Python?

I'd like to represent an arbitrarily complex range of real values, which can be discontinuous, i.e.:
0--4 and 5--6 and 7.12423--8
Where I'll be adding new ranges incrementally:
(0--4 and 5--6 and 7.12423--8) | ( 2--7) = (0--7 and 7.12423--8)
I don't really know the right language to describe this, so I'm struggling to search, but it seems like a class probably already exists to do what I want to do. Does it?
There are at least a couple of packages listed in the Python Package Index which deal with intervals:
interval
pyinterval
I've experimented with interval before and found it to be well-written and documented (at the moment its website seems to be unavailable). I've not used pyinterval.
In addition to the interval and pyinterval packages mentioned by Ned, there is also pyinter.
As pyinterval didn't compile on my machine, I only played with interval and pyinter.
The former seems better to me, because it has addition and subtraction operators defined for interval sets, which pyinter has not. Also when I tried to calculate the union of two discrete points, it worked as expected in interval, but raised AttributeError ("'int' object has no attribute 'overlaps'") in pyinter.
One very visible difference of pyinter was the the __repr__ function of the interval class which would output (7,9] instead of Interval(7, 9, lower_closed=False, upper_closed=True) (the latter is the representation of the interval package). While this is nice for quick interactive work, closed intervals might be confunded with two-element lists. Here I also like the interval package's approach more: It has a less ambiguous representation, but additionally defines a __str__ method, so that when calling str() or print() on the example interval, it would output as (7..9].

Categories

Resources