How to use 1 side wilcoxon rank sum test python - python

I have 2 lists, and I would like to check the wilcoxon rank sum test.
I saw that there is scipy.stats.ranksums library, but it only show the 2 sided test.
How can I do 1 sided test in python?

I checked seems like there is only one-sided for wilcoxon signed rank test.
Maybe not the most ideal, but you can use the wilcox.test from R by calling rpy2:
import numpy as np
from rpy2.robjects import FloatVector
from rpy2.robjects.packages import importr
stats = importr('stats')
x = np.random.poisson(1,size=20)
y = np.random.poisson(3,size=20)
test = stats.wilcox_test(FloatVector(x),FloatVector(y),alternative='less')
d = { key : test.rx2(key)[0] for key in ['statistic','p.value','alternative'] }
d

Related

Creating vector with one unknown using a function

I have a problem that I cannot get my head around although there must be a simple way to do so. Basically, I have this function:
Pr(d) = Pr(d_0) -10*n*lodg(d/d_0)
where we can ignore (for now) the Pr(d) term. Now, I want to pass the follwing dataframe:
d
0 200
1 600
2 800
3 1000
with d_0 constant. I sould actually pass it as an array using df_matrix = df.to_numpy().
What I want is to create a function
import pandas as pd
import numpy as np
from sympy import symbols, solve
from scipy.optimize import fsolve
import math
def recieved_power(pr_d0,d_0, x):
pr_d0 -10*n*math.log(d/d_0)
that will return a vector with the variable n (unknown). It should return:
-3.0102999566398116*n
-7.781512503836435*n
-9.030899869919434*n
-10.0*n
Is that possible. I cannot just multiply by n afterwards because there might be new factors in a later stage of the work.
Thanks for any insight.
I suggest you change
def recieved_power(pr_d0,d_0, x):
pr_d0 -10*n*math.log(d/d_0)
into
def recieved_power(pr_d0,d,d_0):
return lambda n: [ pr_d0 -10*n*math.log(single_d/d_0) for single_d in d]
For example, the following code snippet takes in the parameters pr_d0, d and d0 and returns a function which takes in n and outputs a list of numbers, each of which representing a single pr_d0 -10nmath.log(d/d_0) value
import math
d=[200,600,800,1000]
def recieved_power(pr_d0,d,d_0):
return lambda n: [ pr_d0 -10*n*math.log(single_d/d_0) for single_d in d]
func_list=recieved_power(0,d,d_0=1)
print(func_list(3))

rpy2 pass : in function arguments

I am facing an issue when trying to execute the r code below with python rpy2.
from rpy2.robjects import r
import rpy2.robjects as ro
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
stats = importr("stats")
with localconverter(ro.default_converter + pandas2ri.converter):
Rdataframe2 = ro.conversion.py2rpy(dtw)
rdism = r["as.dist"](Rdataframe2)
ttclust = r.hclust(rdism)
ttclusterange = r.cutree(ttclust, k='1:3')
I can't find a way to pass the argument k="1:3" in the cutree function.
I keep receiving an error message stating
""elements of 'k' must be between 1 and %d", :
missing value where TRUE/FALSE needed
it seems that I can't find the right syntax to execute the last line.
Can someone please help me to solve this issue
The 1:3 expression is meant to generate a vector of c(1, 2, 3) in R. However, you are not evaluating it in R but passing it as a string/character '1:3' using rpy2. Try passing an equivalent list [1, 2, 3] instead, or using list(range(1, 3 + 1)). This is:
r.cutree(ttclust, k=list(range(1, 3 + 1)))

Compare Methods in rpy2

I have an rpy2 script:
from rpy2.robjects.packages import importr
binom = importr('binom')
from rpy2 import robjects
robjects.r('''library(binom)
p = seq(0,1,.01)
coverage = binom.coverage(p, 10, method="bayes", type = "central")$coverage
''')
I'd like to use it to compare the results from a list of methods please:
methods = [("bayes", type = "central"),("asymptotic")]
for method in methods:
robjects.globalenv["method"] = robjects.r(method)
robjects.r('''library(binom)
p = seq(0,1,0.01)
coverage = binom.coverage(p, 10, method=method)$coverage
''')
The first line gives me:
invalid syntax
And I'd like to include the 'type' for the Bayes method please but when I drop that to get the syntax on my list I still get the error:
object 'bayes' not found
robjects.r() receives a string so for this particular task you can just replace the word method with the right string. Using both quotes (single and double) will do the trick because .replace() will ditch the external quote and replace the text, keeping the single quote.
from rpy2.robjects.packages import importr
binom = importr('binom')
from rpy2 import robjects
methods = ["'bayes', type='central'","'asymptotic'"]
for method in methods:
r_string = """library(binom)
p = seq(0,1,0.01)
coverage = binom.coverage(p, 10, method=TECHNIQUE)$coverage
""".replace('TECHNIQUE',method)
robjects.r(r_string)

Using subset from arules package in rpy2

It's easy to use apriori algorithm from package arules as:
import rpy2.interactive as r
arules = r.packages.importr("arules")
from rpy2.robjects.vectors import ListVector
od = OrderedDict()
od["supp"] = 0.0005
od["conf"] = 0.7
od["target"] = 'rules'
result = ListVector(od)
my_rules = arules.apriori(dataset, parameter=result)
However, apriori subset uses a different format in subset param:
rules.sub <- subset(rules, subset = rhs %in% "marital-status=Never-married" & lift > 2)
It's possible to use this subset function with rpy2?
If subset is (re)defined in the R package arules, the object arules obtained from importr will contain it. In your python code this will look like arules.subset.
The parameter subset is a slightly different story because it is an R expression. There can be several ways to tackle this. One of them is to wrap it in an ad-hoc R function.
from rpy2.robjects import r
def mysubset(rules, subset_str):
return r("function(rules) { arules::subset(rules, subset = %s) }" % \
subset_str)
rules_sub = mysubset(rules,
"rhs %in% "marital-status=Never-married" & lift > 2)

Embed R code in python

I need to make computations in a python program, and I would prefer to make some of them in R. Is it possible to embed R code in python ?
You should take a look at rpy (link to documentation here).
This allows you to do:
from rpy import *
And then you can use the object called r to do computations just like you would do in R.
Here is an example extracted from the doc:
>>> from rpy import *
>>>
>>> degrees = 4
>>> grid = r.seq(0, 10, length=100)
>>> values = [r.dchisq(x, degrees) for x in grid]
>>> r.par(ann=0)
>>> r.plot(grid, values, type=’lines’)
RPy is your friend for this type of thing.
The scipy, numpy and matplotlib packages all do simular things to R and are very complete, but if you want to mix the languages RPy is the way to go!
from rpy2.robjects import *
def main():
degrees = 4
grid = r.seq(0, 10, length=100)
values = [r.dchisq(x, degrees) for x in grid]
r.par(ann=0)
r.plot(grid, values, type='l')
if __name__ == '__main__':
main()
When I need to do R calculations, I usually write R scripts, and run them from Python using the subprocess module. The reason I chose to do this was because the version of R I had installed (2.16 I think) wasn't compatible with RPy at the time (which wanted 2.14).
So if you already have your R installation "just the way you want it", this may be a better option.
Using rpy2.objects. (Tried and ran some sample R programs)
from rpy2.robjects import r
print(r('''
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
print(class(apple))
##########################
# Create the data for the chart.
v <- c(7,12,28,3,41)
# Give the chart file a name.
png(file = "line_chart.jpg")
# Plot the bar chart.
plot(v,type = "o")
# Save the file.
dev.off()
##########################
# Give the chart file a name.
png(file = "scatterplot_matrices.png")
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and total 4 variables.
pairs(~wt+mpg+disp+cyl,data = mtcars,
main = "Scatterplot Matrix")
# Save the file.
dev.off()
install.packages("plotly") # Please select a CRAN mirror for use in this session
library(plotly) # to load "plotly"
'''))

Categories

Resources