Using the R select function in Python using rpy2 - python

I'm trying to covert UniProt accession numbers to Entrez IDs using the BioconductoR package org.Hs.eg.db (which is an S4 object). I'm also trying to do this as part of a Python script with rpy2. Calling the select function gives me errors. Here's the code (the program is 400 lines, I'm excerpting the relevant stuff):
from rpy2.robjects.packages import importr
from rpy2.robjects import StrVector, DataFrame, r
# get UniProt accension numbers from first two columns of data
uniprotA = []
uniprotB = []
for row in interactions:
uniprotA.append(row[0][10:])
uniprotB.append(row[1][10:])
# convert to vectors in r
uniprotA = StrVector(uniprotA)
uniprotB = StrVector(uniprotB)
homosap = importr('org.Hs.eg.db')
geneidA = r.select(homosap, keys = uniprotA, columns = "ENTREZID", keytype="UNIPROT")
And here are the error messages:
Traceback (most recent call last):
File "mitab_preprocess.py", line 356, in <module>
reformat_data(interactions)
File "mitab_preprocess.py", line 140, in reformat_data
geneidA = r.select(homosap, keys = uniprotA, columns = "ENTREZID", keytype="UNIPROT")
File "//anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "//anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 102, in __call__
new_args = [conversion.py2ri(a) for a in args]
File "//anaconda/lib/python2.7/site-packages/singledispatch.py", line 210, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
File "//anaconda/lib/python2.7/site-packages/rpy2/robjects/conversion.py", line 60, in _py2ri
raise NotImplementedError("Conversion 'py2ri' not defined for objects of type '%s'" % str(type(obj)))
NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'rpy2.robjects.packages.InstalledSTPackage'>'

homosap is an R package exposed as a Python namespace.
I think that you want to use an object in that namespace as a parameter, not the namespace.
Here it should be homosap.org_Hs_eg_db (I am guessing, I have not tried).
There are many things at play here:
. is not a syntactically valid symbol for Python variable names so it is translated to _ by rpy2
when importing an R package all its symbols are added to the search path. If coming from Python, this is a bit like from <package> import *. rpy2's importr is returning a namespace in which the symbols of the package are exposed as attributes.

Related

How to get all inventory groups variables in hierarchy via Python API?

I want to collect all inventory hosts groups variables in hierarchy data struct and send them to Consul to make them available in runtime.
Calling this method - https://github.com/ansible/ansible/blob/devel/lib/ansible/inventory/manager.py#L160 I got the error
inventory.get_vars()
Traceback (most recent call last):
File "<input>", line 1, in <module>
inventory.get_vars()
File "<>/.virtualenvs/ansible27/lib/python2.7/site-packages/ansible/inventory/manager.py", line 160, in get_vars
return self._inventory.get_vars(args, kwargs)
AttributeError: 'InventoryData' object has no attribute 'get_vars'
my script
import pprint
pp = pprint.PrettyPrinter(indent=4).pprint
from ansible.parsing.dataloader import DataLoader
from ansible.vars.manager import VariableManager
from ansible.inventory.manager import InventoryManager
loader = DataLoader()
inventory = InventoryManager(loader=loader, sources='inventories/itops-vms.yml')
variable_manager = VariableManager(loader=loader, inventory=inventory)
# shows groups as well
pp(inventory.groups)
# shows dict as well with content
pp(variable_manager.get_vars())
# creates an unhandled exception
inventory.get_vars()
How to do that right way?
Python 2.7.15
ansible==2.6.2
OS Mac High Siera
The error itself seems to be caused by a bug - the get_vars method of the inventory object calls get_vars method of the InventoryData object which is not implemented.
You need to specify the group, for example:
>>> inventory.groups['all'].get_vars()
{u'my_var': u'value'}
You can create a dictionary with that data:
{g: inventory.groups[g].get_vars() for g in inventory.groups}
The above gets only the variables defined in the inventory itself (which is what the question asks about). If you wanted to get a structure with variables from group_vars, host_vars, etc. (as you indicated in your comment I want to get something similar to $ ansible-inventory -i inventories/itops-vms.yml --graph --vars you'd need to collect the data from different sources, just like Ansible does.

PyRal getAttachment

I have a fairly simple use-case but i'm not understanding the error message i'm receiving.
I'm using the requests and pyral modules, pyral (http://pyral.readthedocs.io/en/latest/interface.html#) is really just a wrapper for Rally's Restful api. My goal is to get a file (attachment) from a Rally (a CA product) UserStory and store it to a local file system.
For context, here is my environment setup (authenticate to Rally and create an object). I've obviously removed authentication information.
from pyral import Rally, rallyWorkset
options = [arg for arg in sys.argv[1:] if arg.startswith('--')]
args = [arg for arg in sys.argv[1:] if arg not in options]
server, user, password, apikey, workspace, project = rallyWorkset(options)
rally = Rally(server='rally1.rallydev.com',
user='**********', password='***********',
apikey="**************",
workspace='**************', project='**************',
server_ping=False)
After that I get a response object for just one user story (see the query for US845), i do this just to simplify the problem.
r = rally.get('UserStory', fetch = True, projectScopeDown=True, query = 'FormattedID = US845')
and then I use the built-in iterator to get the user story from the RallyRESTResponse object.
us = r.next()
from there it feels like I should be able to easily use the getAttachment() method that accepts a artifact (us) and filename (name of an attachment). I'm able to use getAttachmentNames(us) to return a list of attachment names. The issue arrises when i try something like
attachment_names = rally.getAttachmentNames(us) #get attachments for this UserStory
attachment_file = rally.getAttachment(us, attachment_names[0]) #Try to get the first attachment
returns an error like this
Traceback (most recent call last):
File "<ipython-input-81-a4a342a59c5a>", line 1, in <module>
attachment_file = rally.getAttachment(us, attachment_names[0])
File "C:\Miniconda3\lib\site-packages\pyral\restapi.py", line 1700, in getAttachment
att.Content = base64.decodebytes(att_content.Content) # maybe further txfm to Unicode ?
File "C:\Miniconda3\lib\base64.py", line 552, in decodebytes
_input_type_check(s)
File "C:\Miniconda3\lib\base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
I receive a similar error if i try to use
test_obj = rally.getAttachments(us)
Which returns an error like this:
Traceback (most recent call last):
File "<ipython-input-82-06a8cd525177>", line 1, in <module>
rally.getAttachments(us)
File "C:\Miniconda3\lib\site-packages\pyral\restapi.py", line 1721, in getAttachments
attachments = [self.getAttachment(artifact, attachment_name) for attachment_name in attachment_names]
File "C:\Miniconda3\lib\site-packages\pyral\restapi.py", line 1721, in <listcomp>
attachments = [self.getAttachment(artifact, attachment_name) for attachment_name in attachment_names]
File "C:\Miniconda3\lib\site-packages\pyral\restapi.py", line 1700, in getAttachment
att.Content = base64.decodebytes(att_content.Content) # maybe further txfm to Unicode ?
File "C:\Miniconda3\lib\base64.py", line 552, in decodebytes
_input_type_check(s)
File "C:\Miniconda3\lib\base64.py", line 520, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
It seems that i'm fundamentally misunderstanding the parameters that this method requires? Has anyone been able to do this successfully before? For what it's worth i have no issues using the addAttachment() method with a workflow similar to the above. I've tried converting the filename (string) with the bytes() method to utf-8 but that didn't help.
I've also looked at this example in the pyral source, but i receive exactly the same error when trying to execute that.
https://github.com/klehman-rally/pyral/blob/master/examples/get_attachments.py
It looks like the issue in restapi.py script - there is no decodebytes method in base64 library:
att.Content = base64.decodebytes(att_content.Content)
All available methods are described at:
RFC 3548: Base16, Base32, Base64 Data Encodings
So, workaround is to replace decodebytes by base64.b64decode in restapi.py. At least, it works me.
E.g. location at Mac OS X:
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyral/restapi.py
I have used the below code to get all attachment since getAttachments is not working as expected. it will create a file in the current dir with the same name.
import sys
import string
import base64
from pyral import rallyWorkset, Rally,RallyRESTResponse
rally = Rally(server, user=USER_NAME, password=PASSWORD, workspace=workspace, project=project)
criterion = 'FormattedID = US57844'
response = rally.get('HierarchicalRequirement', query=criterion, order="FormattedID",pagesize=200, limit=400, projectScopeDown=True)
artifact = response.next()
context, augments = rally.contextHelper.identifyContext()
for att in artifact.Attachments:
resp = rally._getResourceByOID(context, 'AttachmentContent', att.Content.oid, project=None)
if resp.status_code not in [200, 201, 202]:
break
res = RallyRESTResponse(rally.session, context, "AttachmentContent.x", resp, "full", 1)
if res.errors or res.resultCount != 1:
print("breaking the for loop")
att_content = res.next()
cont = att_content.Content
x = base64.b64decode(cont)
output = open(att.Name, 'wb')
output.write(x)

boto does not like EMR BootstrapAction paramater

I'm trying to launch AWS EMR cluster using boto library, everything works well.
Because of that I need to install required python libraries, tried to add bootstrap action step using boto.emr.bootstrap_action
But It gives error below;
Traceback (most recent call last):
File "run_on_emr_cluster.py", line 46, in <module>
steps=[step])
File "/usr/local/lib/python2.7/dist-packages/boto/emr/connection.py", line 552, in run_jobflow
bootstrap_action_args = [self._build_bootstrap_action_args(bootstrap_action) for bootstrap_action in bootstrap_actions]
File "/usr/local/lib/python2.7/dist-packages/boto/emr/connection.py", line 623, in _build_bootstrap_action_args
bootstrap_action_params['ScriptBootstrapAction.Path'] = bootstrap_action.path AttributeError: 'str' object has no attribute 'path'
Code below;
from boto.emr.connection import EmrConnection
conn = EmrConnection('...', '...')
from boto.emr.step import StreamingStep
step = StreamingStep(name='mapper1',
mapper='s3://xxx/mapper1.py',
reducer='s3://xxx/reducer1.py',
input='s3://xxx/input/',
output='s3://xxx/output/')
from boto.emr.bootstrap_action import BootstrapAction
bootstrap_action = BootstrapAction(name='install related packages',path="s3://xxx/bootstrap.sh", bootstrap_action_args=None)
job = conn.run_jobflow(name='emr_test',
log_uri='s3://xxx/logs',
master_instance_type='m1.small',
slave_instance_type='m1.small',
num_instances=1,
action_on_failure='TERMINATE_JOB_FLOW',
keep_alive=False,
bootstrap_actions='[bootstrap_action]',
steps=[step])
What's the proper way of passing bootstrap arguments?
You are passing the bootstrap_actions argument as a literal string rather than as a list containing the BootstrapAction object you just created. Try this:
job = conn.run_jobflow(name='emr_test',
log_uri='s3://xxx/logs',
master_instance_type='m1.small',
slave_instance_type='m1.small',
num_instances=1,
action_on_failure='TERMINATE_JOB_FLOW',
keep_alive=False,
bootstrap_actions=[bootstrap_action],
steps=[step])
Notice that the ``bootstrap_action` argument is different here.

AttributeError for custom types with mixer

I have stumbled into a pretty interesting bug in klen mixer library for Python.
https://github.com/klen/mixer
This bug occurs whenever you try to setup a model with a column using sqlalchemy.dialect.postgresql.INET. Trying to blend a model with this in will bring the following trace...
mixer: ERROR: Traceback (most recent call last):
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 612, in blend
return type_mixer.blend(**values)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 130, in blend
for name, value in defaults.items()
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 130, in <genexpr>
for name, value in defaults.items()
File "/home/cllamach/PythonProjects/mixer/mixer/mix_types.py", line 220, in gen_value
return type_mixer.gen_field(field)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 209, in gen_field
return self.gen_value(field.name, field, unique=unique)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 254, in gen_value
gen = self.get_generator(field, field_name, fake=fake)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 304, in get_generator
field.scheme, field_name, fake, kwargs=field.params)
File "/home/cllamach/PythonProjects/mixer/mixer/backend/sqlalchemy.py", line 178, in make_generator
stype, field_name=field_name, fake=fake, args=args, kwargs=kwargs)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 324, in make_generator
fabric = self.__factory.gen_maker(scheme, field_name, fake)
File "/home/cllamach/PythonProjects/mixer/mixer/factory.py", line 157, in gen_maker
if not func and fcls.__bases__:
AttributeError: Mixer (<class 'tests.test_flask.IpAddressUser'>): 'NoneType' object has no attribute '__bases__'
I debugged this error all the way down to a couple of methods in the code, the first method get_generator tries the following...
if key not in self.__generators:
self.__generators[key] = self.make_generator(
field.scheme, field_name, fake, kwargs=field.params)
And heres comes the weird part. Here in this statement field.scheme has a value, specifically a Column object from sqlalchemy, but when is passed down to the make_generetor method is passed as a None. So far i have seen no other piece of code in between these two methods, have debugged with ipdb and others. Have tried calling the method manually with ipdb and still the scheme is passed None.
I know this can be deemed as too particular an issue but i would like to know if someone has encountered this kind of issues before, as this is a first for me.
Mixer is choking on an unknown column type. It stores all the ones it knows in GenFactory.types as a dict and calls types.get(column_type), which of course will return None for an unrecognized type. I ran into this because I defined a couple custom SQLAlchemy types with sqlalchemy.types.TypeDecorator.
To solve this problem, You'll have to monkey-patch your types into Mixer's type system. Here's how I did it:
def _setup_mixer_with_custom_types():
from mixer._faker import faker
from mixer.backend.sqlalchemy import (
GenFactory,
mixer,
)
from myproject.customcolumntypes import (
IntegerTimestamp,
UTCDateTimeTimestamp,
)
def arrow_generator():
return arrow.get(faker.date_time())
GenFactory.generators[IntegerTimestamp] = arrow_generator
GenFactory.generators[UTCDateTimeTimestamp] = arrow_generator
return mixer
mixer = _setup_mixer_with_custom_types()
Note that you don't actually have to touch GenFactory.types because it's just an intermediary step that Mixer skips if it can find your type directly on GenFactory.generators.
In my case, I also had to define a custom generator (to accommodate Arrow), but you may not need to. Mixer uses the fake-factory library to generate fake data, and you can see what they're using by looking at the GenFactory.generators dict.
You have to get the column type into GenFactory.generators, which by default only contains some standard types. Instead of monkey-patching, you might subclass GenFactory and then specify your own class upon Mixer generation.
In this case, we'll customize the already subclassed GenFactory and Mixer variants from backend.sqlalchemy:
from mixer.backend.sqlalchemy import Mixer, GenFactory
from customtypes import CustomType # The column type
def get_mixer():
class CustomFactory(GenFactory):
# No need to preserve entries, the parent class attribute is
# automatically extended through GenFactory's metaclass
generators = {
CustomType: lambda: 42 # Or any other function
}
return Mixer(factory=CustomFactory)
You can use whatever function you like as generator, it just has to return the desired value. Sometimes, directly using something from faker might be enough.
In the same way, you can also customize the other attributes of GenFactory, i.e. fakers and types.

lmtest/lrtest throws data.frame error when data passed through rpy2

I have an error that occurs only when I call lrtest (from the lmtest package) from a user-defined function via rpy2.
R:
continuous.test <- function(dat) {
require('lmtest')
options(warn=-1)
model <- lm(formula='pheno ~ .', data=dat)
anova <- lrtest(model,'interaction')
pval <- anova$"Pr(>Chisq)"[2]
}
When I call this function from the R interpreter, everything runs correctly. However, I receive an error when calling from the following snippet of python code. Note, this particular python file makes many other calls to rpy2 with success.
Python:
...
kway_dat = R.DataFrame(dataframe) # this is a valid dataframe, it's used in other calls.
...
R.r("source('/path/to/user/defined/file/perm_test.r')")
continuous_test = R.r['continuous.test']
pval = continuous_test(kway_dat)
Error:
Error in is.data.frame(data) : object 'dat' not found
Traceback (most recent call last):
File "./test_r_.py", line 83, in <module>
pval = continuous_test(kway_dat)
File "/usr/lib/python2.6/site-packages/rpy2-2.2.6dev_20120806-py2.6-linux-x86_64.egg/rpy2/robjects/functions.py", line 82, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/rpy2-2.2.6dev_20120806-py2.6-linux-x86_64.egg/rpy2/robjects/functions.py", line 34, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in is.data.frame(data) : object 'dat' not found
Troubleshooting:
I have tested the code in R and everything works fine.
I have passed a dataframe from python to R through rpy2 and called is.data.frame(dat) from an R function, and it returns true, so the issue is with lmtest or lrtest + rpy2.
Any help would be great. Thanks all!
It would be easier to help with a self-contained example (so one can reproduce exactly what you are experiencing).
A possible answer still: you might want to check that the content of the file
/path/to/user/defined/file/perm_test.r is really what you think it is.
I am also adding a stub for a self-contained example:
r_code = """
require('lmtest')
options(warn=-1)
continuous.test <- function(dat) {
model <- lm(formula='pheno ~ .', data=dat)
anova <- lmtest::lrtest(model,'interaction')
pval <- anova$"Pr(>Chisq)"[2]
}
"""
from rpy2.robjects import packages
my_r_pack = packages.SignatureTranslatedAnonymousPackage(r_code, "my_r_pack")
# [build a demo kway_dat here]
my_r_pack.continuous_test(kway_dat)
Answer Found
The issue was lrtest's internal call to update the model. Once inside lrtest, dat was out of scope. By updating the model manually and using lrtest's alternative call lrtest(model0,model1), the issue is entirely avoided.
Thanks to Achim Zeileis who replied incredibly promptly.

Categories

Resources