searching python object hierarchy

searching python object hierarchy - python

I'm using an unstable python library that's undergoing changes to the API in its various git submodules. Every few weeks, some member or method's location will get changed or renamed. For example, the expression
vk.rinkront[0].asyncamera.printout()
that worked a few weeks back, now has the asyncamera member located in
vk.toplink.enabledata.tx[0].asyncamera[0].print()
I manage to figure out the method's new location by greping, git diffing, ipython autocomplete, and generally bumbling my way around. This process is painful, because the repository has been quite abstracted and the member names observable from the python shell do not necessarily appear in the git submodule code.
Is there a python routine that performs a graph traversal of the object hierarchy while checking for keywords (e.g. print ) ?
(if no, I'll hack some bfs/dfs that checks the child node type before pushing/appending the __dict__, array, dir(). etc... contents onto a queue/stack. But some one must have come across this problem before and come up with a more elegant solution.)
-------EDITS----------
to sum up, is there an existing library such that the following code would work?
import unstable_library
import object_inspecter.search
vk = unstable_library.create() #initializing vk
object_inspecter.search(vk,"print") #searching for lost method

Oy, I feel for you... You probably should just lock down on a version, use it until the API becomes stable, and then upgrade later. That's why complex projects focus on dependency version control.
There is a way, but it's not really standard. The dir() built-in function returns a list of string names of all attributes and methods of a class. With a little bit a wrangling, you could write a script that recursively digs down.
The ultimate issue, though, is that you're going to run into infinite recursive loops when you try to investigate class members. You'll need to add in smarts to either recognize the pattern or limit the depth of your searching.

you can use fairly simple graph traversal to do this
def find_thing(ob,pattern,seen=set(),name=None,curDepth=0,maxDepth=99,isDict=True):
if(ob is None):
return []
if id(ob) in seen:
return []
seen.add(id(ob))
name = str(name or getattr(ob,"__name__",str(ob)))
base_case = check_base_cases(ob,name,pattern)
if base_case is not None:
return base_case
if(curDepth>=maxDepth):
return []
return recursive_step(ob,pattern,name=name,curDepth=curDepth,isDict=isDict)
now just define your two steps (base_case and recursive_step)... something like
def check_base_cases(ob,name,pattern):
if isinstance(ob,str):
if re.match(pattern,ob):
return [ob]
else:
return []
if isinstance(ob,(int,float,long)):
if ob == pattern or str(ob) == pattern:
return [ob]
else:
return []
if isinstance(ob,types.FunctionType):
if re.match(pattern,name):
return [name]
else:
return []
def recursive_step(ob,pattern,name,curDepth,isDict=True):
matches = []
if isinstance(ob,(list,tuple)):
for i,sub_ob in enumerate(ob):
matches.extend(find_thing(sub_ob,pattern,name='%s[%s]'%(name,i),curDepth=curDepth+1))
return matches
if isinstance(ob,dict):
for key,item in ob.items():
if re.match(pattern,str(key)):
matches.append('%s.%s'%(name,key) if not isDict else '%s["%s"]'%(name,key))
else:
matches.extend(find_thing(item,pattern,
name='%s["%s"]'%(name,key) if isDict else '%s.%s'%(name,key),
curDepth=curDepth+1))
return matches
else:
data = dict([(x,getattr(ob,x)) for x in dir(ob) if not x.startswith("_")])
return find_thing(data,pattern,name=name,curDepth=curDepth+1,isDict=False)
finally you can test it like so
print(find_thing(vk,".*print.*"))
I used the following Example
class vk(object):
class enabledata:
class camera:
class object2:
def printX(*args):
pass
asynccamera = [object2]
tx = [camera]
toplink = {'enabledata':enabledata}
running the following
print( find_thing(vk,'.*print.*') )
#['vk.toplink["enabledata"].camera.asynccamera[0].printX']

Related

Python AST, ast.NodeTransformer ,TypeError: required field 'lineno' missing from stmt

I'd be v. gratefull to learn something, usefull, as for now, I've been moving blindly.
so the problem lays in python's ast.NodeTransformer. I was wondering if one is able to add an function to existing class using this way, and not getting mad.
this is how I proceded so far.
import ast, inspect, cla # cla is a name of class to which we want to add a new function
klass = inspect.getsource(cla)
tree = ast.parse(klass)
st = '''def function(): return 1'''
Foo = ast.parse(st)
class AddFunc(ast.NodeTransformer):
def visit_ClassDef(self, node):
return node, node.body + Foo.body
self.generic_visit(node)
inst = AddFunc()
stuff = i.visit(tree)
# now the trouble begins, a compiling..
co = compile(stuff, filename='<ast>', mode='exec')
# i get TypeError: required "lineno" missing from stmt
I have tried (unsuccesfully as you could probably guess) to handle this by using
ast library helper functions ast.fix_missing_locations(),
and ast.copy_locaction(), but in most cases I've ended up guessing or
facing AttributeError by tuple which is inside AddFunc class.
Have anybody some idea, how to manage this?

kolko's answer is correct, in this instance, you need to return an AST node. Foo in this case is a module, and its body should be the statements constructing the function, which can simply be spliced into the class definition.
It helps to understand that ast.NodeTransformer is very particular:
it calls the visit method by the specific class name, ignoring hierarchy
it evaluates the return value with isinstance(AST) and isinstance(list), anything else is ignored!
You may also get this error or the closely related "lineno missing from expr" even when you're using ast.fix_missing_locations to add location data.
If this happens, it's because your AST has an element that's not allowed in that place. Use astunparse.dump to carefully examine the structure. Within a module body, there can only be statements. Within a function def, there can only be statements. Then there are specific places that you can have an expression.
To solve this, write out the python you're expecting to generate, parse it, dump it and check it against what you're generating.

I found answer in twitter:
https://twitter.com/mitsuhiko/status/91169383254200320
mitsuhiko: "TypeError: required field "lineno" missing from stmt" — no, what you actually mean is "tuple is not a statement'. #helpfulpython
You return here tuple, but u must return ast
class AddFunc(ast.NodeTransformer):
def visit_ClassDef(self, node):
node.body += Foo.body
return node

short name for a string/dict/list index

In python I seem to need to frequently make dicts/lists within dicts/lists within dicts/lists and then access these structures in complex if/elif/else trees. Is there someway that I could make a shorthand way of accessing a certain level of this data structure to make the code more concise.
This is an example line of code now:
schema[exp][node]['properties']['date'] = i.partition('(')[2].rpartition(')')[0].strip()
which is followed by a whole heap of other lines starting with "schema[exp][node]['properties']['foo']"
What I would like is something like:
reference_maker(schema[exp][node]['properties']['date'], schema_props)
schema_props['date'] = i.partition('(')[2].rpartition(')')[0].strip()
but I can't even really think where to begin.

If you're not worried about it changing:
schema_props = schema[exp][node]['properties']
schema_props['date'] = ...
But if you want the reference to hang around and auto-update:
schema_props = lambda: schema[exp][node]['properties']
schema_props()['date'] = ...
node = node + 1
# this now uses the next node
schema_props()['date'] = ...
Or without the lambda:
def schema_props():
return schema[exp][node]['properties']
schema_props()['date'] = ...
node = node + 1
# this now uses the next node
schema_props()['date'] = .

Not sure I understand but what’s the problem with the following?
schema_props = schema[exp][node]['properties']
schema_props['date'] = i.partition('(')[2].rpartition(')')[0].strip()
Of course, you have to be careful that schema_props always points to a still valid entry in your dict. Ie. once you manually reset schema[exp][node]['properties'] your schema_props reference will not update the original dict anymore.
For more elaborate indirection handling, you could build your own collection types which may then always keep a reference to the base dict. (See also: http://docs.python.org/2/library/collections.html#collections-abstract-base-classes)

Workarounds to suspend (serialize) and resume a recursive generator stack?

I have a recursive generator function that creates a tree of ChainMap contexts, and finally does something with the context at the end of the tree. It looks like this (parent_context is a ChainMap, hierarchy is a list):
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield do_something(**child_context)
else:
yield from recursive_generator(child_context, hierarchy[1:])
Now I'd like to flag one level of the hierarchy such that the operation suspends after finishing that level, serializes the state to disk to be picked up later where it left off. Is there a way to do this without losing the elegance of the recursion?
I know that you can't pickle generators, so I thought about refactoring into an iterator object. But I think yield from is necessary for the recursion here (edit: at least without some tedious management of the stack), so I think it needs to be a generator, no? Is there a workaround for this?

you seem to be exploring a tree with DFS. so you could construct the tree in memory and make the DFS explicit. then just store the tree and restart at the left-most node (i think?).
that's effectively "tedious management of the stack", but it has a nice picture that would help implement it (at least for me, looking at your problem as DFS of a tree makes the implementation seem fairly obvious - before i thought of it like that, it seemed quite complicated - but i may be missing something).
sorry if that's obvious and insufficient...
[edit]
class Inner:
def __init__(self, context, hierarchy):
self.children = []
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level)
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
self.children.append(Leaf(context))
else:
self.children.append(Inner(child_context, hierarchy[1:]))
def do_something(self):
# this will do something on the left-most leaf
self.children[0].so_something()
def prune(self):
# this will remove the left-most leaf
if isinstance(self.children[0], Leaf):
self.children.pop(0)
else:
self.children[0].prune()
if not self.children[0]:
self.children.pop(0)
def __bool__(self):
return bool(self.children)
class Leaf:
def __init__(self, context):
self.context = context
def do_something():
do_something(**self.context)
the code above hasn't been tested. i ended up using classes for nodes as a tuple seemed too confusing. you create the tree by creating the parent node. then you can "do something" by calling do_something, after which you will want to remove the "done" leaf with prune:
tree = Inner(initial_context, initial_hierarchy)
while tree:
tree.do_something()
tree.prune()
i am pretty sure it will contain bugs, but hopefully it's enough to show the idea. sorry i can't do more but i need to repot plants....
ps it's amusing that you can write code with generators, but didn't know what DFS was. you might enjoy reading the "algorithm design manual" - it's part textbook and part reference, and it doesn't treat you like an idiot (i too have no formal education in computer science, and i thought it was a good book).
[edited to change to left-most first, which is what you had before, i think]
and alko has a good point...

Here's what I ended up doing:
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield child_context
else:
yield from recursive_generator(child_context, hierarchy[1:])
def traverse_tree(hierarchy):
return list(recursive_generator(ChainMap(), hierarchy)
def do_things(contexts, start, stop):
for context in contexts[start:stop]:
yield do_something(**context)
Then I can pickle the list returned by traverse_tree and later load it and run it in pieces with do_things. This is all in a class with a lot more going on of course, but this gets to the gist of it.

Running multiple functions in Python

I had a program that read in a text file and took out the necessary variables for serialization into turtle format and storing in an RDF graph. The code I had was crude and I was advised to separate it into functions. As I am new to Python, I had no idea how to do this. Below is some of the functions of the program.
I am getting confused as to when parameters should be passed into the functions and when they should be initialized with self. Here are some of my functions. If I could get an explanation as to what I am doing wrong that would be great.
#!/usr/bin/env python
from rdflib import URIRef, Graph
from StringIO import StringIO
import subprocess as sub
class Wordnet():
def __init__(self, graph):
self.graph = Graph()
def process_file(self, file):
file = open("new_2.txt", "r")
return file
def line_for_loop(self, file):
for line in file:
self.split_pointer_part()
self.split_word_part()
self.split_gloss_part()
self.process_lex_filenum()
self.process_synset_offset()
+more functions............
self.print_graph()
def split_pointer_part(self, before_at, after_at, line):
before_at, after_at = line.split('#', 1)
return before_at, after_at
def get_num_words(self, word_part, num_words):
""" 1 as default, may want 0 as an invalid case """
""" do if else statements on l3 variable """
if word_part[3] == '0a':
num_words = 10
else:
num_words = int(word_part[3])
return num_words
def get_pointers_list(self, pointers, after_at, num_pointers, pointerList):
pointers = after_at.split()[0:0 +4 * num_pointers:4]
pointerList = iter(pointers)
return pointerList
............code to create triples for graph...............
def print_graph(self):
print graph.serialize(format='nt')
def main():
wordnet = Wordnet()
my_file = wordnet.process_file()
wordnet.line_for_loop(my_file)
if __name__ == "__main__":
main()

You question is mainly a question about what object oriented programming is. I will try to explain quickly, but I recommend reading a proper tutorial on it like
http://www.voidspace.org.uk/python/articles/OOP.shtml
http://net.tutsplus.com/tutorials/python-tutorials/python-from-scratch-object-oriented-programming/
and/or http://www.tutorialspoint.com/python/python_classes_objects.htm
When you create a class and instantiate it (with mywordnet=WordNet(somegraph)), you can resue the mywordnet instance many times. Each variable you set on self. in WordNet, is stored in that instance. So for instance self.graph is always available if you call any method of mywordnet. If you wouldn't store it in self.graph, you would need to specify it as a parameter in each method (function) that requires it. Which would be tedious if all of these method calls require the same graph anyway.
So to look at it another way: everything you set with self. can be seen as a sort of configuration for that specific instance of Wordnet. It influences the Wordnet behaviour. You could for instance have two Wordnet instances, each instantiated with a different graph, but all other functionality the same. That way you can choose which graph to print to, depending on which Wordnet instance you use, but everything else stays the same.
I hope this helps you out a little.

First, I suggest you figure out the basic functional decomposition on its own - don't worry about writing a class at all.
For example,
def split_pointer_part(self, before_at, after_at, line):
before_at, after_at = line.split('#', 1)
return before_at, after_at
doesn't touch any instance variables (it never refers to self), so it can just be a standalone function.
It also exhibits a peculiarity I see in your other code: you pass two arguments (before_at, after_at) but never use their values. If the caller doesn't already know what they are, why pass them in?
So, a free function should probably look like:
def split_pointer_part(line):
"""get tuple (before #, after #)"""
return line.split('#', 1)
If you want to put this function in your class scope (so it doesn't pollute the top-level namespace, or just because it's a logical grouping), you still don't need to pass self if it isn't used. You can make it a static method:
#staticmethod
def split_pointer_part(line):
"""get tuple (before #, after #)"""
return line.split('#', 1)

One thing that would be very helpful for you is a good visual debugger. There's a nice free one for Python called Winpdb. There are also excellent debuggers in the commercial products IntelliJ IDEA/PyCharm, Komodo IDE, WingIDE, and Visual Studio (with the Python Tools add-in). Probably a few others too.
I highly recommend setting up one of these debuggers and running your code under it. It will let you step through your code line by line and see what happens with all your variables and objects.
You may find people who tell you that real programmers don't need or shouldn't use debuggers. Don't listen to them: a good debugger is one of the very best tools to help you learn a new language or to get familiar with a piece of code.

How do I retain the method attributes of the functions generated through yield in python 2.7?

I have been doing a lot of searching, and I don't think I've really found what I have been looking for. I will try my best to explain what I am trying to do, and hopefully there is a simple solution, and I'll be glad to have learned something new.
This is ultimately what I am trying to accomplish: Using nosetests, decorate some test cases using the attribute selector plugin, then execute test cases that match a criteria by using the -a switch during commandline invocation. The attribute values for the tests that are executed are then stored in an external location. The command line call I'm using is like below:
nosetests \testpath\ -a attribute='someValue'
I have also created a customized nosetest plugin, which stores the test cases' attributse, and writes them to an external location. The idea is that I can select a batch of tests, and by storing the attributes of these tests, I can do filtering on these results later for reporting purposes. I am accessing the method attributes in my plugin by overriding the "wantMethod" method with the code similar to the following:
def set_attribs(self, method, attribute):
if hasattr(method, attribute):
if not self.method_attributes.has_key(method.__name__):
self.method_attributes[method.__name__] = {}
self.method_attributes[method.__name__][attribute] = getattr(method, attribute)
def wantMethod(self, method):
self.set_attribs(method, "attribute1")
self.set_attribs(method, "attribute2")
pass
I have this working for pretty much all the tests, except for one case, where the test is uing the "yield" keyword. What is happening is that the methods that are generated are being executed fine, but then the method attributes are empty for each of the generated functions.
Below is the example of what I am trying to achieve. The test below retreives a list of values, and for each of those values, yields the results from another function:
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield (lambda x: f(), key)
def _do_test(self, input1, input2):
# Some code
From what I have read, and think I understand, when yield is called, it would create a new callable function which then gets executed. I have been trying to figure out how to retain the attribute values from my sample_test_generator method, but I have not been successful. I thought I could create a partial method, and then add the attribute to the method, but no luck. The tests execute without errors at all, it just seems that from my plugin's perspective, the method attributes aren't present, so they don't get recorded.
I realize this a pretty involved question, but I wanted to make sure that the context for what I am trying to achieve is clear. I have been trying to find information that could help me for this particular case, but I feel like I've reached a stumbling block now, so I would really like to ask the experts for some advice.
Thanks.
** Update **
After reading through the feedback and playing around some more, it looks like if I modified the lambda expression, it would achieve what I am looking for. In fact, I didn't even need to create the partial function:
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
yield (lambda: self._do_test)
The only downside to this approach is that the test name will not change. As I am playing around more, it looks like in nosetests, when a test generator is used, it would actually change the test name in the result based on the keywords it contains. Same thing was happening when I was using the lambda expression with a parameter.
For example:
Using lamdba expression with a parameter:
yield (lambda x: self._do_test, "value1")
In nosetests plugin, when you access the test case name, it would be displayed as "sample_test_generator(value1)
Using lambda expression without a parameter:
yield (lambda: self._do_test)
The test case name in this case would be "sample_test_generator". In my example above, if there are multiple values in the dictionary, then the yield call would occur multiple times. However, the test name would always remain as "sample_test_generator". This is not as bad as when I would get the unique test names, but then not be able to store the attribute values at all. I will keep playing around, but thanks for the feedback so far!
EDIT
I forgot to come back and provide my final update on how I was able to get this to work in the end, there was a little confusion on my part at first, and after I looked through it some more, I figured out that it had to do with how the tests are recognized:
My original implementation assumed that every test that gets picked up for execution goes through the "wantMethod" call from the plugin's base class. This is not true when "yield" is used to generate the test, because at this point, the test method has already passed the "wantMethod" call.
However, once the test case is generated through the "yeild" call, it does go through the "startTest" call from the plug-in base class, and this is where I was finally able to store the attribute successfully.
So in a nut shell, my test execution order looked like this:
nose -> wantMethod(method_name) -> yield -> startTest(yielded_test_name)
In my override of the startTest method, I have the following:
def startTest(self, test):
# If a test is spawned by using the 'yield' keyword, the test names would be the parent test name, appended by the '(' character
# example: If the parent test is "smoke_test", the generated test from yield would be "smoke_test('input')
parent_test_name = test_name.split('(')[0]
if self.method_attributes.has_key(test_name):
self._test_attrib = self.method_attributes[test_name]
elif self.method_attributes.has_key(parent_test_name):
self._test_attrib = self.method_attributes[parent_test_name]
else:
self._test_attrib = None
With this implementation, along with my overide of wantMethod, each test spawned by the parent test case also inherits attributes from the parent method, which is what I needed.
Again, thanks to all who send replies. This was quite a learning experience.

Would this fix your name issue?
def _actual_test(x, y):
assert x == y
def test_yield():
_actual_test.description = "test_yield_%s_%s" % (5, 5)
yield _actual_test, 5, 5
_actual_test.description = "test_yield_%s_%s" % (4, 8) # fail
yield _actual_test, 4, 8
_actual_test.description = "test_yield_%s_%s" % (2, 2)
yield _actual_test, 2, 2
Rename survives #attr too.

does this work?
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
def get_f(f, key):
return lambda x: f(), key
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield get_f(f, key)
def _do_test(self, input1, input2):
# Some code
The Problem ist that the local variables change after you created the lambda.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

searching python object hierarchy - python

Related

Python AST, ast.NodeTransformer ,TypeError: required field 'lineno' missing from stmt

short name for a string/dict/list index

Workarounds to suspend (serialize) and resume a recursive generator stack?

Running multiple functions in Python

How do I retain the method attributes of the functions generated through yield in python 2.7?

Categories

Resources