Retrieving attributes of elements in Python using libxml2 - python

I'm writing my first Python script using libxml2 to retrieve data from an XML file. The file looks like the following:
<myGroups1>
<myGrpContents name="ABC" help="abc_help">
<myGrpKeyword name="abc1" help="help1"/>
<myGrpKeyword name="abc2" help="help2"/>
<myGrpKeyword name="abc3" help="help3"/>
</myGrpContents>
</myGroups1>
There are many similar groups in the file. My intention is to get the attributes "name" and "help" and put them in a different format into another file. But I'm only able to retrieve till myGroups1 element using the following code.
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
while child is not None:
if not child.isBlankNode():
if child.type == "element":
print "\t Element ", child.name, " with ", child.lsCountNode(), "child(ren)"
print "\t and content ", repr(child.content)
child = child.next
How can I iterate deeper to the elements and get the attributes? Any help in this would be deeply appreciated.

python. how to get attribute value with libxml2 is probably the kind of answer you're looking for.
When faced with a problem like this, when I'd rather not read the docs for some reason, exploring a library interactively like this can be helpful - I suggest you use an interactive python repl (I like bpython) to try this. Here's my session in which I came up with a solution:
>>> import libxml2
>>> xml = """<myGroups1>
... <myGrpContents name="ABC" help="abc_help">
... <myGrpKeyword name="abc1" help="help1"/>
... <myGrpKeyword name="abc2" help="help2"/>
... <myGrpKeyword name="abc3" help="help3"/>
... </myGrpContents>
... </myGroups1>"""
>>> tree = libxml2.parseMemory(xml, len(xml)) # I found this method by looking through `dir(libxml2)`
>>> tree.children
<xmlNode (myGroups1) object at 0x10aba33b0>
>>> a = tree.children
>>> a
<xmlNode (myGroups1) object at 0x10a919ea8>
>>> a.children
<xmlNode (text) object at 0x10ab24368>
>>> a.properties
>>> b = a.children
>>> b.children
>>> b.properties
>>> b.next
<xmlNode (myGrpContents) object at 0x10a921290>
>>> b.next.content
'\n \n \n \n'
>>> b.next.next.content
'\n'
>>> b.next.next.next.content
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'content'
>>> b.next.next.next
>>> b.next.properties
<xmlAttr (name) object at 0x10aba32d8>
>>> b.next.properties.children
<xmlNode (text) object at 0x10ab40f38>
>>> b.next.properties.children.content
'ABC'
>>> b.next.properties.children.name
'text'
>>> b.next.properties.next
<xmlAttr (help) object at 0x10ab40fc8>
>>> b.next.properties.next.name
'help'
>>> b.next.properties.next.content
'abc_help'
>>> list(tree)
[<xmlDoc (None) object at 0x10a921248>, <xmlNode (myGroups1) object at 0x10aba32d8>, <xmlNode (text) object at 0x10aba3878>, <xmlNode (myGrpContents) object at 0x10aba3d88>, <xmlNode (text) object at 0x10aba3950>, <xmlNode (myGrpKeyword) object at 0x10aba3758>, <xmlNode (text) object at 0x10aba3320>, <xmlNode (myGrpKeyword) object at 0x10aba3f38>, <xmlNode (text) object at 0x10aba3560>, <xmlNode (myGrpKeyword) object at 0x10aba3998>, <xmlNode (text) object at 0x10aba33f8>, <xmlNode (text) object at 0x10aba38c0>]
>>> good = list(tree)[5]
>>> good.properties
<xmlAttr (name) object at 0x10aba35f0>
>>> good.prop('name')
'abc1'
>>> good.prop('help')
'help1'
>>> good.prop('whoops')
>>> good.hasProp('whoops')
>>> good.hasProp('name')
<xmlAttr (name) object at 0x10ab40ef0>
>>> good.hasProp('name').content
'abc1'
>>> for thing in tree:
... if thing.hasProp('name') and thing.hasProp('help'):
... print thing.prop('name'), thing.prop('help')
...
...
...
ABC abc_help
abc1 help1
abc2 help2
abc3 help3
Because it's bpython, I cheated a little bit - there's a rewind key, so I mistyped more than this, but otherwise this is pretty close.

Haven't used libxml2, but dived in to the case and found this,
try either,
if child.type == "element":
if child.name == "myGrpKeyword":
print child.prop('name')
print child.prop('help')
or
if child.type == "element":
if child.name == "myGrpKeyword":
for property in child.properties:
if property.type=='attribute':
# check what is the attribute
if property.name == 'name':
print property.content
if property.name == 'help':
print property.content
Refer http://ukchill.com/technology/getting-started-with-libxml2-and-python-part-1/
update:
try a recursive function
def explore(child):
while child is not None:
if not child.isBlankNode():
if child.type == "element":
print element.prop('name')
print element.prop('help')
explore(child.children)
child = child.next
doc = libxml2.parseFile(cmmfilename)
root2 = doc.children
child = root2.children
explore(child)

Related

What is the difference between getattr() and calling the attribute?

Here is a code I have written. I presumed both of them to return the same answer but they don't! How are they different?
from collections import deque
d = deque()
for _ in range(int(input())):
method, *n = input().split()
getattr(d, method)(*n)
print(*d)
and
from collections import deque
d = deque()
for _ in range(int(input())):
method, *n = input().split()
d.method(*n)
print(*d)
getattr(...) will get a named attribute from an object; getattr(x, 'y') is equivalent to x.y.
Where as d.method(*n) will try to lookup the method named method in deque object result in AttributeError: 'collections.deque' object has no attribute 'method'
>>> from collections import deque
>>> d = deque()
>>> dir(d) # removed dunder methods for readability
[
"append",
"appendleft",
"clear",
"copy",
"count",
"extend",
"extendleft",
"index",
"insert",
"maxlen",
"pop",
"popleft",
"remove",
"reverse",
"rotate",
]
>>> method = "insert"
>>> d.method
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'collections.deque' object has no attribute 'method'
>>> insert_method = getattr(d, method)
>>> insert_method
<built-in method insert of collections.deque object at 0x000001E5638AC0>
>>> help(insert_method)
Help on built-in function insert:
insert(...) method of collections.deque instance
D.insert(index, object) -- insert object before index
>>> insert_method(0, 1)
>>> d
deque([1])

AttributeError: 'str' object has no attribute 'decode' Having this error on the second line of my code

This is my code,
if __name__ == "__main__":
key = "0123456789abcdef0123456789abcdef".decode('hex') /this line is having error
plain_1 = "1weqweqd"
plain_2 = "23444444"
plain_3 = "dddd2225"
print(plain_1)
print(plain_2)
print(plain_3)
cipher = Present(key)
Output
AttributeError: 'str' object has no attribute 'decode'
It's because you try to decode a string. bytes type can be decoded but not str type. You should encode (key.encode()) this (or use b"foo") before, to convert the string to a bytes object.
>>> foo = "adarcfdzer"
>>> type(foo)
<class 'str'>
>>> foo = foo.encode()
>>> type(foo)
<class 'bytes'>
>>> foo = foo.decode()
>>> type(foo)
<class 'str'>

How do I clear a Python threading.local object?

How can I clear all the attributes off an instance of Python's threading.local()?
You can clear it's underlying __dict__:
>>> l = threading.local()
>>> l
<thread._local object at 0x7fe8d5af5fb0>
>>> l.ok = "yes"
>>> l.__dict__
{'ok': 'yes'}
>>> l.__dict__.clear()
>>> l.__dict__
{}
>>> l.ok
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'thread._local' object has no attribute 'ok'
Accessing the __dict__ directly is specifically called out as a valid way to interact with the local object in the _threading_local module documentation:
Thread-local objects support the management of thread-local data.
If you have data that you want to be local to a thread, simply create
a thread-local object and use its attributes:
>>> mydata = local()
>>> mydata.number = 42
>>> mydata.number
42
You can also access the local-object's dictionary:
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]

in python ,if i edit this: type("myclass", (foo,), {"bar()", barbar}, how can i access the attribute "bar()"

>>> FooChild = type("FooChild", (Foo,), {"echobar()":echo_bar})
>>> FooChild().echobar()
Traceback (most recent call last):
File "<pyshell#214>", line 1, in <module>
FooChild().echobar()
AttributeError: 'FooChild' object has no attribute 'echobar'
>>> FooChild().echobar
Traceback (most recent call last):
File "<pyshell#215>", line 1, in <module>
FooChild().echobar
AttributeError: 'FooChild' object has no attribute 'echobar'
>>> hasattr(FooChild, "echobar()")
True
>>> FooChild().echobar()()
Traceback (most recent call last):
File "<pyshell#217>", line 1, in <module>
FooChild().echobar()()
AttributeError: 'FooChild' object has no attribute 'echobar'
Remove those parentheses:
FooChild = type("FooChild", (Foo,), {"echobar":echo_bar})
The name of a function is without the parentheses. Appending them means to call the function. Without the parentheses you have a reference on the function itself (e. g. for passing a function to things like sort or map).
echobar() is an invalid identifier in python, so you can't access it directly i.e using the dot syntax:
>>> FooChild = type("FooChild", (Foo,), {"echobar()":10})
Use __dict__ or getattr:
>>> FooChild.__dict__['echobar()']
10
>>> getattr(FooChild, 'echobar()')
10
If you want to use it as an attribute then simply get rid of the parenthesis:
>>> FooChild = type("FooChild", (Foo,), {"echobar":10})
>>> FooChild.echobar
10
If you want to use it as a method, then:
>>> def echobar(self):return 10
>>> FooChild = type("FooChild", (Foo,), {'echobar':echobar})
>>> FooChild().echobar()
10
If you pretend to have fancy function with name echobar() in you class, only mean of accessing it is getattr:
class Foo(object):pass
echo_bar =lambda *a: 'bar'
FooChild = type("FooChild", (Foo,), {"echobar()":echo_bar})
print getattr(FooChild(), 'echobar()')()
# bar

Gremlin / Bulbflow: how to get an integer result out of execute()

Sorry if this question is too stupid to ask... I am a newbie on Python+Django+Bulbs+Neo4j.
I am attempting --without success-- to get an integer produced by g.gremlin.execute() while using Python+Django shell, as detailed below.
First, the query in Neo4j's Gremlin console:
gremlin> g.v(2).out
==> v[6]
==> v[4]
==> v[8]
==> v[7]
gremlin> g.v(2).out.count()
==> 4
What I intend to do it to get this result in Python+Django shell, passing it to a variable, as tried below:
>>> from bulbs.neo4jserver import Graph
>>> from bulbs.model import Node,Relationship
>>> g = Graph()
>>> sc = " g.v(vertex_id).out.count()"
>>> params = dict(vertex_id = 2)
>>> val = g.gremlin.execute(sc,params)
>>> val
<bulbs.neo4jserver.client.Neo4jResponse object at 0x243cfd0>
I can't get any further from now on.
>>> val.one()
<bulbs.neo4jserver.client.Neo4jResult object at 0x2446b90>
>>> val.one().data
>>> val.one().results
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'Neo4jResult' object has no attribute 'results'
Could anyone please tell me what am I doing wrong?
Many thanks!
Raw result data is going to be in the Result object's raw attribute:
>>> from bulbs.neo4jserver import Graph
>>> from bulbs.model import Node,Relationship
>>> g = Graph()
>>> script = " g.v(vertex_id).out.count()"
>>> params = dict(vertex_id = 2)
>>> resp = g.gremlin.execute(script,params)
>>> result = resp.one()
>>> result.raw
NOTE: result.data returns an element's property data, so it will be empty unless you are returning a vertex or edge, i.e. a node or relationship in Neo4j parlance.
See...
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L60
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L88
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L167
To see what Neo4j Server returned in the server response, you can output the Response headers and content:
>>> from bulbs.neo4jserver import Graph
>>> from bulbs.model import Node,Relationship
>>> g = Graph()
>>> script = "g.v(vertex_id).out.count()"
>>> params = dict(vertex_id = 2)
>>> resp = g.gremlin.execute(script,params)
>>> resp.headers
>>> resp.content
And if you set the loglevel to DEBUG in Config, you'll be able to see what's being sent to the server on each request. When DEBUG is enabled, Bulbs also sets the raw attribute on the Response object (not to be confused with the raw attribute that is always set on the Result object). Response.raw will contain the raw server response:
>>> from bulbs.neo4jserver import Graph, DEBUG
>>> from bulbs.model import Node,Relationship
>>> g = Graph()
>>> g.config.set_logger(DEBUG)
>>> script = " g.v(vertex_id).out.count()"
>>> params = dict(vertex_id = 2)
>>> resp = g.gremlin.execute(script,params)
>>> resp.raw
See...
https://github.com/espeed/bulbs/blob/master/bulbs/config.py#L70
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L221
http://bulbflow.com/quickstart/#enable-debugging
To turn off DEBUG, set the loglevel back to ERROR:
>>> from bulbs.neo4jserver import ERROR
>>> g.config.set_logger(ERROR)
See...
http://bulbflow.com/quickstart/#disable-debugging

Categories

Resources