Python parse standalone-full.xml from Wildfly

Python parse standalone-full.xml from Wildfly - python

I'm trying to parse the standalone-full.xml from Wildfly 8.1 Final with python to extract some information as datasources.
The example XML below.
<?xml version="1.0" ?>
<server xmlns="urn:jboss:domain:2.1">
<profile>
<subsystem xmlns="urn:jboss:domain:datasources:2.0">
<datasources>
<datasource jndi-name="java:jboss/datasources/JNDI" pool-name="JNDI" enabled="true">
<connection-url>jdbc:oracle:thin:#//HOST</connection-url>
<driver>ojdbc6</driver>
<pool>
<min-pool-size>50</min-pool-size>
<max-pool-size>100</max-pool-size>
</pool>
<security>
<user-name>USER</user-name>
<password>USER</password>
</security>
<validation>
<valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
<validate-on-match>false</validate-on-match>
<background-validation>true</background-validation>
<background-validation-millis>10000</background-validation-millis>
<exception-sorter class-name="org.jboss.resource.adapter.jdbc.vendor.OracleExceptionSorter"/>
</validation>
</datasource>
<drivers>
<driver name="h2" module="com.h2database.h2">
<xa-datasource-class>org.h2.jdbcx.JdbcDataSource</xa-datasource-class>
</driver>
<driver name="ojdbc6" module="oracle.ojdbc">
<xa-datasource-class>oracle.ojdbc.xa.client.OracleXADataSource</xa-datasource-class>
</driver>
</drivers>
</datasources>
</subsystem>
</profile>
EDIT: How can I get deeper in the tree?
I tried something like this:
In[16]: from lxml import etree
In[18]: xml = etree.parse('standalone-full.xml')
In[21]: root = xml.getroot()
In[28]: children = root[0].getchildren()
In[31]: children[0]
Out[31]: <Element {urn:jboss:domain:datasources:2.0}subsystem at 0x4bef208>
In[32]: datasources = children[0]
In[33]: datasources.getchildren()
Out[33]: [<Element {urn:jboss:domain:datasources:2.0}datasources at 0x4befa48>]

Your question is rather unspecific, but as far as I can see from the regex you posted, you want to grab the text values of the connection-url, user-name, and password nodes under each datasource node that has a pool-name attribute with a value of JNDI. Here is one possibility of doing that (tested under Python 2.7):
import xml.etree.cElementTree as ET
ns = {'ds': 'urn:jboss:domain:datasources:2.0'}
root = ET.parse('standalone-full.xml').getroot()
children = root.findall(".//ds:datasource[#pool-name='JNDI']", ns)
for child in children:
print child.find("ds:connection-url", ns).text
security = child.find("ds:security", ns)
print security.find("ds:user-name", ns).text
print security.find("ds:password", ns).text

You could use Augeas to parse it:
$ augtool -At "Xml.lns incl $PWD/standalone-full.xml"
augtool> get //standalone-full.xml//datasource//password/#text
//standalone-full.xml//datasource//password/#text = USER
Just use the python-augeas bindings with Python:
import augeas
a = augeas.Augeas(flags=augeas.Augeas.NO_MODL_AUTOLOAD)
a.transform("Xml", "/home/raphink/bas/augeas/standalone-full.xml")
a.load()
v = a.get("//standalone-full.xml//datasource//password/#text")

I've solved my problem with regex which is a bad idea but it works.
import re
data = "standalone-full.xml"
regex_result = re.findall(r'.*:domain:datasources[\S\s]*?pool-name="JNDI"[\S\s]*?connection-url>.*' +
'#//(.*)<.*[\S\s]*?user-name>(.*)<.*\s*<password>(.*)<', data, re.M)

Related

Iterating through xml file

I am trying to get all surnames from xml file, but if I am trying to use find, It throws an exception
TypeError: 'NoneType' object is not iterable
This is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
for subsubelem in subelem.find('surname'):
print(subsubelem.text)
When I remove the find('surname') from code, It returning all texts from subsubelements.
This is xml:
<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>
How should I fix it?

Not really a python person, but should the "find" statement include the "pp:" in its search, such as,
find('pp:surname')
Neither the opening nor closing tags actually match "surname".

Use the namespace when you call findall
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>'''
ns = {'pp': 'http://xmlns.page.com/path/subpath'}
root = ET.fromstring(xml)
names = [sn.text for sn in root.findall('.//pp:surname', ns)]
print(names)
output
['Walker', 'Jordan']

lxml non-recursive full tag

Given the following xml:
<node a='1' b='1'>
<subnode x='25'/>
</node>
I would like to extract the tagname and all attributes for the first node, i.e., the verbatim code:
<node a='1' b='1'>
without the subnode.
For example in Python, tostring returns too much:
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
print(etree.tostring(root))
returns
b'<node a="1" b="1"><subnode x="25">some text</subnode></node>'
The following gives the desired result, but is much too verbose:
tag = root.tag
for att, val in root.attrib.items():
tag += ' '+att+'="'+val+'"'
tag = '<'+tag+'>'
print(tag)
result:
<node a="1" b="1">
What is an easier (and guaranteed attribute order preserving) way of doing this?

You can remove all of the subnodes.
from lxml import etree
root = etree.fromstring("<node a='1' b='1'><subnode x='25'>some text</subnode></node>")
for subnode in root.xpath("//subnode"):
subnode.getparent().remove(subnode)
etree.tostring(root) # '<node a="1" b="1"/>'
Alternatively, you can use a simple regex. Order is guaranteed.
import re
res = re.search('<(.*?)>', etree.tostring(root))
res.group(1) # "node a='1' b='1'"

Parsing XML with ElementTree in Python

I have XML like this:
<parameter>
<name>ec_num</name>
<value>none</value>
<units/>
<url/>
<id>2455</id>
<m_date>2008-11-29 13:15:14</m_date>
<user_id>24</user_id>
<user_name>registry</user_name>
</parameter>
<parameter>
<name>swisspro</name>
<value>Q8H6N2</value>
<units/>
I want to parse the XML and extract the <value> entry which is just below the <name> entry marked 'swisspro'. I.e. I want to parse and extract the 'Q8H6N2' value.
How would I do this using ElementTree?

It would by much easier to do via lxml, but here' a solution using ElementTree library:
import xml.etree.ElementTree as ET
data = """<parameters>
<parameter>
<name>ec_num</name>
<value>none</value>
<units/>
<url/>
<id>2455</id>
<m_date>2008-11-29 13:15:14</m_date>
<user_id>24</user_id>
<user_name>registry</user_name>
</parameter>
<parameter>
<name>swisspro</name>
<value>Q8H6N2</value>
<units/>
</parameter>
</parameters>"""
tree = ET.fromstring(data)
for parameter in tree.iter(tag='parameter'):
name = parameter.find('name')
if name is not None and name.text == 'swisspro':
print parameter.find('value').text
break
prints:
Q8H6N2
The idea is pretty simple: iterate over all parameter tags, check the value of the name tag and if it is equal to swisspro, get the value element.
Hope that helps.

Here is an example:
xml file
<span style="font-size:13px;"><?xml version="1.0" encoding="utf-8"?>
<root>
<person age="18">
<name>hzj</name>
<sex>man</sex>
</person>
<person age="19" des="hello">
<name>kiki</name>
<sex>female</sex>
</person>
</root></span>
parse method
from xml.etree import ElementTree
def print_node(node):
'''print basic info'''
print "=============================================="
print "node.attrib:%s" % node.attrib
if node.attrib.has_key("age") > 0 :
print "node.attrib['age']:%s" % node.attrib['age']
print "node.tag:%s" % node.tag
print "node.text:%s" % node.text
def read_xml(text):
'''read xml file'''
# root = ElementTree.parse(r"D:/test.xml") #first method
root = ElementTree.fromstring(text) #second method
# get element
# 1 by getiterator
lst_node = root.getiterator("person")
for node in lst_node:
print_node(node)
# 2 by getchildren
lst_node_child = lst_node[0].getchildren()[0]
print_node(lst_node_child)
# 3 by .find
node_find = root.find('person')
print_node(node_find)
#4. by findall
node_findall = root.findall("person/name")[1]
print_node(node_findall)
if __name__ == '__main__':
read_xml(open("test.xml").read())

Reading Maven Pom xml in Python

I have a pom file that has the following defined:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.welsh</groupId>
<artifactId>my-site</artifactId>
<version>1.0.0</version>
<packaging>pom</packaging>
<profiles>
<profile>
<build>
<plugins>
<plugin>
<groupId>org.welsh.utils</groupId>
<artifactId>site-tool</artifactId>
<version>1.0</version>
<executions>
<execution>
<configuration>
<mappings>
<property>
<name>homepage</name>
<value>/content/homepage</value>
</property>
<property>
<name>assets</name>
<value>/content/assets</value>
</property>
</mappings>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
And I am looking to build a dictionary off the name & value elements under property under the mappings element.
So what I'm trying to figure out how to get all possible mappings elements (Incase of multiple build profiles) so I can get all property elements under it and from reading about Supported XPath syntax the following should print out all possible text/value elements:
import xml.etree.ElementTree as xml
pomFile = xml.parse('pom.xml')
root = pomFile.getroot()
for mapping in root.findall('*/mappings'):
for prop in mapping.findall('.//property'):
logging.info(prop.find('name').text + " => " + prop.find('value').text)
Which is returning nothing. I tried just printing out all the mappings elements and get:
>>> print root.findall('*/mappings')
[]
And when I print out the everything from root I get:
>>> print root.findall('*')
[<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x10b38bd50>, <Element '{http://maven.apache.org/POM/4.0.0}groupId' at 0x10b38bd90>, <Element '{http://maven.apache.org/POM/4.0.0}artifactId' at 0x10b38bf10>, <Element '{http://maven.apache.org/POM/4.0.0}version' at 0x10b3900d0>, <Element '{http://maven.apache.org/POM/4.0.0}packaging' at 0x10b390110>, <Element '{http://maven.apache.org/POM/4.0.0}name' at 0x10b390150>, <Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x10b390190>, <Element '{http://maven.apache.org/POM/4.0.0}build' at 0x10b390310>, <Element '{http://maven.apache.org/POM/4.0.0}profiles' at 0x10b390390>]
Which made me try to print:
>>> print root.findall('*/{http://maven.apache.org/POM/4.0.0}mappings')
[]
But that's not working either.
Any suggestions would be great.
Thanks,

The main issues of the code in the question are
that it doesn't specify namespaces, and
that it uses */ instead of // which only matches direct children.
As you can see at the top of the XML file, Maven uses the namespace http://maven.apache.org/POM/4.0.0. The attribute xmlns in the root node defines the default namespace. The attribute xmlns:xsi defines a namespace that is only used for xsi:schemaLocation.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
To specify tags like profile in methods like find, you have to specify the namespace as well. For example, you could write the following to find all profile-tags.
import xml.etree as xml
pom = xml.parse('pom.xml')
for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'):
print(repr(profile))
Also note that I'm using //. Using */ would have the same result for your specific xml file above. However, it would not work for other tags like mappings. Since * represents only one level, */child can be expanded to parent/tag or xyz/tag but not to xyz/parent/tag.
Now, you should be able to come up with something like this to find all mappings:
pom = xml.parse('pom.xml')
map = {}
for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings'
'/{http://maven.apache.org/POM/4.0.0}property'):
name = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text
value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text
map[name] = value
Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find and findall:
# ...
nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'}
for mapping in pom.findall('//m:mappings/m:property', nsmap):
name = mapping.find('m:name', nsmap).text
value = mapping.find('m:value', nsmap).text
map[name] = value

Ok, found out that when I remove maven stuff from the project element so its just <project> I can do this:
for mapping in root.findall('*//mappings'):
logging.info(mapping)
for prop in mapping.findall('./property'):
logging.info(prop.find('name').text + " => " + prop.find('value').text)
Which would result in:
INFO:root:<Element 'mappings' at 0x10d72d350>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets
However, if I leave the Maven stuff in at the top I can do this:
for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'):
logging.info(mapping)
for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'):
logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text)
Which results in:
INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets
However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.
EDIT:
Ok, I managed to get something a bit more verbose:
import xml.etree.ElementTree as xml
def getMappingsNode(node, nodeName):
if node.findall('*'):
for n in node.findall('*'):
if nodeName in n.tag:
return n
else:
return getMappingsNode(n, nodeName)
def getMappings(rootNode):
mappingsNode = getMappingsNode(rootNode, 'mappings')
mapping = {}
for prop in mappingsNode.findall('*'):
key = ''
val = ''
for child in prop.findall('*'):
if 'name' in child.tag:
key = child.text
if 'value' in child.tag:
val = child.text
if val and key:
mapping[key] = val
return mapping
pomFile = xml.parse('pom.xml')
root = pomFile.getroot()
mappings = getMappings(root)
print mappings

Accessing XMLNS attribute with Python Elementree?

How can one access NS attributes through using ElementTree?
With the following:
<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22">
When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..

I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.
>>> from xml.etree import ElementTree as ET
>>> data = '''<data xmlns="http://www.foo.net/a"
... xmlns:a="http://www.foo.net/a"
... book="1" category="ABS" date="2009-12-22"/>'''
>>> element = ET.fromstring(data)
>>> element
<Element {http://www.foo.net/a}data at 1013b74d0>
>>> element.tag
'{http://www.foo.net/a}data'
>>> element.attrib
{'category': 'ABS', 'date': '2009-12-22', 'book': '1'}
If you just want to know the xmlns URI, you can split it out with a function like:
def tag_uri_and_name(elem):
if elem.tag[0] == "{":
uri, ignore, tag = elem.tag[1:].partition("}")
else:
uri = None
tag = elem.tag
return uri, tag
For much more on namespaces and qualified names in ElementTree, see effbot's examples.

Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.
However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.
Here's what I came up with:
import elementtree.ElementTree as ET
def parse_and_get_ns(file):
events = "start", "start-ns"
root = None
ns = {}
for event, elem in ET.iterparse(file, events):
if event == "start-ns":
if elem[0] in ns and ns[elem[0]] != elem[1]:
# NOTE: It is perfectly valid to have the same prefix refer
# to different URI namespaces in different parts of the
# document. This exception serves as a reminder that this
# solution is not robust. Use at your own peril.
raise KeyError("Duplicate prefix with different URI found.")
ns[elem[0]] = "{%s}" % elem[1]
elif event == "start":
if root is None:
root = elem
return ET.ElementTree(root), ns
With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"\
>
<feed>
<item>
<title>Foo</title>
<dc:creator>Joe McGroin</dc:creator>
<description>etc...</description>
</item>
</feed>
</rss>
You will be able to use the xml namepaces and get info for elements like dc:creator:
>>> tree, ns = parse_and_get_ns("my.xml")
>>> ns
{u'content': '{http://purl.org/rss/1.0/modules/content/}',
u'dc': '{http://purl.org/dc/elements/1.1/}'}
>>> item = tree.find("/feed/item")
>>> item.findtext(ns['dc']+"creator")
'Joe McGroin'

Try this:
import xml.etree.ElementTree as ET
import re
import sys
with open(sys.argv[1]) as f:
root = ET.fromstring(f.read())
xmlns = ''
m = re.search('{.*}', root.tag)
if m:
xmlns = m.group(0)
print(root.find(xmlns + 'the_tag_you_want').text)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parse standalone-full.xml from Wildfly - python

I've solved my problem with regex which is a bad idea but it works. import re data = "standalone-full.xml" regex_result = re.findall(r'.:domain:datasources[\S\s]?pool-name="JNDI"[\S\s]?connection-url>.' + '#//(.)<.[\S\s]?user-name>(.)<.\s<password>(.*)<', data, re.M)

Related

Iterating through xml file

lxml non-recursive full tag

Parsing XML with ElementTree in Python

Reading Maven Pom xml in Python

Accessing XMLNS attribute with Python Elementree?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parse standalone-full.xml from Wildfly - python

I've solved my problem with regex which is a bad idea but it works. import re data = "standalone-full.xml" regex_result = re.findall(r'.*:domain:datasources[\S\s]*?pool-name="JNDI"[\S\s]*?connection-url>.*' + '#//(.*)<.*[\S\s]*?user-name>(.*)<.*\s*<password>(.*)<', data, re.M)

Related

Iterating through xml file

lxml non-recursive full tag

Parsing XML with ElementTree in Python

Reading Maven Pom xml in Python

Accessing XMLNS attribute with Python Elementree?

Categories

Resources

I've solved my problem with regex which is a bad idea but it works. import re data = "standalone-full.xml" regex_result = re.findall(r'.:domain:datasources[\S\s]?pool-name="JNDI"[\S\s]?connection-url>.' + '#//(.)<.[\S\s]?user-name>(.)<.\s<password>(.*)<', data, re.M)