How can i get attribute number - python

I use BS4 to parser .xml,i want to get resattribute number,but get none
how to do it ?
source xml
`<digitizer id="1" integrated="true" csrmusttouch="falsehardprox="true"
physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080" />`
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />`
Python Code
a = data_xml.find('digitizer',id="1")
b = a.find('properties')
print(b.get('res'))
Result
None

I have taken your data as html
html="""<digitizer id="1" integrated="true" csrmusttouch="falsehardprox="true"
physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080" />`
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />"""
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
Code::
You can find all property tag and then find res value associate to it!
a = soup.find('digitizer',attrs={"id":"1"})
properties=a.find_all("property")
res_lst=[i['res'] for i in properties]
Output::
['621.7457275', '983.9639893', '0', '1', '1.861861944']

Your xml seems poorly formatted, after reformatting it:
<digitizer id="1" integrated="true" csrmusttouch="" falsehardprox="true" physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080"/>
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />
You can easily parse it like this:
from bs4 import BeautifulSoup
with open('data.xml') as raw_resuls:
results = BeautifulSoup(raw_resuls, 'lxml')
for element in results.find_all("properties"):
for property_tag in element.find_all("property"):
print(property_tag['res'])
Output:
621.7457275
983.9639893
0
1
1.861861944
You can find more info about parsing attribute values from xml in the tutorial where the code is from.
Edit: Note that I slightly modified the code to fit your question.

Related

Delete everything in file after last appearance string

I want to make a program which look through files, finds every incomplete file (without </module> at the end), then it will print last found abnumber in file and delete everyline (including the last with abnumber) after it.
So my file looks like that:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />
and at the end it should looks like:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
with printed: 4
I started by doing something like that but I feel like I am doing everything wrong:
import os
Mainfile = 'path'
for filename in os.listdir(Mainfile):
lines = filename.readlines()
if not "</Module>" in lines:
with open(filename, 'r+', encoding="utf-8") as file:
line_list = list(file)
line_list.reverse()
for line in line_list:
if line.find('absno') != -1:
print(line)
You can use re to get your result :
<object([\s\S]*?)<\/object> to get correct <object... </object> tag
abnumber=\"([0-9.]+) to get abnnumber for incorrect tag
<Module.*|<object(?:[\s\S]*?)<\/object> to get correct format of xml data
import re
data = """<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />"""
invalid_XML_Tag = re.sub("<object([\s\S]*?)<\/object>", '', data)
abnnumber_value = re.findall("abnumber=\"([0-9.]+)", invalid_XML_Tag)
print("abnumber of invalid tag => {0}".format(abnnumber_value))
correct_xml_format = re.findall("<Module.*|<object(?:[\s\S]*?)<\/object>",data)
print("".join(correct_xml_format))
Output:
abnumber of invalid tag => ['4']
<Module bs="Mainfile_1"><object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object><object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object><object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>

How to match a string to mapping xml tag using Python?

I would like to map a value from an XML file.
<Country>
<number no="2008" info="update">
<detail name="man1" class="A1\X4">
<string name="ruth" />
<string name="amy" />
</detail>
<detail name="man2" class="A2">
<string name="lisa" />
<string name="graham" />
</detail>
</number>
</Country>
I need to get the value of the number in here <number no="2008" by mapping with this value class="A1\X4"
T tried this way:
stringno = 'A1'
for family in ReadXML.findall('number/detail[#class="{}"]/..'.format(stringno)):
name = family.get('no')
print(name)
it only works if the stringno="A1\X4". But I need to mapping it if the stringno = 'A1'. Is there any matching function in python to solve this problem? maybe -like or -contain?
Thank you for the information.
Hi what about using an iterative method.
Full code
import xml.etree.ElementTree as ET
tree = ET.parse('myXml.xml')
root = tree.getroot()
stringno = 'A1'
for family in root.findall('number'):
for elem in family:
if stringno in elem.get('class'):
print('no: {}, name: {}, class: {}'.format(family.get('no'), elem.get('name'), elem.get('class')))
Input
myXml.xml
<Country>
<number no="2008" info="update">
<detail name="man1" class="A1\X4">
<string name="ruth" />
<string name="amy" />
</detail>
<detail name="man2" class="A2">
<string name="lisa" />
<string name="graham" />
</detail>
</number>
<number no="2009" info="update">
<detail name="man1" class="A1\X5">
<string name="ruth" />
<string name="amy" />
</detail>
<detail name="man2" class="A3">
<string name="lisa" />
<string name="graham" />
</detail>
</number>
</Country>
Output
no: 2008, name: man1, class: A1\X4
no: 2009, name: man1, class: A1\X5

How to remove only the parent element and not its child elements in Python?

A similar question to the one in JavaScript
I have a xml and want to comment just the parent tag without its children
like in the example below:
<object id="12">
<process name="Developer">
<appdef>
<attributes>
<attribute name="X">
<ProcessValue datatype="number" value="15" />
</attribute>
<attribute name="Y">
<ProcessValue datatype="number" value="59" />
</attribute>
</attributes>
</appdef>
</process>
</object>
and comment just < object > tags
<!--<object id="12">-->
<process name="Developer">
<appdef>
<attributes>
<attribute name="X">
<ProcessValue datatype="number" value="15" />
</attribute>
<attribute name="Y">
<ProcessValue datatype="number" value="59" />
</attribute>
</attributes>
</appdef>
</process>
<!--</object>-->
I have a code to comment the tag but it comment all its children also.
Thank you very much I appreciate any help
Due to confusions I am attaching the whole code:
from xml.dom import minidom
xml = """\
<bpr:release xmlns:bpr="http://www.blueprism.co.uk/product/release">
<object id="0e694daf-836e-44a9-816a-9b8127abb7b2" name="Developer 2
ex" xmlns="http://www.blueprism.co.uk/product/process">
<process name="Developer 2 ex" version="1.0" bpversion="5.0.33.0"
narrative="BO for automation the HTML page
" type="object"
runmode="Exclusive">
<appdef>
<attributes>
<attribute name="X">
<ProcessValue datatype="number" value="15" />
</attribute>
<attribute name="Y">
<ProcessValue datatype="number" value="59" />
</attribute>
</attributes>
</appdef>
</process>
</object>
</bpr:release>
"""
def comment_node(node):
comment = node.ownerDocument.createComment(node.toxml())
print(comment)
node.parentNode.replaceChild(comment, node)
return comment
doc = minidom.parseString(xml).documentElement
comment_node(doc.getElementsByTagName('object')[-1])
xml = doc.toxml()

parse in a xml file an element using minidom

I am triying to get the String "W:_fdsw\Projects\HIL\releases\release_9_1_0\Config\CDNG\UX_EF_TSHIL\UX_EF_TSHIL.CDP" and save it in a variable using minidom
<TOOL id="CONTROLDESKNG" xsi:type="tool">
<TOOL-HOST xsi:type="unicode">tsp:QMUC633107:5018</TOOL-HOST>
<TOOL-NAME xsi:type="unicode">CONTROLDESKNG</TOOL-NAME>
<START-OPTION xsi:type="integer">0</START-OPTION>
<START-PRIORITY xsi:type="integer">0</START-PRIORITY>
<SETTINGS xsi:type="dynamicPropertySet">
<PROPERTY format-rev="1" name="ExpName" propertyType="string" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">K53MU</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="ModelDir" propertyType="uri" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">W:\_fdsw\Projects\HIL\releases\release_9_1_0\Config\CDNG\UX_EF_TSHIL\Variable Descriptions\UX_EF_TSHIL.sdf(#14)</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="PrjFile" propertyType="uri" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">W:\_fdsw\Projects\HIL\releases\release_9_1_0\Config\CDNG\UX_EF_TSHIL\UX_EF_TSHIL.CDP</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="RecordingFormat" propertyType="string" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">MDF</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="ToolState" propertyType="string" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">Online</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="VersionCDNG" propertyType="string" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">5.5</VALUE>
</PROPERTY>
<PROPERTY format-rev="1" name="VersionHILAPI" propertyType="string" readonly="false" xsi:type="_property">
<VALUE xsi:type="unicode">2015-B</VALUE>
</PROPERTY>
</SETTINGS>
</TOOL>
My code is:
xmldoc = minidom.parse('C:\Users\qxn5622\Desktop\EF10018\DEFAULT.tbc')
propertyList = xmldoc.getElementsByTagName('PROPERTY')
for prop in propertyList:
if prop.attributes["name"].value == "ModelDir":
myString = prop.getElementsByTagName("VALUE").value
I think the problem is that the element I am trying to get doesnt have any Id.
Can anybody help me?
This might help
from xml.dom.minidom import parse
dom = parse('C:\Users\qxn5622\Desktop\EF10018\DEFAULT.tbc')
propertyList = dom.getElementsByTagName('PROPERTY')
for prop in propertyList:
if prop.getAttribute('name') == "PrjFile":
myString = prop.getElementsByTagName("VALUE")
print myString[0].firstChild.nodeValue
Output:
W:\_fdsw\Projects\HIL\releases\release_9_1_0\Config\CDNG\UX_EF_TSHIL\UX_EF_TSHIL.CDP

Using XML ElementTree to create list of objects with atrributes

I use the python requests module to get XML from the TeamCity rest api that looks like this:
<triggers count="10">
<trigger id="TRIGGER_1240" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt191"/>
</properties>
</trigger>
<trigger id="TRIGGER_1241" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt171"/>
</properties>
</trigger>
<trigger id="TRIGGER_1242" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt167"/>
</properties>
</trigger>
<trigger id="TRIGGER_1243" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt164"/>
</properties>
</trigger>
<trigger id="TRIGGER_1244" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt364"/>
</properties>
</trigger>
<trigger id="TRIGGER_736" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="Components_Ratchetdb"/>
</properties>
</trigger>
<trigger id="TRIGGER_149" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="Components_Filedb"/>
</properties>
</trigger>
<trigger id="TRIGGER_150" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="bt168"/>
</properties>
</trigger>
<trigger id="TRIGGER_1130" type="buildDependencyTrigger">
<properties count="2">
<property name="afterSuccessfulBuildOnly" value="true"/>
<property name="dependsOn" value="Components_Tbldb"/>
</properties>
</trigger>
<trigger id="vcsTrigger" type="vcsTrigger" inherited="true">
<properties count="3">
<property name="quietPeriod" value="60"/>
<property name="quietPeriodMode" value="USE_DEFAULT"/>
<property name="triggerRules" value="-:version.properties
-:comment=^Incremented:**
-:**/*-schema.sql"/>
</properties>
</trigger>
I am trying to create a list of "trigger" objects using a class. Ideally the object would have id, type, and a list of properties attributes as dictionaries of {name : value}. My code so far is:
class triggerList:
def __init__(self, triggerId, triggerType):
self.id = triggerId
self.type = triggerType
self.properties = []
def add_property(self, buildProperty):
self.properties.append(buildProperty)
def getAllTriggers(buildId):
url = path + 'buildTypes/id:' + buildId + '/triggers'
r = requests.get(url, auth=auth)
tree = ElementTree.fromstring(r.content)
listOfTriggers = []
for trigger in tree.iter('trigger'):
triggerType = trigger.get('type')
triggerId = trigger.get('id')
triggerName = str(triggerId)
triggerName = triggerList(triggerId, triggerType)
listOfTriggers.append(triggerName)
for triggerProperty in tree.iter('property'):
propertyName = triggerProperty.get('name')
propertyValue = triggerProperty.get('value')
propDict = {propertyName : propertyValue}
triggerName.add_property(propDict)
This gives me a list of objects but every object has a list of every property dictionary. This is the output:
a = listOfTriggers[1]
print a.id, a.type, a.properties
>>> TRIGGER_1241 buildDependencyTrigger [{'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt191'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt171'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt167'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt164'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt364'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'Components_Ratchetdb'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'Components_Filedb'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'bt168'}, {'afterSuccessfulBuildOnly': 'true'}, {'dependsOn': 'Components_Tbldb'}, {'quietPeriod': '60'}, {'quietPeriodMode': 'USE_DEFAULT'}, {'triggerRules': '-:version.properties\n-:comment=^Incremented:**\n-:**/*-schema.sql'}]
I don't know how to stop the loop for just the properties for a specific trigger. Is there a way to use ElementTree to only get the properties for a specific trigger? Is there a more efficient way to create this object?
Not directly answering the question, but you may be reinventing the wheel here, check lxml.objectify package:
The main idea is to hide the usage of XML behind normal Python
objects, sometimes referred to as data-binding. It allows you to use
XML as if you were dealing with a normal Python object hierarchy.
Accessing the children of an XML element deploys object attribute
access. If there are multiple children with the same name, slicing and
indexing can be used. Python data types are extracted from XML content
automatically and made available to the normal Python operators.
Simple syntax mistake:
for triggerProperty in trigger.iter('property'):
propertyName = triggerProperty.get('name')
propertyValue = triggerProperty.get('value')
propDict = {propertyName : propertyValue}
triggerName.add_property(propDict)
I was iterating over the whole tree, rather than the triggers. Should be:
for triggerProperty in trigger.iter('property'):

Categories

Resources