Python Elementtree access children through attribute value

Python Elementtree access children through attribute value - python

How can I access the tag with the attribute value "test5" and then its child "loc" and the child "rot" with python and elementtree.
After this i want to store every value of the element loc x, y, z in a seperate variable.
<item name="test1">
<loc x="0" y="0" z="0"/>
<rot x="1" y="0" z="0" radian="0"/>
</item>
<item name="test2">
<loc x="22" y="78.7464" z="109.131"/>
<rot x="-1" y="0" z="0" radian="1.35263"/>
</item>
<item name="test3">
<loc x="-28" y="-106.911" z="71.0443"/>
<rot x="0" y="0.779884" z="-0.625923" radian="3.14159"/>
</item>
<item name="test4">
<loc x="38" y="51.6772" z="94.9353"/>
<rot x="1" y="0" z="0" radian="0.218166"/>
</item>
<item name="test5">
<loc x="-38" y="-86.9568" z="64.2009"/>
<rot x="0" y="-0.108867" z="0.994056" radian="3.14159"/>
</item>
I have tried multiple variants, but i have no clue, how to do it.

This is one way to do it:
>>> import xml.etree.ElementTree as ET
>>> data = '''<root>
... <item name="test1">
... <loc x="0" y="0" z="0"/>
... <rot x="1" y="0" z="0" radian="0"/>
... </item>
... <item name="test2">
... <loc x="22" y="78.7464" z="109.131"/>
... <rot x="-1" y="0" z="0" radian="1.35263"/>
... </item>
... <item name="test3">
... <loc x="-28" y="-106.911" z="71.0443"/>
... <rot x="0" y="0.779884" z="-0.625923" radian="3.14159"/>
... </item>
... <item name="test4">
... <loc x="38" y="51.6772" z="94.9353"/>
... <rot x="1" y="0" z="0" radian="0.218166"/>
... </item>
... <item name="test5">
... <loc x="-38" y="-86.9568" z="64.2009"/>
... <rot x="0" y="-0.108867" z="0.994056" radian="3.14159"/>
... </item>
... </root>'''
>>> tree = ET.fromstring(data)
>>> for child in tree.findall("./item[#name='test5']/"):
... print child.tag, child.attrib
...
This gives:
loc {'y': '-86.9568', 'x': '-38', 'z': '64.2009'}
rot {'y': '-0.108867', 'x': '0', 'z': '0.994056', 'radian': '3.14159'}
It uses the XPath notation to access the element you are interested in. Furthermore, child.attrib is a dictionary. You can access the values of x, y and z as child.attrib['x'] and so on

Related

Delete everything in file after last appearance string

I want to make a program which look through files, finds every incomplete file (without </module> at the end), then it will print last found abnumber in file and delete everyline (including the last with abnumber) after it.
So my file looks like that:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />
and at the end it should looks like:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
with printed: 4
I started by doing something like that but I feel like I am doing everything wrong:
import os
Mainfile = 'path'
for filename in os.listdir(Mainfile):
lines = filename.readlines()
if not "</Module>" in lines:
with open(filename, 'r+', encoding="utf-8") as file:
line_list = list(file)
line_list.reverse()
for line in line_list:
if line.find('absno') != -1:
print(line)

You can use re to get your result :
<object([\s\S]*?)<\/object> to get correct <object... </object> tag
abnumber=\"([0-9.]+) to get abnnumber for incorrect tag
<Module.*|<object(?:[\s\S]*?)<\/object> to get correct format of xml data
import re
data = """<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />"""
invalid_XML_Tag = re.sub("<object([\s\S]*?)<\/object>", '', data)
abnnumber_value = re.findall("abnumber=\"([0-9.]+)", invalid_XML_Tag)
print("abnumber of invalid tag => {0}".format(abnnumber_value))
correct_xml_format = re.findall("<Module.*|<object(?:[\s\S]*?)<\/object>",data)
print("".join(correct_xml_format))
Output:
abnumber of invalid tag => ['4']
<Module bs="Mainfile_1"><object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object><object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object><object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>

How can i get attribute number

I use BS4 to parser .xml，i want to get resattribute number，but get none
how to do it ?
source xml
`<digitizer id="1" integrated="true" csrmusttouch="falsehardprox="true"
physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080" />`
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />`
Python Code
a = data_xml.find('digitizer',id="1")
b = a.find('properties')
print(b.get('res'))
Result
None

I have taken your data as html
html="""<digitizer id="1" integrated="true" csrmusttouch="falsehardprox="true"
physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080" />`
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />"""
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,"html.parser")
Code::
You can find all property tag and then find res value associate to it!
a = soup.find('digitizer',attrs={"id":"1"})
properties=a.find_all("property")
res_lst=[i['res'] for i in properties]
Output::
['621.7457275', '983.9639893', '0', '1', '1.861861944']

Your xml seems poorly formatted, after reformatting it:
<digitizer id="1" integrated="true" csrmusttouch="" falsehardprox="true" physidcsrs="false" pnpid="49154" kind="MULTI_TOUCH" maxcsrs="10">
<monitor left="0" top="0" right="1920" bottom="1080"/>
<properties>
<property name="x" logmin="0" logmax="16383" res="621.7457275" unit="cm" hidusage="0x00010030" guid="{598A6A8F-52C0-4BA0-93AF-AF357411A561}" />
<property name="y" logmin="0" logmax="16383" res="983.9639893" unit="cm" hidusage="0x00010031" guid="{B53F9F75-04E0-4498-A7EE-C30DBB5A9011}" />
<property name="status" logmin="0" logmax="15" res="0" unit="DEFAULT" hidusage="0x000d0042, 0x000d003c, 0x000d0044" guid="{6E0E07BF-AFE7-4CF7-87D1-AF6446208418}" />
<property name="time" logmin="0" logmax="2147483647" res="1" unit="DEFAULT" guid="{436510C5-FED3-45D1-8B76-71D3EA7A829D}" />
<property name="contactid" logmin="0" logmax="31" res="1.861861944" unit="cm" hidusage="0x000d0051" guid="{02585B91-049B-4750-9615-DF8948AB3C9C}" />
You can easily parse it like this:
from bs4 import BeautifulSoup
with open('data.xml') as raw_resuls:
results = BeautifulSoup(raw_resuls, 'lxml')
for element in results.find_all("properties"):
for property_tag in element.find_all("property"):
print(property_tag['res'])
Output:
621.7457275
983.9639893
0
1
1.861861944
You can find more info about parsing attribute values from xml in the tutorial where the code is from.
Edit: Note that I slightly modified the code to fit your question.

Change the atribute of the xml tree in python

I have a problem with change of the atribute at the xml file.
My tree looks like that
<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="false"/>
<Item Name="s_2" Selected="false"/>
<Item Name="s_3" Selected="true"/>
<Item Name="s_4" Selected="false"/>
</ItemList>
</BigObj >
</Objects>
And i need to check if "s_x"is in list of names and if it is then change the value of Selected to true, if it's not to false (or keep it false)
I've tried to do that with this code:
lslist = ["s_1","s_4"]
for child in root.findall("./Objects/BigObj/ItemList/Item"):
for idx in lslist:
if idx in child.find("Name").text:
child.set('Selected', "true")
else:
child.set('Selected', "false")
But i have an AttributeError: 'NoneType' object has no attribute 'text'

The below works
import xml.etree.ElementTree as ET
lslist = ["s_1", "s_4"]
xml = '''<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="false"/>
<Item Name="s_2" Selected="false"/>
<Item Name="s_3" Selected="true"/>
<Item Name="s_4" Selected="false"/>
</ItemList>
</BigObj ></Objects>'''
root = ET.fromstring(xml)
items = root.findall('.//Item')
for item in items:
item.attrib['Selected'] = str(item.attrib['Name'] in lslist)
ET.dump(root)
output
<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="True" />
<Item Name="s_2" Selected="False" />
<Item Name="s_3" Selected="False" />
<Item Name="s_4" Selected="True" />
</ItemList>
</BigObj></Objects>

Extract info based on name tag from XML file by beautifulsoup python

In python 3.5 -- I'm using Entrez biopython for extract some info from Database = pmc in pubmed biomedical website. Now I want to from XML file:
<DocSum>
<Id>5412469</Id>
<Item Name="PubDate" Type="Date">2017 Apr 22</Item>
<Item Name="EPubDate" Type="Date">2017 Apr 22</Item>
<Item Name="Source" Type="String">Int J Mol Sci</Item>
<Item Name="AuthorList" Type="List">
<Item Name="Author" Type="String">Guo Y</Item>
<Item Name="Author" Type="String">Bao Y</Item>
<Item Name="Author" Type="String">Yang W</Item>
</Item>
<Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
<Item Name="Volume" Type="String">18</Item>
<Item Name="Issue" Type="String">4</Item>
<Item Name="Pages" Type="String">890</Item>
<Item Name="ArticleIds" Type="List">
<Item Name="pmid" Type="String">28441730</Item>
<Item Name="doi" Type="String">10.3390/ijms18040890</Item>
<Item Name="pmcid" Type="String">PMC5412469</Item>
</Item>
<Item Name="DOI" Type="String">10.3390/ijms18040890</Item>
<Item Name="FullJournalName" Type="String">International Journal of Molecular Sciences</Item>
<Item Name="SO" Type="String">2017 Apr 22;18(4):890</Item>
extract Name=Title {Exact below line} :
<Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
But How can I solve this issue?
Although I've been used this code :
for tag in soup.findAll("docsum"): # I'm working with multiple articles in one file
for a_tag in tag.findAll("item"):
a_recs.append(a_tag.text)
return a_recs
But it returns all the values in one list while I want just title. such as below :
['2017 Apr 22', '2017 Apr 22', 'Int J Mol Sci', '\nGuo Y\nBao Y\nYang W\n', 'Guo Y', 'Bao Y', 'Yang W', 'Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis', '18', '4', '890', '\n28441730\n10.3390/ijms18040890\nPMC5412469\n', '28441730', '10.3390/ijms18040890', 'PMC5412469', '10.3390/ijms18040890', 'International Journal of Molecular Sciences', '2017 Apr 22;18(4):890']

Try:
>>> data = '''
... <DocSum>
... <Id>5412469</Id>
... <Item Name="PubDate" Type="Date">2017 Apr 22</Item>
... <Item Name="EPubDate" Type="Date">2017 Apr 22</Item>
... <Item Name="Source" Type="String">Int J Mol Sci</Item>
... <Item Name="AuthorList" Type="List">
... <Item Name="Author" Type="String">Guo Y</Item>
... <Item Name="Author" Type="String">Bao Y</Item>
... <Item Name="Author" Type="String">Yang W</Item>
... </Item>
... <Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
... <Item Name="Volume" Type="String">18</Item>
... <Item Name="Issue" Type="String">4</Item>
... <Item Name="Pages" Type="String">890</Item>
... <Item Name="ArticleIds" Type="List">
... <Item Name="pmid" Type="String">28441730</Item>
... <Item Name="doi" Type="String">10.3390/ijms18040890</Item>
... <Item Name="pmcid" Type="String">PMC5412469</Item>
... </Item>
... <Item Name="DOI" Type="String">10.3390/ijms18040890</Item>
... <Item Name="FullJournalName" Type="String">International Journal of Molecular Sciences</Item>
... <Item Name="SO" Type="String">2017 Apr 22;18(4):890</Item>'''
>>>
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(data, 'xml')
>>> for tag in soup.findAll("DocSum"):
... for a_tag in tag.find("Item", {"Name" : "Title"}):
... a_recs.append(a_tag)
...
>>> a_recs
['Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis']

Element is no longer valid

I face a problem with Python and Selenium.
I want to click a link, this is my Py code:
toubao_luru_xpath='//div[87]/xml/items/item[2]/item[#path=policynewbiz/inputapplication/chooseproduct.jsp]'
#url=policynewbiz/inputapplication/chooseproduct.jsp
print WebDriverWait(browser,10).until(EC.presence_of_element_located((By.XPATH,toubao_luru_xpath)))
print browser.find_element_by_xpath(toubao_luru_xpath)
#print browser.find_element_by_xpath(toubao_luru_xpath).click()
The error is:
File
"C:\Python27\lib\site-packages\selenium\webdriver\support\wait.py",
line 80, in until
raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: Yes,
And this is the HTML code:
<html>
<DIV style="DISPLAY: none"><xml id=__menu>
<items>
<item name="quotation" label="散单报价" >
<item name="input" label="录入" path="policyquotation/createquotation/chooseproduct.jsp" icon="../image/icon/1.gif" visible="false" ></item>
<item name="quotationinput" label="录入" icon="../image/icon/1.gif" visible="false" command="commandCreateQuotationTemplate" ></item>
<item name="quotationinput2014" label="录入" path="policyquotation_v2/chooseproduct2014.jsp" icon="../image/icon/1.gif" ></item>
<item name="enterquotation" label="enterquotation" visible="false" ></item>
<item name="queryQuotation" label="查询" path="policyquotation/qryquotationlist.jsp" icon="../image/icon/2.gif" visible="false" ></item>
<item name="queryQuotation2" label="查询" path="policyquotation_v2/qryquotationlist.jsp" icon="../image/icon/2.gif" ></item>
<item name="packageWork" label="套餐指定" path="policyquotation/package-manage-work.jsp" icon="../image/icon/3.gif" visible="false" ></item>
<item name="querypackage" label="套餐管理" path="policyquotation/query-package-list.jsp" icon="../image/icon/4.gif" visible="false" ></item>
<item name="quotationfollow" label="报价跟进" path="policyquotation/followquotation.jsp" icon="../image/icon/5.gif" visible="false" ></item>
<item name="entererror" label="entererror" path="error.jsp" visible="false" ></item>
<item name="quotationfollownew" label="报价跟进" path="policyquotation_v2/followquotation.jsp" icon="../image/icon/5.gif" visible="false" ></item>
</item>
<item name="application" label="投保" >
<item name="input" label="录入" path="policynewbiz/inputapplication/chooseproduct.jsp" icon="../image/icon/1.gif" ></item>
</html>
The last <item> I want to click

This should work. It's unique given the HTML you provided and I'm assuming that there aren't two links to the same URL so it should be good.
driver.find_element_by_css_selector("item[path='policynewbiz/inputapplication/chooseproduct.jsp']").click()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Elementtree access children through attribute value - python

Related

Delete everything in file after last appearance string

How can i get attribute number

Change the atribute of the xml tree in python

Extract info based on name tag from XML file by beautifulsoup python

Element is no longer valid

Categories

Resources