I face a problem with Python and Selenium.
I want to click a link, this is my Py code:
toubao_luru_xpath='//div[87]/xml/items/item[2]/item[#path=policynewbiz/inputapplication/chooseproduct.jsp]'
#url=policynewbiz/inputapplication/chooseproduct.jsp
print WebDriverWait(browser,10).until(EC.presence_of_element_located((By.XPATH,toubao_luru_xpath)))
print browser.find_element_by_xpath(toubao_luru_xpath)
#print browser.find_element_by_xpath(toubao_luru_xpath).click()
The error is:
File
"C:\Python27\lib\site-packages\selenium\webdriver\support\wait.py",
line 80, in until
raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: Yes,
And this is the HTML code:
<html>
<DIV style="DISPLAY: none"><xml id=__menu>
<items>
<item name="quotation" label="散单报价" >
<item name="input" label="录入" path="policyquotation/createquotation/chooseproduct.jsp" icon="../image/icon/1.gif" visible="false" ></item>
<item name="quotationinput" label="录入" icon="../image/icon/1.gif" visible="false" command="commandCreateQuotationTemplate" ></item>
<item name="quotationinput2014" label="录入" path="policyquotation_v2/chooseproduct2014.jsp" icon="../image/icon/1.gif" ></item>
<item name="enterquotation" label="enterquotation" visible="false" ></item>
<item name="queryQuotation" label="查询" path="policyquotation/qryquotationlist.jsp" icon="../image/icon/2.gif" visible="false" ></item>
<item name="queryQuotation2" label="查询" path="policyquotation_v2/qryquotationlist.jsp" icon="../image/icon/2.gif" ></item>
<item name="packageWork" label="套餐指定" path="policyquotation/package-manage-work.jsp" icon="../image/icon/3.gif" visible="false" ></item>
<item name="querypackage" label="套餐管理" path="policyquotation/query-package-list.jsp" icon="../image/icon/4.gif" visible="false" ></item>
<item name="quotationfollow" label="报价跟进" path="policyquotation/followquotation.jsp" icon="../image/icon/5.gif" visible="false" ></item>
<item name="entererror" label="entererror" path="error.jsp" visible="false" ></item>
<item name="quotationfollownew" label="报价跟进" path="policyquotation_v2/followquotation.jsp" icon="../image/icon/5.gif" visible="false" ></item>
</item>
<item name="application" label="投保" >
<item name="input" label="录入" path="policynewbiz/inputapplication/chooseproduct.jsp" icon="../image/icon/1.gif" ></item>
</html>
The last <item> I want to click
This should work. It's unique given the HTML you provided and I'm assuming that there aren't two links to the same URL so it should be good.
driver.find_element_by_css_selector("item[path='policynewbiz/inputapplication/chooseproduct.jsp']").click()
Related
I want to make a program which look through files, finds every incomplete file (without </module> at the end), then it will print last found abnumber in file and delete everyline (including the last with abnumber) after it.
So my file looks like that:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />
and at the end it should looks like:
<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
with printed: 4
I started by doing something like that but I feel like I am doing everything wrong:
import os
Mainfile = 'path'
for filename in os.listdir(Mainfile):
lines = filename.readlines()
if not "</Module>" in lines:
with open(filename, 'r+', encoding="utf-8") as file:
line_list = list(file)
line_list.reverse()
for line in line_list:
if line.find('absno') != -1:
print(line)
You can use re to get your result :
<object([\s\S]*?)<\/object> to get correct <object... </object> tag
abnumber=\"([0-9.]+) to get abnnumber for incorrect tag
<Module.*|<object(?:[\s\S]*?)<\/object> to get correct format of xml data
import re
data = """<Module bs="Mainfile_1">
<object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object>
<object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object>
<object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
<object id="1238" name="name2" abnumber="4">
<item name="item8" value="something12:
<item name="item9" value="233" />"""
invalid_XML_Tag = re.sub("<object([\s\S]*?)<\/object>", '', data)
abnnumber_value = re.findall("abnumber=\"([0-9.]+)", invalid_XML_Tag)
print("abnumber of invalid tag => {0}".format(abnnumber_value))
correct_xml_format = re.findall("<Module.*|<object(?:[\s\S]*?)<\/object>",data)
print("".join(correct_xml_format))
Output:
abnumber of invalid tag => ['4']
<Module bs="Mainfile_1"><object id="1000" name="namex" abnumber="1">
<item name="item0" value="100" />
<item name="item00" value="100" />
</object><object id="1001" name="namey" abnumber="2">
<item name="item1" value="100" />
<item name="item00" value="100" />
</object><object id="1234" name="name1" abnumber="3">
<item name="item1" value="something11:
something11" />
<item name="item2" value="233" />
<item name="item3" value="233" />
<item name="item4" value="something12:
12something" />
</object>
I have a problem with change of the atribute at the xml file.
My tree looks like that
<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="false"/>
<Item Name="s_2" Selected="false"/>
<Item Name="s_3" Selected="true"/>
<Item Name="s_4" Selected="false"/>
</ItemList>
</BigObj >
</Objects>
And i need to check if "s_x"is in list of names and if it is then change the value of Selected to true, if it's not to false (or keep it false)
I've tried to do that with this code:
lslist = ["s_1","s_4"]
for child in root.findall("./Objects/BigObj/ItemList/Item"):
for idx in lslist:
if idx in child.find("Name").text:
child.set('Selected', "true")
else:
child.set('Selected', "false")
But i have an AttributeError: 'NoneType' object has no attribute 'text'
The below works
import xml.etree.ElementTree as ET
lslist = ["s_1", "s_4"]
xml = '''<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="false"/>
<Item Name="s_2" Selected="false"/>
<Item Name="s_3" Selected="true"/>
<Item Name="s_4" Selected="false"/>
</ItemList>
</BigObj ></Objects>'''
root = ET.fromstring(xml)
items = root.findall('.//Item')
for item in items:
item.attrib['Selected'] = str(item.attrib['Name'] in lslist)
ET.dump(root)
output
<Objects>
<BigObj Version="2.2" Name="Something">
<ItemList>
<Item Name="s_1" Selected="True" />
<Item Name="s_2" Selected="False" />
<Item Name="s_3" Selected="False" />
<Item Name="s_4" Selected="True" />
</ItemList>
</BigObj></Objects>
In python 3.5 -- I'm using Entrez biopython for extract some info from Database = pmc in pubmed biomedical website. Now I want to from XML file:
<DocSum>
<Id>5412469</Id>
<Item Name="PubDate" Type="Date">2017 Apr 22</Item>
<Item Name="EPubDate" Type="Date">2017 Apr 22</Item>
<Item Name="Source" Type="String">Int J Mol Sci</Item>
<Item Name="AuthorList" Type="List">
<Item Name="Author" Type="String">Guo Y</Item>
<Item Name="Author" Type="String">Bao Y</Item>
<Item Name="Author" Type="String">Yang W</Item>
</Item>
<Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
<Item Name="Volume" Type="String">18</Item>
<Item Name="Issue" Type="String">4</Item>
<Item Name="Pages" Type="String">890</Item>
<Item Name="ArticleIds" Type="List">
<Item Name="pmid" Type="String">28441730</Item>
<Item Name="doi" Type="String">10.3390/ijms18040890</Item>
<Item Name="pmcid" Type="String">PMC5412469</Item>
</Item>
<Item Name="DOI" Type="String">10.3390/ijms18040890</Item>
<Item Name="FullJournalName" Type="String">International Journal of Molecular Sciences</Item>
<Item Name="SO" Type="String">2017 Apr 22;18(4):890</Item>
extract Name=Title {Exact below line} :
<Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
But How can I solve this issue?
Although I've been used this code :
for tag in soup.findAll("docsum"): # I'm working with multiple articles in one file
for a_tag in tag.findAll("item"):
a_recs.append(a_tag.text)
return a_recs
But it returns all the values in one list while I want just title. such as below :
['2017 Apr 22', '2017 Apr 22', 'Int J Mol Sci', '\nGuo Y\nBao Y\nYang W\n', 'Guo Y', 'Bao Y', 'Yang W', 'Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis', '18', '4', '890', '\n28441730\n10.3390/ijms18040890\nPMC5412469\n', '28441730', '10.3390/ijms18040890', 'PMC5412469', '10.3390/ijms18040890', 'International Journal of Molecular Sciences', '2017 Apr 22;18(4):890']
Try:
>>> data = '''
... <DocSum>
... <Id>5412469</Id>
... <Item Name="PubDate" Type="Date">2017 Apr 22</Item>
... <Item Name="EPubDate" Type="Date">2017 Apr 22</Item>
... <Item Name="Source" Type="String">Int J Mol Sci</Item>
... <Item Name="AuthorList" Type="List">
... <Item Name="Author" Type="String">Guo Y</Item>
... <Item Name="Author" Type="String">Bao Y</Item>
... <Item Name="Author" Type="String">Yang W</Item>
... </Item>
... <Item Name="Title" Type="String">Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis</Item>
... <Item Name="Volume" Type="String">18</Item>
... <Item Name="Issue" Type="String">4</Item>
... <Item Name="Pages" Type="String">890</Item>
... <Item Name="ArticleIds" Type="List">
... <Item Name="pmid" Type="String">28441730</Item>
... <Item Name="doi" Type="String">10.3390/ijms18040890</Item>
... <Item Name="pmcid" Type="String">PMC5412469</Item>
... </Item>
... <Item Name="DOI" Type="String">10.3390/ijms18040890</Item>
... <Item Name="FullJournalName" Type="String">International Journal of Molecular Sciences</Item>
... <Item Name="SO" Type="String">2017 Apr 22;18(4):890</Item>'''
>>>
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(data, 'xml')
>>> for tag in soup.findAll("DocSum"):
... for a_tag in tag.find("Item", {"Name" : "Title"}):
... a_recs.append(a_tag)
...
>>> a_recs
['Regulatory miRNAs in Colorectal Carcinogenesis and Metastasis']
I need to find tag=ITEM that match 2 criteria, and then get the parent tag=NODE#name based on this find.
Two issues:
I can't find a way for XPath to do an 'and', for example
item = node.findall('./ITEM[#name="toppas_type" and #value="output file list"]')
Getting the parent NODE info without having to explicitely search and save it in advance of finding the ITEM, for example something like
parent_name = item.parent.attrib['name']
This is the code I have now:
node_names = []
for node in tree.findall('NODE[#name="vertices"]/NODE'):
for item in node.findall('./ITEM[#name="toppas_type"]'):
if item.attrib['name'] == 'toppas_type' and item.attrib['value'] == 'output file list':
node_names.append(node.attrib['name'])
...to parse a file like this (snippet only) ...
<?xml version="1.0" encoding="ISO-8859-1"?>
<PARAMETERS version="1.6.2" xsi:noNamespaceSchemaLocation="http://open-ms.sourceforge.net/schemas/Param_1_6_2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<NODE name="vertices" description="">
<NODE name="23" description="">
<ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
<ITEM name="toppas_type" value="tool" type="string" description="" required="false" advanced="false" />
<ITEM name="tool_name" value="FileConverter" type="string" description="" required="false" advanced="false" />
<ITEM name="tool_type" value="" type="string" description="" required="false" advanced="false" />
<ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
<ITEM name="y_pos" value="-1380" type="double" description="" required="false" advanced="false" />
</NODE>
<NODE name="24" description="">
<ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
<ITEM name="toppas_type" value="output file list" type="string" description="" required="false" advanced="false" />
<ITEM name="x_pos" value="-440" type="double" description="" required="false" advanced="false" />
<ITEM name="y_pos" value="-1480" type="double" description="" required="false" advanced="false" />
<ITEM name="output_folder_name" value="" type="string" description="" required="false" advanced="false" />
</NODE>
<NODE name="33" description="">
<ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
<ITEM name="toppas_type" value="merger" type="string" description="" required="false" advanced="false" />
<ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
<ITEM name="y_pos" value="-1540" type="double" description="" required="false" advanced="false" />
<ITEM name="round_based" value="false" type="string" description="" required="false" advanced="false" />
</NODE>
<!--(snip)-->
</NODE>
</PARAMETERS>
UPDATE:
#Mathias Müller
Great suggestion - unfortunately when I try to load the XML file, I get an error. I'm not familiar with lxml...so I'm not sure if I'm using it right.
from lxml import etree
root = etree.DTD("/Users/mikes/Documents/Eclipseworkspace/Bioproximity/Assay-Workflows-Mikes/protein_lfq/protein_lfq-1.1.2.toppas")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/dtd.pxi", line 294, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:187024)
lxml.etree.DTDParseError: Content error in the external subset, line 2, column 1
Unfortunately, ElementTree will not accept that xpath in its tree.find(xpath) or tree.findall(xpath)
Perhaps you do not need nested loops at all, a single XPath expression would suffice. I am not exactly sure what you would like the final result to be, but here is an example with lxml:
>>> import lxml.etree
>>> s = '''<NODE name="vertices" description="">
...
... <NODE name="23" description="">
... <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
... <ITEM name="toppas_type" value="tool" type="string" description="" required="false" advanced="false" />
... <ITEM name="tool_name" value="FileConverter" type="string" description="" required="false" advanced="false" />
... <ITEM name="tool_type" value="" type="string" description="" required="false" advanced="false" />
... <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
... <ITEM name="y_pos" value="-1380" type="double" description="" required="false" advanced="false" />
... </NODE>
...
... <NODE name="24" description="">
... <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
... <ITEM name="toppas_type" value="output file list" type="string" description="" required="false" advanced="false" />
... <ITEM name="x_pos" value="-440" type="double" description="" required="false" advanced="false" />
... <ITEM name="y_pos" value="-1480" type="double" description="" required="false" advanced="false" />
... <ITEM name="output_folder_name" value="" type="string" description="" required="false" advanced="false" />
... </NODE>
...
... <NODE name="33" description="">
... <ITEM name="recycle_output" value="false" type="string" description="" required="false" advanced="false" />
... <ITEM name="toppas_type" value="merger" type="string" description="" required="false" advanced="false" />
... <ITEM name="x_pos" value="-620" type="double" description="" required="false" advanced="false" />
... <ITEM name="y_pos" value="-1540" type="double" description="" required="false" advanced="false" />
... <ITEM name="round_based" value="false" type="string" description="" required="false" advanced="false" />
... </NODE>
... <!--(snip)-->
... </NODE>'''
>>> root = lxml.etree.fromstring(s)
>>> root.xpath('/NODE[#name="vertices"]/NODE/ITEM[#name = "toppas_type" and #value = "output file list"]')
[<Element ITEM at 0x102b5f788>]
And if you actually need the name of the parent element, you can move to the parent node with ..:
>>> root.xpath('/NODE[#name="vertices"]/NODE/ITEM[#name = "toppas_type" and #value = "output file list"]/../#name')
['24']
Parsing an XML document from a file
The function etree.DTD is the wrong choice if you would like to parse an XML document from a file. A DTD is not an XML document. Here is how you can do it with lxml:
>>> import lxml.etree
>>> root = lxml.etree.parse("example.xml")
>>> root
<lxml.etree._ElementTree object at 0x106593b00>
Second Update
If the outermost element is PARAMETERS, you need to search like this:
>>> root.xpath('/PARAMETERS/NODE[#name="vertices"]/NODE/ITEM[#name = "toppas_type" and #value = "output file list"]')
[<Element ITEM at 0x106593e18>]
How can I access the tag with the attribute value "test5" and then its child "loc" and the child "rot" with python and elementtree.
After this i want to store every value of the element loc x, y, z in a seperate variable.
<item name="test1">
<loc x="0" y="0" z="0"/>
<rot x="1" y="0" z="0" radian="0"/>
</item>
<item name="test2">
<loc x="22" y="78.7464" z="109.131"/>
<rot x="-1" y="0" z="0" radian="1.35263"/>
</item>
<item name="test3">
<loc x="-28" y="-106.911" z="71.0443"/>
<rot x="0" y="0.779884" z="-0.625923" radian="3.14159"/>
</item>
<item name="test4">
<loc x="38" y="51.6772" z="94.9353"/>
<rot x="1" y="0" z="0" radian="0.218166"/>
</item>
<item name="test5">
<loc x="-38" y="-86.9568" z="64.2009"/>
<rot x="0" y="-0.108867" z="0.994056" radian="3.14159"/>
</item>
I have tried multiple variants, but i have no clue, how to do it.
This is one way to do it:
>>> import xml.etree.ElementTree as ET
>>> data = '''<root>
... <item name="test1">
... <loc x="0" y="0" z="0"/>
... <rot x="1" y="0" z="0" radian="0"/>
... </item>
... <item name="test2">
... <loc x="22" y="78.7464" z="109.131"/>
... <rot x="-1" y="0" z="0" radian="1.35263"/>
... </item>
... <item name="test3">
... <loc x="-28" y="-106.911" z="71.0443"/>
... <rot x="0" y="0.779884" z="-0.625923" radian="3.14159"/>
... </item>
... <item name="test4">
... <loc x="38" y="51.6772" z="94.9353"/>
... <rot x="1" y="0" z="0" radian="0.218166"/>
... </item>
... <item name="test5">
... <loc x="-38" y="-86.9568" z="64.2009"/>
... <rot x="0" y="-0.108867" z="0.994056" radian="3.14159"/>
... </item>
... </root>'''
>>> tree = ET.fromstring(data)
>>> for child in tree.findall("./item[#name='test5']/"):
... print child.tag, child.attrib
...
This gives:
loc {'y': '-86.9568', 'x': '-38', 'z': '64.2009'}
rot {'y': '-0.108867', 'x': '0', 'z': '0.994056', 'radian': '3.14159'}
It uses the XPath notation to access the element you are interested in. Furthermore, child.attrib is a dictionary. You can access the values of x, y and z as child.attrib['x'] and so on