I'm trying to create a small script to tell me if addresses need a certain type of shipping.
I have a list of addresses to input into a website and it will return what type they are. Why is this returning none, even though when I check the element in selenium it's there? And technically it has to be there, to even pass the "EC.presence_of_element_located" code.
browser = webdriver.Chrome()
browser.get('courier_website')
field = browser.find_element_by_id("txt-address-auto-complete")
field.send_keys("12 test Street")
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, "//li[#class='ui-menu-item']/a[contains(#id, 'ui-id-')]")))
browser.find_element_by_xpath("//li[#class='ui-menu-item']/a[contains(#id, 'ui-id-')]").click()
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, '//*[#id="delivery-details-addresstype" and text() != ""]')))
post = browser.find_element_by_xpath('//*[#id="delivery-details-addresstype"]').get_attribute('value')
print(post)
Output is "None"
HTML I'm trying to get the text out of
<table class="delivery-details">
<tbody><tr>
<th colspan="3" id="delivery-details-addresstype">Residential Delivery Zone Address</th>
</tr>
Instead of browser.find_element_by_xpath(//required_path).get_attribute('value'), use:
browser.find_element_by_xpath(//required_path).get_attribute('innerHTML')
In some cases 'textContent' has worked for me
browser.find_element_by_xpath(//path).get_attribute('textContent')
Related
I am trying to make an crawler/auto clicker on python where my target is to click all row header in a table to expand them and show the nested rows. I cannot seem to find the correct selector to use, I tried to use driver.find_elements_id, driver.find_elements_xpath, but it isnt working.
here is what i am using
PATH = "C:/Users/Downloads/chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://int.soccerway.com/")
link = driver.find_elements_id("tr")
link.click()
and here is a snippet of the website
<tr class="group-head clickable " id="date_matches-1886" stage-value="14">
<th colspan="4"><h3><span class="flag_16 left_16 australia_16_left">Australia - Queensland NPL Youth League</span></h3></th>
<th class="competition-link"> <span>More…</span></th>
</tr>
id should be date_matches-1886 instead of tr in your example. You may be instrested in the function called find_element_by_tag_name. But I think it is better to use find_element_by_class_name('clickable') in your case. Have a look at Locating Elements for more.
from selenium import webdriver
PATH = "C:/Users/Downloads/chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://int.soccerway.com/')
link = driver.find_elements_by_class_name('clickable')
link[0].click()
link[2].click()
I use python selenium to do web scraping. And Iwould like to catch website with both in a specific date (like [01-20]) and title with specific text in it (like 'public'), how could the code satisfy both ?
I tried the following but no luck
Thank you in advance!!
href:
<td width="89%" height="26">
sth sth public
</td>
<td width="8%" align="center">[01-20]</td>
<tr>
code:
titles = driver.find_elements_by_css_selector("[title*='public']")
for title in titles:
links=[title.get_attribute('href') for title in driver.find_elements_by_xpath("//td[text()='[01-20]']/preceding::td[1]/a")]
urls = [links.get_attribute("href") for links in driver.find_elements_by_css_selector("[title*='public']")]
for url in urls:
print(url)
driver.get(url)
###do something
use keyword and and contains function in xpath:
'//td[text()="[01-20]"]/preceding::td[1]/a[contains(#title, "资本")]'
check this video for more info
EDIT: changed xpath to a working answer
Given this code ("sleep" instances used to help display what's going on):
from splinter import Browser
import time
with Browser() as browser:
# Visit URL
url = "https://mdoe.state.mi.us/moecs/PublicCredentialSearch.aspx"
browser.visit(url)
browser.fill('ctl00$ContentPlaceHolder1$txtCredentialNumber', 'IF0000000262422')
# Find and click the 'search' button
button = browser.find_by_name('ctl00$ContentPlaceHolder1$btnSearch')
# Interact with elements
button.first.click()
time.sleep(5)
#Only click the link next to "Professional Teaching Certificate Renewal"
certificate_link = browser.find_by_xpath("//td[. = 'Professional Teaching Certificate Renewal']/following-sibling::td/a")
certificate_link.first.click()
time.sleep(10)
I am now trying to get the values from the table that shows after this code runs. I am not well-versed in xpath commands, but based on the response to this question, I have tried these, to no avail:
name = browser.find_by_xpath("//td[. ='Name']/following-sibling::td/a")
name = browser.find_by_xpath("//td[. ='Name']/following-sibling::td/[1]")
name = browser.find_by_xpath("//td[. ='Name']/following-sibling::td/[2]")
I tried [2] because I do notice a colon (:) sibling character between "Name" and the cell containing the name. I just want the string value of the name itself (and all other values in the table).
I do notice a different structure (span is used within td instead of just td) in this case (I also tried td span[. ='Name']... but no dice):
Updated to show more detail
<tr>
<td>
<span class="MOECSBold">Name</span>
</td>
<td>:</td>
<td>
<span id="ContentPlaceHolder1_lblName" class="MOECSNormal">MICHAEL WILLIAM LANCE </span>
</td>
</tr>
This ended up working:
browser.find_by_xpath("//td[span='Name']/following-sibling::td")[1].value
I'm currently using selenium with python 2.7 and I'm trying to to insert a password to the following form:
<tr id="mockpass">
<td>
<input type="text" value="something1" onfocus="document.getElementById('mockpass').style.display='none';
document.getElementById('realpass').style.display=''; document.getElementById('Irealpass').focus();">
</td>
</tr>
<tr id="realpass" style="display: none;">
<td>
<input type="password" name="Password" id="Irealpass" onblur="if(this.value=='') {document.getElementById('mockpass').style.display='';
document.getElementById('realpass').style.display='none';}">
</td>
</tr>
I tried using the following code but I keeping getting an error while trying to excute the clear command:
passBoxXpath='//*[#id="mockpass"]/td/input'
passBoxElement = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(passBoxXpath))
passBoxElement.click()
passElement = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('//*[#name="Password"]'))
passElement = driver.execute_script("arguments[0].style.display = 'block'; return arguments[0];",
passElement)
passElement.clear()
passElement.send_keys("myPassword")
The error:
raise exception_class(message, screen, stacktrace)
InvalidElementStateException: Message: Element is not currently interactable and may not be manipulated
I'm not sure if it's something to do with the focus or the blur that changes the element, but I get the element and fail to accsses it.
Thanks in advance!
update: the next line solved my case (still don't know why it didn't work):
driver.execute_script('document.getElementById("Irealpass").setAttribute("value","myPassword");')
In this way I didn't need to use the passBoxElement at all or changing the display style.
According to the docs http://selenium-python.readthedocs.io, this code should work.
elem = driver.find_element_by_css_selector("#mockpass input:first-child")
If you get the error anyway, scroll the browser window to element you are trying to access.
A couple things...
First off, if you are trying to perform user scenarios, you want to avoid using JavascriptExecutor (JSE). JSE allows you to do things on a page that an actual user cannot. Avoid using JSE unless you absolutely have to or if you don't care about user scenarios.
The problem is that the input that you want is hidden in the HTML you provided. You can see that in the 2nd TR, style="display: none;". If you look in the HTML of the first INPUT, you will see that the onfocus hides the first TR
onfocus="document.getElementById('mockpass').style.display='none';
and then unhides the 2nd TR
document.getElementById('realpass').style.display='';
So what you need to do is to focus the first INPUT which will expose the second INPUT.
One thing to note, from the JS in the second INPUT it looks like if you value is empty, it will rehide the second INPUT and expose the first (undoing what you just did).
onblur="if(this.value=='') {docum ...
I would do something like this
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable(By.XPATH, '//*[#id="mockpass"]/td/input')).click()
wait.until(EC.element_to_be_clickable(By.XPATH, '//*[#id="Irealpass"]')).send_keys("myPassword")
Below should be work:
passwdEle = self.driver.find_element(by='id', value='Irealpass')
self.driver.execute_script('arguments[0].setAttribute("value","****")', passwdEle)
Here is an example web page I am trying to get data from.
http://www.makospearguns.com/product-p/mcffgb.htm
The xpath was taken from chrome development tools, and firepath in firefox is also able to find it, but using lxml it just returns an empty list for 'text'.
from lxml import html
import requests
site_url = 'http://www.makospearguns.com/product-p/mcffgb.htm'
xpath = '//*[#id="v65-product-parent"]/tbody/tr[2]/td[2]/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[1]/div/table/tbody/tr/td/font/div/b/span/text()'
page = requests.get(site_url)
tree = html.fromstring(page.text)
text = tree.xpath(xpath)
Printing out the tree text with
print(tree.text_content().encode('utf-8'))
shows that the data is there, but it seems the xpath isn't working to find it. Is there something I am missing? Most other sites I have tried work fine using lxml and the xpath taken from chrome dev tools, but a few I have found give empty lists.
1. Browsers frequently change the HTML
Browsers quite frequently change the HTML served to it to make it "valid". For example, if you serve a browser this invalid HTML:
<table>
<p>bad paragraph</p>
<tr><td>Note that cells and rows can be unclosed (and valid) in HTML
</table>
To render it, the browser is helpful and tries to make it valid HTML and may convert this to:
<p>bad paragraph</p>
<table>
<tbody>
<tr>
<td>Note that cells and rows can be unclosed (and valid) in HTML</td>
</tr>
</tbody>
</table>
The above is changed because <p>aragraphs cannot be inside <table>s and <tbody>s are recommended. What changes are applied to the source can vary wildly by browser. Some will put invalid elements before tables, some after, some inside cells, etc...
2. Xpaths aren't fixed, they are flexible in pointing to elements.
Using this 'fixed' HTML:
<p>bad paragraph</p>
<table>
<tbody>
<tr>
<td>Note that cells and rows can be unclosed (and valid) in HTML</td>
</tr>
</tbody>
</table>
If we try to target the text of <td> cell, all of the following will give you approximately the right information:
//td
//tr/td
//tbody/tr/td
/table/tbody/tr/td
/table//*/text()
And the list goes on...
however, in general browser will give you the most precise (and least flexible) XPath that lists every element from the DOM. In this case:
/table[0]/tbody[0]/tr[0]/td[0]/text()
3. Conclusion: Browser given Xpaths are usually unhelpful
This is why the XPaths produced by developer tools will frequently give you the wrong Xpath when trying to use the raw HTML.
The solution, always refer to the raw HTML and use a flexible, but precise XPath.
Examine the actual HTML that holds the price:
<table border="0" cellspacing="0" cellpadding="0">
<tr>
<td>
<font class="pricecolor colors_productprice">
<div class="product_productprice">
<b>
<font class="text colors_text">Price:</font>
<span itemprop="price">$149.95</span>
</b>
</div>
</font>
<br/>
<input type="image" src="/v/vspfiles/templates/MAKO/images/buttons/btn_updateprice.gif" name="btnupdateprice" alt="Update Price" border="0"/>
</td>
</tr>
</table>
If you want the price, there is actually only one place to look!
//span[#itemprop="price"]/text()
And this will return:
$149.95
The xpath is simply wrong
Here is snippet from the page:
<form id="vCSS_mainform" method="post" name="MainForm" action="/ProductDetails.asp?ProductCode=MCFFGB" onsubmit="javascript:return QtyEnabledAddToCart_SuppressFormIE();">
<img src="/v/vspfiles/templates/MAKO/images/clear1x1.gif" width="5" height="5" alt="" /><br />
<table width="100%" cellpadding="0" cellspacing="0" border="0" id="v65-product-parent">
<tr>
<td colspan="2" class="vCSS_breadcrumb_td"><b>
Home >
You can see, that element with id being "v65-product-parent" is of typetableand has subelementtr`.
There can be only one element with such id (otherwise it would be broken xml).
The xpath is expecting tbody as child of given element (table) and there is none in whole page.
This can be tested by
>>> "tbody" in page.text
False
How Chrome came to that XPath?
If you simply download this page by
$ wget http://www.makospearguns.com/product-p/mcffgb.htm
and review content of it, it does not contain a single element named tbody
But if you use Chrome Developer Tools, you find some.
How it comes here?
This often happens, if JavaScript comes into play and generates some page content when in the browser. But as LegoStormtroopr noted, this is not our case and this time it is the browser, which modifies document to make it correct.
How to get content of page dynamically modified within browser?
You have to give some sort of browser a chance. E.g. if you use selenium, you would get it.
byselenium.py
from selenium import webdriver
from lxml import html
url = "http://www.makospearguns.com/product-p/mcffgb.htm"
xpath = '//*[#id="v65-product-parent"]/tbody/tr[2]/td[2]/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[1]/div/table/tbody/tr/td/font/div/b/span/text()'
browser = webdriver.Firefox()
browser.get(url)
html_source = browser.page_source
print "test tbody", "tbody" in html_source
tree = html.fromstring(html_source)
text = tree.xpath(xpath)
print text
what prints
$ python byselenimum.py
test tbody True
['$149.95']
Conclusions
Selenium is great when it comes to changes within browser. However it is a bit heavy tool and if you can do it simpler way, do it that way. Lego Stormrtoopr have proposed such a simpler solution working on simply fetched web page.
I had a similar issue (Chrome inserting tbody elements when you do Copy as XPath). As others answered, you have to look at the actual page source, though the browser-given XPath is a good place to start. I've found that often, removing tbody tags fixes it, and to test this I wrote a small Python utility script to test XPaths:
#!/usr/bin/env python
import sys, requests
from lxml import html
if (len(sys.argv) < 3):
print 'Usage: ' + sys.argv[0] + ' url xpath'
sys.exit(1)
else:
url = sys.argv[1]
xp = sys.argv[2]
page = requests.get(url)
tree = html.fromstring(page.text)
nodes = tree.xpath(xp)
if (len(nodes) == 0):
print 'XPath did not match any nodes'
else:
# tree.xpath(xp) produces a list, so always just take first item
print (nodes[0]).text_content().encode('ascii', 'ignore')
(that's Python 2.7, in case the non-function "print" didn't give it away)