HTML:
<g id="OpenLayers.Layer.Vector_101_vroot">
<image id="OpenLayers.Geometry.Point_259_status"..></image>
So the page generates the above and the number section of the Id is different on each load.
How do I located them, or even a group of them that match the pattern using selenium and python?
Use Xpaths like below:
//g[contains(#id, 'OpenLayers.Layer.Vector')]
//image[contains(#id, 'OpenLayers.Geometry.Point')]
Hope if helps!
according to this answer, you can use css3 substring matching attribute selector.
the following code clicks an element which contains OpenLayers.Layer.Vector in id attribute.
Python
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://localhost:1111/')
browser.find_element_by_css_selector('[id*="OpenLayers.Layer.Vector"]').click()
HTML (which is displayed in http://localhost:1111/)
<button id="OpenLayers.Layer.Vector_123" onclick="alert(1);return false">xxx</button>
no need for and pattern matching, you can use this module here called "beautiful soup" with some easy documentation here.
for example to get all tags with id="OpenLayers.Layer.Vector_101_vroot" use:
soup = BeautifulSoup(<your_html_as_a_string>)
soup.find_all(id="OpenLayers.Layer.Vector_101_vroot")
Related
Lets say I have html code with three links
Whatever
Whatever
Whatever
and I want to use selenium to find all elements which have a href tag which includes the string "hey" (in this case the first two links). How would I write python Selenium code which accomplishes this?
This works:
all_href = driver.find_elements(By.XPATH, "//*[contains(#href, 'hey')]")
print(len(all_href)
This XPath will do this work:
"//a[contains(#href,'hey')]"
To use that with Selenium you will need a find_elements or findElements method, depends on the language binding you use with Selenium.
For Selenium in Python this will give you the list of all such elements:
all_hey_elements = driver.find_elements(By.XPATH, "//a[contains(#href, 'hey')]")
I want to scrape the URLs within the HTML of the 'Racing-Next to Go' section of www.tab.com.au.
Here is an excerpt of the HTML:
<a ng-href="/racing/2020-07-31/MACKAY/MAC/R/8" href="/racing/2020-07-31/MACKAY/MAC/R/8"><i ng-
All I want to scrape is the last bit of that HTML which is a link, so:
/racing/2020-07-31/MACKAY/MAC/R/8
I have tried to find the element by using xpath, but I can't get the URL I need.
My code:
driver = webdriver.Firefox(executable_path=r"C:\Users\Harrison Pollock\Downloads\Python\geckodriver-v0.27.0-win64\geckodriver.exe")
driver.get('https://www.tab.com.au/')
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.text)
Probaly you want to use get_attribute insted of .text. Documentation here.
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))
Yes, you can use getAttribute(attributeLocator) function for your requirement.
selenium.getAttribute(//xpath#href);
Specify the Xpath of the element for which you require to know the class of.
The value /racing/2020-07-31/MACKAY/MAC/R/8 within the HTML is the value of href attribute but not the innerText.
Solution
Instead of using the text attribute you need to use get_attribute("href") and the effective lines of code will be:
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))
I am trying to scrape a page using Scrapy Framework.
<div class="info"><span class="label">Establishment year</span> 2014</div>
The tag I want to deal with looks like above. I want to get the value 2014. I can't use info or label class as they are common through the page.
So, I tried below xpath but I am getting null:
response.xpath("//span[contains(text(),'Establishment year')]/following-sibling").get()
response.xpath("//span[contains(text(),'Establishment year')]/following-sibling::text()").get()
Any clue what can be the issue?
Since you are trying to extract it in between the tag you should use the tag at the end. I don't know what website you are trying to scrape but here is an example of me scraping in between the 'a' tag on this website http://books.toscrape.com/ Here is the code I used for it
response.xpath("(//h3)[1]/a/text()").extract_first()
In your second line of code you did not use the function for extracting text right. The one you are using is for CSS selector. For Xpath if would be /text(), not ::text(). For you code I think you should try one of these options. Let me know if it helps.
response.xpath("//span[contains(text(),'Establishment year')]/div/text()").get()
or
response.xpath("//span[contains(text(),'Establishment year')]/span/text()").get()
Extract direct text children (/text()) from the parent element:
>>> from parsel import Selector
>>> selector = Selector(text='<div class="info"><span class="label">Establishment year</span> 2014</div>')
>>> selector.xpath('//*[#class="info"]/text()').get()
' 2014'
I'm trying to extract the links on this html page:
<div class="listbox">
<div class="mainbox" onclick="www.abc.com">
I've tried using:
//div[#class="listbox"]/a/text()
//div/onclick/text()
but they return an empty list.
Such XPath must work for you.
/div/div/#onclick
or more precise
/div[#class="listbox"]/div[#class="mainbox"]/#onclick
In your case you can obtain the link via using Selenium and the getAttribute method.
First find the element (or elements and then loop) that have the links inside their onclick attributes, then just get them via getAttribute:
Selenium + Java:
String link = driver.findElement(By.className("mainbox")).getAttribute("onclick");
Selenium + Python:
I'm no python guy, but it should work like this:
link = driver.find_element_by_class_name("mainbox")).get_attribute("onclick");
I have following piece of html:
<p class="attrs"><span>foo:</span> <strong>foo</strong></p>
<p class="attrs"><span>bar:</span> <strong>bar</strong></p>
<p class="attrs"><span>foo2:</span> <strong></strong></p>
<p class="attrs"><span>description:</span> <strong>description body</strong></p>
<p class="attrs"><span>another foo:</span> <strong>foooo</strong></p>
I would like to get description body using splinter. I've managed to get a list of p using
browser.find_by_css("p.attrs")
xpath = '//p[#class="attrs"]/span[text()="description:"]/following-sibling::strong'
description = browser.find_by_xpath(xpath).first.text
Would you be able to get the description using find_by_tag?
Find by Tag
browser.find_by_tag('span')
Then go iterate through all 'span' tags and look for the value of 'description'. I used the documentation here
You may be able to acomplish using this code, if you want to try a different approach with the selenium library:
import selenium
from selenium import webdriver
driver = webdriver.Chrome('PATH_LOCATION_TO_CHROME_DRIVER')
driver.find_elements_by_class_name("attrs")
Hope this helps! replace PATH_LOCATION_TO_CHROME_DRIVER --- with the location of your chrome driver, if you google it should be first or second link to download and then place that download inside your Python's project folder.