I am trying to get span elements from same class names. I am trying to click the lastest avaible days in calender.
My code is :
days = driver.find_element(By.CLASS_NAME,'BookingCalendar-date--bookable')
avaibledays = days.find_elements(By.CLASS_NAME,'BookingCalendar-day')
for i in avaibledays:
print(i.text)
This work for 1 class when I try to change the days variable like this:
days = driver.find_elements(By.CLASS_NAME,'BookingCalendar-date--bookable')
I can't get all.
There is a calander html.
I want to click a span at lastest class name is = "BookingCalendar-date--bookable"
Span names and class same for Booking-Calendar-date--unavaible
So basicly I'm trying to get multiple span elements from classes name is BookingCalendar-date--bookable. The span elements have a same class name it's = BookingCalendar-day
To extract the texts from all the <td class="BookingCalendar-date--bookable " ...> using List Comprehension you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "td.BookingCalendar-date--bookable span.BookingCalendar-day")])
Using XPATH:
print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//td[#class='BookingCalendar-date--bookable ']//span[#class='BookingCalendar-day']")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Related
I'm trying to get the text inside these div's but I'm not succeeding. They have a code between the class that doesn't change with each execution.
<div data-v-a5b90146="" class="html-content"> Text to be captured</div>
<div data-v-a5b90146="" class="html-content"><b> TEXT WANTED </b><div><br></div>
I've tried with XPATH, but I was not successful too.
content = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '/html/body/div/div/div/div/div/div/div[1]/div[2]/div[2]/div[4]/div/div/b'))).text
You need to change couple of things.
presence_of_all_elements_located() returns list of elements, so you can't use .text with a list. To get the text value of the element you need to iterate and then get the text.
xpath seems very fragile. you should use relative xpath since class name is unique you can use the classname.
your code should be like.
contents = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//div[#class="html-content"][#data-v-a5b90146]')))
for content in contents:
print(content.text)
You can use visibility_of_all_elements_located() as well
contents = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, '//div[#class="html-content"][#data-v-a5b90146]')))
for content in contents:
print(content.text)
Both the <div> tags have the attribute class="html-content"
Solution
To extract the texts from the <div> tags instead of presence_of_all_elements_located(), you have to induce WebDriverWait for visibility_of_all_elements_located() and using list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.html-content")))])
Using XPATH:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='html-content']")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Try to get the element first then get the text from the element;
element = WebDriverWait(
driver,
10
).until(
EC.presence_of_all_elements_located((
By.XPATH, '/html/body/div/div/div/div/div/div/div[1]/div[2]/div[2]/div[4]/div/div/b'
))
)
content = element.text
I would like to extract the following text fields that are located within a g tag which is located in a svg tag (the url: https://www.msci.com/our-solutions/esg-investing/esg-ratings-climate-search-tool). I put in a company name and search for it, expand the last drop down menu and want to extract information from the <svg>.
HTML Part I:
HTML Part II:
I tried what has been suggested here: Extracting text from svg using python and selenium
but I did not mange to get it.
My code:
test = driver.find_element(By.XPATH, "//*[local-name()='svg' and #class='highcharts-root']//*[local-name()='g' and #class='highcharts-axis-labels highcharts-xaxis-labels']//*[name()='text']").text
print(test)
To extract the texts e.g. Sep-18, Nov-22 you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "svg.highcharts-root g.highcharts-axis-labels.highcharts-xaxis-labels text")))])
Using XPATH and get_attribute("innerHTML"):
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[local-name()='svg' and #class='highcharts-root']//*[local-name()='g' and #class='highcharts-axis-labels highcharts-xaxis-labels']//*[local-name()='text']")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a couple of relevant detailed discussions in:
Reading title tag in svg?
Creating XPATH for svg tag
I have a python script logging into our company website to then get each row of data. It logs me in, locates the iframe where the table with row values is located, from there I try to locate a row in the iframe table and attached is a picture:
Highlighting the correct xpath expression but I need all 31 values.
All my code works properly except this:
elem5 = browser.find_element_by_xpath("//td/label[contains(#id, 'driver')][1]")
find_element_by_* returns a single webelement, where as you are looking for all the 31 elements. So you need find_elements* and you can use either of the following Locator Strategies:
Using css_selector:
elements = driver.find_elements(By.CSS_SELECTOR, "td > label[id^='driver']")
Using xpath:
elements = driver.find_elements(By.XPATH, "//td/label[starts-with(#id, 'driver')]")
To find all the desired elements ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td > label[id^='driver']")))
Using XPATH:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td/label[starts-with(#id, 'driver')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I need to get the number "3" from this HTML with python selenium
<div class="number">3</div>
This is the XPATH:
//*[#id="roulette-recent"]/div/div[1]/div[1]/div/div
I tried something like
number = navegador.find_element_by_xpath('//*[#id="rouletterecent"]/div/div[1]/div[1]/div/div').get_attribute('class')
If this xpath
//*[#id="rouletterecent"]/div/div[1]/div[1]/div/div
represent the node:
<div class="number">3</div>
and you want to extract the text from it, you should use either:
number = navegador.find_element_by_xpath('//*[#id="rouletterecent"]/div/div[1]/div[1]/div/div').get_attribute('innerText')
print(number)
or
number = navegador.find_element_by_xpath('//*[#id="rouletterecent"]/div/div[1]/div[1]/div/div').text
print(number)
I think you're looking for:
number = navegador.find_element_by_xpath('//*[#id="rouletterecent"]/div/div[1]/div[1]/div/div').text
To print the text 3 you can use either of the following Locator Strategies:
Using css_selector and get_attribute("innerHTML"):
print(navegador.find_element(By.CSS_SELECTOR, "#roulette-recent div.number").get_attribute("innerHTML"))
Using xpath and text attribute:
print(navegador.find_element(By.XPATH, "//*[#id="roulette-recent"]//div[#class='number' and text()]").text)
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
print(WebDriverWait(navegador, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#roulette-recent div.number"))).text)
Using XPATH and get_attribute("innerHTML"):
print(WebDriverWait(navegador, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#id="roulette-recent"]//div[#class='number' and text()]"))).get_attribute("innerHTML"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
I am trying to copy the href value from a website, and the html code looks like this:
<p class="sc-eYdvao kvdWiq">
<a href="https://www.iproperty.com.my/property/setia-eco-park/sale-
1653165/">Shah Alam Setia Eco Park, Setia Eco Park
</a>
</p>
I've tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") but it returned 'list' object has no attribute 'get_attribute'. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") returned None. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.
If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq.
Ultimately i would want to copy all the 20+ href and export them out to a csv file.
Appreciate any help possible.
You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href
elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]")
links = [elem.get_attribute('href') for elem in elems]
You might also need a wait condition for presence of all elements located by css selector.
elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]")))
As per the given HTML:
<p class="sc-eYdvao kvdWiq">
Shah Alam Setia Eco Park, Setia Eco Park
</p>
As the href attribute is within the <a> tag ideally you need to move deeper till the <a> node. So to extract the value of the href attribute you can use either of the following Locator Strategies:
Using css_selector:
print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href'))
Using xpath:
print(driver.find_element_by_xpath("//p[#class='sc-eYdvao kvdWiq']/a").get_attribute('href'))
If you want to extract all the values of the href attribute you need to use find_elements* instead:
Using css_selector:
print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")])
Using xpath:
print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[#class='sc-eYdvao kvdWiq']/a")])
Dynamic elements
However, if you observe the values of class attributes i.e. sc-eYdvao and kvdWiq ideally those are dynamic values. So to extract the href attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href'))
Using XPATH:
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[#class='sc-eYdvao kvdWiq']/a"))).get_attribute('href'))
If you want to extract all the values of the href attribute you can use visibility_of_all_elements_located() instead:
Using CSS_SELECTOR:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))])
Using XPATH:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[#class='sc-eYdvao kvdWiq']/a")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
The XPATH
//p[#class='sc-eYdvao kvdWiq']/a
return the elements you are looking for.
Writing the data to CSV file is not related to the scraping challenge. Just try to look at examples and you will be able to do it.
To crawl any hyperlink or Href, proxycrwal API is ideal as it uses pre-built functions for fetching desired information. Just pip install the API and follow the code to get the required output. The second approach to fetch Href links using python selenium is to run the following code.
Source Code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
list = ['https://www.heliosholland.com/Ampullendoos-voor-63-ampullen','https://www.heliosholland.com/lege-testdozen’]
driver = webdriver.Chrome()
wait = WebDriverWait(driver,29)
for i in list:
driver.get(i)
image = wait.until(EC.visibility_of_element_located((By.XPATH,'/html/body/div[1]/div[3]/div[2]/div/div[2]/div/div/form/div[1]/div[1]/div/div/div/div[1]/div/img'))).get_attribute('src')
print(image)
To scrape the link, use .get_attribute(‘src’).
Get the whole element you want with driver.find_elements(By.XPATH, 'path').
To extract the href link use get_attribute('href').
Which gives,
driver.find_elements(By.XPATH, 'path').get_attribute('href')
try something like:
elems = driver.find_elements_by_xpath("//p[contains(#class, 'sc-eYdvao') and contains(#class='kvdWiq')]/a")
for elem in elems:
print elem.get_attribute['href']