how to scrape links from hidden span class HTML?

how to scrape links from hidden span class HTML? - python

I'm learning web scraping as I scrape real world data from real websites.
Yet, I've never ran into this type of issues until now.
One can usually search for wanted HTML source codes by right-clicking the part of the websites and then clicking inspect option. I'll jump to the example right away to explain the issue.
From the above picture, the red color marked span class is not there originally but when I put(did not even click) my cursor on a user's name, a small box for that user pops up and also that span class shows up. What I ultimately want to scrape is the link address for a user's profile which is embedded inside of that span class.I'm not sure but IF I can parse that span class, I guess I can try to scrape the link address but I keep failing to parse that hidden span class.
I didn't expect that much but my codes of course gave me the empty list because that span class didn't show up when my cursor was not on the user's name. But I show my code to show what I've done.
from bs4 import BeautifulSoup
from selenium import webdriver
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
time.sleep(3)
#parse html
html =driver.page_source
soup=BeautifulSoup(html,"html.parser")
hidden=soup.find_all("span", class_="ui_overlay ui_popover arrow_left")
print (hidden)
Are there any simple and intuitive ways to parse that hidden span class using selenium? If I can parse it, I may use 'find' function to parse the link address for a user and then loop over all the users to get all the link addresses.
Thank you.
=======================updated the question by adding below===================
To add some more detailed explanations on what I want to retrieve, I want to get the link that is pointed with a red arrow from the below picture. Thank you for pointing out that I need more explanations.
==========================updated code so far=====================
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
time.sleep(3)
profile=driver.find_element_by_xpath("//div[#class='mainContent']")
profile_pic=profile.find_element_by_xpath("//div[#class='ui_avatar large']")
ActionChains(driver).move_to_element(profile_pic).perform()
ActionChains(driver).move_to_element(profile_pic).click().perform()
#So far I could successfully hover over the first user. A few issues occur after this line.
#The error message says "type object 'By' has no attribute 'xpath'". I thought this would work since I searched on the internet how to enable this function.
waiting=wait(driver, 5).until(EC.element_to_be_clickable((By.xpath,('//span//a[contains(#href,"/Profile/")]'))))
#This gives me also a error message saying that "unable to locate the element".
#Some of the ways to code in Python and Java were different so I searched how to get the value of the xpath which contains "/Profile/" but gives me an error.
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
Also, is there any way to iterate through xpath in this case?

I think you can use requests library instead of selenium.
When you hover on username, you will get Request URL as below.
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html')
print(html.status_code)
soup = BeautifulSoup(html.content, 'html.parser')
# Find all UID of username
# Split the string "UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293" into UID, SRC
# And recombine to Request URL
name = soup.find_all('div', class_="memberOverlayLink")
for i in name:
print(i.get('id'))
# Use url to get profile link
response = requests.get('https://www.tripadvisor.com/MemberOverlay?Mode=owa&uid=805E0639C29797AEDE019E6F7DA9FF4E&c=&src=507403702&fus=false&partner=false&LsoId=&metaReferer=')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.find('a')
print(result.get('href'))
This is output:
200
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
/Profile/JLERPercy
If you want to use selenium to get popup box,
You can use ActionChains to do hover() function.
But I think it's less efficient than using requests.
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver).move_to_element(element).perform()

Python
The below code will extract the href value.Try and let me know how it goes.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
ActionChains(driver).move_to_element(i).click().perform()
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
driver.quit()
Java example:
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Selenium {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "./lib/chromedriver");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
//finds all the comments or profiles
List<WebElement> profile= driver.findElements(By.xpath("//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']"));
for(int i=0;i<profile.size();i++)
{
//Hover on user profile photo
Actions builder = new Actions(driver);
builder.moveToElement(profile.get(i)).perform();
builder.moveToElement(profile.get(i)).click().perform();
//Wait for user details pop-up
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span//a[contains(#href,'/Profile/')]")));
//Extract the href value
String hrefvalue=driver.findElement(By.xpath("//span//a[contains(#href,'/Profile/')]")).getAttribute("href");
//Print the extracted value
System.out.println(hrefvalue);
}
//close the browser
driver.quit();
}
}
output
https://www.tripadvisor.com/Profile/861kellyd
https://www.tripadvisor.com/Profile/JLERPercy
https://www.tripadvisor.com/Profile/rayn817
https://www.tripadvisor.com/Profile/grossla
https://www.tripadvisor.com/Profile/kapmem

Related

Selenium cannot find elements

I try to automate retrieving data from "SAP Business Client" using Python and Selenium.
Since I cannot find the element I wanted even though I am sure it is correct, I printed out the html content with the following code:
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
EDGE_PATH = r"C:\Users\XXXXXX\Desktop\WPy64-3940\edgedriver_win64\msedgedriver"
service = Service(executable_path=EDGE_PATH)
options = Options()
options.use_chromium = True
options.add_argument("headless")
options.add_argument("disable-gpu")
cc_driver = webdriver.Edge(service = service, options=options)
cc_driver.get('https://saps4.sap.XXXX.de/sap/bc/ui5_ui5/ui2/ushell/shells/abap/FioriLaunchpad.html#Z_APSuche-display')
sleep(5)
cc_html = cc_driver.page_source
cc_content = soup(cc_html, 'html.parser')
print(cc_content.prettify())
cc_driver.close()
Now I am just surprised, because the printed out content is different than from firefox "inspect" function. For example, I can find the word "Nachname" from the firefox html content but not such word exists in the printed out html content from the code above:
Have someone an idea, why the printed out content is different?
Thank you for any help... Gunardi

the code you get from selenium is a the code without javascript process on it, then you shoul get the code from javascript using selenium interaction with javascipt,
String javascript = "return arguments[0].innerHTML"; String pageSource=(String)(JavascriptExecutor)driver) .executeScript(javascript, driver.findElement(By.tagName("html")enter code here)); pageSource = "<html>"+pageSource +"</html>"; System.out.println(pageSource);

How to use selenium for webscraping google flights?

I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the code below, all I get back is 14 empty lists.
from selenium import webdriver
from lxml import html
from time import sleep
driver = webdriver.Chrome(r"C:\Users\14074\Python\chromedriver")
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
sleep(1)
tree = html.fromstring(driver.page_source)
for flight_tree in tree.xpath('//div[#class="TQqf0e sSHqwe tPgKwe ogfYpf"]'):
title = flight_tree.xpath('.//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[2]/div/div[2]/div[6]/div/div[2]/div/div[1]/div/div[1]/div/div[2]/div[2]/div[2]/span/text()')
price = flight_tree.xpath('.//span[contains(#data-gs, "CjR")]')
print(title, price)
#driver.close()
This is just the first part of my code but I can't really continue without getting this to work. If anyone has some ideas on what I'm doing wrong that would be amazing! It's been driving me crazy. Thank you!

I noticed a few issues with your code. First of all, I believe that when entering this page, first google will show you the "I agree to terms and conditions" popup before showing you the content of the page, therefore you need to first click on that button.
Also, you should use the find_elements_by_xpath function directly on driver instead of using the page content, as this also allows you to render the javascript content. You can find more info here: python tree.xpath return empty list
To get more info on how to scrape using selenium and python you could check out this guide: https://www.webscrapingapi.com/python-selenium-web-scraper/
I used the following code to scrape the titles. (I also changed the xpaths to do so, by extracting them directly from google chrome. You can do that by right clicking on an element -> inspect and in the elements tab where the element is, you can right click -> copy -> Copy xpath)
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# I used these for the code to work on my windows subsystem linux
option = webdriver.ChromeOptions()
option.add_argument('--no-sandbox')
option.add_argument('--disable-dev-sh-usage')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=option)
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button/span').click() # this is necessary to pres the I agree button
elements = driver.find_elements_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[3]/div[3]/c-wiz/div/div[2]/div[1]/div/div/ol/li')
for flight_tree in elements:
title = flight_tree.find_element_by_xpath('.//*[#class="W6bZuc YMlIz"]').text
print(title)

I tried the below code, with screen maximized and having explicit waits and could successfully extract the information, please see below :
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.get("https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA")
wait = WebDriverWait(driver, 10)
titles = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div/descendant::h3")))
for name in titles:
print(name.text)
price = name.find_element(By.XPATH, "./../following-sibling::div/descendant::span[2]").text
print(price)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
Tokyo
₹38,473
Mumbai
₹3,515
Dubai
₹15,846

Webdriver not returning some data

I am trying to get some information from a website. The Web Inspector shows the html source, with what JavaScript rendered into it. So I wanted to use chromedriver to render it for the purpose of extracting certain information, which cannot be accessed by simply requesting the website.
Now what seems confusing, is that even the driver is not returning anything.
My code looks like this:
driver = webdriver.Chrome('path/Chromedriver')
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all("tr", class_="odd")
And the website is:
https://www.amundietf.co.uk/professional/product/view/LU1681038243
Is there anything else that gets rendered into the html, when the Web Inspector is opened, which Chromedriver is not able to handle?
Thanks for your answers in advance!

At least you need to accept privacy settings, than click validateDisclaimer to site:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
url = "https://www.amundietf.co.uk/professional/product/view/LU1681038243"
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.get(url)
driver.find_element_by_id("footer_tc_privacy_button_3").click()
driver.find_element_by_id("validateDisclaimer").click()
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".fpFrame.fpBannerMore #blockleft>#part_principale_1")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all("tr", class_="odd")
print(results)
After it you need to wait for your page to load and to define elements you are looking for correctly.
Your question really contains many questions, that should be solved one by one.
I just pointed out the first of the problems.
Update
I solved the issue.
You will need to parse result by yourself.
So, you had problems:
Did not click two buttons.
Did not wait for a table you need to load.
Did not have any waits. In Selenium you must use them.

How can I use Selenium (Python) to do a Google Search and then open the results of the first page in new tabs?

As the title said, I'd like to performa a Google Search using Selenium and then open all results of the first page on separate tabs.
Please have a look at the code, I can't get any further (it's just my 3rd day learning Python)
Thank you for your help !!
Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pyautogui
query = 'New Search Query'
browser = webdriver.Chrome('/Users/MYUSERNAME/Desktop/Desktop-Files/Chromedriver/chromedriver')
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
element = browser.find_element_by_class_name('LC20lb')
element.click()
The reason why I imported pyautogui is because I tried simulating a right click and then open in new tab for each result but it was a little confusing :)

Forget about pyautogui as what you want to do can be done in Selenium. Same with most of the rest. You just do not need it. See if this code meets your needs.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
query = 'sins of a solar empire' #my query about a video game
browser = webdriver.Chrome()
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
links = browser.find_elements_by_class_name('r') #I went on Google Search and found the container class for the link
for link in links:
url = link.find_element_by_tag_name('a').get_attribute("href") #this code extracts the url of the HTML link
browser.execute_script('''window.open("{}","_blank");'''.format(url)) # this code uses Javascript to open a new tab and open the given url in that new tab
print(link.find_element_by_tag_name('a').get_attribute("href"))

how to hover over multiple elements using Python

I'm trying to hover over not only one point but the multiple points after one by one.
The point here means each user's image profile (There are 5 of them for each page).
The reason why I do this is that I try to parse each user's link profile.
But tricky part is that the html codes are hidden. In other words, it doesn't show up unless if I hover over each user's profile or picture.
Let me jump straight to my code.
from selenium.webdriver import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from bs4 import BeautifulSoup
from selenium import webdriver
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
#Get the link
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
#This is the first time for me to use Xpath so please understand if there's something wrong with my code
profile=driver.find_element_by_xpath("//div[#class='mainContent']")
profile_pic=profile.find_element_by_xpath("//div[#class='ui_avatar large']")
ActionChains(driver).move_to_element(profile_pic).perform()
ActionChains(driver).move_to_element(profile_pic).click().perform()
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
How do I hover over the multiple users (who has the same Xpath codes) in this case?
===============================updated codes==========================
num_page=0
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.implicitly_wait(10)
#Type in URL you want to visit
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-Groove_Stone_Getaway-Asheville_North_Carolina.html")
driver.maximize_window()
time.sleep(5)
#loop over multiple pages.
for j in range(1,16,1):
time.sleep(5)
try:
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
time.sleep(3)
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
time.sleep(2)
ActionChains(driver).move_to_element(i).click().perform()
time.sleep(4)
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
time.sleep(3)
print (profile_box)
except:
pass
#click the next button to go to the next page.
link = driver.find_element_by_link_text('Next')
#Another element is covering the element you are to click.
#You could use execute_script() to click on this.
driver.execute_script("arguments[0].click();", link)
#After a certain number of pages, use break function to escape from the loop.
num_page=num_page+1
if num_page==14:
break
Thanks to Yosuva A, I could solve how to hover over the multiple users in the same page and could parse the data. I tried to develop the code more so that I loop over the multiple pages (each page includes 5 users).
My updated code surely iterates through multiple pages but at some random point, the code only parse the same user profile links.
Here's the output example I get:
https://www.tripadvisor.com/Profile/Cftra
https://www.tripadvisor.com/Profile/jessicarZ577PF
https://www.tripadvisor.com/Profile/BackPacker115730
https://www.tripadvisor.com/Profile/nanm
https://www.tripadvisor.com/Profile/kukimama
https://www.tripadvisor.com/Profile/ThreeColeys
https://www.tripadvisor.com/Profile/AlanS990
https://www.tripadvisor.com/Profile/S5227HKlisas
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
I thought I needed to add time sleep function so put those in several lines but still having the same issue. Could someone help me out and this occurs and how to get over it?
Thank you.

Python example
This code will click all the profile pics one by one and it will print the href value.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
ActionChains(driver).move_to_element(i).click().perform()
time.sleep(2)
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
driver.quit()
Output
https://www.tripadvisor.com/Profile/861kellyd
https://www.tripadvisor.com/Profile/JLERPercy
https://www.tripadvisor.com/Profile/rayn817
https://www.tripadvisor.com/Profile/grossla
https://www.tripadvisor.com/Profile/kapmem
Java Example
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Selenium {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "./lib/chromedriver");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
//finds all the comments or profiles
List<WebElement> profile= driver.findElements(By.xpath("//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']"));
for(int i=0;i<profile.size();i++)
{
//Hover on user profile photo
Actions builder = new Actions(driver);
builder.moveToElement(profile.get(i)).perform();
builder.moveToElement(profile.get(i)).click().perform();
//Wait for user details pop-up
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span//a[contains(#href,'/Profile/')]")));
//Extract the href value
String hrefvalue=driver.findElement(By.xpath("//span//a[contains(#href,'/Profile/')]")).getAttribute("href");
//Print the extracted value
System.out.println(hrefvalue);
}
//close the browser
driver.quit();
}
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to scrape links from hidden span class HTML? - python

Related

Selenium cannot find elements

How to use selenium for webscraping google flights?

Webdriver not returning some data

How can I use Selenium (Python) to do a Google Search and then open the results of the first page in new tabs?

how to hover over multiple elements using Python

Categories

Resources