how to hover over multiple elements using Python

how to hover over multiple elements using Python - python

I'm trying to hover over not only one point but the multiple points after one by one.
The point here means each user's image profile (There are 5 of them for each page).
The reason why I do this is that I try to parse each user's link profile.
But tricky part is that the html codes are hidden. In other words, it doesn't show up unless if I hover over each user's profile or picture.
Let me jump straight to my code.
from selenium.webdriver import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from bs4 import BeautifulSoup
from selenium import webdriver
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
#Get the link
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
#This is the first time for me to use Xpath so please understand if there's something wrong with my code
profile=driver.find_element_by_xpath("//div[#class='mainContent']")
profile_pic=profile.find_element_by_xpath("//div[#class='ui_avatar large']")
ActionChains(driver).move_to_element(profile_pic).perform()
ActionChains(driver).move_to_element(profile_pic).click().perform()
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
How do I hover over the multiple users (who has the same Xpath codes) in this case?
===============================updated codes==========================
num_page=0
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.implicitly_wait(10)
#Type in URL you want to visit
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-Groove_Stone_Getaway-Asheville_North_Carolina.html")
driver.maximize_window()
time.sleep(5)
#loop over multiple pages.
for j in range(1,16,1):
time.sleep(5)
try:
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
time.sleep(3)
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
time.sleep(2)
ActionChains(driver).move_to_element(i).click().perform()
time.sleep(4)
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
time.sleep(3)
print (profile_box)
except:
pass
#click the next button to go to the next page.
link = driver.find_element_by_link_text('Next')
#Another element is covering the element you are to click.
#You could use execute_script() to click on this.
driver.execute_script("arguments[0].click();", link)
#After a certain number of pages, use break function to escape from the loop.
num_page=num_page+1
if num_page==14:
break
Thanks to Yosuva A, I could solve how to hover over the multiple users in the same page and could parse the data. I tried to develop the code more so that I loop over the multiple pages (each page includes 5 users).
My updated code surely iterates through multiple pages but at some random point, the code only parse the same user profile links.
Here's the output example I get:
https://www.tripadvisor.com/Profile/Cftra
https://www.tripadvisor.com/Profile/jessicarZ577PF
https://www.tripadvisor.com/Profile/BackPacker115730
https://www.tripadvisor.com/Profile/nanm
https://www.tripadvisor.com/Profile/kukimama
https://www.tripadvisor.com/Profile/ThreeColeys
https://www.tripadvisor.com/Profile/AlanS990
https://www.tripadvisor.com/Profile/S5227HKlisas
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
https://www.tripadvisor.com/Profile/H1493VRmatthewt
I thought I needed to add time sleep function so put those in several lines but still having the same issue. Could someone help me out and this occurs and how to get over it?
Thank you.

Python example
This code will click all the profile pics one by one and it will print the href value.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
ActionChains(driver).move_to_element(i).click().perform()
time.sleep(2)
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
driver.quit()
Output
https://www.tripadvisor.com/Profile/861kellyd
https://www.tripadvisor.com/Profile/JLERPercy
https://www.tripadvisor.com/Profile/rayn817
https://www.tripadvisor.com/Profile/grossla
https://www.tripadvisor.com/Profile/kapmem
Java Example
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Selenium {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "./lib/chromedriver");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
//finds all the comments or profiles
List<WebElement> profile= driver.findElements(By.xpath("//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']"));
for(int i=0;i<profile.size();i++)
{
//Hover on user profile photo
Actions builder = new Actions(driver);
builder.moveToElement(profile.get(i)).perform();
builder.moveToElement(profile.get(i)).click().perform();
//Wait for user details pop-up
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span//a[contains(#href,'/Profile/')]")));
//Extract the href value
String hrefvalue=driver.findElement(By.xpath("//span//a[contains(#href,'/Profile/')]")).getAttribute("href");
//Print the extracted value
System.out.println(hrefvalue);
}
//close the browser
driver.quit();
}
}

Related

How to use selenium for webscraping google flights?

I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the code below, all I get back is 14 empty lists.
from selenium import webdriver
from lxml import html
from time import sleep
driver = webdriver.Chrome(r"C:\Users\14074\Python\chromedriver")
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
sleep(1)
tree = html.fromstring(driver.page_source)
for flight_tree in tree.xpath('//div[#class="TQqf0e sSHqwe tPgKwe ogfYpf"]'):
title = flight_tree.xpath('.//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[2]/div/div[2]/div[6]/div/div[2]/div/div[1]/div/div[1]/div/div[2]/div[2]/div[2]/span/text()')
price = flight_tree.xpath('.//span[contains(#data-gs, "CjR")]')
print(title, price)
#driver.close()
This is just the first part of my code but I can't really continue without getting this to work. If anyone has some ideas on what I'm doing wrong that would be amazing! It's been driving me crazy. Thank you!

I noticed a few issues with your code. First of all, I believe that when entering this page, first google will show you the "I agree to terms and conditions" popup before showing you the content of the page, therefore you need to first click on that button.
Also, you should use the find_elements_by_xpath function directly on driver instead of using the page content, as this also allows you to render the javascript content. You can find more info here: python tree.xpath return empty list
To get more info on how to scrape using selenium and python you could check out this guide: https://www.webscrapingapi.com/python-selenium-web-scraper/
I used the following code to scrape the titles. (I also changed the xpaths to do so, by extracting them directly from google chrome. You can do that by right clicking on an element -> inspect and in the elements tab where the element is, you can right click -> copy -> Copy xpath)
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# I used these for the code to work on my windows subsystem linux
option = webdriver.ChromeOptions()
option.add_argument('--no-sandbox')
option.add_argument('--disable-dev-sh-usage')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=option)
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button/span').click() # this is necessary to pres the I agree button
elements = driver.find_elements_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[3]/div[3]/c-wiz/div/div[2]/div[1]/div/div/ol/li')
for flight_tree in elements:
title = flight_tree.find_element_by_xpath('.//*[#class="W6bZuc YMlIz"]').text
print(title)

I tried the below code, with screen maximized and having explicit waits and could successfully extract the information, please see below :
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.get("https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA")
wait = WebDriverWait(driver, 10)
titles = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div/descendant::h3")))
for name in titles:
print(name.text)
price = name.find_element(By.XPATH, "./../following-sibling::div/descendant::span[2]").text
print(price)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
Tokyo
₹38,473
Mumbai
₹3,515
Dubai
₹15,846

Python Selenium: Can't Get HREF Link Off Instagram in <time> tags

PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1][*[local-name()='a']]").get_attribute('href')
print (PostLinkExtraction)
Im trying to print the href link from the Time Stamp on Instagram under the first post on my Instagram Timeline. The code above returns none for some reason. Below is the code for anyone who wants to run it and see where I may have went wrong, but the overall goal I want to accomplish is to extract the href link from the <-time> tags. Below is an image of where the <-time> tags will be in developer tools
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
user = 'username'
passw = 'password'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://www.instagram.com/')
driver.implicitly_wait(10)
driver.find_element_by_name('username').send_keys(user)
driver.find_element_by_name('password').send_keys(passw)
Login = "//button[#type='submit']"
sleep(2)
driver.find_element_by_xpath(Login).submit()
sleep(1)
# Logs into Instagram
print ('Logged In')
#------------------------ATTENTION
NotNow = "//button[contains(text(),'Not Now')]"
driver.find_element_by_xpath(NotNow).click()
# Clicks Pop Up
print ('Close Pop Up')
# It's weird but the pop up opens once, only after this page.
# If ever a problem delete one, or have the first click be
# directed to your Instagram Profiles timeline
NotNow = "//button[contains(text(),'Not Now')]"
driver.find_element_by_xpath(NotNow).click()
#Clicks Pop Up; Comment out the line above if it causes an error
print ('Close Pop Up')
#-----------------------------------
driver.refresh()
print ('refreshing')
driver.implicitly_wait(10)
PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1][*[local-name()='a']]").get_attribute('href')
print (PostLinkExtraction)

I find out the issue is because of your xpath. Fix it and you will print out the href of your first post.
PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1]/a").get_attribute('href')
print (PostLinkExtraction)
The result:

Short Answer: Stop sticking to xpaths and find the elements you're looking for in this way:
1 - put all the elements with the same tag in an array
2 - search for two-three attributes that renders it unique
3- extract it cycling in the array and use it
Easy, fast and clean.

Dynamic element (Table) in page is not updated when i use Click() in selenium, so i couldn't retrive the new data

Page which i need to scrape data from: Digikey Search result
Issue
It is allowed to show only 100 row in each table, so i have to move between multiple tables using the NextPageButton.
As illustrated in the code below, I actually do though, but the results retrieves to me every time the first table results and doesn't move on to the next table results on my click action ActionChains(driver).click(element).perform().
Keep in mind that NO new pages is opened, click is going to be intercepted by some sort of JavaScript to do some rich UI stuff on the same page to load a new table of data
My Expectations
I am just trying to validate that I could move to the next table, then i will edit the code to loop through all of them.
This piece of code should return the data in the second table from results, BUT it actually returns the values from the first table which loaded initially with the URL. This means that the click action didn't occur or it actually occurred but the WebDriver driver content isn't being updated by interacting with dynamic JavaScript elements in the page.
I will appreciate any help, Thanks..
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver import ActionChains
import time
import sys
url = "https://www.digikey.com/en/products/filter/coaxial-connectors-rf-terminators/382?s=N4IgrCBcoA5QjAGhDOl4AYMF9tA"
chrome_driver_path = "..PATH\\chromedriver"
chrome_options = Options()
chrome_options.add_argument ("--headless")
webdriver = webdriver.Chrome(
executable_path= chrome_driver_path
,options= chrome_options
)
with webdriver as driver:
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(presence_of_element_located((By.CSS_SELECTOR, "tbody")))
element = driver.find_element_by_css_selector("button[data-testid='btn-next-page']")
ActionChains(driver).click(element).perform()
time.sleep(10) #too much time i know, but to make sure it is not a waiting issue. something needs to be updated
results = driver.find_elements_by_css_selector("tbody")
for count in results:
countArr = count.text
print(countArr)
print()
driver.close()

Finally found a SOLUTION !
Source of the solution.
As expected the issue was in the clicking action itself. It is somehow not being done right or it's not being done at all as illustrated in the solution Source question.
the solution is to click the button using Javascript execution.
Change line 30
ActionChains(driver).click(element).perform()
to be as following:
driver.execute_script("arguments[0].click();",element)
That's it..

how to scrape links from hidden span class HTML?

I'm learning web scraping as I scrape real world data from real websites.
Yet, I've never ran into this type of issues until now.
One can usually search for wanted HTML source codes by right-clicking the part of the websites and then clicking inspect option. I'll jump to the example right away to explain the issue.
From the above picture, the red color marked span class is not there originally but when I put(did not even click) my cursor on a user's name, a small box for that user pops up and also that span class shows up. What I ultimately want to scrape is the link address for a user's profile which is embedded inside of that span class.I'm not sure but IF I can parse that span class, I guess I can try to scrape the link address but I keep failing to parse that hidden span class.
I didn't expect that much but my codes of course gave me the empty list because that span class didn't show up when my cursor was not on the user's name. But I show my code to show what I've done.
from bs4 import BeautifulSoup
from selenium import webdriver
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
time.sleep(3)
#parse html
html =driver.page_source
soup=BeautifulSoup(html,"html.parser")
hidden=soup.find_all("span", class_="ui_overlay ui_popover arrow_left")
print (hidden)
Are there any simple and intuitive ways to parse that hidden span class using selenium? If I can parse it, I may use 'find' function to parse the link address for a user and then loop over all the users to get all the link addresses.
Thank you.
=======================updated the question by adding below===================
To add some more detailed explanations on what I want to retrieve, I want to get the link that is pointed with a red arrow from the below picture. Thank you for pointing out that I need more explanations.
==========================updated code so far=====================
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",options=option)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html")
time.sleep(3)
profile=driver.find_element_by_xpath("//div[#class='mainContent']")
profile_pic=profile.find_element_by_xpath("//div[#class='ui_avatar large']")
ActionChains(driver).move_to_element(profile_pic).perform()
ActionChains(driver).move_to_element(profile_pic).click().perform()
#So far I could successfully hover over the first user. A few issues occur after this line.
#The error message says "type object 'By' has no attribute 'xpath'". I thought this would work since I searched on the internet how to enable this function.
waiting=wait(driver, 5).until(EC.element_to_be_clickable((By.xpath,('//span//a[contains(#href,"/Profile/")]'))))
#This gives me also a error message saying that "unable to locate the element".
#Some of the ways to code in Python and Java were different so I searched how to get the value of the xpath which contains "/Profile/" but gives me an error.
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
Also, is there any way to iterate through xpath in this case?

I think you can use requests library instead of selenium.
When you hover on username, you will get Request URL as below.
import requests
from bs4 import BeautifulSoup
html = requests.get('https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html')
print(html.status_code)
soup = BeautifulSoup(html.content, 'html.parser')
# Find all UID of username
# Split the string "UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293" into UID, SRC
# And recombine to Request URL
name = soup.find_all('div', class_="memberOverlayLink")
for i in name:
print(i.get('id'))
# Use url to get profile link
response = requests.get('https://www.tripadvisor.com/MemberOverlay?Mode=owa&uid=805E0639C29797AEDE019E6F7DA9FF4E&c=&src=507403702&fus=false&partner=false&LsoId=&metaReferer=')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.find('a')
print(result.get('href'))
This is output:
200
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_D37FB22A0982ED20FA4D7345A60B8826-SRC_511863293
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_805E0639C29797AEDE019E6F7DA9FF4E-SRC_507403702
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_6A86C50AB327BA06D3B8B6F674200EDD-SRC_506453752
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_97307AA9DD045AE5484EEEECCF0CA767-SRC_500684401
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
UID_E629D379A14B8F90E01214A5FA52C73B-SRC_496284746
/Profile/JLERPercy
If you want to use selenium to get popup box,
You can use ActionChains to do hover() function.
But I think it's less efficient than using requests.
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver).move_to_element(element).perform()

Python
The below code will extract the href value.Try and let me know how it goes.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
#finds all the comments or profile pics
profile_pic= driver.find_elements(By.XPATH,"//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']")
for i in profile_pic:
#clicks all the profile pic one by one
ActionChains(driver).move_to_element(i).perform()
ActionChains(driver).move_to_element(i).click().perform()
#print the href or link value
profile_box=driver.find_element_by_xpath('//span//a[contains(#href,"/Profile/")]').get_attribute("href")
print (profile_box)
driver.quit()
Java example:
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.interactions.Actions;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class Selenium {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "./lib/chromedriver");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.get("https://www.tripadvisor.com/VacationRentalReview-g60742-d7951369-or20-Groove_Stone_Getaway-Asheville_North_Carolina.html");
//finds all the comments or profiles
List<WebElement> profile= driver.findElements(By.xpath("//div[#class='prw_rup prw_reviews_member_info_hsx']//div[#class='ui_avatar large']"));
for(int i=0;i<profile.size();i++)
{
//Hover on user profile photo
Actions builder = new Actions(driver);
builder.moveToElement(profile.get(i)).perform();
builder.moveToElement(profile.get(i)).click().perform();
//Wait for user details pop-up
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//span//a[contains(#href,'/Profile/')]")));
//Extract the href value
String hrefvalue=driver.findElement(By.xpath("//span//a[contains(#href,'/Profile/')]")).getAttribute("href");
//Print the extracted value
System.out.println(hrefvalue);
}
//close the browser
driver.quit();
}
}
output
https://www.tripadvisor.com/Profile/861kellyd
https://www.tripadvisor.com/Profile/JLERPercy
https://www.tripadvisor.com/Profile/rayn817
https://www.tripadvisor.com/Profile/grossla
https://www.tripadvisor.com/Profile/kapmem

Scraper doesn't stop clicking on the next page button

I've written a script in python in combination with selenium to get some names and corresponding addresses displayed upon a search and the search keyword is "Saskatoon". However, the data, in this case, traverse multiple pages. My script almost does everything except for one thing.
It still runs even though there are no more pages to traverse. The last page also holds ">" sign for next page option and is not grayed out.
Here is the link: Page_link
Search_keyword: Saskatoon (in the city/town field).
Here is what I've written:
from selenium import webdriver; import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("above_link")
time.sleep(3)
search_input = driver.find_element_by_id("cityField")
search_input.clear()
search_input.send_keys("Saskatoon")
search_input.send_keys(Keys.ENTER)
while True:
try:
wait.until(EC.visibility_of_element_located((By.LINK_TEXT, "›"))).click()
time.sleep(2)
except:
break
driver.quit()
BTW, I've just taken out the name and address part form this script which I suppose is not relevant here. Thanks.

You can use class attribute of > button as on last page it is "ng-scope disabled" while on rest pages - "ng-scope":
wait.until(EC.visibility_of_element_located((By.XPATH, "//li[#class='ng-scope']/a[.='›']"))).click()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to hover over multiple elements using Python - python

Related

How to use selenium for webscraping google flights?

Python Selenium: Can't Get HREF Link Off Instagram in <time> tags

Dynamic element (Table) in page is not updated when i use Click() in selenium, so i couldn't retrive the new data

how to scrape links from hidden span class HTML?

Scraper doesn't stop clicking on the next page button

Categories

Resources