Navigate through all the members of Research Gate Python Selenium - python

I am a rookie in python selenium. I have to navigate through all the members from the members page of an institution in Research Gate, which means I have to click the first member to go to their profile page and go back to the members page to click the next member.I tried for loop, but every time it is clicking only on the first member. Could anyone please guide me. Here is what I have tried.
from selenium import webdriver
import urllib
driver = webdriver.Firefox("/usr/local/bin/")
university="Lawrence_Technological_University"
members= driver.get('https://www.researchgate.net/institution/' + university +'/members')
membersList = driver.find_element_by_tag_name("ul")
list = membersList.find_elements_by_tag_name("li")
for member in list:
driver.find_element_by_class_name('display-name').click()
print(driver.current_url)
driver.back()

You are not even doing anything with the list members in your for loop. The state of the page changes after navigating to a different page & coming back, so you need to find the element again. One approach to handle this is given below:
for i in range(len(list)):
membersList = driver.find_element_by_tag_name("ul")
element = membersList.find_elements_by_tag_name("li")[i]
element.click()
driver.back()

Related

Unable to find element by any means

I'm kinda of a newbie in Selenium, started learning it for my job some time ago. Right now I'm working with a code that will open the browser, enter the specified website, put the products ID in the search box, search, and them open it. Once it opens the product, it needs to extract its name and price and write it in a CSV file. I'm kinda struggling with it a bit.
The main problem right now is that Selenium is unable to open the product after searching it. I've tried by ID, name and class and it still didn't work.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import csv
driver = webdriver.Firefox()
driver.get("https://www.madeiramadeira.com.br/")
assert "Madeira" in driver.title
elem = driver.find_element_by_id("input-autocomplete")
elem.clear()
elem.send_keys("525119")
elem.send_keys(Keys.RETURN)
product_link = driver.find_element_by_id('variant-url').click()
The error I get is usually this:
NoSuchElementException: Message: Unable to locate element: [id="variant-url"]
There are multiple elements with the id="variant-url", So you could use the index to click on the desired element, Also you need to handle I think the cookie pop-Up. Check the below code, hope it will help
#Disable the notifications window which is displayed on the page
option = Options()
option.add_argument("--disable-notifications")
driver = webdriver.Chrome(r"Driver_Path",chrome_options=option)
driver.get("https://www.madeiramadeira.com.br/")
assert "Madeira" in driver.title
elem = driver.find_element_by_id("input-autocomplete")
elem.clear()
elem.send_keys("525119")
elem.send_keys(Keys.RETURN)
#Clicked on the cookie popup button
accept= driver.find_element_by_xpath("//button[#id='lgpd-button-agree']")
accept.click()
OR with explicitWait
accept=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[#id='lgpd-button-agree']")))
accept.click()
#Here uses the XPath with the index like [1][2] to click on the specific element
product_link = driver.find_element_by_xpath("((//a[#id='variant-url'])[1])")
product_link.click()
OR with explicitWait
product_link=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"((//a[#id='variant-url'])[1])")))
product_link.click()
I used to get this error all the time when starting. The problem is that there are probably multiple elements with the same ID "variant-url". There are two ways you can fix this
By using "driver.find_elementS_by_id"
product_links = driver.find_elements_by_id('variant-url')
product_links[2].click()
This will create an index off all the elements with id 'variant-url' then click the index of 2. This works but it is annoying to find the correct index of the button you want to click and also takes a long time if there are many elements with the same ID.
By using Xpaths or CSS selectors
This way is a lot easier as each element has a specific Xpath of Selector. It will look like this.
product_link = driver.get_element_by_xpath("XPATH GOES HERE").click()
To get an Xpath or selector 1. go into developer mode on your browser by inspecting the element 2. right click the element in the F12 menu 3. hover over copy 4. move to copy Xpath and click on it.
Hope this helps you :D

Bypass anti scraper that changes the suffix of a url when looping through many urls

I've built a scraper that has a parent url and many children. I built a list with the urls of the children and am looping through it -all are https-. However, when I get to the second object of the loop, it adds a suffix (?Nao=0) and scrapes the parent again.
I illustrate it below:
links_products = ['https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=24',
'https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=48',
'https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=72']
from selenium import webdriver
driver = webdriver.Chrome('/home/chromedriver')
for i in links_products:
driver.get(i)
print(driver.current_url)
The result -which adds '?Nao=0'- at the end of each url is.
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=24?Nao=0
https://www.target.com/c/grocery-deals-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=48?Nao=0
https://www.target.com/c/grocery-deals-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=72?Nao=0
I've tried adding
driver.execute_script('window.history.go(-1)')
driver.refresh()
print(driver.current_url)
Then it prints the urls I actually want to scrape:
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=24
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=48
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=72
But only scrapes three times the parent of the three links above, namely:
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r
Any suggestions on how to bypass this issue?
ps. it is the same if I go through the loop, as above described, or by clicking on the button "next". It all comes back to the parent.
Just try with these urls:
https://www.target.com/c/grocery-deals/-/N-5xt0rZ55e6uZ55e69Z55e6tZ5tdv0r&Nao=0?Nao=24
Keep the Nao=0 and modify the one after ?Nao=...
It worked for me

Looping through a series of web elements of common class name

I'am writing a code to do the following using python and selenium:
1.go to google maps and search London Restaurants
2.click on first restaurant to view details and then go back to previous page and click the next restaurant (i, i+1, i+2 etc...)
Note all restaurant click pages have common class names (being 'section-result')
however when I'am running the code, for some reason, driver is not clicking on the restaurant to go to the details page.
I have tried the following code, which also was suggested in another forum post for this problem. However so far unsuccessfully.
also i have tried to do a for loop which i have also included in the code section as (option 2)
from selenium import webdriver
import random
import time
import pandas as pd
driver=webdriver.Chrome(executable_path="C:/users/usr/Desktop/chromedriver.exe")
UrlA = "https://www.google.com/maps/search/"
UrlB= "London"
UrlC="Restaurant"
UrlD= UrlA + UrlB + '+' + UrlC
driver.get("http://www.google.com/ncr") #to load page in english language
driver.get(UrlD)
time.sleep(2)
driver.maximize_window()
elements = driver.find_elements_by_class_name('section-result')
Option 1:
for i in elements:
i.click()
driver.back()
Option 2:
for i in range (1,20):
elements[i].click
driver.back
the code line (i dot click) is not responding and instead its going back to previous page. Please advise the correct modification for the code

Selenium Python: StaleElementReferenceException with a twist

I'm running into the infamous StaleElementReferceExeption error with selenium. I've checked previous questions on the subject, and the common solution is to add and implicit.wait, explicit.wait, or time.sleep to give the website time to load. I've tried this, but I am still experiencing an error. Can anyone tell what the issue is
Here is my code:
links = driver.find_elements_by_css_selector("a.overline-productName")
time.sleep(2)
#finds pricing data of links on page
link_count = 0
for element in links:
links[link_count].click()
cents = driver.find_element_by_css_selector("span.cents")
dollar = driver.find_element_by_css_selector("span.dollar")
text_price = dollar.text + "." + cents.text
price = float(text_price)
print(price)
print(link_count)
driver.execute_script("window.history.go(-1)")
link_count = link_count + 1
time.sleep(5)
what am I missing?
You're storing your links in a list. The second you follow a link to another page, that set of links is stale. So the next iteration in your loop will attempt to click a stale link from the list.
Even if you go back in history as you do later, that original element reference is gone.
Your best bet is to loop through based on index, and find the links each time you return to the page.

Getting an element by attribute and using driver to click on a child element in webscraping - Python

Webscraping results from Indeed.com
-Searching 'Junior Python' in 'Los Angeles, CA' (Done)
-Sometimes popup window opens. Close the window if popup occurs.(Done)
-Top 3 results are sponsored so skip these and go to real results
-Click on result summary section which opens up side panel with full summary
-Scrape the full summary
-When result summary is clicked, url changes. Rather than opening new window, I would like to scrape the side panel full summary
-Each real result is under ('div':{'data-tn-component':'organicJob'}). I am able to get jobtitle, company, and short summary using BeautifulSoup. However, I would like to get the full summary on the side panel.
Problem
1) When I try to click on the link(using Selenium) (jobtitle or the short summary, which opens up the side panel), the code only ends up clicking on the 1st link which is the 'sponsored'. Unable to locate and click on real result under id='jobOrganic'
2) Once a real result is clicked on(manually), I can see that the full summary side panel is under <td id='auxCol'> and within this, under . The full summary is contained within the <p> tags. When I try to have a selenium scrape full summary using findAll('div':{'id':'vjs-desc'}), all I get is blank result [].
3) The url also changes when the side panel is opened. I tried using Selenium to have driver get the new url and then soup the url to get results but all I'm getting is the 1st sponsored result which is not what I want. I'm not sure why BeautifulSoup keeps getting the result for sponsored, even when I'm running the code under 'id='jobOrganic' real results.
Here is my code. I've been working on this for almost past two days, went through stackoverflow, documentation, and google but unable to find the answer. I'm hoping someone can point out what I'm doing wrong and why I'm unable to get the full summary.
Thanks and sorry for this being so long.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
url = 'https://www.indeed.com/'
driver = webdriver.Chrome()
driver.get(url)
whatinput = driver.find_element_by_id('text-input-what')
whatinput.send_keys('Junior Python')
whereinput = driver.find_element_by_id('text-input-where')
whereinput.click()
whereinput.clear()
whereinput.send_keys('Los Angeles, CA')
findbutton = driver.find_element_by_xpath('//*[#id="whatWhere"]/form/div[3]/button')
findbutton.click()
try:
popup = driver.find_element_by_id('prime-popover-close-button')
popup.click()
except:
pass
This is where I'm stuck. The result summary is under {'data-tn-component':'organicJob'}, span class='summary'. Once I click on this, side panel opens up.
soup = bs(driver.page_source,'html.parser')
contents = soup.findAll('div',{"data-tn-component":"organicJob"})
for each in contents:
summary = driver.find_element_by_class_name('summary')
summary.click()
This opens side panel but it clicks the first sponsored link in the whole page (sponsored link), not the real result. This basically goes out of the 'organicJob' resultset for some reason.
url = driver.current_url
driver.get(url)
I tried to set the new url after clicking on the link(sponsored) to test out whether I can even get the side panel full summary(albeit sponsored, as test purpose).
soup=bs(driver.page_source,'html.parser')
fullsum = soup.findAll('div',{"id":"vjs-desc"})
print(fullsum)
This actually prints out the full summary of side panel, but it keeps printing the same 1st result over and over through the whole loop, instead of moving to the next one.
The problem is you are fetching divs using beautiful soup but, clicking using selenium which is not aware of your collected divs.
As you are using find_element_by_class_name() method of the driver object. It searches the whole page instead of your intended div object each(in the for loop). Thus, it ends up fetching the same first result from the whole page in each iterations.
One, quick work around is possible using only selenium(this will be slower though)
elements = driver.find_elements_by_tag_name('div')
for element in elements:
if "organicJob" in element.get_attribute("data-tn-component"):
summary = element.find_element_by_class_name('summary')
summary.click()
The above code will search for all the divs and, iterate over them to find divs with data-tn-component attribute containing organicJob. Once, it find one it will search for element with class name summary and click on that element.

Categories

Resources