Python-Selenium page scraping is not working properly

Python-Selenium page scraping is not working properly - python

I am simply trying to open a web page through selenium web-driver. Clicks a button on it, interacts with some elements on second page.. etc..
I heard that selenium is best to work with python for this specified purpose so I wrote my code in it which works very fine at once. But gradually day after day the code which was working absolutely fine before ..just stopped working. Stopped interacting with page elements. Every time throw different errors. I am sick of this selenium behavior. Do anyone know why such so happens? Or can u suggest any good alternatives?
driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(50)
cookie = driver.find_elements_by_xpath("//*[contains(text(), 'Decline')]")
cookie[0].click()
buttons = driver.find_elements_by_xpath("//button[contains(text(), 'Search')]")
buttons[0].click()
driver.implicitly_wait(50)
close = driver.find_elements_by_css_selector("button.close")
close[0].click()
parent = driver.find_elements_by_class_name("job-info")
for link in parent[:19]:
links = link.find_elements_by_tag_name('a')
hyperlink = random.choice(links)
driver.implicitly_wait(150)
driver.find_element_by_link_text(hyperlink.text).click()
driver.close()

Related

Selenium two consecutive explicit wait times and clicks creates error with finding same elements on new page

While I was trying to create a bot to automate shopping on a certain page, I run into a problem that I couldn't fix for a long time. The goal of program was to enter the page, click on the certain button representing size and click on the buy button to add item to a cart. It was running in a loop for every link to the item that I've supplied. The code of the program:
def buy(driver: webdriver, href: str):
driver.get(href)
sizeButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, f"//span[contains(text(),'{size.upper()}')]/../..")))
sizeButton.click()
buyButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='addToCartButton']/button")))
buyButton.click()
The code worked on the first iteration, but after adding first item to a cart and switching to next page, driver couldn't find the same WebElements. I made sure that XPaths didn't change, new page was on the same window and there were no any extra iframes. To addition to that, when the code didn't include one of the "clicks" on either button, code worked fine.
After trying many possible fixes, accidentally I run into solution by doing both explicit waits first and then forcing click methods.
def buy(driver: webdriver, href: str):
driver.get(href)
sizeButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, f"//span[contains(text(),'{size.upper()}')]/../..")))
buyButton = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='addToCartButton']/button")))
sizeButton.click()
buyButton.click()
Is there anyone who can explain me why earlier approach didn't work? I had lost a lot of time fixing it, so I would love to gain new knowledge to avoid mistakes in the future.

Selenium doesn't realize page finished loading

I'm writing a trying to scrape some data from the following website:
http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/renda-fixa/
It worked as expected for a while, but now it get stuck in loading the page at line 3.
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/renda-fixa/'
driver = webdriver.Chrome()
driver.get(url)
What is weird is that the page is in fact fully loaded, as I can browse through it without a problem, but chrome keeps showing me a "Connecting..." message in the bottom.
When selenium finally gives up and raises the TimeoutException, the "Connecting..." message dissapears and Chrome understands that the page is in fact fully loaded.
If I try to manually open the link in another tab, it does so in less than a second.
Is there a way I can overide the built in "wait until loaded" and just get to next steps, as everything i need is already loaded?

http://www.b3.com.br/lumis/portal/controller/html/SetLocale.jsp?lumUserLocale=pt_BR
This link loads inifinitely.
Report a bug an ask developers to fix.

Taking full page screenshot in chrome store SELENIUM PYTHON

I'm trying to save a full-page screenshot of a chrome store page, using selenium, and python 3.
I've searched online for different answers and I keep getting only the "header" part, no matter what I try. As if the page doesn't scroll for the next "section".
I tried clicking inside the page to verify it's in focus but that didn't help.
Tried answers with stitching and imported Screenshots and Image.
my current code is:
ob = Screenshot_Clipping.Screenshot()
driver2 = webdriver.Chrome(executable_path=chromedriver)
url = "https://chrome.google.com/webstore/detail/online-game-zone-new-tab/abalcghoakdcaalbfadaacmapphamklh"
driver2.get(url)
img_url = ob.full_Screenshot(driver, save_path=r'.', image_name='Myimage.png')
print(img_url)
print('done')
driver2.close()
driver2.quit()
but that gives me this picture:
What am I doing wrong?

Cannot find element from a jump out window. How can I switch to a new jump out window?

I'm trying to automate our system with Python2.7, Selenium-webdriver, and Sikuli. I have a problem on login. Every time I open our system, the first page is an empty page, and it will jump to another page automatically; the new page is the main login page, so Python is always trying to find the element from the first page. The first page sometimes shows:
your session has timeout
I set a really large number for session timeout, but it doesn't work.
Here is my code:
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome()
driver.get('http://172.16.1.186:8080/C******E/servlet/LOGON')
# time.sleep(15)
bankid = driver.find_element_by_id("idBANK")
bankid.send_keys(01)
empid = driver.find_element_by_id("idEMPLOYEE")
empid.send_keys(200010)
pwdid = driver.fin`enter code here`d_element_by_id("idPASSWORD")
pwdid.send_keys("C******e1")
elem = driver.find_element_by_id("maint")
elem.send_keys(Keys.RETURN)

First of all, I can't see any Sikuli usage in your example. If you were using Sikuli, it wouldn't matter how the other page was launched as you'd be interacting with whatever is visible on your screen at that time.
In Selenium, if you have multiple windows you have to switch your driver to the correct one. A quick way to get a list of the available windows is something like this:
for handle in driver.window_handles:
driver.switch_to_window(handle);
print "Switched to handle:", handle
element = browser.find_element_by_tag_name("title")
print element.get_attribute("value")

Tips on navigating through thousands of web pages and scraping them?

I need to scrape data from an html table with about 20,000 rows. The table, however, is separated into 200 pages with 100 rows in each page. The problem is that I need to click on a link in each row to access the necessary data.
I was wondering if anyone had any tips to go about doing this because my current method, shown below, is taking far too long.
The first portion is necessary for navigating through Shiboleth. This part is not my concern as it only takes around 20 seconds and happens once.
from selenium import webdriver
from selenium.webdriver.support.ui import Select # for <SELECT> HTML form
driver = webdriver.PhantomJS()
# Here I had to select my school among others
driver.get("http://onesearch.uoregon.edu/databases/alphabetical")
driver.find_element_by_link_text("Foundation Directory Online Professional").click()
driver.find_element_by_partial_link_text('Login with your').click()
# We are now on the login in page where we shall input the information.
driver.find_element_by_name('j_username').send_keys("blahblah")
driver.find_element_by_name('j_password').send_keys("blahblah")
driver.find_element_by_id('login_box_container').submit()
# Select the Search Grantmakers by I.D.
print driver.current_url
driver.implicitly_wait(5)
driver.maximize_window()
driver.find_element_by_xpath("/html/body/header/div/div[2]/nav/ul/li[2]/a").click()
driver.find_element_by_xpath("//input[#id='name']").send_keys("family")
driver.find_element_by_xpath("//input[#id='name']").submit()
This is the part that is taking too long. The scraping part is not included in this code.
# Now I need to get the page source for each link of 20299 pages... :()
list_of_links = driver.find_elements_by_css_selector("a[class='profile-gate-check search-result-link']")
# Hold the links in a list instead of the driver.
list_of_linktext = []
for link in list_of_links:
list_of_linktext.append(link.text)
# This is the actual loop that clicks on each link on the page.
for linktext in list_of_linktext:
driver.find_element_by_link_text(linktext).click()
driver.implicitly_wait(5)
print driver.current_url
driver.back()
driver.implicitly_wait(5) #Waits to make sure that the page is reached.
Navigating 1 out of the 200 pages takes about 15 minutes. Is there a better way to do this?
I tried using an explicit wait instead of an implicit wait.
for linktext in list_of_linktext:
# explicit wait
WebDriverWait(driver, 2).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "a[class='profile-gate-check search-result-link']"))
)
driver.find_element_by_link_text(linktext).click()
print driver.current_url
driver.back()
The problem, however, still persists with an avg time of 5 seconds before each page.

For screen scraping, I normally steer clear of Selenium altogether. There are faster, more reliable ways to scrape data from a website.
If you're using Python, you might give beautifulsoup a try. It seems very similar to other site-scraping tools I've used in the past for other languages (most notably JSoup and NSoup).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python-Selenium page scraping is not working properly - python

Related

Selenium two consecutive explicit wait times and clicks creates error with finding same elements on new page

Selenium doesn't realize page finished loading

Taking full page screenshot in chrome store SELENIUM PYTHON

Cannot find element from a jump out window. How can I switch to a new jump out window?

Tips on navigating through thousands of web pages and scraping them?

Categories

Resources