Webscraping Click Button Selenium - python

I am trying to webscrape indeed.com to search for jobs using python, with selenium and beautifulsoup. I want to click next page but cant seem to figure out how to do this. Looked at many threads but it is unclear to me which element I am supposed to perform on. Here is the web page html and the code marked with grey comes up when I inspect the next button.
Also just to mention I tried first to follow what happens to the url when mousedown is executed. After reading the addppurlparam function and adding the strings in the function and using that url I just get thrown back to page one.
Here is my code for the class with selenium meant to click on the button:
from selenium import webdriver
from selenium.webdriver import ActionChains
driver = webdriver.Chrome("C:/Users/alleballe/Downloads/chromedriver.exe")
driver.get("https://se.indeed.com/Internship-jobb")
print(driver.title)
#assert "Python" in driver.title
elem = driver.find_element_by_class_name("pagination-list")
elem = elem.find_element_by_xpath("//li/a[#aria-label='Nästa']")
print(elem)
assert "No results found." not in driver.page_source
assert elem
action = ActionChains(driver).click(elem)
action.perform()
print(elem)
driver.close()

The indeed site is formatted so that it shows 10 per page.
Your photo shows the wrong section of HTML instead you can see the links contain start=0 for the first page, start=10 for the second, start=20 for the third,...
You could use this knowledge to do a code like this:
while True:
i = 0
driver.get(f'https://se.indeed.com/jobs?q=Internship&start={i}')
# code here
i = i + 10
But, to directly answer to your question you should do:
next_page_link = driver.find_element_by_xpath('/html/head/link[6]')
driver.get(next_page_link)
This will find the link and then get it.

its work. paginated to next page.
driver.find_element_by_class_name("pagination-list").find_element_by_tag_name('a').click()

Related

Selenium webscraper not scraping desired tags

here are the two tags I am trying to scrape: https://i.stack.imgur.com/a1sVN.png. In case you are wondering, this is the link to that page (the tags I am trying to scrape are not behind the paywall): https://www.wsj.com/articles/chinese-health-official-raises-covid-alarm-ahead-of-lunar-new-year-holiday-11672664635
Below is the code in python I am using, does anyone know why the tags are not properly being stored in paragraphs?
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.wsj.com/articles/chinese-health-official-raises-covid-alarm-ahead-of-lunar-new-year-holiday-11672664635'
driver = webdriver.Chrome()
driver.get(url)
paragraphs = driver.find_elements(By.CLASS_NAME, 'css-xbvutc-Paragraph e3t0jlg0')
print(len(paragraphs)) # => prints 0
So you have two problems impacting you.
you should wait for the page to load after you get() the webpage. You can do this with something like import time and time.sleep(10)
The elements that you are trying to scrape, the class tags that you are searching for change on every page load. However, the fact that it is a data-type='paragraph' stays constant, therefore you are able to do:
paragraphs = driver.find_elements(By.XPATH, '//*[#data-type="paragraph"]') # search by XPath to find the elements with that data attribute
print(len(paragraphs))
prints: 2 after the page is loaded.
Just to add-on to #Andrew Ryan's answer, you can use explicit wait for shorter and more dynamical waiting time.
paragraphs = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.XPATH, '//*[#data-type="paragraph"]'))
)
print(len(paragraphs))

Scraping Javascript with "onclick"

I am having some trouble scraping the url below:
http://102.37.123.153/Lists/eTenders/AllItems.aspx
I am using Python with Selenium, but have many "onclick" javascript events to run to get to lowest level of information. Does anyone know how to automate this?
Thanks
url = 'http://102.37.123.153/Lists/eTenders/AllItems.aspx'
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome('c:/Users/AB/Dropbox/ITProjects/Scraping/chromedriver.exe', options=chrome_options)
res = browser.get(url)
time.sleep(10)
source = browser.page_source
soup = BeautifulSoup(source)
for link in soup.find_all('a'):
if link.get('href') == 'javascript:':
print(link)
You don't need selenium with this website, you need patience. Let me explain how you'd approach that.
Click X
Y opens, click Y
Z opens, click Z.
Goes on..........
What happened here is that when you've clicked X, an AJAX request was made to get Y and after you click Y, another AJAX was made to get Z and then this goes on.
So you can just simulate those requests, open the networks tab and see how does it craft the requests then make the same ones in your code then get the response, based on it, do the next request and the cycle will go on till you get to the innermost level of the tree.
This approach has no UI and is technically-speaking, more unfriendly and harder to implement. But it's more efficient, on the other side, you can just select your clickable elements with selenium like
eleme = driver.find_elemnent_by_x('x')
elem.click()
And it will also work
I'd also note that sometimes, links don't AJAX, they just hide the info but it's in the source code. To know what you'll recieve in your response, R-click in the website and choose View page source and note that this is different than inspect element.

xpath returns more than one result, how to handle in python

I have started selenium using python. I am able to change the message text using find_element_by_id. I want to do the same with find_element_by_xpath which is not successful as the xpath has two instances. want to try this out to learn about xpath.
I want to do web scraping of a page using python in which I need clarity on using Xpath mainly needed for going to next page.
#This code works:
import time
import requests
import requests
from selenium import webdriver
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_element_by_id("user-message")
eleUserMessage.clear()
eleUserMessage.send_keys("Testing Python")
time.sleep(2)
driver.close()
#This works fine. I wish to do the same with xpath.
#I inspect the the Input box in chrome, copy the xpath '//*[#id="user-message"]' which seems to refer to the other box as well.
# I wish to use xpath method to write text in this box as follows which does not work.
driver = webdriver.Chrome()
url = "http://www.seleniumeasy.com/test/basic-first-form-demo.html"
driver.get(url)
eleUserMessage = driver.find_elements_by_xpath('//*[#id="user-message"]')
eleUserMessage.clear()
eleUserMessage.send_keys("Test Python")
time.sleep(2)
driver.close()
To elaborate on my comment you would use a list like this:
eleUserMessage_list = driver.find_elements_by_xpath('//*[#id="user-message"]')
my_desired_element = eleUserMessage_list[0] # or maybe [1]
my_desired_element.clear()
my_desired_element.send_keys("Test Python")
time.sleep(2)
The only real difference between find_elements_by_xpath and find_element_by_xpath is the first option returns a list that needs to be indexed. Once it's indexed, it works the same as if you had run the second option!

Python splinter cant click on element by css on page

I am trying to automate a booking in process on a travel site using
splinter and having trouble clicking on a css element on the page.
This is my code
import splinter
import time
secret_deals_email = {
'user[email]': 'adf#sad.com'
}
browser = splinter.Browser()
url = 'http://roomer-qa-1.herokuapp.com'
browser.visit(url)
click_FIND_ROOMS = browser.find_by_css('.blue-btn').first.click()
time.sleep(10)
# click_Book_button = browser.find_by_css('.book-button-row.blue-btn').first.click()
browser.fill_form(secret_deals_email)
click_get_secret_deals = browser.find_by_name('button').first.click()
time.sleep(10)
click_book_first_room_list = browser.find_by_css('.book-button-row-link').first.click()
time.sleep(5)
click_book_button_entry = browser.find_by_css('.entry-white-box.entry_box_no_refund').first.click()
The problem is whenever I run it and the code gets to the page where I need to click the sort of purchase I would like. I can't click any of the option on the page.
I keep getting an error of the element not existing no matter what should I do.
http://roomer-qa-1.herokuapp.com/hotels/atlanta-hotels/ramada-plaza-atlanta-downtown-capitol-park.h30129/44389932?rate_plan_id=1&rate_plan_token=6b5aad6e9b357a3d9ff4b31acb73c620&
This is the link to the page that is causing me trouble please help :).
You need to whait until the element is present at the website. You can use the is_element_not_present_by_css method with a while loop to do that
while not(is_element_not_present_by_css('.entry-white-box.entry_box_no_refund')):
time.sleep(50)

How to use find_element_by_link_text() properly to not raise NoSuchElementException?

I have a HTML code like this:
<div class="links nopreview"><span><a class="csiAction"
href="/WebAccess/home.html#URL=centric://REFLECTION/INSTANCE/_CS_Data/null">Home</a></span> • <span><span><a class="csiAction"
href="/WebAccess/home.html#URL=centric://SITEADMIN/_CS_Site">Setup</a></span> • </span><span><a
title="Sign Out" class="csiAction csiActionLink">Sign Out</a></span></div>
I would like to click on the link that has the text Home. As this Home link appears after login, I have a code like this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re
browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://myServer/WebAccess/login.html") # Load App page
elem = browser.find_element_by_name("LoginID") # Find the Login box
elem.send_keys("Administrator")
elem = browser.find_element_by_name("Password") # Find the Password box
elem.send_keys("Administrator" + Keys.RETURN)
#try:
elem = browser.find_element_by_link_text("Home")
elem.click()
The part till login works great. However the last but one line is problematic
elem = browser.find_element_by_link_text("Home")
It raises this NoSuchElementException where the Home link is there as you can see from the HTML code.
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"Home"}'
Any guidance as to what I am doing wrong, please?
Have you tried adding an implicit wait to this so that it waits instead of running to quickly.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re
browser = webdriver.Firefox() # Get local session of firefox
browser.implicitly_wait(10) #wait 10 seconds when doing a find_element before carrying on
browser.get("http://myServer/WebAccess/login.html") # Load App page
elem = browser.find_element_by_name("LoginID") # Find the Login box
elem.send_keys("Administrator")
elem = browser.find_element_by_name("Password") # Find the Password box
elem.send_keys("Administrator" + Keys.RETURN)
#try:
elem = browser.find_element_by_link_text("Home")
elem.click()
The implicitly_wait call makes the browser poll until the item is on the page and visible to be interacted with.
The most common issues with NoSuchElementException while the element is there are:
the element is in different window/frame, so you've to switch to it first,
your page is not loaded or your method of page load is not reliable.
Solution could include:
check if you're using the right frame/window by: driver.window_handles,
write a wait wrapper to wait for an element to appear,
try XPath instead, like: driver.find_element_by_xpath(u'//a[text()="Foo"]').click(),
use pdb to diagnose your problem more efficiently.
See also: How to find_element_by_link_text while having: NoSuchElement Exception?
Maybe the element you are looking for doesn't exactly match that text string?
I know it can be tricky if it looks like it does on-screen, but sometimes there are oddities embedded like this simple markup "Home" or "Home" which makes the first char italic:
"<i>H</i>ome" is visually identical to "<em>H</em>ome" but does not match text.
Edit: after writing the above answer, I studied the question closer and discovered the HTML sample does show "Home" in plain text, but was not visible due to long lines not wrapping. So I edited the OP to wrap the line for readability.
New observation: I noticed that the Logout element has a "title" attribute, but the Home link element lacks such--try giving it one and using that.
Try adding an implicit wait to this in order to wait, instead of running too quickly.
Or
else you can import time and use time.sleep(25)

Categories

Resources