I am trying to scrape a website and I am unable to send keys to the email/password boxes to log in.
from config import root_dir, username_placer, password_placer
import os
import pathlib
os.chdir(root_dir)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import time
import math
import datetime as dt
import pandas as pd
from parse_account_name import parse_account_name
class Placer:
def __init__(self, account_name):
# Set webdriver parameters
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--kiosk-printing')
chrome_options.add_argument("window-size=1440,1080")
chromedriver = r'.\_reference\chromedriver.exe'
self.driver = webdriver.Chrome(executable_path=chromedriver, options=chrome_options)
self.driver.get('https://permits.placer.ca.gov/CitizenAccess/Default.aspx')
# Log in
WebDriverWait(self.driver, 30).until(lambda d: d.find_element_by_xpath('//*[#id="ctl00_PlaceHolderMain_LoginBox_txtUserId"]')).send_keys(username_placer)
self.driver.find_element_by_xpath('//*[#id="ctl00_PlaceHolderMain_LoginBox_txtPassword"]').send_keys(password_placer)
self.driver.find_element_by_xpath('//*[#id="ctl00_PlaceHolderMain_LoginBox_btnLogin"]').click()
time.sleep(2)
def search_placer(account_name):
scraper = Placer(account_name)
#
if __name__ == '__main__':
account_name = '''test'''
result = search_placer(account_name)
print(result)
It will just return a TimeoutException. I've tried using EC.element_to_be_clickable, but I get the same error. Without waiting and just trying to skip to sending keys, I (predictably) get NoSuchElementException.
If I print the page source, it returns the collapsed HTML.
It seems like the page and elements load by the time it looks for the login box element, so I'm not sure why it can't find it. What should I try at this point?
The issue you are facing is that the elements you are trying to reach are inside of an IFRAME. Using Selenium, you have to first switch to the IFRAME before interacting with that part of the page. Once you are done with elements inside the IFRAME, make sure you switch back to the default content.
Additional feedback...
If you are going to locate an element using the ID, always use By.ID instead of By.XPATH. It's much simpler syntax and easier to read.
You should always use the EC class methods when possible rather than writing your own custom wait conditions.
Using sleep is a bad practice and should be avoided whenever possible. Instead use WebDriverWait each time you need to wait for an element as you did earlier in your code.
Using these suggestions, the updated code would be something like
self.driver.get('https://permits.placer.ca.gov/CitizenAccess/Default.aspx')
driver.switch_to.frame(driver.find_element(By.ID, "ACAFrame"))
# Log in
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "ctl00_PlaceHolderMain_LoginBox_txtUserId")).send_keys(username_placer)
self.driver.find_element(By.ID, "ctl00_PlaceHolderMain_LoginBox_txtPassword").send_keys(password_placer)
self.driver.find_element(By.ID, "ctl00_PlaceHolderMain_LoginBox_btnLogin").click()
driver.switch_to.default_content()
# don't use sleep... instead use a WebDriverWait
Related
I have tried to scrap info from that site - specifically, from a table. Every time I occur, info that elements doesn't exist.
https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
I try to add time.slep(5) to my code or scrolling down function to load all element - ineffective.
Do you have any advice for me?
EDIT
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
I added code in regard to your request. So my main goal is to scrap content from the table at this site. I add Explicit Waits to my code and still I can't select anything from that table - it's looking like the script doesn't see anything from that area.
One way to try solve it, its using the Xpath of the element or the relative position of the same, to make Selenium, get allways the same "line" of position to return the "value" of the information that you are searching.
Ex1:find_element(:xpath,"//*[#id="wmd-input"]")#in that case it's the input of this check box.
If it doesnt work, try this one.
Ex2: browser.implicitly_wait(30) #makes a timer to load all the informations from the web to your machine.
I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium
Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.
One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="content"]/div[3]/div[1]')))
box = driver.find_elements_by_class_name('game_summary expanded nohover')
print (box)
driver.quit()
Try the below code, it is working in my computer. Do let me know if you still face problem.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="content"]/div[3]/div[1]')))
boxes = driver.wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[#class=\"game_summary expanded nohover\"]")))
print("Number of Elements Located : ", len(boxes))
for box in boxes:
print(box.text)
print("-----------")
driver.quit()
If it resolves your problem then please mark it as answer. Thanks
Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas
import pandas as pd
dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
for idx, table in enumerate(dfs[:-2]):
print (table)
if (idx+1)%3 == 0:
print("-----------")
I'm trying to click a button with its class but it throws an ElementNotInteractableException.
Here is the website HTML code
Here is the code I'm using
driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)
driver.get('https://physionet.org/lightwave/?db=noneeg/1.0.0')
def get_spo2hr(subject):
driver.find_element_by_xpath("//select[#name='record']/option[text()='"+subject+"']").click()
driver.find_element_by_id('ui-id-3').click()
driver.find_element_by_id('viewann').click()
driver.find_element_by_id('viewsig').click()
driver.find_element_by_id('lwform').click()
driver.find_element_by_css_selector(".fwd").click()
driver.save_screenshot('screenie.png')
get_spo2hr('Subject10_SpO2HR')
One thing is (as said in other answers) the unstable css selector prefer xpath
But the main thing is that the div is overlapping the a item at the dom rendering
Just wait one second to wait until the dom loads:
import time
time.sleep(1)
Example code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome()
driver.get('https://physionet.org/lightwave/?db=noneeg/1.0.0')
def get_spo2hr(subject):
driver.find_element_by_xpath("//select[#name='record']/option[text()='"+subject+"']").click()
import time
time.sleep(1)
driver.find_element_by_id('ui-id-3').click()
driver.find_element_by_id('viewann').click()
driver.find_element_by_id('viewsig').click()
driver.find_element_by_id('lwform').click()
driver.find_element_by_xpath('/html/body/div[1]/main/div/div/div/form/div[3]/table/tbody/tr/td[2]/div/button[3]').click()
driver.save_screenshot('screenie.png')
get_spo2hr('Subject10_SpO2HR')
I always prefer getting elements using their xpath, of course, in suitable situations. With that being said, I modified your code to find the forward button using its xpath and it works.
Here is the modified code:
driver = webdriver.Chrome('chromedriver.exe', chrome_options=options)
driver.get('https://physionet.org/lightwave/?db=noneeg/1.0.0')
def get_spo2hr(subject):
driver.find_element_by_xpath("//select[#name='record']/option[text()='" + subject + "']").click()
driver.find_element_by_id('ui-id-3').click()
driver.find_element_by_id('viewann').click()
driver.find_element_by_id('viewsig').click()
driver.find_element_by_id('lwform').click()
driver.find_element_by_xpath('/html/body/div[1]/main/div/div/div/form/div[3]/table/tbody/tr/td[2]/div/button[3]').click()
driver.save_screenshot('screenie.png')
get_spo2hr('Subject10_SpO2HR')
I'm trying to get the xpath of an element in site https://www.tradingview.com/symbols/BTCUSD/technicals/
Specifically the result under the summary speedometer. Whether it's buy or sell.
Speedometer
Using Google Chrome xpath I get the result
//*[#id="technicals-root"]/div/div/div[2]/div[2]/span[2]
and to try and get that data in python I plugged it into
from lxml import html
import requests
page = requests.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')
tree = html.fromstring(page.content)
status = tree.xpath('//*[#id="technicals-root"]/div/div/div[2]/div[2]/span[2]/text()')
When I print status I get an empty array. But it doesn't seem like anything is wrong with the xpath. I've read that google does some shenanigans with incorrectly written HTML tables which will output the wrong xpath but that doesn't seem to be the issue.
When I run your code, the "technicals-root" div is empty. I assume javascript is filling it in. When you can't get a page statically, you can always turn to Selenium to run a browser and let it figure everything out. You may have to tweak the driver path to get it working in your environment but this works for me:
import time
import contextlib
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
option = webdriver.ChromeOptions()
option.add_argument(" — incognito")
with contextlib.closing(webdriver.Chrome(
executable_path='/usr/lib/chromium-browser/chromedriver',
chrome_options=option)) as browser:
browser.get('https://www.tradingview.com/symbols/BTCUSD/technicals/')
# wait until js has filled in the element - and a bit longer for js churn
WebDriverWait(browser, 20).until(EC.visibility_of_element_located(
(By.XPATH,
'//*[#id="technicals-root"]/div/div/div[2]/div[2]/span')))
time.sleep(1)
status = browser.find_elements_by_xpath(
'//*[#id="technicals-root"]/div/div/div[2]/div[2]/span[2]')
print(status[0].text)
I'm trying to automate the filling out of an Easily Apply job application on Indeed. Here is an example of a job application on Indeed that uses the Easily Apply approach. I've tried every which way to navigate the nested iframes; however, I cannot find an approach that works. I even found that this question has been asked before, unfortunately, the solution given to the question does not work for me. Below is my code as it stands now:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('indeed_url_goes_here')
driver.find_element_by_class_name('indeed-apply-button').click()
driver.switch_to_frame(driver.find_element_by_xpath('/html/body/iframe'))
driver.switch_to_frame(driver.find_element_by_xpath('//*[#id="indeedapply-modal-preload-iframe"]'))
driver.find_element_by_class_name('applicant.name')
Find the first parent iframe and switch to it and then to the nested frame by index.
Complete working code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://www.indeed.com/viewjob?jk=2e3d019aa34a2801&q=bartender&tk=1a9g51n08a3iof6h&from=web&advn=5333586156877432&sjdu=UvkB_mgi5f7NyMagFcTHP0E6zA3mclLGHWb8Kte-0FV3cY2ZuZvj3LUvh8wnnxrqeYWG3HpvTXBK3G4htWfwgfQeMa0N1Tds6VxYb4V3Vlg&pub=4a1b367933fd867b19b072952f68dceb")
driver.find_element_by_class_name('indeed-apply-button').click()
wait = WebDriverWait(driver, 10)
frame = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "iframe[name$=modal-iframe]")))
driver.switch_to.frame(frame)
driver.switch_to.frame(0)
print(driver.find_element_by_css_selector("h1.jobtitle").text)
Prints the job title from the popup: Bartender/Mixologist.
Well first off, the element doesn't have a class name - it has a regular name and an ID, so use either driver.find_element_by_nameor driver.find_element_by_id.