When i try to scrape this website with selenium and python - python

When i try and scrape the website it just throws some errors
I think it may have something to do with my webdriver but idk
I am trying to get this data so i can put it in a spreadsheet to get some cool staticstics
from selenium import webdriver
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
driver.get(url)
names = driver.find_elements_by_class_name(" column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)
Here are the error i get in terminal
d:\downloads\PythonScraping\Test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
DevTools listening on ws://127.0.0.1:53131/devtools/browser/73ca0453-352e-47a0-a98a-fb539150d6f9
d:\downloads\PythonScraping\Test.py:8: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead
names = driver.find_elements_by_class_name(" column-player")
Traceback (most recent call last):
File "d:\downloads\PythonScraping\Test.py", line 8, in <module>
names = driver.find_elements_by_class_name(" column-player")
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 783, in
find_elements_by_class_name
return self.find_elements(by=By.CLASS_NAME, value=name)
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 1279, in find_elements
return self.execute(Command.FIND_ELEMENTS, {
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in
execute
self.error_handler.check_response(response)
File "C:\Users\terk0\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\errorhandler.py", line 247,
in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=98.0.4758.102)
Stacktrace:
Backtrace:
Ordinal0 [0x00EF69A3+2582947]
Ordinal0 [0x00E8A6D1+2139857]
Ordinal0 [0x00D83A98+1063576]
Ordinal0 [0x00D862B7+1073847]
Ordinal0 [0x00D8617E+1073534]
Ordinal0 [0x00D863F0+1074160]
Ordinal0 [0x00DAFCB2+1244338]
Ordinal0 [0x00DB013B+1245499]
Ordinal0 [0x00DD9F8C+1417100]
Ordinal0 [0x00DC8594+1344916]
Ordinal0 [0x00DD834A+1409866]
Ordinal0 [0x00DC8366+1344358]
Ordinal0 [0x00DA5176+1200502]
Ordinal0 [0x00DA6066+1204326]
GetHandleVerifier [0x0109BE02+1675858]
GetHandleVerifier [0x0115036C+2414524]
GetHandleVerifier [0x00F8BB01+560977]
GetHandleVerifier [0x00F8A8D3+556323]
Ordinal0 [0x00E9020E+2163214]
Ordinal0 [0x00E95078+2183288]
Ordinal0 [0x00E951C0+2183616]
Ordinal0 [0x00E9EE1C+2223644]
BaseThreadInitThunk [0x7586FA29+25]
RtlGetAppContainerNamedObjectPath [0x77957A9E+286]
RtlGetAppContainerNamedObjectPath [0x77957A6E+238]

There are 2 problems here:
Instead of
names = driver.find_elements_by_class_name(" column-player")
it should be
names = driver.find_elements_by_class_name("column-player")
(I know, there are spaces before column-player class name there, but you still should not put them inside the locator)
2) You should add a delay to access these elements only after the page have been completely loaded.
This should work better:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://prosettings.net/cs-go-pro-settings-gear-list/'
driver = webdriver.Chrome(executable_path="C:\WebDrivers\chromedriver.exe")
wait = WebDriverWait(driver, 20)
driver.get(url)
#wait for at least 1 element visibility
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".column-player")))
#short additional wait for all the other elements to complete loading
time.sleep(0.5)
names = driver.find_elements_by_class_name("column-player")
for name in names:
title = name.find_element_by_xpath('.//a').text
print(name)

Related

Python Selenium webdriver doesn't open chrome, and if it does - it keeps refreshing it without any result

I am trying to enter web.whatsapp.com while using Selenium through Python, it opens Chrome web browser but doesn't enter the site but shows "data.;" blank page instead.
import pandas as pd
import webbrowser
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManagerz
options=Options()
options.add_experimental_option("detach",True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
#driver.get("'https://web.whatsapp.com/send?phone='+x+'&text='+message+''")
driver.get("'https://ynet.co.il")
#driver = webdriver.Edge()
import time
I supposed to get the browser open on the website page but instead I get the "data:," blank page opened and those errors:
File "D:\liranew\Lib\Main.py", line 12, in <module>
driver.get("'https://ynet.co.il")
File "D:\liranew\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 455, in get
self.execute(Command.GET, {"url": url})
File "D:\liranew\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 444, in execute
self.error_handler.check_response(response)
File "D:\liranew\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 249, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=107.0.5304.88)
Stacktrace:
Backtrace:
Ordinal0 [0x010EACD3+2075859]
Ordinal0 [0x0107EE61+1633889]
Ordinal0 [0x00F7B680+571008]
Ordinal0 [0x00F6E8FE+518398]
Ordinal0 [0x00F6D2A3+512675]
Ordinal0 [0x00F6D5AD+513453]
Ordinal0 [0x00F7D0CE+577742]
Ordinal0 [0x00FDBC7D+965757]
Ordinal0 [0x00FC731C+881436]
Ordinal0 [0x00FDB56A+963946]
Ordinal0 [0x00FC7136+880950]
Ordinal0 [0x00F9FEFD+720637]
Ordinal0 [0x00FA0F3F+724799]
GetHandleVerifier [0x0139EED2+2769538]
GetHandleVerifier [0x01390D95+2711877]
GetHandleVerifier [0x0117A03A+521194]
GetHandleVerifier [0x01178DA0+516432]
Ordinal0 [0x0108682C+1665068]
Ordinal0 [0x0108B128+1683752]
Ordinal0 [0x0108B215+1683989]
Ordinal0 [0x01096484+1729668]
BaseThreadInitThunk [0x74F86359+25]
RtlGetAppContainerNamedObjectPath [0x773C7C14+228]
RtlGetAppContainerNamedObjectPath [0x773C7BE4+180]
You have to remove the ' in front of 'https://ynet.co.il. It should be https://ynet.co.il i.e.
driver.get("https://ynet.co.il")
So simple

python scrollIntoView to the element error

I am trying to get names of teams by ID. With action of to move focus to element I get element names to list and text files. At some point web page reloads, and screen freezes then it stopes focus move and does not take team names to the list or text file neither.
I even tried time.sleep(3) it could not get any team name data any.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from datetime import datetime
driver = webdriver.Chrome(r"C:\Users\Admin\Downloads\chromedriver_win32 (1)\chromedriver.exe")
driver.get("https://www.nba.com/schedule?pd=false&region=1")
driver.implicitly_wait(5)
element_to_click=driver.find_element(By.ID,"onetrust-accept-btn-handler") #.click()
element_to_click.click()
element_to_save=driver.find_element(By.XPATH,"//div/div/div/div/h4")
f=open('new_result_file00.txt','w')#before optional read=write mode was ,r+,
f.write(element_to_save.text)
f.write("\n")
f.write(str(datetime.today()))
myList=[]
myList.append(1)
elements_to_save=driver.find_elements(By.XPATH,"//*[#data-id='nba:schedule:main:team:link']")
i=1
for element in elements_to_save:
driver.execute_script("arguments[0].scrollIntoView();", element)
try:
f.write(element.text)
myList.append(element.text)
except Exception as e:
print("err",i)
i=i+1
f.write(" \n ")
f.write(str(datetime.today()))
f.close()
error TraceBack:
err 1
Traceback (most recent call last):
File "C:\pythonPro\w_crawl\w01_nba.py", line 23, in <module>
driver.execute_script("arguments[0].scrollIntoView();", element)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 491, in execute_script
return self.execute(command, {
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 428, in execute
self.error_handler.check_response(response)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=105.0.5195.127)
Stacktrace:
Backtrace:
Ordinal0 [0x004FDF13+2219795]
Ordinal0 [0x00492841+1779777]
Ordinal0 [0x003A423D+803389]
Ordinal0 [0x003A6D04+814340]
Ordinal0 [0x003A6BC2+814018]
Ordinal0 [0x003A755F+816479]
Ordinal0 [0x003FFC1B+1178651]
Ordinal0 [0x003EE7FC+1107964]
Ordinal0 [0x003FF192+1175954]
Ordinal0 [0x003EE616+1107478]
Ordinal0 [0x003C7F89+950153]
Ordinal0 [0x003C8F56+954198]
GetHandleVerifier [0x007F2CB2+3040210]
GetHandleVerifier [0x007E2BB4+2974420]
GetHandleVerifier [0x00596A0A+565546]
GetHandleVerifier [0x00595680+560544]
Ordinal0 [0x00499A5C+1808988]
Ordinal0 [0x0049E3A8+1827752]
Ordinal0 [0x0049E495+1827989]
Ordinal0 [0x004A80A4+1867940]
BaseThreadInitThunk [0x75B8FA29+25]
RtlGetAppContainerNamedObjectPath [0x77357B5E+286]
RtlGetAppContainerNamedObjectPath [0x77357B2E+238]
Process finished with exit code 1
I added expected conditions and waitwebdriver package and put 10sec to wait until element loads (!with 5 sec error ) everything flied off
wait = WebDriverWait(driver, 10)
elements_to_save=wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#data-id='nba:schedule:main:team:link']")))

How to comment on blogspot within an iframe with Selenium(Python)

I would like to comment on a blogspot with selenium, chromedriver and python. I tried many methods but failed. How can I run my code below?
driver.get(url)
iframe = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.NAME, 'comment-editor')))
driver.switch_to.frame(iframe)
element=WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.NAME, 'commentBody')))
actionChains = ActionChains(driver)
actionChains.move_to_element(element).click().perform()
actionChains.move_to_element(element).send_keys(text).perform()
I'm getting an error on this line:
element=WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.NAME, 'commentBody')))
Please help to comment with selenium.
Edit..
test url : https://lf2011b8308.blogspot.com/2011/12/macronutrients-carbohydrates-proteins.html
Error stacktrace:
Traceback (most recent call last):
File "C:/Users/Hotto/PycharmProjects/blogspot/chromes.py", line 51, in <module>
element=WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.NAME, 'commentBody')))
File "C:\Users\Hotto\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x00405FD3+2187219]
Ordinal0 [0x0039E6D1+1763025]
Ordinal0 [0x002B3E78+802424]
Ordinal0 [0x002E1C10+990224]
Ordinal0 [0x002E1EAB+990891]
Ordinal0 [0x0030EC92+1174674]
Ordinal0 [0x002FCBD4+1100756]
Ordinal0 [0x0030CFC2+1167298]
Ordinal0 [0x002FC9A6+1100198]
Ordinal0 [0x002D6F80+946048]
Ordinal0 [0x002D7E76+949878]
GetHandleVerifier [0x006A90C2+2721218]
GetHandleVerifier [0x0069AAF0+2662384]
GetHandleVerifier [0x0049137A+526458]
GetHandleVerifier [0x00490416+522518]
Ordinal0 [0x003A4EAB+1789611]
Ordinal0 [0x003A97A8+1808296]
Ordinal0 [0x003A9895+1808533]
Ordinal0 [0x003B26C1+1844929]
BaseThreadInitThunk [0x7697343D+18]
RtlInitializeExceptionChain [0x77729812+99]
RtlInitializeExceptionChain [0x777297E5+54]
To send a character sequence to the commentBody field as the elements are within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use the following locator strategies:
driver.get('https://clearing.apcs.at/emwebapcsem/startApp.do')
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name='comment-editor']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "commentBody"))).send_keys("Akif")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium TimeoutException not handling error

I'm writing a script that tests username and passwords against a list of IPs. There are four outcomes:
The IP/webpage is not reachable.
There is no login page.
The username/password fail.
The login is successful.
For some reason, the TimeoutException is not handling the error. The script crashes when an IP is not reachable.
#!/usr/bin/env python3
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
username = "8000"
password = "0"
def default_password(site_ip):
ops = webdriver.ChromeOptions()
ops.add_argument('--ignore-certificate-errors')
ops.add_argument('--ignore-ssl-errors')
ops.add_argument('--headless')
driver = webdriver.Chrome('/mnt/c/Windows/chromedriver.exe',options=ops)
try:
driver.get("https://"+site_ip)
except TimeoutException as ex:
print("Connection timed out")
driver.close()
try:
driver.find_element(By.ID, "username").send_keys(username)
driver.find_element(By.ID, "password").send_keys(password)
driver.find_element(By.ID, "submit").click()
except NoSuchElementException as ex:
print ("No login page. Try "+site_ip+" in browser.")
driver.close()
try:
success = WebDriverWait(driver, 80).until(EC.presence_of_element_located((By.ID, "loggedInUsername")))
if 'success' in locals():
print("Login successful with default password for "+ site_ip)
driver.close()
except TimeoutException as ex:
print ("Login Unsuccessful")
driver.close()
default_password(192.168.0.1)
Here's the Traceback I get (removed identifying information):
DevTools listening on ws://127.0.0.1:59113/devtools/browser/2bc2e501-4b4a-4d9a-989b-5e72c7b25a68
Traceback (most recent call last):
File "/default_pass.py", line 56, in <module>
default_password("xx.xx.xx.xx")
File "/default_pass.py", line 24, in default_password
driver.get("https://"+site_ip)
File "/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
self.execute(Command.GET, {'url': url})
File "/.local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/.local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: net::ERR_CONNECTION_TIMED_OUT
(Session info: headless chrome=103.0.5060.114)
Stacktrace:
Backtrace:
Ordinal0 [0x00616463+2188387]
Ordinal0 [0x005AE461+1762401]
Ordinal0 [0x004C3D78+802168]
Ordinal0 [0x004C04E8+787688]
Ordinal0 [0x004B654D+746829]
Ordinal0 [0x004B710A+749834]
Ordinal0 [0x004B675A+747354]
Ordinal0 [0x004B5D3F+744767]
Ordinal0 [0x004B4C28+740392]
Ordinal0 [0x004B50FD+741629]
Ordinal0 [0x004C5544+808260]
Ordinal0 [0x0051D2DD+1168093]
Ordinal0 [0x0050C7DC+1099740]
Ordinal0 [0x0051CC22+1166370]
Ordinal0 [0x0050C5F6+1099254]
Ordinal0 [0x004E6BE0+945120]
Ordinal0 [0x004E7AD6+948950]
GetHandleVerifier [0x008B71F2+2712546]
GetHandleVerifier [0x008A886D+2652765]
GetHandleVerifier [0x006A002A+520730]
GetHandleVerifier [0x0069EE06+516086]
Ordinal0 [0x005B468B+1787531]
Ordinal0 [0x005B8E88+1805960]
Ordinal0 [0x005B8F75+1806197]
Ordinal0 [0x005C1DF1+1842673]
BaseThreadInitThunk [0x75B2FA29+25]
RtlGetAppContainerNamedObjectPath [0x77157A7E+286]
RtlGetAppContainerNamedObjectPath [0x77157A4E+238]
```
You were close enough. Instead of sending the raw ip address i.e. xx.xx.xx.xx you need to pass it as a string as follows which will get appended to https:// within default_password(site_ip):
default_password("198.168.34.18")

Selenium Python not scraping from this website

I was trying to interact with this website using Selenium in python. I wrote this code to select the radio button using XPATH. But some weird error is showing in my terminal. Can anyone please solve this problem? I tried but can't figure out the problem.
My code.
from select import select
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://www2.illinois.gov/idoc/Offender/Pages/InmateSearch.aspx')
button = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table[2]/tbody/tr/td[1]/table/tbody/tr/td/form/table/tbody/tr/td/input[2]')
button.click()
driver.implicitly_wait(10)
driver.quit()
Error :
DevTools listening on ws://127.0.0.1:62348/devtools/browser/ce37da62-856d-4159-ad45-9eca8e63115a
E:\Fiverr job\Orders\1\test.py:18: DeprecationWarning: find_element_by_xpath is deprecated. Please use find_element(by=By.XPATH, value=xpath) instead
button = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table[2]/tbody/tr/td[1]/table/tbody/tr/td/form/table/tbody/tr/td/input[2]')
Traceback (most recent call last):
File "E:\Fiverr job\Orders\1\test.py", line 18, in <module>
button = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table[2]/tbody/tr/td[1]/table/tbody/tr/td/form/table/tbody/tr/td/input[2]')
File "E:\Fiverr job\Orders\1\env\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 526, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "E:\Fiverr job\Orders\1\env\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1251, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "E:\Fiverr job\Orders\1\env\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 430, in execute
self.error_handler.check_response(response)
File "E:\Fiverr job\Orders\1\env\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/table/tbody/tr/td/table[2]/tbody/tr/td[1]/table/tbody/tr/td/form/table/tbody/tr/td/input[2]"}
(Session info: chrome=102.0.5005.63)
Stacktrace:
Backtrace:
Ordinal0 [0x0054D953+2414931]
Ordinal0 [0x004DF5E1+1963489]
Ordinal0 [0x003CC6B8+837304]
Ordinal0 [0x003F9500+1021184]
Ordinal0 [0x003F979B+1021851]
Ordinal0 [0x00426502+1205506]
Ordinal0 [0x004144E4+1131748]
Ordinal0 [0x00424812+1198098]
Ordinal0 [0x004142B6+1131190]
Ordinal0 [0x003EE860+976992]
Ordinal0 [0x003EF756+980822]
GetHandleVerifier [0x007BCC62+2510274]
GetHandleVerifier [0x007AF760+2455744]
GetHandleVerifier [0x005DEABA+551962]
GetHandleVerifier [0x005DD916+547446]
Ordinal0 [0x004E5F3B+1990459]
Ordinal0 [0x004EA898+2009240]
Ordinal0 [0x004EA985+2009477]
Ordinal0 [0x004F3AD1+2046673]
BaseThreadInitThunk [0x7648FA29+25]
RtlGetAppContainerNamedObjectPath [0x77BE7A7E+286]
RtlGetAppContainerNamedObjectPath [0x77BE7A4E+238]
This is because the element you try to click is located into an iframe:
So you must first switch to it before finding and clicking the desired button:
import time
driver = webdriver.Chrome(options=options, desired_capabilities=capabilities)
driver.get('https://www2.illinois.gov/idoc/Offender/Pages/InmateSearch.aspx')
#wait a little to be sure the iframe is loaded
time.sleep(2)
#find and switch to the iframe
iframe = driver.find_element(By.XPATH, '//*[#id="soi-iframe"]')
driver.switch_to.frame(iframe)
button = driver.find_element(By.XPATH, '/html/body/table/tbody/tr/td/table[2]/tbody/tr/td[1]/table/tbody/tr/td/form/table/tbody/tr/td/input[2]')
button.click()
driver.quit()
Proof of work: (click on pic to zoom in)

Categories

Resources