Web Scraping "Simple Site" with Selenium Hangs - python

I'm scraping a seemingly simple site that doesn't require a login, or any interaction with elements. However, when I use Selenium/requests/etc., the code just hangs. I've tried matching the headers to what I find using developer tools to no avail. I'm wondering if someone can point me in the right direction.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy
from fake_useragent import UserAgent
URL = 'https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/blocks-records.xsl&url=/da/BlockTradeQuotes/V1/Block/BlockTrades?exchange=XCBT,XCME,XCEC,DUMX,XNYM&foi=FUT,OPT,SPD&assetClassId=8&tradeDate=06212021&sortCol=time&sortBy=desc&_=1624332329760'
agent = UserAgent()
userAgent = agent.random
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'--user-agent={userAgent}')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get(URL)
The option arguments are recommended for getting chromedriver to run on Google Colab. I've tried it locally without them with the same result.

Related

Connection Issue Python Selenium Webdriver on Tor Browser

I was trying to get around the request limit of GitHub contribution graph (e.g.,https://github.com/crobby/webhook/graphs/contributors) in my web-scraping) during web scraping. So I decide to use webdriver on Tor.
I can open my web driver with the Tor browser. But it cannot stuck at the connecting stage as shown in the screenshot.
I can open links with the web driver, but I still encountered the request limit after it scraped several links. Does anyone have a hint about the potential issue?
Here is my code:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.options import Options
import os
torexe = os.popen(r'C:/Users/fredr/Desktop/Tor Browser/Browser/TorBrowser/Tor/tor.exe')
profile = FirefoxProfile(r'C:/Users/fredr/Desktop/Tor Browser/Browser/TorBrowser/Data/Browser/profile.default')
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', '127.0.0.1')
profile.set_preference('network.proxy.socks_port', 9050)
profile.set_preference("network.proxy.socks_remote_dns", False)
profile.update_preferences()
options = Options()
options.binary_location = r'C:/Users/fredr/Desktop/Tor Browser/Browser/firefox.exe'
driver = webdriver.Firefox(firefox_profile= profile, executable_path=r'C:/Users/fredr/Downloads/geckodriver.exe', options=options)
driver.get("http://check.torproject.org")

page_load_strategy cannot be set for selenium webdriver

A site I often access via Chrome Webdriver suddenly takes much longer to load. All needed elements seem to be there but the page is still loading and blocking the code after the driver.get(url) from executing. I then tried to set my page_load_strategy to 'eager' or 'none', but that doesn't seem to change anything, and is also not reflected in the driver's capabilities.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'eager'
driver = webdriver.Chrome(r"C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe") , options=options)
driver.capabilities #'pageLoadStrategy': 'normal'
Any ideas what I'm doing wrong?

Browser.get() not working using Selenium using python

I am really bad at coding and am trying to cop a 3070. I tried to create this bot, and the browser opens, but the website link does not open (if that makes sense). Here's my code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# For Chrome
chrome_options = Options()
chrome_options.add_argument(
"--user-data-dir=C:/Users/ajith/AppData/Local/Google/Chrome/User Data")
chrome_options.add_argument('--profile-directory=Profile 1')
browser = webdriver.Chrome(
"C:/Users/ajith/OneDrive/Desktop/bot/chromedriver.exe", chrome_options=chrome_options)
# Bestbuy 3090 Page
browser.get(
"https://www.bestbuy.com/site/nvidia-geforce-rtx-3070-8gb-gddr6-pci-express-4-0-graphics-card-dark-platinum-and-black/6429442.p?skuId=6429442")` ` `

Loading Selenium user profile in Tor Chrome on Windows

This code works for Windows where it launches Chrome connected via Tor. Keep in mind you have to have Tor browser running beforehand. How can I enable the user-profile and start the browser logged in? I have tried the regular method. I have only 1 profile. Default. Doesn't seem to be working. Any clues?
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
tor_proxy = "127.0.0.1:9150"
chrome_options = Options()
'''chrome_options.add_argument("--test-type")'''
chrome_options.add_argument('--ignore-certificate-errors')
'''chrome_options.add_argument('--disable-extensions')'''
chrome_options.add_argument('disable-infobars')
'''chrome_options.add_argument("--incognito")'''
chrome_options.add_argument('--user-data=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data\\Default')
chrome_options.add_argument('--proxy-server=socks5://%s' % tor_proxy)
driver = webdriver.Chrome(executable_path='C:\\chromedriver.exe', options=chrome_options)
driver.get('https://www.gmail.com')
time.sleep(4)
driver.switch_to.frame(0)
driver.find_element_by_id("introAgreeButton").click()
Use this instead.
chrome_options.add_argument("user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data")
you don't have to specify the profile directory (Default) as it is used by default if nothing is explicitly specified using below code. SO in your case use only the above line of code
chrome_options.add_argument("profile-directory=Profile 1")
Eg:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument(r"user-data-dir=C:\Users\prave\AppData\Local\Google\Chrome\User Data")
driver =webdriver.Chrome("./chromedriver.exe",options=options)
driver.get(url)
Output:

Disable python chrome driver extensions without lose driver path

I started having issues running a python script which uses selenium and chrome driver, so I want to disable extensions of chrome driver using python without losing the path where the driver is located.
Currently I have the below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
path_to_Ie = 'C:\\Python34\\ChromeDriver\\ChromeDriver.exe'
browser = webdriver.Chrome(executable_path = path_to_Ie)
url = 'https://wwww.test.com/'
browser.get(url)
and I would like to add the below lines:
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
browser = webdriver.Chrome(chrome_options=chrome_options)
Any idea how I can merge both codes to make it work properly?
Thanks a lot.
Just provide multiple keyword arguments:
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
browser = webdriver.Chrome(executable_path=path_to_Ie, chrome_options=chrome_options)

Categories

Resources