I was trying to get around the request limit of GitHub contribution graph (e.g.,https://github.com/crobby/webhook/graphs/contributors) in my web-scraping) during web scraping. So I decide to use webdriver on Tor.
I can open my web driver with the Tor browser. But it cannot stuck at the connecting stage as shown in the screenshot.
I can open links with the web driver, but I still encountered the request limit after it scraped several links. Does anyone have a hint about the potential issue?
Here is my code:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.options import Options
import os
torexe = os.popen(r'C:/Users/fredr/Desktop/Tor Browser/Browser/TorBrowser/Tor/tor.exe')
profile = FirefoxProfile(r'C:/Users/fredr/Desktop/Tor Browser/Browser/TorBrowser/Data/Browser/profile.default')
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', '127.0.0.1')
profile.set_preference('network.proxy.socks_port', 9050)
profile.set_preference("network.proxy.socks_remote_dns", False)
profile.update_preferences()
options = Options()
options.binary_location = r'C:/Users/fredr/Desktop/Tor Browser/Browser/firefox.exe'
driver = webdriver.Firefox(firefox_profile= profile, executable_path=r'C:/Users/fredr/Downloads/geckodriver.exe', options=options)
driver.get("http://check.torproject.org")
Related
ther is my code when i visit this site ( https://www.myip.com/) and after 1 minute refresh the site my ip will change , my question is How I can prevent to change my ip during browse the website by usening selenium pyhthon firefox
import os
import time
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver import FirefoxProfile
from selenium import webdriver
torexe = os.popen(r'C:\Users\sam\OneDrive\Desktop\Tor Browser\Browser\firefox.exe')
time.sleep(5)
binary = FirefoxBinary(r"C:\Users\sam\OneDrive\Desktop\Tor Browser\Browser\firefox.exe")
profile = FirefoxProfile(r"C:\Users\sam\OneDrive\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default")
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', '127.0.0.1')
profile.set_preference('network.proxy.socks_port', 9150)
profile.update_preferences()
firefox_options = webdriver.FirefoxOptions()
driver = webdriver.Firefox(profile, binary)
driver.get("https://www.google.com/")
#driver.quit()
The answer is: You can't
Selemium is definitely not a tool for such a things.
Your question is like:
"What T-shirt should I wear in order to prevent the https://www.myip.com/ from changing my PC IP address"?
I'm scraping a seemingly simple site that doesn't require a login, or any interaction with elements. However, when I use Selenium/requests/etc., the code just hangs. I've tried matching the headers to what I find using developer tools to no avail. I'm wondering if someone can point me in the right direction.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy
from fake_useragent import UserAgent
URL = 'https://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/blocks-records.xsl&url=/da/BlockTradeQuotes/V1/Block/BlockTrades?exchange=XCBT,XCME,XCEC,DUMX,XNYM&foi=FUT,OPT,SPD&assetClassId=8&tradeDate=06212021&sortCol=time&sortBy=desc&_=1624332329760'
agent = UserAgent()
userAgent = agent.random
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'--user-agent={userAgent}')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get(URL)
The option arguments are recommended for getting chromedriver to run on Google Colab. I've tried it locally without them with the same result.
I am trying to connect to a Tor browser but get an error stating "proxyConnectFailure" any ideas I have tried multiple attempts to get into the basics of Tor browser to get it connected but all in vain if any could help life could be saved big time:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary(r"C:\Users\Admin\Desktop\Tor Browser\Browser\firefox.exe")
profile = FirefoxProfile(r"C:\Users\Admin\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default")
# Configured profile settings.
proxyIP = "127.0.0.1"
proxyPort = 9150
proxy_settings = {"network.proxy.type":1,
"network.proxy.socks": proxyIP,
"network.proxy.socks_port": proxyPort,
"network.proxy.socks_remote_dns": True,
}
driver = webdriver.Firefox(firefox_binary=binary,proxy=proxy_settings)
def interactWithSite(driver):
driver.get("https://www.google.com")
driver.save_screenshot("screenshot.png")
interactWithSite(driver)
To connect to a Tor Browser through a FirefoxProfile you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
import os
torexe = os.popen(r'C:\Users\AtechM_03\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe')
profile = FirefoxProfile(r'C:\Users\AtechM_03\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default')
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', '127.0.0.1')
profile.set_preference('network.proxy.socks_port', 9050)
profile.set_preference("network.proxy.socks_remote_dns", False)
profile.update_preferences()
driver = webdriver.Firefox(firefox_profile= profile, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("http://check.torproject.org")
Browser Snapshot:
You can find a relevant discussion in How to use Tor with Chrome browser through Selenium
I would like to expand on #DebanjanB answer by adding the Linux counterpart:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
import os
torexe = os.popen('some/path/tor-browser_en-US/Browser/start-tor-browser')
# in my case, I installed it under a folder tor-browser_en-US after
# downloading and extracting it from
# https://www.torproject.org/download/ for linux
profile = FirefoxProfile(
'some/path/tor-browser_en-US/Browser/TorBrowser/Data/Browser/profile.default')
profile.set_preference('network.proxy.type', 1)
profile.set_preference('network.proxy.socks', '127.0.0.1')
profile.set_preference('network.proxy.socks_port', 9050)
profile.set_preference("network.proxy.socks_remote_dns", False)
profile.update_preferences()
firefox_options = webdriver.FirefoxOptions()
firefox_options.binary_location = '/usr/bin/firefox'
# /usr/bin/firefox is default location of firefox - for me anyway
driver = webdriver.Firefox(
firefox_profile=profile, options=firefox_options,
executable_path='wherever/you/installed/geckodriver')
# I keep my geckodriver(s) in a special folder sorted by versions.
# Geckodriver downloadable here:
# https://github.com/mozilla/geckodriver/releases/
driver.get("http://check.torproject.org")
The verified answer does not work in case of opening dot onion sites(I believe that's something to do with tor network which is not allowing access to normal firefox).
As for the latest tor browser (from the tor browser bundle), starting it using selenium causes some error due to which the browser cannot start tor proxy itself causing proxy and timeout errors(doesn't matter if tor proxy is started by python or manually or not started at all). This could also be due to port 9050 or 9150 being used by tor proxy and not being available to browser's tor instance but this does not explain the error caused when no instance of tor proxy is running.
The solution i have found is to start the tor proxy as normal, manually or using os.popen("tor.exe") and configure tor browser to not start tor proxy.
here's the code:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
os.popen(r'e:\\bla\\bla\\bla\\tor\\Tor\\tor.exe')
binary=FirefoxBinary(r'e:\\bla\\bla\\bla\\Tor Browser\\Browser\\firefox.exe')
fp=FirefoxProfile(r'e:\\foo\\bar\\bla\\Tor Browser\\Browser\\TorBrowser\\Data\\Browser\\profile.default')
fp.set_preference('extensions.torlauncher.start_tor',False)#note this
fp.set_preference('network.proxy.type',1)
fp.set_preference('network.proxy.socks', '127.0.0.1')
fp.set_preference('network.proxy.socks_port', 9050)
fp.set_preference("network.proxy.socks_remote_dns", True)
fp.update_preferences()
driver = webdriver.Firefox(firefox_profile=fp,firefox_binary=binary)
driver.get("http://check.torproject.org")
driver.get('https://www.bbcnewsv2vjtpsuy.onion/')
*note fp.set_preference('extensions.torlauncher.start_tor',False) on line 10 is being used to configure tor to not start its own tor instance so that it uses the proxy config and tor instance started above.
lo and behold as the tbb starts working like normal firefox bot browser
Automatically login gmail with selenium webdriver but when I close the browser and open it again normally, even I use selenium webdriver go to gmail, I must login again. Why?
This is my code
from selenium.webdriver.firefox.options import Options
from selenium import webdriver
import time
fp = webdriver.FirefoxProfile (r"K:\Desktop\Kiranaf\Profile")
options = Options()
options.set_preference('media.peerconnection.enabled', False)
options.set_preference('javascript.enabled', False)
driver = webdriver.Firefox(firefox_profile = fp,options = options)
driver.get("http://gmail.com")
...
Thanks for reply my question !!! :(
While running my python selenium script for firefox browser; I encountered an issue saying
Your connection is not secure
It is not allowing me to Add exception and blocked
Confirm Security Exception
as well (even with preferences manually). hence i am trying to add profiles like "webdriver_accept_untrusted_certs", "webdriver_accept_untrusted_certs" but nothing is helping me out. Not sure how to tackle this...
I need some help here
Currently using the following...
Python 3.4.4
selenium==3.4.1
linux 32 bit
Firefox 60.6.1esr (32-bit)
Everything seems to be compatible, so no issue here.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
cap = DesiredCapabilities().FIREFOX
profile = webdriver.FirefoxProfile()
profile.set_preference("webdriver_assume_untrusted_issuer", False)
profile.update_preferences()
browser = webdriver.Firefox(capabilities=cap,firefox_profile=profile)
browser.get('my url')
and
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
cap = DesiredCapabilities().FIREFOX
profile = webdriver.FirefoxProfile()
profile.set_preference("webdriver_accept_untrusted_certs", True)
browser = webdriver.Firefox(capabilities=cap,firefox_profile=profile)
browser.get('my url')
I want to get rid of the "Your Connection is not secure"
For FireFox you can use:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
desired_caps = DesiredCapabilities.FIREFOX.copy()
desired_caps.update({'acceptInsecureCerts': True, 'acceptSslCerts': True})
driver = webdriver.Firefox(capabilities=self.desired_caps)
For Chrome:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
driver = webdriver.Chrome(options=options)