I have searched on stackoverflow as well as just googled how to use Proxies with Selenium. I found two different ways but none are working for me. Can you guys please help me figure out what I am doing wrong?
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
proxy = "YYY.YYY.YYY.YY:XXXX"
prox = Proxy()
prox.proxy_type = ProxyType.MANUAL
prox.http_proxy = proxy
prox.https_proxy = proxy
capabilities = webdriver.DesiredCapabilities.CHROME
prox.add_to_capabilities(capabilities)
options = webdriver.ChromeOptions()
options.add_experimental_option('detach', True)
driver = webdriver.Chrome(desired_capabilities=capabilities, options=options)
Code Above did not work. The page would open but if I go to "Whatsmyip.com" I could see my home IP.
I then tried another method I found on this link:
https://www.browserstack.com/guide/set-proxy-in-selenium
proxy = "YYY.YYY.YYY.YY:XXXX"
options = webdriver.ChromeOptions()
options.add_experimental_option('detach', True)
options.add_argument("--proxy--server=%s" % proxy)
driver = webdriver.Chrome(options = options)
Same result as with the previous method. Browser will open but home IP.
Worth mentioning that I tried with USER:PASS proxies, as well as IP Authorized proxies. None worked!
In addition to helping me figure out how to use proxies, I would also like to understand why these methods are different. On the one hand, the Selenium documentation talks about a proxy class which you access via the "common.proxy" class, yet the second method is directly using Chrome's options and not Selenium's proxy class. I am confused as to why have two methods, and of course which one works more reliably.
Thanks
Authentificated proxies aren't supported for chrome by default. If you still need them, refer to Selenium-Profiles.
Your second code snippet should work as following:
proxy = "https://host_or_ip:port" # or "socks5://" or "http://"
options = webdriver.ChromeOptions()
options.add_argument("--proxy--server=%s" % proxy)
driver = webdriver.Chrome(options = options)
driver.get("http://lumtest.com/myip.json") # test proxy
input("Press ENTER to exit")
If it still doesn't work, check your proxy with curl or python requests.
Related
I have been searching loads of forums for using a proxy in python with the selenium library to prevent "max number" timeout with web scraping through selenium.
I found the script below in many forums, but it just doesn't seem to work for me whatsoever... Could anyone please help me and give me some advice on how to implement proxy in chrome through python with selenium.
Thanks a lot!
SCRIPT:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
chromedriver = directory....
PROXY = "177.202.59.58:8080"
chrome_options = Options()
chrome_options.add_argument('--proxy-server=%s' % PROXY)
chrome = webdriver.Chrome(chromedriver, options=chrome_options)
chrome.get("https://whatismyipaddress.com")
There's nothing wrong with your code. That proxy is just not available/not working anymore.
Try to find another proxy that a better uptime. Keep it mind that public proxies have a noticeable latency so the page will load pretty slow.
I'm using Selenium webdriver to open a webpage and I set up a proxy for the driver to use. The code is listed below:
PATH = "C:\Program Files (x86)\chromedriver.exe"
PROXY = "212.237.16.60:3128" # IP:PORT or HOST:PORT
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'--proxy-server={PROXY}')
proxy = Proxy()
proxy.auto_detect = False
proxy.http_proxy = PROXY
proxy.sslProxy = PROXY
proxy.socks_proxy = PROXY
capabilities = webdriver.DesiredCapabilities.CHROME
proxy.add_to_capabilities(capabilities)
driver = webdriver.Chrome(PATH, chrome_options=chrome_options,desired_capabilities=capabilities)
driver.get("https://whatismyipaddress.com")
The problem is that the web driver is not using the given proxy and it accesses the page with my normal IP. I already tried every type of code I could find on the internet and it didn't work. I also tried to set a proxy directly in my pc settings and when I open a normal chrome page it works fine (it's not a proxy server problem then), but if I open a page with the driver it still uses my normal IP and somehow bypasses the proxy. I also tried changing the proxy settings of the IDE (pycharm) and still it's not working. I'm out of ideas, could someone help me?
This should work.
Code snippet-
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
PROXY = "212.237.16.60:3128"
#add proxy in chrome_options
chrome_options.add_argument(f'--proxy-server={PROXY}')
driver = webdriver.Chrome(PATH,options=chrome_options)
#to check new IP
driver.get("https://api.ipify.org/?format=json")
Note:- chrome_options is deprecated now, you have to use options instead
The challenge I see is that, through selenium, I am trying to click on a website element (a div with some js attached). The "button" navigates you to another page.
How can I configure the browser to automatically route the requests through a proxy?
My proxy is set up as follows:
http://api.myproxy.com?key=AAA111BBB6&url=http://awebsitetobrowse.com
I am trying to put webdriver (chrome) behind the proxy
from selenium import webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=options)
where options, so far, is some basic configuration of the browser window size.
I have seen quite some examples (ex1, ex2, ex3) but I somehow fail to find an example that suits my needs.
import os
dir_path = os.path.dirname(os.path.realpath(__file__)) + "\\chromedriver.exe"
PROXY = "http://api.scraperapi.com?api_key=1234&render=true"
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % PROXY)
driver = webdriver.Chrome(executable_path = dir_path, chrome_options=chrome_options)
driver.get("https://stackoverflow.com/questions/11450158/how-do-i-set-proxy-for-chrome-in-python-webdriver")
Though it seems like the Proxy address you are using is not an actual proxy it is an API that returns HTML content of page itself after handling proxies, captcha or any IP blocking. But still for different scenario there can be different solution. some of those are as follow.
Scenario 1
So according to me, you are using this API in the wrong manner if your
api provide the facility to return the response of your visited page through the proxy.
So it should be used directly in 'driver.get()' with
address="http://api.scraperapi.com/?api_key=YOURAPIKEY&url="+url_to_be_visited_via_api
Example code for this would look like:
import os
dir_path = os.path.dirname(os.path.realpath(__file__)) + "\\chromedriver.exe"
APIKEY=1234 #replace with your API Key
apiURL = "http://api.scraperapi.com/?api_key="+APIKEY+"&render=true&url="
visit_url = "https://stackoverflow.com/questions/11450158/how-do-i-set-proxy-for-chrome-in-python-webdriver"
from selenium import webdriver
driver = webdriver.Chrome(executable_path = dir_path)
driver.get(apiURL+visit_url)
Scenario 2
But if you have some API that provides proxy address and login
credentials in response then it can be fudged in chrome options to use
it with chrome itself.
This should be in case if response of api is something like
"PROTOCOL://user:password#proxyserver:proxyport" (In case of authentication)
"PROTOCOL://proxyserver:proxyport" (In case of null authentication)
In both cases PROTOCOL can like HTTP, HTTPS, SOCKS4, SOCKS5 etc.
And that code should look like:
import os
dir_path = os.path.dirname(os.path.realpath(__file__)) + "\\chromedriver.exe"
import requests
proxyapi = "http://api.scraperapi.com?api_key=1234&render=true"
proxy=requests.get(proxyapi).text
visit_url = "https://stackoverflow.com/questions/11450158/how-do-i-set-proxy-for-chrome-in-python-webdriver"
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server='+proxy)
driver = webdriver.Chrome(executable_path = dir_path, chrome_options=chrome_options)
driver.get(visit_url)
Scenario 3
But if you have some API itself is a proxy with null authentication, then it can be fudged in chrome options to use
it with chrome itself.
And that code should look like:
import os
dir_path = os.path.dirname(os.path.realpath(__file__)) + "\\chromedriver.exe"
proxyapi = "http://api.scraperapi.com?api_key=1234&render=true"
visit_url = "https://stackoverflow.com/questions/11450158/how-do-i-set-proxy-for-chrome-in-python-webdriver"
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server='+proxyapi)
driver = webdriver.Chrome(executable_path = dir_path, chrome_options=chrome_options)
driver.get(visit_url)
So the solution can be used as per the different scenario.
Well, after countless of experiments, I have figure out that the thing works with:
apiURL = "http://api.scraperapi.com/?api_key="+APIKEY+"&render=true&url="
while fails miserably with
apiURL = "http://api.scraperapi.com?api_key="+APIKEY+"&render=true&url="
I have to admit my ignorance here: I thought the two should be equivalent
When I try to use a proxy in firefox it doesn't work, the code doesn't give me any error, all works good but the ip doesn't change.
But when I do it with chrome it works.
Can you please tell me what I'm doing wrong?
from selenium import webdriver
PROXY = "ipproxy"
firefox_options = webdriver.FirefoxOptions()
firefox_options.add_argument ('--proxy-server=%s' % PROXY)
firefox = webdriver.Firefox(options=firefox_options)
firefox.get("https://www.youtube.com/")
I've done plenty of searching however there is a lot of confusing snippets out there that are very similar.
I've attempted to use the DesiredCapabilities, ChromeOptions, Options and a series of arguments but nothing is working :( It fails to set a proxy.
For example (ChromeOptions)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy=https://' + proxy_ip_and_port)
chrome_options.add_argument('--proxy-auth=' + proxy_user_and_pass)
chrome_options.add_argument('--proxy-type=https')
browser = webdriver.Chrome("C:\drivers\chromedriver.exe")
Another example (Options)
options = Options()
options.add_argument('--proxy=https://' + proxy_ip_and_port)
options.add_argument('--proxy-auth=' + proxy_user_and_pass)
options.add_argument('--proxy-type=https')
browser = webdriver.Chrome("C:\drivers\chromedriver.exe", chrome_options=options)
I've also used --proxy-server instead of --proxy-auth, --proxy-type... etc even in the format of: '--proxy-server=http://' + proxy_user_and_pass + '#' + proxy_ip_and_port
Another example (DesiredCapabilities)
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = dict(DesiredCapabilities.CHROME)
capabilities['proxy'] = {'proxyType': 'MANUAL',
'httpProxy': proxy_ip_and_port,
'ftpProxy': proxy_ip_and_port,
'sslProxy': proxy_ip_and_port,
'noProxy': '',
'class': "org.openqa.selenium.Proxy",
'autodetect': False}
capabilities['proxy']['socksUsername'] = proxy_user
capabilities['proxy']['socksPassword'] = proxy_pass
browser = webdriver.Chrome(executable_path="C:\drivers\chromedriver.exe", desired_capabilities=capabilities)
I've tried in Firefox too but the same issue happens, it uses the browser with my normal IP.
According to the latest documentation (Jul 2020) you set the DesiredCapabilities for either FIREFOX or CHROME.
I've tested it for Firefox. You can check your browser's connection settings afterwards to validate the proxy was set correctly.
from selenium import webdriver
PROXY = "<HOST>:<PORT>" # HOST can be IP or name
webdriver.DesiredCapabilities.FIREFOX['proxy'] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY, # this is the https proxy
"proxyType": "MANUAL",
}
with webdriver.Firefox() as driver:
# Open URL
driver.get("https://selenium.dev")
The proxy-dict itself is documented in the Selenium wiki. You'll see here that the attribute sslProxy sets the proxy for https.
I haven't tested it for Chrome though. If it shouldn't work, you may find clues in Google's ChromeDriver documentation. According to this you also need to instantiate the webdriver with the desired_capabilites parameter (which is then actually very similar to your example, so this is now more of a guess than a proven solution):
caps = webdriver.DesiredCapabilities.CHROME.copy()
caps['proxy'] = ... # like described above
driver = webdriver.Chrome(desired_capabilities=caps)
driver.get("https://selenium.dev")