I am using python + selenium webdriver to automatize checks.
I am stuck on websites that request http authentication through popup window.
I am trying to use the "authenticate" method through the following code :
#init.
driver = webdriver.Firefox()
driver.get(url)
#get to the auth popup window by clicking relevant link
elem = driver.find_element_by_id("login_link")
elem.click()
#use authenticate alert method
driver._switch_to.alert.authenticate("login", "password")
the (scarce) infos/doc related to this method indicates that it should submit the credentials provided & validate http auth. It doesn't and I am getting the following error :
File
"/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/alert.py",
line 105, in authenticate
self.driver.execute(Command.SET_ALERT_CREDENTIALS, {'username':username, 'password':password}) File
"/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py",
line 201, in execute
self.error_handler.check_response(response) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py",
line 159, in check_response
raise exception_class(value) selenium.common.exceptions.WebDriverException: Message: Unrecognized
command: POST
/session/c30d03e1-3835-42f5-ace0-968aef486b36/alert/credentials
is there something I am missing here / has anybody come accross the same issue and resolved it ?
PS : the http://username:password#url trick doesn't work for me in my tests conditions.
Basic authentication is pretty easy to work around for automated testing, without having to deal with native alerts/dialogs or other browser differences.
The approach I've used very successfully in the Java world is to set up a Browsermob proxy server in code and register a RequestInterceptor to intercept all incoming requests (that match the host / URL pattern in question). When you have a request that would otherwise need Basic auth, add an Authorization HTTP header with the credentials required ('Basic ' + the Base64-encoded 'user:pass' string. So for 'foo:bar' you'd set the value Basic Zm9vOmJhcg==)
Start the server, set it as a web proxy for Selenium traffic, and when a request is made that requires authentication, the proxy will add the header, the browser will see it, verify the credentials, and not need to pop up the dialog.
Although the technique might seem laborious, by having the header set automatically for every request, you don't have to explicitly add user:pass# to any URL that might need it, where there are multiple ways into the auth-ed area. Also, unlike user:pass# users, you don't have to worry about the browser caching (or ceasing to cache, after a certain amount of time) the header, or about crossing HTTP/HTTPS.
That technique works very well, but how to achieve this in Python?
You could use this Python wrapper for Browsermob, which exposes its REST API in Python. This is the REST call you'll need:
POST /proxy/[port]/headers - Set and override HTTP Request headers.
For example setting a custom User-Agent. Payload data should be json
encoded set of headers (not url-encoded)
So, from the earlier example (pseudocode):
POST localhost:8787/proxy/<proxy_port>/headers '{"Authorization": "Basic Zm9vOmJhcg=="}'
Alternatively, you could see this answer for a custom Python proxy server using Twisted.
Basic authentication is possible in the URL, but you'll have to set a preference:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("network.http.phishy-userpass-length", 255)
driver = webdriver.Firefox(profile)
driver.get("http://admin:admin#the-internet.herokuapp.com/basic_auth")
If it doesn't work in your case, then it is not basic authentication.
Related
For privacy concerns, I cannot distribute the url publicly.
I have been able to access this site successfully using python requests session = requests.Session(); r = session.post(url, auth = HttpNtlmAuth(USERNAME, PASSWORD), proxies = proxies) which works great and I can parse the webpage with bs4. I have tried to return cookies using session.cookies.get_dict() but it returns an empty dict (assuming b/c site is hosted using sharepoint). My original thought was to retrieve cookies then use them to access the site.
The issue that I'm facing is when you redirect to the url, a box comes up asking for credentials - which when entered directs you to the url. You can not inspect the page that the box is on- which means that I can't use send.keys() etc. to login using selenium/chromedriver.
I read through some documentation but was unable to find a way to enter pass/username when calling driver = webdriver.Chrome(path_driver) or following calls.
Any help/thoughts would be appreciated.
When right clicking the below - no option to inspect webpage.
Context
I am currently attempting to build a small-scale bot using Selenium and Requests module in Python.
However, the webpage I want to interact with is running behind Cloudflare.
My python script is running over Tor using stem module.
My traffic analysis is based on Firefox's "Developer options->Network" using Persist Logs.
My findings so far:
Selenium's Firefox webdriver can often access the webpage without going through "checking browser page" (return code 503) and "captcha page" (return code 403).
Requests session object with the same user agent always results in "captcha page" (return code 403).
If Cloudflare was checking my Javascript functionality, shouldn't my requests module return 503 ?
Code Example
driver = webdriver.Firefox(firefox_profile=fp, options=fOptions)
driver.get("https://www.cloudflare.com") # usually returns code 200 without verifying the browser
session = requests.Session()
# ... applied socks5 proxy for both http and https ... #
session.headers.update({"user-agent": driver.execute_script("return navigator.userAgent;")})
page = session.get("https://www.cloudflare.com")
print(page.status_code) # return code 403
print(page.text) # returns "captcha page"
Both Selenium and Requests modules are using the same user agent and ip.
Both are using GET without any parameters.
How does Cloudflare distinguish these traffic?
Am I missing something?
I tried to transfer cookies from the webdriver to the requests session to see if a bypass is possible but had no luck.
Here is the used code:
for c in driver.get_cookies():
session.cookies.set(c['name'], c['value'], domain=c['domain'])
There are additional JavaScript APIs exposed to the webpage when using Selenium. If you can disable them, you may be able to fix the problem.
Cloudflare doesn't only check HTTP headers or javascript — it also analyses the TLS header. I'm not sure exactly how it does it, but I've found that it can be circumvented by using NSS instead of OpenSSL (though it's not well integrated into Requests).
The captcha response depends on the browser fingerprint. It's not about just sending Cookies and User-agent.
Copy all the headers from Network Tab in Developers console, and send all the key value pairs as headers in request library.
This method should work logically.
How to get Authorization token from a webpage using python requests, i have used requests basicAuth to login, it was worked, but subsequent pages are not accpting te basicAuth, it returns "Authuser is not validated"
There is a login url where i have successfully logged in using python requests's basicAuth. then succeeding pages didn't accept basicAuth credential but it needed authorization header. after looking into browser inspect tool, found out that, this authorization header's value is generated as a part of session local storage. is there any way to get this session value without using webdriver API?
Sounds like what you need is a requests persistent session
import requests
s=requests.Session()
#then simply make the request like you already are
r=s.get(r'https://stackoverflow.com/')
#the cookies are persisted
s.cookies.get_dict()
>{'prov':......}
i can't really get more specific without more info about the site you're using.
At work, I sit behind a proxy. When I connect to the company WiFi and open up a browser, a pop-up box usually appears asking for my company credentials before it will let me navigate to any internal/external site.
I am using the Python Requests package to automate pulling data from an external site but am encountering a 401 error that is related to not having authenticated first. This happens when I don't authenticate first using the browser. If I authenticate first with the browser and then use Python requests then everything is fine and I'm able to navigate to any site.
My questions is how do I perform the work authentication part using Python? I want to be able to automate this process so that I can set a cron job that grabs data from an external source every night.
I've tried providing an blank URL:
import requests
response = requests.get('')
But requests.get() requires a properly structure URL. I want to be able to emulate as if I've opened up a browser and capturing the pop-up that asks for authentication. This does not rely on any URL being used.
From the requests documentation
import requests
proxies = {
"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
I have been struggling to find much information to go along with this so I have turned here for help.
I am running UI tests of a web app using robot framework. When a test fails I want a log of the HTML requests so I can look back and see what failed, i.e. things not loading, 500 errors etc.
To this point I haven't managed to find something within the robot framework or selenium?
Another option is to see if there is a python library for logging this sort of thing or whether it would be a reasonable task to create one?
I have also looked into using autoit it use the browsers internal network logging tools but using these is a whole test of its own and I am not sure how well it would work. I am sure I must not be the first person to want this functionality?
I have continued to look into this and have found a viable option may be a packet sniffer using pcapy, I have no idea what to do in network programming and how I would proccess packets to only get post and get packets and repsonses, any help would be much appreciated
Cheers
Selenium is only emulating user behaviour, so it does not help you here. You could use a proxy that logs all the traffic and lets you examine the traffic. BrowserMob Proxy let's you do that. See Create Webdriver from Selenium2Libray on how to configure proxy for your browser.
This way you can ask your proxy to return the traffic after you noticed a failure in you test.
I have implemented same thing using BrowserMobProxy. It captures network traffic based on the test requirement.
First function CaptureNetworkTraffic(), will open the browser with configuration provided in the parameters.
Second function Parse_Request_Response(), will get the HAR file from above function and return resp. network data based on the parameter configured.
e.g.
print Capture_Request_Response("g:\\har.txt","google.com",True,True,False,False,False)
In this case, it will check url with "google.com" and returns response and request headers for the url.
from browsermobproxy import Server
from selenium import webdriver
import json
def CaptureNetworkTraffic(url,server_ip,headers,file_path):
'''
This function can be used to capture network traffic from the browser. Using this function we can capture header/cookies/http calls made from the browser
url - Page url
server_ip - remap host to for specific URL
headers - this is a dictionary of the headers to be set
file_path - File in which HAR gets stored
'''
port = {'port':9090}
server = Server("G:\\browsermob\\bin\\browsermob-proxy",port) #Path to the BrowserMobProxy
server.start()
proxy = server.create_proxy()
proxy.remap_hosts("www.example.com",server_ip)
proxy.remap_hosts("www.example1.com",server_ip)
proxy.remap_hosts("www.example2.com",server_ip)
proxy.headers(headers)
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
new = {'captureHeaders':'True','captureContent':'True'}
proxy.new_har("google",new)
driver.get(url)
proxy.har # returns a HAR JSON blob
server.stop()
driver.quit()
file1 = open(file_path,'w')
json.dump(proxy.har,file1)
file1.close()
def Parse_Request_Response(filename,url,response=False,request_header=False,request_cookies=False,response_header=False,response_cookies=False):
resp ={}
har_data = open(filename, 'rb').read()
har = json.loads(har_data)
for i in har['log']['entries']:
if url in i['request']['url']:
resp['request'] = i['request']['url']
if response:
resp['response'] = i['response']['content']
if request_header:
resp['request_header'] = i['request']['headers']
if request_cookies:
resp['request_cookies'] = i['request']['cookies']
if response_header:
resp['response_header'] = i['response']['headers']
if response_cookies:
resp['response_cookies'] = i['response']['cookies']
return resp
if (__name__=="__main__"):
headers = {"User-Agent":"Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"}
CaptureNetworkTraffic("http://www.google.com","192.168.1.1",headers,"g:\\har.txt")
print Parse_Request_Response("g:\\har.txt","google.com",False,True,False,False,False)