Logging HTML requests in robot framework

Logging HTML requests in robot framework - python

I have been struggling to find much information to go along with this so I have turned here for help.
I am running UI tests of a web app using robot framework. When a test fails I want a log of the HTML requests so I can look back and see what failed, i.e. things not loading, 500 errors etc.
To this point I haven't managed to find something within the robot framework or selenium?
Another option is to see if there is a python library for logging this sort of thing or whether it would be a reasonable task to create one?
I have also looked into using autoit it use the browsers internal network logging tools but using these is a whole test of its own and I am not sure how well it would work. I am sure I must not be the first person to want this functionality?
I have continued to look into this and have found a viable option may be a packet sniffer using pcapy, I have no idea what to do in network programming and how I would proccess packets to only get post and get packets and repsonses, any help would be much appreciated
Cheers

Selenium is only emulating user behaviour, so it does not help you here. You could use a proxy that logs all the traffic and lets you examine the traffic. BrowserMob Proxy let's you do that. See Create Webdriver from Selenium2Libray on how to configure proxy for your browser.
This way you can ask your proxy to return the traffic after you noticed a failure in you test.

I have implemented same thing using BrowserMobProxy. It captures network traffic based on the test requirement.
First function CaptureNetworkTraffic(), will open the browser with configuration provided in the parameters.
Second function Parse_Request_Response(), will get the HAR file from above function and return resp. network data based on the parameter configured.
e.g.
print Capture_Request_Response("g:\\har.txt","google.com",True,True,False,False,False)
In this case, it will check url with "google.com" and returns response and request headers for the url.
from browsermobproxy import Server
from selenium import webdriver
import json
def CaptureNetworkTraffic(url,server_ip,headers,file_path):
'''
This function can be used to capture network traffic from the browser. Using this function we can capture header/cookies/http calls made from the browser
url - Page url
server_ip - remap host to for specific URL
headers - this is a dictionary of the headers to be set
file_path - File in which HAR gets stored
'''
port = {'port':9090}
server = Server("G:\\browsermob\\bin\\browsermob-proxy",port) #Path to the BrowserMobProxy
server.start()
proxy = server.create_proxy()
proxy.remap_hosts("www.example.com",server_ip)
proxy.remap_hosts("www.example1.com",server_ip)
proxy.remap_hosts("www.example2.com",server_ip)
proxy.headers(headers)
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
new = {'captureHeaders':'True','captureContent':'True'}
proxy.new_har("google",new)
driver.get(url)
proxy.har # returns a HAR JSON blob
server.stop()
driver.quit()
file1 = open(file_path,'w')
json.dump(proxy.har,file1)
file1.close()
def Parse_Request_Response(filename,url,response=False,request_header=False,request_cookies=False,response_header=False,response_cookies=False):
resp ={}
har_data = open(filename, 'rb').read()
har = json.loads(har_data)
for i in har['log']['entries']:
if url in i['request']['url']:
resp['request'] = i['request']['url']
if response:
resp['response'] = i['response']['content']
if request_header:
resp['request_header'] = i['request']['headers']
if request_cookies:
resp['request_cookies'] = i['request']['cookies']
if response_header:
resp['response_header'] = i['response']['headers']
if response_cookies:
resp['response_cookies'] = i['response']['cookies']
return resp
if (__name__=="__main__"):
headers = {"User-Agent":"Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"}
CaptureNetworkTraffic("http://www.google.com","192.168.1.1",headers,"g:\\har.txt")
print Parse_Request_Response("g:\\har.txt","google.com",False,True,False,False,False)

Related

Python requests - Session not capturing response cookies

I'm not sure how else to describe this. I'm trying to log into a website using the requests library with Python but it doesn't seem to be capturing all cookies from when I login and subsequent requests to the site go back to the login page.
The code I'm using is as follows: (with redactions)
with requests.Session() as s:
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
Looking at the developer tools in Chrome. I see the following:
After checking r.cookies it seems only that PHPSESSID was captured there's no sign of the amember_nr cookie.
The value in PyCharm only shows:
{RequestsCookieJar: 1}<RequestsCookieJar[<Cookie PHPSESSID=kjlb0a33jm65o1sjh25ahb23j4 for .website.co.uk/>]>
Why does this code fail to save 'amember_nr' and is there any way to retrieve it?
SOLUTION:
It appears the only way I can get this code to work properly is using Selenium, selecting the elements on the page and automating the typing/clicking. The following code produces the desired result.
from seleniumrequests import Chrome
driver = Chrome()
driver.get('http://www.website.co.uk')
username = driver.find_element_by_xpath("//input[#name='amember_login']")
password = driver.find_element_by_xpath("//input[#name='amember_pass']")
username.send_keys("username")
password.send_keys("password")
driver.find_element_by_xpath("//input[#type='submit']").click() # Page is logged in and all relevant cookies saved.

You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.

FYI I would use something like BurpSuite to capture ALL the data being sent to the server and sort out what headers etc are required ... sometimes people/servers to referrer checking, set cookies via JAVA or wonky scripting, even seen java obfuscation and blocking of agent tags not in whitelist etc... it's likely something the headers that the server is missing to give you the cookie.
Also you can have Python use burp as a proxy so you can see exactly what gets sent to the server and the response.
https://github.com/freeload101/Python/blob/master/CS_HIDE/CS_HIDE.py (proxy support )

How does Cloudflare differentiate Selenium and Requests traffic?

Context
I am currently attempting to build a small-scale bot using Selenium and Requests module in Python.
However, the webpage I want to interact with is running behind Cloudflare.
My python script is running over Tor using stem module.
My traffic analysis is based on Firefox's "Developer options->Network" using Persist Logs.
My findings so far:
Selenium's Firefox webdriver can often access the webpage without going through "checking browser page" (return code 503) and "captcha page" (return code 403).
Requests session object with the same user agent always results in "captcha page" (return code 403).
If Cloudflare was checking my Javascript functionality, shouldn't my requests module return 503 ?
Code Example
driver = webdriver.Firefox(firefox_profile=fp, options=fOptions)
driver.get("https://www.cloudflare.com") # usually returns code 200 without verifying the browser
session = requests.Session()
# ... applied socks5 proxy for both http and https ... #
session.headers.update({"user-agent": driver.execute_script("return navigator.userAgent;")})
page = session.get("https://www.cloudflare.com")
print(page.status_code) # return code 403
print(page.text) # returns "captcha page"
Both Selenium and Requests modules are using the same user agent and ip.
Both are using GET without any parameters.
How does Cloudflare distinguish these traffic?
Am I missing something?
I tried to transfer cookies from the webdriver to the requests session to see if a bypass is possible but had no luck.
Here is the used code:
for c in driver.get_cookies():
session.cookies.set(c['name'], c['value'], domain=c['domain'])

There are additional JavaScript APIs exposed to the webpage when using Selenium. If you can disable them, you may be able to fix the problem.

Cloudflare doesn't only check HTTP headers or javascript — it also analyses the TLS header. I'm not sure exactly how it does it, but I've found that it can be circumvented by using NSS instead of OpenSSL (though it's not well integrated into Requests).

The captcha response depends on the browser fingerprint. It's not about just sending Cookies and User-agent.
Copy all the headers from Network Tab in Developers console, and send all the key value pairs as headers in request library.
This method should work logically.

Unable to log in a site using payload with appropriate parameters as it doesn't show up in chrome dev tools

I'm trying to log in this website using my credentials running python script but the problem is that the xhr requests visible as login in chrome dev tools stays for a moment and then vanishes, so I can't see the appropriate parameters (supposed to be recorded) necessary to log in. However, I do find that login in xhr if I put my password wrong. The form then looks incomplete, though.
I've tried so far (an incomplete payload because of chrome dev tools):
import requests
url = "https://member.angieslist.com/gateway/platform/v1/session/login"
payload = {"identifier":"username","token":"sometoken"}
res = requests.post(url,json=payload,headers={
"User-Agent":"Mozilla/5.0",
"Referer":"https://member.angieslist.com/member/login"
})
print(res.url)
How can I log in that site filling in appropriate parameters issuing a post http requests?

There is a checkbox called Persist logs in the Network tab and if its switched on the data about the post request remains. I think you should requests a session if you need to keep the script logged in. It may be done with:
import requests
url = 'https://member.angieslist.com/gateway/platform/v1/session/login'
s = requests.session()
payload = {"identifier":"youremail","token":"your password"}
res = s.post(url,json=payload,headers={"User-Agent":"Mozilla/5.0",'Referer': 'https://member.angieslist.com/member/login?redirect=%2Fapp%2Faccount'}).text
print(res)
the post requests returns a json file with all details of user.

Using Python to request draftkings.com info that requires login?

I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?

The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()

Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.

python + selenium webdriver : using authenticate method

I am using python + selenium webdriver to automatize checks.
I am stuck on websites that request http authentication through popup window.
I am trying to use the "authenticate" method through the following code :
#init.
driver = webdriver.Firefox()
driver.get(url)
#get to the auth popup window by clicking relevant link
elem = driver.find_element_by_id("login_link")
elem.click()
#use authenticate alert method
driver._switch_to.alert.authenticate("login", "password")
the (scarce) infos/doc related to this method indicates that it should submit the credentials provided & validate http auth. It doesn't and I am getting the following error :
File
"/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/alert.py",
line 105, in authenticate
self.driver.execute(Command.SET_ALERT_CREDENTIALS, {'username':username, 'password':password}) File
"/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py",
line 201, in execute
self.error_handler.check_response(response) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py",
line 159, in check_response
raise exception_class(value) selenium.common.exceptions.WebDriverException: Message: Unrecognized
command: POST
/session/c30d03e1-3835-42f5-ace0-968aef486b36/alert/credentials
is there something I am missing here / has anybody come accross the same issue and resolved it ?
PS : the http://username:password#url trick doesn't work for me in my tests conditions.

Basic authentication is pretty easy to work around for automated testing, without having to deal with native alerts/dialogs or other browser differences.
The approach I've used very successfully in the Java world is to set up a Browsermob proxy server in code and register a RequestInterceptor to intercept all incoming requests (that match the host / URL pattern in question). When you have a request that would otherwise need Basic auth, add an Authorization HTTP header with the credentials required ('Basic ' + the Base64-encoded 'user:pass' string. So for 'foo:bar' you'd set the value Basic Zm9vOmJhcg==)
Start the server, set it as a web proxy for Selenium traffic, and when a request is made that requires authentication, the proxy will add the header, the browser will see it, verify the credentials, and not need to pop up the dialog.
Although the technique might seem laborious, by having the header set automatically for every request, you don't have to explicitly add user:pass# to any URL that might need it, where there are multiple ways into the auth-ed area. Also, unlike user:pass# users, you don't have to worry about the browser caching (or ceasing to cache, after a certain amount of time) the header, or about crossing HTTP/HTTPS.
That technique works very well, but how to achieve this in Python?
You could use this Python wrapper for Browsermob, which exposes its REST API in Python. This is the REST call you'll need:
POST /proxy/[port]/headers - Set and override HTTP Request headers.
For example setting a custom User-Agent. Payload data should be json
encoded set of headers (not url-encoded)
So, from the earlier example (pseudocode):
POST localhost:8787/proxy/<proxy_port>/headers '{"Authorization": "Basic Zm9vOmJhcg=="}'
Alternatively, you could see this answer for a custom Python proxy server using Twisted.

Basic authentication is possible in the URL, but you'll have to set a preference:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("network.http.phishy-userpass-length", 255)
driver = webdriver.Firefox(profile)
driver.get("http://admin:admin#the-internet.herokuapp.com/basic_auth")
If it doesn't work in your case, then it is not basic authentication.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.