python selenium get_log("performance") won't log webworker requests - python

following many article one can log XHR calls in an automated browser (using selenium) as bellow:
capabilities = DesiredCapabilities.CHROME
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="./chromedriver" )
...
logs_raw = driver.get_log("performance")
my probleme is the target request is performed by a "WebWorker", so it's not listed in the performance list of the browser main thread.
getting in chrome and manualy selecting the webworkers scope in the dev console "performance.getEntries()" gets me the request i want;
my question is how can someone perform such an action in selenium ? (python preferable).
no where in the doc of python selenium or Devtool Api have i found something similar
i'm so greatful in advance.
Edit: after some deggin i found that it has something to do with execution context of javascrip, i have no clue how to switch that in selenium

Related

Unable to programatically login to a website

So I am trying to login programatically (python) to https://www.datacamp.com/users/sign_in using my email & password.
I have tried 2 methods of login. One using requests library & another using selenium (code below). Both time facing [403] issue.
Could someone please help me login programatically to it ?
Thank you !
Using Requests library.
import requests; r = requests.get("https://www.datacamp.com/users/sign_in"); r (which gives <response [403]>)
Using Selenium webdriver.
driver = webdriver.Chrome(executable_path=driver_path, options=option)
driver.get("https://www.datacamp.com/users/sign_in")
driver.find_element_by_id("user_email") # there is supposed to be form element with id=user_email for inputting email
Implicit wait at least should have worked, like this:
from selenium import webdriver
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
url = "https://www.datacamp.com/users/sign_in"
driver.get(url)
driver.find_element_by_id("user_email").send_keys("test#dsfdfs.com")
driver.find_element_by_css_selector("#new_user>button[type=button]").click()
BUT
The real issue is the the site uses anti-scraping software.
If you open Console and go to request itself you'll see:
It means that the site blocks your connection even before you try to login.
Here is similar question with different solutions: Can a website detect when you are using Selenium with chromedriver?
Not all answers will work for you, try different approaches suggested.
With Firefox you'll have the same issue (I've already checked).
You have to add a wait after driver.get("https://www.datacamp.com/users/sign_in") before driver.find_element_by_id("user_email") to let the page loaded.
Try something like WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'user_email')))

Selenium Firefox browser is stuck after downloading pdf

Was hoping someone could help me understand what's going on:
I'm using Selenium with Firefox browser to download a pdf (need Selenium to login to the corresponding website):
le = browser.find_elements_by_xpath('//*[#title="Download PDF"]')
time.sleep(5)
if le:
pdf_link = le[0].get_attribute("href")
browser.get(pdf_link)
The code does download the pdf, but after that just stays idle.
This seems to be related to the following browser settings:
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
If I disable the first, it doesn't hang, but opens pdf instead of downloading it. If I disable the second, a "Save As" pop-up window shows up. Could someone explain how to handle this?
For me, the best way to solve this was to let Firefox render the PDF in the browser via pdf.js and then send a subsequent fetch via the Python requests library with the selenium cookies attached. More explanation below:
There are several ways to render a PDF via Firefox + Selenium. If you're using the most recent version of Firefox, it'll most likely render the PDF via pdf.js so you can view it inline. This isn't ideal because now we can't download the file.
You can disable pdf.js via Selenium options but this will likely lead to the issue in this question where the browser gets stuck. This might be because of an unknown MIME-Type but I'm not totally sure. (There's another StackOverflow answer that says this is also due to Firefox versions.)
However, we can bypass this by passing Selenium's cookie session to requests.session().
Here's a toy example:
import requests
from selenium import webdriver
pdf_url = "/url/to/some/file.pdf"
# setup driver with options
driver = webdriver.Firefox(..options)
# do whatever you need to do to auth/login/click/etc.
# navigate to the PDF URL in case the PDF link issues a
# redirect because requests.session() does not persist cookies
driver.get(pdf_url)
# get the URL from Selenium
current_pdf_url = driver.current_url
# create a requests session
session = requests.session()
# add Selenium's cookies to requests
selenium_cookies = driver.get_cookies()
for cookie in selenium_cookies:
session.cookies.set(cookie["name"], cookie["value"])
# Note: If headers are also important, you'll need to use
# something like seleniumwire to get the headers from Selenium
# Finally, re-send the request with requests.session
pdf_response = session.get(current_pdf_url)
# access the bytes response from the session
pdf_bytes = pdf_response.content
I highly recommend using seleniumwire over regular selenium because it extends Python Selenium to let you return headers, wait for requests to finish, use proxies, and much more.

How to capture network traffic using selenium webdriver and browsermob proxy on Python?

I would like to capture network traffic by using Selenium Webdriver on Python. Therefore, I must use a proxy (like BrowserMobProxy)
When I use webdriver.Chrome:
from browsermobproxy import Server
server = Server("~/browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
co = webdriver.ChromeOptions()
co.add_argument('--proxy-server={host}:{port}'.format(host='localhost', port=proxy.port))
driver = webdriver.Chrome(executable_path = "~/chromedriver", chrome_options=co)
proxy.new_har
driver.get(url)
proxy.har # returns a HAR
for ent in proxy.har['log']['entries']:
print ent['request']['url']
the webpage is loaded properly and all requests are available and accessible in the HAR file.
But when I use webdriver.Firefox:
# The same as above
# ...
from selenium import webdriver
profile = webdriver.FirefoxProfile()
driver = webdriver.Firefox(firefox_profile=profile, proxy = proxy.selenium_proxy())
proxy.new_har
driver.get(url)
proxy.har # returns a HAR
for ent in proxy.har['log']['entries']:
print ent['request']['url']
The webpage cannot be loaded properly and the number of requests in the HAR file is smaller than the number of requests that should be.
Do you have any idea what the problem of proxy settings in the second code? How should I fix it to use webdriver.Firefox properly for my purpose?
Just stumbled across this project https://github.com/derekargueta/selenium-profiler. Spits out all network data for a URL. Shouldn't be hard to hack and integrate into whatever tests you're running.
Original source: https://www.openhub.net/p/selenium-profiler
For me, following code component works just fine.
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
I am sharing my solution, this would not write any logs to any file
but you can collect all sort of messages such as Errors,
Warnings, Logs, Info, Debug , CSS, XHR as well as Requests(traffic)
1. We are going to create Firefox profile so that we can enable option of
"Persist Logs" on Firefox (you can try it to enable on your default browser and see
if it launches with "Persist Logs" without creating firefox profile )
2. we need to modify the Firefox initialize code
where this line will do magic : options.AddArgument("--jsconsole");
so complete Selenium Firefox code would be, this will open Browser Console
everytime you execute your automation :
else if (browser.Equals(Constant.Firefox))
{
var profileManager = new FirefoxProfileManager();
FirefoxProfile profile = profileManager.GetProfile("ConsoleLogs");
FirefoxDriverService service = FirefoxDriverService.CreateDefaultService(DrivePath);
service.FirefoxBinaryPath = DrivePath;
profile.SetPreference("security.sandbox.content.level", 5);
profile.SetPreference("dom.webnotifications.enabled", false);
profile.AcceptUntrustedCertificates = true;
FirefoxOptions options = new FirefoxOptions();
options.AddArgument("--jsconsole");
options.AcceptInsecureCertificates = true;
options.Profile = profile;
options.SetPreference("browser.popups.showPopupBlocker", false);
driver = new FirefoxDriver(service.FirefoxBinaryPath, options, TimeSpan.FromSeconds(100));
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(10);
}
3. Now you can write your logic since you have traffic/ logging window open so don't
go to next execution if test fails. That way Browser Console will keep your errors
messages and help you to troubleshoot further
Browser : Firefox v 61
How can you launch Browser Console for firefox:
1. open firefox (and give any URL )
2. Press Ctrl+Shift+J (or Cmd+Shift+J on a Mac)
Link : https://developer.mozilla.org/en-US/docs/Tools/Browser_Console

Can we Zoom the browser window in python selenium webdriver?

I am trying to ZOOM IN and ZOOM OUT the Chrome( selenium webdriver) only using keyboard. I have tried --
from selenium.webdriver.common.keys import Keys
driver.find_element_by_tag_name("body").send_keys(Keys.CONTROL,Keys.SUBTRACT).
but it is not working. Need answer in python.
I was just struggling with this. I managed to find something that works for me, hopefully it works for you:
driver.execute_script("document.body.style.zoom='zoom %'")
Have 'zoom%' = whatever zoom level you want. (e.g. '67%')
Environment:
Selenium 3.6.0
chromedriver 2.33
Chrome version 62.0.3202.75 (Official Build) (64-bit)
macOS Sierra 10.12.6
I tried the ways (without use the CSS) that people suggested in other questions in the past. For example, the answers in this question: Selenium webdriver zoom in/out page content.
Or this: Test zoom levels of page on browsers
without success.
So, I thought: if not with the shortcuts, what could be a different way to do that?
The idea is to use the "chrome://settings/" page in order to change the zoom:
Ok I know, for example from Going through Chrome://settings by Selenium, that every settings should be set in the ChromeOptions.
From this question I noticed that in the list of preferences the only paramater (I think) could be:
// Double that indicates the default zoom level.
const char kPartitionDefaultZoomLevel[] = "partition.default_zoom_level";
I tried, without success.
I want to repeat that I know it isn't the correct approach (and that will be different with different browser versions), but it works and, at least, was useful for me to understand how to go inside a shadow root element with selenium.
The following method return the elements inside a shadow root:
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
It is useful because in the chrome://settings/ page there are shadow root elements.
In order to do that in my browser, this is the path:
root1=driver.find_element_by_xpath("*//settings-ui")
shadow_root1 = expand_shadow_element(root1)
container= shadow_root1.find_element_by_id("container")
root2= container.find_element_by_css_selector("settings-main")
shadow_root2 = expand_shadow_element(root2)
root3=shadow_root2.find_element_by_css_selector("settings-basic-page")
shadow_root3 = expand_shadow_element(root3)
basic_page = shadow_root3.find_element_by_id("basicPage")
settings_section= basic_page.find_element_by_xpath(".//settings-section[#section='appearance']")
root4= settings_section.find_element_by_css_selector("settings-appearance-page")
shadow_root4=expand_shadow_element(root4)
and finally:
settings_animated_pages= shadow_root4.find_element_by_id("pages")
neon_animatable=settings_animated_pages.find_element_by_css_selector("neon-animatable")
zoomLevel= neon_animatable.find_element_by_xpath(".//select[#id='zoomLevel']/option[#value='0.5']")
zoomLevel.click()
The entire code:
driver = webdriver.Chrome(executable_path=r'/pathTo/chromedriver')
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
driver.get('chrome://settings/')
root1=driver.find_element_by_xpath("*//settings-ui")
shadow_root1 = expand_shadow_element(root1)
container= shadow_root1.find_element_by_id("container")
root2= container.find_element_by_css_selector("settings-main")
shadow_root2 = expand_shadow_element(root2)
root3=shadow_root2.find_element_by_css_selector("settings-basic-page")
shadow_root3 = expand_shadow_element(root3)
basic_page = shadow_root3.find_element_by_id("basicPage")
settings_section= basic_page.find_element_by_xpath(".//settings-section[#section='appearance']")
root4= settings_section.find_element_by_css_selector("settings-appearance-page")
shadow_root4=expand_shadow_element(root4)
settings_animated_pages= shadow_root4.find_element_by_id("pages")
neon_animatable=settings_animated_pages.find_element_by_css_selector("neon-animatable")
zoomLevel= neon_animatable.find_element_by_xpath(".//select[#id='zoomLevel']/option[#value='0.5']")
zoomLevel.click()
driver.get("https://www.google.co.uk/")
EDIT
As suggested by #Florent B in the comments, we can obtain the same result simple with:
driver.get('chrome://settings/')
driver.execute_script('chrome.settingsPrivate.setDefaultZoom(1.5);')
driver.get("https://www.google.co.uk/")
firefox solution for me,
Zoom body browser
zoom is a non-standard property, use transform instead (demo):
driver.execute_script("document.body.style.transform = 'scale(0.8)'")
https://github.com/SeleniumHQ/selenium/issues/4244
driver.execute_script('document.body.style.MozTransform = "scale(0.50)";')
driver.execute_script('document.body.style.MozTransformOrigin = "0 0";')
Yes, you can invoke the Chrome driver to zoom without having to use CSS. There are methods packaged into the Chrome DevTools Protocol Viewer, one being Input.synthesizePinchGesture aka zoom in/out.
For ease of use, with regards to the DevTools Protocol API, we will use a class called MyChromeDriver with webdriver.Chrome as a metaclass and a new method for sending these commands to Chrome:
# selenium
from selenium import webdriver
# json
import json
class MyChromeDriver(webdriver.Chrome):
def send_cmd(self, cmd, params):
resource = "/session/%s/chromium/send_command_and_get_result" % self.session_id
url = self.command_executor._url + resource
body = json.dumps({'cmd':cmd, 'params': params})
response = self.command_executor._request('POST', url, body)
return response.get('value')
1. Setup our webdriver and get some page:
webdriver = MyChromeDriver()
webdriver.get("https://google.com")
2. Send Chrome the Input.synthesizePinchGesture command along with its parameters via our new method send_cmd:
webdriver.send_cmd('Input.synthesizePinchGesture', {
'x': 0,
'y': 0,
'scaleFactor': 2,
'relativeSpeed': 800, # optional
'gestureSourceType': 'default' # optional
})
3. Walla! Chrome's zoom is invoked:
As a side note, there are tons of other commands you could use with send_cmd. Find them here: https://chromedevtools.github.io/devtools-protocol/
Based off this answer: Take full page screen shot in Chrome with Selenium
As you mentioned that Need it to work in Chrome. The current solutions are only for Firefox, here are a few updates and options :
Zoom the CSS :
driver.execute_script("document.body.style.zoom='150%'")
This option did work for me. But it zooms the CSS, not the Chrome Browser. So probably you are not looking at that.
Zoom In & Zoom Out the Chrome Browser :
After 4131, 4133 and 1621 the fullscreen() mode got supported to Zoom In through Selenium-Java Clients but it's not yet publicly released to PyPI.
I can see it's implemented but not pushed. Selenium 3.7 (Python) will be out soon. The push to sync version numbers will include that.
Configure the webdriver to open the Browser :
If your requirement is to execute the Test Suite in Full Screen mode, you can always use the Options Class and configure the webdriver instance with --kiosk argument as follows:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--kiosk")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://www.google.co.in')
# zoom in firefox browser
driver.get("about:preferences")
driver.execute_script("arguments[0].click();", driver.find_element(By.XPATH, "// [#id='defaultZoom']"))
ActionChains(driver).click(driver.find_element(By.XPATH, "//*[#value='50']")).perform()

Can't capture HAR using Python Selenium Script with BrowserMob-Proxy

Goal:
I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.
Problem:
I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).
BrowserMob-Proxy Explanation:
As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.
Software Specs:
Operating System: Windows 7 (64x) -- running in VirtualBox
Browser: FireFox (32.0.2)
Script Language: Python (2.7.8)
Automated Web Browser: Selenium (2.43.0) -- installed via PIP
BrowserMob-Proxy: 0.6.0 AND 2.0-beta-8 -- see explanation below
Selenium Script (this script works):
"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
driver.get('https://google.com/') # Gets google.com and loads page in browser.
driver.quit() # Closes Firefox browser
This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.
Script ALPHA with BMP (does not work):
"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
proxy.new_har("ALPHA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.
Script BETA with BMP (does not work):
"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("BETA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).
Try this:
from browsermobproxy import Server
from selenium import webdriver
import json
server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()
profile = webdriver.FirefoxProfile()
profile.set_proxy(self.proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("http://stackoverflow.com", options={'captureHeaders': True})
driver.get("http://stackoverflow.com")
result = json.dumps(proxy.har, ensure_ascii=False)
print result
proxy.stop()
driver.quit()
I use phantomJS, here is an example of how to use it with python:
import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'
s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()
What worked for me was to downgrade java version to java11. I used jenv to install and manage multiple java versions.
When you do:
proxy.har
You need to parse that response, proxy.har is a JSON object, so if you need to generate a file, you need to do this:
myFile = open('BETA_HAR.har','w')
myFile.write( str(proxy.har) )
myFile.close()
Then you will find your .har
Finding your HAR file
Inherently, the HAR object generated by the proxy is just that: an object in memory. The reason you can't find it on your hard drive is because it's not being saved there unless you write it there yourself. This is a pretty simple operation, as the HAR is just JSON.
with open("harfile", "w") as harfile:
harfile.write(json.dumps(proxy.har))
Why does ALPHA not work?
When you start dumping your HAR file, you'll find that your HAR file is empty with the ALPHA script. This is because you are not adding the proxy to the settings for Firefox, meaning that it will just connect directly bypassing your proxy.
What about BETA?
This code is written correctly as far as connecting to the proxy, although personally I prefer adding the proxy to the capabilities and passing those through. The code for that is:
cap = webdriver.DesiredCapabilities.FIREFOX.copy()
proxy.add_to_capabilities(cap)
driver = webdriver.Firefox(capabilities=cap)
I would guess that your issue lies with the proxy itself. Check the bmp.log and/or server.log files in the location of the python script and see what it is saying if something is going wrong.
Another alternative is that selenium is reporting back that the webpage has loaded before it actually has finished getting all of the elements, and as such your proxy is shutting down too early. Try making the script wait a bit longer before shutting down the proxy, or running it interactively through the interpreter.

Categories

Resources