Using Firefox 54, Python 2.7 and Selenium 3.8.1 I'm trying to load some extensions using a Firefox Profile but Selenium fails to load them. The browser, however, does load with this code:
from selenium import webdriver
def setup_firefox_profile():
firefox_custom_profile = webdriver.FirefoxProfile(
r'C:\Users\Owner\AppData\Roaming\Mozilla\Firefox\Profiles\1hmz9w1k.GenericFootprintHardened2')
firefox_custom_profile.add_extension(r'C:\Users\Owner\AppData\Roaming\Mozilla\Firefox\Profiles\1hmz9w1k.GenericFootprintHardened2\extensions\bestproxyswitcher#bestproxyswitcher.com.xpi')
# C:\Users\Owner\AppData\Roaming\Mozilla\Firefox\Profiles\<profile folder>
# Adblock, Best Proxy Switcher, //BetterPrivacy, Clear Flash Cookies, Canvas Fingerprint Blocker, Cookie Quick Manager, Disable WebRTC, Ghostery, Privacy Badger,
firefox_custom_profile.set_preference("extensions.bestproxyswitcher.currentVersion", "5.4.2")
firefox_settings = webdriver.Firefox(firefox_profile=firefox_custom_profile)
return firefox_settings
driver = setup_firefox_profile()
driver.get('http://duckduckgo.com')
Attempting to load the profile with a manual session I am able to see the extensions load.
It appears the profile does not load correctly. I found instructions to go to "about:support" and then "Open Folder" under the Profile Folder heading. The directory did not match the profile in name or path ("C:\Users\Owner\AppData\Local\Temp\tmpp4lzxe")
Can anyone see why the profile is not being loaded?
Is the code for loading the extensions also wrong?
Related
Was hoping someone could help me understand what's going on:
I'm using Selenium with Firefox browser to download a pdf (need Selenium to login to the corresponding website):
le = browser.find_elements_by_xpath('//*[#title="Download PDF"]')
time.sleep(5)
if le:
pdf_link = le[0].get_attribute("href")
browser.get(pdf_link)
The code does download the pdf, but after that just stays idle.
This seems to be related to the following browser settings:
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
If I disable the first, it doesn't hang, but opens pdf instead of downloading it. If I disable the second, a "Save As" pop-up window shows up. Could someone explain how to handle this?
For me, the best way to solve this was to let Firefox render the PDF in the browser via pdf.js and then send a subsequent fetch via the Python requests library with the selenium cookies attached. More explanation below:
There are several ways to render a PDF via Firefox + Selenium. If you're using the most recent version of Firefox, it'll most likely render the PDF via pdf.js so you can view it inline. This isn't ideal because now we can't download the file.
You can disable pdf.js via Selenium options but this will likely lead to the issue in this question where the browser gets stuck. This might be because of an unknown MIME-Type but I'm not totally sure. (There's another StackOverflow answer that says this is also due to Firefox versions.)
However, we can bypass this by passing Selenium's cookie session to requests.session().
Here's a toy example:
import requests
from selenium import webdriver
pdf_url = "/url/to/some/file.pdf"
# setup driver with options
driver = webdriver.Firefox(..options)
# do whatever you need to do to auth/login/click/etc.
# navigate to the PDF URL in case the PDF link issues a
# redirect because requests.session() does not persist cookies
driver.get(pdf_url)
# get the URL from Selenium
current_pdf_url = driver.current_url
# create a requests session
session = requests.session()
# add Selenium's cookies to requests
selenium_cookies = driver.get_cookies()
for cookie in selenium_cookies:
session.cookies.set(cookie["name"], cookie["value"])
# Note: If headers are also important, you'll need to use
# something like seleniumwire to get the headers from Selenium
# Finally, re-send the request with requests.session
pdf_response = session.get(current_pdf_url)
# access the bytes response from the session
pdf_bytes = pdf_response.content
I highly recommend using seleniumwire over regular selenium because it extends Python Selenium to let you return headers, wait for requests to finish, use proxies, and much more.
I currently have a script that will log on to my company's wiki, visit a page, and select a download to pdf option available on the page. However, when this option is chosen, this dialogue box
I have read there is a way to create a Firefox profile that will suppress the creation of dialogue boxes, but I am unfamiliar with the library.
from splinter import Browser
browser = Browser()
browser.visit('https://company.wiki.com')
browser.find_by_id('login-link').click()
browser.fill('os_username', 'user')
browser.fill('os_password', 'pass')
browser.find_by_name('login').click()
browser.visit('https://pageoncompany.wiki.com')
browser.find_by_xpath('//*[#id="navigation"]/ul/li[4]').click()
browser.find_by_id('action-export-pdf-link').click()
I was able to set the preference in the Firefox Browser, then call my firefox profile
browser = Browser('firefox', profile=r'C:\Users\craab\AppData\Roaming\Mozilla\Firefox\Profiles\0lot9hun.default')
Here is my problem: I'm trying to use selenium to access a webpage and the special about this page is it is an auto redirecting page (you open that page and after few seconds, it automatically redirect to another page). When i use driver = webdriver.Firefox(), my IDM catched that link just perfectly after few seconds.
And because i don't want the browser to come up so i use Phantomjs instead, ut it not working. My application just can get the loading page url (bitdl-1336...) but not the redirected link. Please help!
This is my code:
link = 'http://torrent.ajee.sh/hash.php?hash=' + self.global_hash_code
driver = webdriver.PhantomJS('phantomjs.exe')
driver.get(str(link))
element = driver.find_element_by_link_text('Download Zip')
element.click()
time.sleep(10)
msg = QMessageBox.information(self, QString('Thành công'),QString(driver.current_url))
And this is the result:
Please help!
Sorry about my english
Not exactly an answer to your PhantomJS-specific question, but a workaround to the problem.
And because i don't want the browser to come up so i use Phantomjs instead
You can continue using Firefox, but start it in a Virtual Display, see more information at:
How do I run Selenium in Xvfb?
You may also need to let the browser automatically save the archive in a specified directory, see:
How do I automatically download files from a pop up dialog using selenium-python
Access to file download dialog in Firefox
I am writing a Python program using Selenium Webdriver api that utilizes Firefox browser to browse, and I need the first page of the add on that shows it's version to be disabled and not gets shown when the browser gets to work. My add-on is NoScript.
Here is my code for Firefox profile :
def fpp():
ffprofile = webdriver.FirefoxProfile()
ffprofile.add_extension(extension='NS.xpi')
ffprofile.set_preference("extensions.noscript.currentVerison" , "2.6.9.35")
ffprofile.update_preferences()
return webdriver.Firefox(ffprofile)
def driver(url1):
m = fpp()
m.get(url1)
However, this line doesn't prevent the starting windows from showing up:
ffprofile.set_preference("extensions.noscript.currentVerison" , "2.6.9.35")
What is the problem and how do I fix it?
noscript preferences start with noscript (no need for extensions.). And you need to set the version instead of currentVersion. Works for me:
ffprofile.set_preference("noscript.version", "2.6.9.35")
Goal:
I want to run a Selenium Python script through BrowserMob-Proxy, which will capture and output a HAR file capture.
Problem:
I have a functional (very basic) Python script (shown below). When it is altered to utilize BrowserMob-Proxy to capture HAR however, it fails. Below I provide two different scripts that both fail, but for differing reasons (details provided after code snippets).
BrowserMob-Proxy Explanation:
As mentioned before, I am using both 0.6.0 AND 2.0-beta-8. The reasoning for this is that A) LightBody (lead designer of BMP) recently indicated that his most current release (2.0-beta-9) is not functional and advises users to use 2.0-beta-8 instead and B) from what I can tell from reading various site/stackoverflow information is that 0.6.0 (acquired through PIP) is used to make calls to the Client.py/Server.py, whereas 2.0-beta-8 is used to initiate the Server. To be honest, this confuses me. When importing BMP's Server however, it requires a batch (.bat) file to initiate the server, which is not provided in 0.6.0, but is with 2.0-beta-8...if anyone can shed some light on this area of confusion (I suspect it is the root of my problems described below), then I'd be most appreciative.
Software Specs:
Operating System: Windows 7 (64x) -- running in VirtualBox
Browser: FireFox (32.0.2)
Script Language: Python (2.7.8)
Automated Web Browser: Selenium (2.43.0) -- installed via PIP
BrowserMob-Proxy: 0.6.0 AND 2.0-beta-8 -- see explanation below
Selenium Script (this script works):
"""This script utilizes Selenium to obtain the Google homepage"""
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
driver.get('https://google.com/') # Gets google.com and loads page in browser.
driver.quit() # Closes Firefox browser
This script succeeds in running and does not produce any errors. It is provided for illustrative purposes to indicate it works before adding BMP logic.
Script ALPHA with BMP (does not work):
"""Using the same functional Selenium script, produce ALPHA_HAR.har output"""
from browsermobproxy import Server
server = Server('C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy')
server.start()
proxy = server.create_proxy()
from selenium import webdriver
driver = webdriver.Firefox() # Opens FireFox browser.
proxy.new_har("ALPHA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code will succeed in running the script and will not produce any errors. However, when searching the entirety of my hard drive, I never succeed in locating ALPHA_HAR.har.
Script BETA with BMP (does not work):
"""Using the same functional Selenium script, produce BETA_HAR.har output"""
from browsermobproxy import Server
server = Server("C:\Users\Matt\Desktop\\browsermob-proxy-2.0-beta-8\\bin\\browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("BETA_HAR") # Creates a new HAR
driver.get("https://www.google.com/") # Gets google.com and loads page in browser.
proxy.har # Returns a HAR JSON blob
server.stop()
This code was taken from http://browsermob-proxy-py.readthedocs.org/en/latest/. When running the above code, FireFox will attempt to get google.com, but will never succeed in loading the page. Eventually it will time out without producing any errors. And BETA_HAR.har can't be found anywhere on my hard drive. I have also noticed that, when trying to use this browser to visit any other site, it will similarly fail to load (I suspect this is due to the proxy not being configured properly).
Try this:
from browsermobproxy import Server
from selenium import webdriver
import json
server = Server("path/to/browsermob-proxy")
server.start()
proxy = server.create_proxy()
profile = webdriver.FirefoxProfile()
profile.set_proxy(self.proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("http://stackoverflow.com", options={'captureHeaders': True})
driver.get("http://stackoverflow.com")
result = json.dumps(proxy.har, ensure_ascii=False)
print result
proxy.stop()
driver.quit()
I use phantomJS, here is an example of how to use it with python:
import browsermobproxy as mob
import json
from selenium import webdriver
BROWSERMOB_PROXY_PATH = '/usr/share/browsermob/bin/browsermob-proxy'
url = 'http://google.com'
s = mob.Server(BROWSERMOB_PROXY_PATH)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [ proxy_address, '--ignore-ssl-errors=yes', ] #so that i can do https connections
driver = webdriver.PhantomJS(service_args=service_args)
driver.set_window_size(1400, 1050)
proxy.new_har(url)
driver.get(url)
har_data = json.dumps(proxy.har, indent=4)
screenshot = driver.get_screenshot_as_png()
imgname = "google.png"
harname = "google.har"
save_img = open(imgname, 'a')
save_img.write(screenshot)
save_img.close()
save_har = open(harname, 'a')
save_har.write(har_data)
save_har.close()
driver.quit()
s.stop()
What worked for me was to downgrade java version to java11. I used jenv to install and manage multiple java versions.
When you do:
proxy.har
You need to parse that response, proxy.har is a JSON object, so if you need to generate a file, you need to do this:
myFile = open('BETA_HAR.har','w')
myFile.write( str(proxy.har) )
myFile.close()
Then you will find your .har
Finding your HAR file
Inherently, the HAR object generated by the proxy is just that: an object in memory. The reason you can't find it on your hard drive is because it's not being saved there unless you write it there yourself. This is a pretty simple operation, as the HAR is just JSON.
with open("harfile", "w") as harfile:
harfile.write(json.dumps(proxy.har))
Why does ALPHA not work?
When you start dumping your HAR file, you'll find that your HAR file is empty with the ALPHA script. This is because you are not adding the proxy to the settings for Firefox, meaning that it will just connect directly bypassing your proxy.
What about BETA?
This code is written correctly as far as connecting to the proxy, although personally I prefer adding the proxy to the capabilities and passing those through. The code for that is:
cap = webdriver.DesiredCapabilities.FIREFOX.copy()
proxy.add_to_capabilities(cap)
driver = webdriver.Firefox(capabilities=cap)
I would guess that your issue lies with the proxy itself. Check the bmp.log and/or server.log files in the location of the python script and see what it is saying if something is going wrong.
Another alternative is that selenium is reporting back that the webpage has loaded before it actually has finished getting all of the elements, and as such your proxy is shutting down too early. Try making the script wait a bit longer before shutting down the proxy, or running it interactively through the interpreter.