Selenium: How to disable image loading with firefox and python? - python

I have read similar questions and one was supposed to be the answer, but when I tried it, it only gave a partial solution.
I refer to this question: Disable images in Selenium Python
My problem is that I tried the solution and some of the images do not appear, but images that arrive from:
<img href="www.xxx.png">
Are being loaded.
Is there a way to tell firefox/selenium not to get it?
If not, is there a way to discard it from the dom element that I get back, via:
self._browser.get(url)
content=self._browser.page_source
for example by doing some kind of find replace on the dom tree?
The browser configuration is the same browser from the previous question:
firefox_profile = webdriver.FirefoxProfile()
# Disable CSS
firefox_profile.set_preference('permissions.default.stylesheet', 2)
# Disable images
firefox_profile.set_preference('permissions.default.image', 2)
# Disable Flash
firefox_profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
# Set the modified profile while creating the browser object
self._browser = webdriver.Firefox(firefox_profile=firefox_profile)
Update:
I kept on digging and what I learned is that if I inspect the text document that the selenium/firefox combo did I see that, it didn't bring the images and kept them as links.
But when I did:
self._browser.save_screenshot("info.png")
I got a 24 mega file with all the img links loaded.
Can anyone explain to me this matter?
Thanks!

You can disable images using the following code:
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference('permissions.default.image', 2)
firefox_profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
driver = webdriver.Firefox(firefox_profile=firefox_profile)
if you need to block some specific url... hm...
I think you need to add string:
127.0.0.1 www.someSpecificUrl.com
to the hosts file before test start and delete it after test finish.

In the latest Firefox versions permissions.default.image can't be changed. To disable the images, either switch to ChromDriver or use alternative extentions as suggested here.

Related

Selenium: boolean setting in about:config of Firefox webdriver

For a test suite, I'm running a python script controlling a Firefox instance using selenium webdriver. I want to change the setting dom.disable_open_during_load in about:config to true. Although this is the default setting in my default Firefox profile, selenium changes it to false (user-defined) whenever I'm starting a webdriver instance. It seems to use an anonymous, slightly changed profile?! I can then manually change it back, but I was struggling to do it with code: neither using a new profile nor using a pre-set profile configured with Firefox' profile manager solves the problem.
from selenium import webdriver
FFprofile = webdriver.FirefoxProfile()
FFprofile.set_preference('dom.disable_open_during_load', 'true') # I also tried True, 1 - with and without quotes
# FFprofile = webdriver.FirefoxProfile('C:/Users/ExampleUser/AppData/Local/Mozilla/Firefox/Profiles/owieroiuysd.testprofile')
FFdriver = webdriver.Firefox(firefox_profile=FFprofile)
FFdriver.get('http://www.google.com')
I can change various settings this way, but it doesn't work for this one. Where does the changed value false "user-defined" come from? Is it an automatic setting of selenium somewhere? I'm using:
geckodriver 0.16.1
selenium 3.4.2.
Firefox 53.0.3 (64bit)
python 3.4.4
Edit: I just found this question on SO, dealing with the same problem in java.
If this turns out to be impossible, probably there is a nice work-around? Any ideas?
fp = webdriver.FirefoxProfile()
fp.DEFAULT_PREFERENCES['frozen']["dom.disable_open_during_load"] = True
Don't use profile.set_preference('dom.disable_open_during_load', True) as profile.default_preference will be overrided by frozen's.
profile.set_preference('dom.disable_open_during_load', True)
is the correct way to do it, but it won't work for this particular property as it's not allowed to change by user according to the following article. The same thing would work for other parameters.
i.e.
profile.set_preference('browser.download.manager.showWhenStarting', False)
https://www.stigviewer.com/stig/mozilla_firefox/2015-06-30/finding/V-19743
Solution:
create a new profile and directly modify this setting in JS file. and then provide path of this local profile. I have not tested this solution so not sure if it will work or not.
This particular setting seems to be difficult for some reason...
Although I wasn't able to find a solution, I got inspired by this webpage and found a decent work-around using Firefox' developer toolbar:
ActionChains(self.FFdriver).key_down(Keys.SHIFT).send_keys(Keys.F2).key_up(Keys.SHIFT).perform()
time.sleep(0.1) // this seems to be necessary
ActionChains(self.FFdriver).send_keys('pref set dom.disable_open_during_load true').perform()
ActionChains(self.FFdriver).send_keys(Keys.ENTER).perform()
ActionChains(self.FFdriver).key_down(Keys.SHIFT).send_keys(Keys.F2).key_up(Keys.SHIFT).perform()
If anyone should know or find a better way, please comment!

Selenium in Python to download file: even after setting Firefox Profile the Download Window opens

I am trying to use Selenium in Python to download a file from a website. In order to do that, I have read that I need to change the settings in my Firefox Profile to avoid opening the download dialogue window. I provided sample code below. This code works absolutely great at home, but it does not function properly with my work PC. I am suspecting that somehow Python can not change the settings of the firefox profile, even though the code below does not throw an error but rather works fine and in the end opens the download dialogue window.
from selenium import webdriver
import os
profile = webdriver.FirefoxProfile("C:\\Users\\Ric\\Documents\\Python Scripts\\FirefoxProfileCopies\\ric.copy")
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', os.getcwd())
profile.set_preference('browser.helperApps.neverAsk.saveToDisk',('application/vnd.ms-excel'))
browser = webdriver.Firefox(profile)
browser.get("http://www.sample-videos.com/download-sample-xls.php")
elem1 = browser.find_element_by_css_selector(".push-form > table:nth-child(2) > tbody:nth-child(2) > tr:nth-child(4) > td:nth-child(4) > a:nth-child(1)")
elem1.click()
This code works perfectly with my Firefox and its profile at home, but not with my computer at work. Does anybody know why this might be? Thank you in advance.
EDIT
I tried to add all the MIMEtypes from the Microsoft webpage, but still, the download manager window opens. When stopping the code to execute before opening the download link and trying to look at the settings for the used firefox profile with about:configthe following values are displayed:
So, after a lot of trying, I figured to look at the firefox settings in Firefox again, since it worked with an empty profile. I managed to resolve my issue and finally have the download window disappear by going to firefox, settings and changing the settings for applications:
Then, when opening this menu, search for excel and change the values from "asking every time" to "save file/download file". Sorry if these entries in the list differ from the actual ones in firefox but my Firefox is in German. After doing this, my issue was resolved. I hope it resolves somebody else :) and thanks to anderson.

PhantomJS loads much less HTML than other drivers

I'm trying to load one web page and get some elements from it. So the first thing I do is to check the page using "inspect element". When I search for the tags I'm looking for, I can see them (in Chrome).
But when I try to do driver.get(url) and then driver.find_element_by_..., it doesn't find those elements because they aren't in the source code.
I think that it is probably because it doesn't load the whole page but only a part.
Here is an example:
I'm trying to find ads on the web page.
PREPARED_TABOOLA_BLOCK = """//div[contains(#id,'taboola') and not(ancestor::div[contains(#id,'taboola')])]"""
driver = webdriver.PhantomJS(service_args=["--load-images=false"])
# driver = webdriver.Chrome()
driver.maximize_window()
def find_taboola_blocks_selenium(url):
driver.get(url)
taboola_blocks = driver.find_elements_by_xpath(PREPARED_TABOOLA_BLOCK)
return taboola_blocks
print len(find_taboola_blocks_selenium('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html'))
driver.get('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html')
print len(driver.page_source)
OUTPUTS:
Using PhantomJS:
0
85103
Using ChromeDriver:
3
420869
Do you know how to make PhantomJS to load as much Html as possible or any other way to solve this?
Can you compare the request that ChromeDriver is making versus the request you are making in PhantomJS? Since you are only doing GET for the specified url, you may not be including other request parameters that are needed to get the advertisements.
The open() method may give you a better representation of what you are looking for here: http://phantomjs.org/api/webpage/method/open.html
The reason for this is because PhantomJS, by default, renders in a really small window, which makes it load the mobile version of the site. And with the PhantomJSDriver, calling maximizeWindow() (or maximize_window() in python) does absolutely nothing, since there is no rendered window to maximize. You will have to explicitly set the window's render size with:
edit: Below is the Java solution. I'm not entirely sure what the Python solution would be when setting the window size, but it should be similar.
driver.manage().window().setSize(new Dimension(1920, 1200));
edit again: Found the python version:
driver.set_window_size(1920, 1200)
Hope that helps!
PhantomJS 1.x is a really old browser. It only uses SSLv3 (now disabled on most sites) by default and doesn't implement most cutting edge functionality.
Advertisement scripts are usually delivered over HTTPS (SSLv3/TLS) and usually use some obscure feature of JavaScript which is not well tested or simply not implemented in PhantomJS.
If you use PhantomJS < v1.9.8 then you should use those commandline options (service_args): --ignore-ssl-errors=true --ssl-protocol=any.
If iframes or strange cross-domain requests are necessary for the page/ads to work, then add --web-security=false to the service_args.
If this still doesn't solve the problem, then try upgrading to PhantomJS 2.0.0. You might need to compile it yourself on Linux.

Follow a link with Ghost.py

I'm trying to use Ghost.py to do some web scraping. I'm trying to follow a link but the Ghost doesn't seem to actually evaluate the javascript and follow the link. My problem is that i'm in an HTTPS session and cannot use redirection. I've also looked at other options (like selenium) but I cannot install a browser on the machine that will run the script. I also have some javascript evaluation further so I cannot use mechanize.
Here's what I do...
## Open the website
page,resources = ghost.open('https://my.url.com/')
## Fill textboxes of the form (the form didn't have a name)
result, resources = ghost.set_field_value("input[name=UserName]", "myUser")
result, resources = ghost.set_field_value("input[name=Password]", "myPass")
## Submitting the form
result, resources = ghost.evaluate( "document.getElementsByClassName('loginform')[0].submit();", expect_loading=True)
## Print the link to make sure that's the one I want to follow
#result, resources = ghost.evaluate( "document.links[4].href")
## Click the link
result, resources = ghost.evaluate( "document.links[4].click()")
#print ghost.content
When I look at ghost.content, I'm still on the same page and result is empty. I noticed that when I add expect_loading=True when trying to evaluate the click, I get a timeout error.
When I try the to run the javascript in a Chrome Developper Tools console, I get
event.returnValue is deprecated. Please use the standard
event.preventDefault() instead.
but the page does load up the linked url correctly.
Any ideas are welcome.
Charles
I think you are using the wrong methods for that.
If you want to submit the form there's a special method for that:
page, resources = ghost.fire_on("loginform", "submit", expect_loading=True)
Also there's a special ghost.py method for performing a click:
ghost.click('#some-selector')
Another possibilty, if you just want to open that link could be:
link_url = ghost.evaluate("document.links[4]")[0]
ghost.open(link_url)
You only have to find the right selectors for that.
I don't know on which page you want to perform the task, thus I can't fix your code. But I hope this will help you.

Do not want the Images to load and CSS to render on Firefox in Selenium WebDriver - Python

I am using Selenium 2 with python bindings to fetch some data from our partner's site. But on an average it's taking me around 13 secs to perform this operation.
I was looking for a way to disable the images css and flash etc.
I am using Firefox 3.6 and also using pyvirtualdisplay to to prevent opening of firefox window. Any other optimization to speed up firefox will be also helpful.
I have already tried network.http.* options but does not help much.
And also set the permissions.default.image = 2
I have figured out a way to prevent Firefox from loading CSS, images and Flash.
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
def disableImages(self):
## get the Firefox profile object
firefoxProfile = FirefoxProfile()
## Disable CSS
firefoxProfile.set_preference('permissions.default.stylesheet', 2)
## Disable images
firefoxProfile.set_preference('permissions.default.image', 2)
## Disable Flash
firefoxProfile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so',
'false')
## Set the modified profile while creating the browser object
self.browserHandle = webdriver.Firefox(firefoxProfile)
Thanks again #Simon and #ernie for your suggestions.
New Edit
It has been so long since I've written this and I can say the field of web automation (either for testing or crawling/scraping purposes) has changed a lot. The major browsers have already presented a --headless flag and even interactive shell. No more change the good old DISPLAY variable on Linux.
Firefox has also changed, migrating to Servo engine written with Rust. I've tried the profile below with a contemporary version (specifically, 62.0). Some worked, some did not. Keep that in mind.
I'm just extending the answer of kyrenia in this question. However, disabling the CSS might cause Jquery not to be able to manipulate DOM elements. Use QuickJava and those below:
profile.set_preference("network.http.pipelining", True)
profile.set_preference("network.http.proxy.pipelining", True)
profile.set_preference("network.http.pipelining.maxrequests", 8)
profile.set_preference("content.notify.interval", 500000)
profile.set_preference("content.notify.ontimer", True)
profile.set_preference("content.switch.threshold", 250000)
profile.set_preference("browser.cache.memory.capacity", 65536) # Increase the cache capacity.
profile.set_preference("browser.startup.homepage", "about:blank")
profile.set_preference("reader.parse-on-load.enabled", False) # Disable reader, we won't need that.
profile.set_preference("browser.pocket.enabled", False) # Duck pocket too!
profile.set_preference("loop.enabled", False)
profile.set_preference("browser.chrome.toolbar_style", 1) # Text on Toolbar instead of icons
profile.set_preference("browser.display.show_image_placeholders", False) # Don't show thumbnails on not loaded images.
profile.set_preference("browser.display.use_document_colors", False) # Don't show document colors.
profile.set_preference("browser.display.use_document_fonts", 0) # Don't load document fonts.
profile.set_preference("browser.display.use_system_colors", True) # Use system colors.
profile.set_preference("browser.formfill.enable", False) # Autofill on forms disabled.
profile.set_preference("browser.helperApps.deleteTempFileOnExit", True) # Delete temprorary files.
profile.set_preference("browser.shell.checkDefaultBrowser", False)
profile.set_preference("browser.startup.homepage", "about:blank")
profile.set_preference("browser.startup.page", 0) # blank
profile.set_preference("browser.tabs.forceHide", True) # Disable tabs, We won't need that.
profile.set_preference("browser.urlbar.autoFill", False) # Disable autofill on URL bar.
profile.set_preference("browser.urlbar.autocomplete.enabled", False) # Disable autocomplete on URL bar.
profile.set_preference("browser.urlbar.showPopup", False) # Disable list of URLs when typing on URL bar.
profile.set_preference("browser.urlbar.showSearch", False) # Disable search bar.
profile.set_preference("extensions.checkCompatibility", False) # Addon update disabled
profile.set_preference("extensions.checkUpdateSecurity", False)
profile.set_preference("extensions.update.autoUpdateEnabled", False)
profile.set_preference("extensions.update.enabled", False)
profile.set_preference("general.startup.browser", False)
profile.set_preference("plugin.default_plugin_disabled", False)
profile.set_preference("permissions.default.image", 2) # Image load disabled again
What does it do? You can actually see what it does in comment lines. However, I've also found a couple of about:config entries to increase the performance. For example, the code above does not load the font or colors of the document, but it loads CSS, so Jquery -or any other library- can manipulate DOM elements and does not raise an error. (For a further debug, you still download CSS, but your browser will jump the lines which contains a special font-family or color definition. So browser will download and load CSS, but use system-defaults in styling and renders the page faster.)
For more information, check out this article.
Edit (Tests)
I just made a performance test. You do not really need to take the results serious since I made this test just once, for you to have an idea.
I made the test in an old machine on 2.2 gHZ Intel Pentium processor, 3 gB RAM with 4gB swap area, Ubuntu 14.04 x64 system.
The test takes three steps:
Driver Loading Performance: The seconds wasted to load the driver in webdriver module.
Page Loading Performance: The seconds wasted to load the page. It also includes the internet speed, however the render process is included as well.
DOM Inspecting Performance: DOM inspecting speed on the page.
I used this page as subject and inspected .xxy a as CSS selector. Then I used a special process one by one.
Selenium, Firefox, No Profile
Driver Loading Performance: 13.124099016189575
Page Loading Performance: 3.2673521041870117
DOM Inspecting Performance: 67.82778096199036
Selenium, Firefox, Profile Above
Driver Loading Performance: 7.535895824432373
Page Loading Performance: 2.9704301357269287
DOM Inspecting Performance: 64.25136017799377
Edit (About Headlessness)
I made a test maybe a month ago, but I could not take the results. However, I want to mention that driver loading, page loading and DOM inspecting speed decreases under ten seconds when Firefox is used headless. That was really cool.
Unfortunately the option firefox_profile.set_preference('permissions.default.image', 2) no longer seems to work to disable images with the latest version of Firefox - [for reason see Alecxe's answer to my question Can't turn off images in Selenium / Firefox ]
The best solution i had was to use the firefox extension quickjava , which amongst other things can disable images- https://addons.mozilla.org/en-us/firefox/addon/quickjava/
My Python code:
from selenium import webdriver
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.add_extension(folder_xpi_file_saved_in + "\\quickjava-2.0.6-fx.xpi")
firefox_profile.set_preference("thatoneguydotnet.QuickJava.curVersion", "2.0.6.1") ## Prevents loading the 'thank you for installing screen'
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.Images", 2) ## Turns images off
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.AnimatedImage", 2) ## Turns animated images off
driver = webdriver.Firefox(firefox_profile)
driver.get(web_address_desired)
Disabling CSS (and i think flash) still work with firefox propertiees. but they and other parts can also be switched off by adding the lines:
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.CSS", 2) ## CSS
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.Cookies", 2) ## Cookies
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.Flash", 2) ## Flash
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.Java", 2) ## Java
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.JavaScript", 2) ## JavaScript
firefox_profile.set_preference("thatoneguydotnet.QuickJava.startupStatus.Silverlight", 2)
You can disable images/css using the Web Developer toolbar Addon.
https://addons.mozilla.org/en-US/firefox/addon/web-developer/
go to CSS->Disable and Images->Disable
For everyone interested in still using the original straight-forward approach suggested by Anupam:
Just install firefox version 20.0.1 (https://ftp.mozilla.org/pub/firefox/releases/20.0.1/) - works perfectly fine.
Other versions may work as well (versions 32 and higher and versions 3.6.9 and lower do NOT work)
Tossing in my 2ยข.
Better to use javascript snippets to accomplish.
driver.execute_script(
'document.querySelectorAll("img").forEach(function(ev){ev.remove()});'
);
That will remove the img elements. If you do this right after you load the page, they will have little chance to download image data.
Here is a similar solution I found elsewhere on StackOverflow. (Can't find it anymore)
driver.execute_script(
"document.head.parentNode.removeChild(document.head)"
);

Categories

Resources