I'm testing a site with lots of proxies, and the problem is some of those proxies are awfully slow. Therefore my code is stuck at loading pages every now and then.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://example.com/example-page.php")
element = browser.find_element_by_id("someElement")
I've tried lots of stuff like explicit waits or implicit waits and been searching around for quite a while but still not yet found a solution or workaround. Nothing seems to really affect page loading line browser.get("http://example.com/example-page.php"), and that's why it's always stuck there.
Anybody got a solution for this?
Update 1:
JimEvans' answer solved my previous problem, and here you can find python patch for this new feature.
New problem:
browser = webdriver.Firefox()
browser.set_page_load_timeout(30)
browser.get("http://example.com/example-page.php")
element = browser.find_element_by_id("elementA")
element.click() ## assume it's a link to a new page http://example.com/another-example.php
another_element = browser.find_element_by_id("another_element")
As you can see browser.set_page_load_timeout(30) only affects browser.get("http://example.com/example-page.php") which means if this page loads for over 30 seconds it will throw out a timeout exception, but the problem is that it has no power over page loading such as element.click(), although it does not block till the new page entirely loads up, another_element = browser.find_element_by_id("another_element") is the new pain in the ass, because either explicit waits or implicit waits would wait for the whole page to load up before it starts to look for that element. In some extreme cases this would take even HOURS. What can I do about it?
You could try using the page load timeout introduced in the library. The implementation of it is not universal, but it's exposed for certain by the .NET and Java bindings, and has been implemented in and the Firefox driver now, and in the IE driver in the forthcoming 2.22. In Java, to set the page load timeout to 15 seconds, the code to set it would look like this:
driver.manage().timeouts().pageLoadTimeout(15, TimeUnit.SECONDS);
If it's not exposed in the Python language bindings, I'm sure the maintainer would eagerly accept a patch that implemented it.
You can still speedup your script execution by waiting for presence (not waiting for visibility) of expected element for say 5-8 sec and then sending window.stop() JS Script (to stop loading further elements ) without waiting for entire page to load or catching the timeout exception for page load after 5-8 seconds then calling window.stop()
Because if the page not adopted lazy loading technique (loading only visible element and loading rest of element only after scroll) it loads each element before returning window.ready state so it will be slower if any of the element takes longer time to render.
Related
I am using selenium with Firefox to automate some tasks on Instagram. It basically goes back and forth between user profiles and notifications page and does tasks based on what it finds.
It has one infinite loop that makes sure that the task keeps on going. I have sleep() function every few steps but the memory usage keeps increasing. I have something like this in Python:
while(True):
expected_conditions()
...doTask()
driver.back()
expected_conditions()
...doAnotherTask()
driver.forward()
expected_conditions()
I never close the driver because that will slow down the program by a lot as it has a lot of queries to process. Is there any way to keep the memory usage from increasing overtime without closing or quitting the driver?
EDIT: Added explicit conditions but that did not help either. I am using headless mode of Firefox.
Well, This the serious problem I've been going through for some days. But I have found the solution. You can add some flags to optimize your memory usage.
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument('--no-sandbox')
options.add_argument('--disable-application-cache')
options.add_argument('--disable-gpu')
options.add_argument("--disable-dev-shm-usage")
These are the flags I added. Before I added the flags RAM usage kept increasing after it crosses 4GB (8GB my machine) my machine stuck. after I added these flags memory usage didn't cross 500MB. And as DebanjanB answers, if you running for loop or while loop tries to put some seconds sleep after each execution it will give some time to kill the unused thread.
To start with Selenium have very little control over the amount of RAM used by Firefox. As you mentioned the Browser Client i.e. Mozilla goes back and forth between user profiles and notifications page on Instagram and does tasks based on what it finds is too broad as a single usecase. So, the first and foremost task would be to break up the infinite loop pertaining to your usecase into smaller Tests.
time.sleep()
Inducing time.sleep() virtually puts a blanket over the underlying issue. However while using Selenium and WebDriver to execute tests through your Automation Framework, using time.sleep() without any specific condition defeats the purpose of automation and should be avoided at any cost. As per the documentation:
time.sleep(secs) suspends the execution of the current thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.
You can find a detailed discussion in How to sleep webdriver in python for milliseconds
Analysis
There were previous instances when Firefox consumed about 80% of the RAM.
However as per this discussion some of the users feels that the more memory is used the better because it means you don't have RAM wasted. Firefox uses RAM to make its processes faster since application data is transferred much faster in RAM.
Solution
You can implement either/all of the generic/specific steps as follows:
Upgrade Selenium to current levels Version 3.141.59.
Upgrade GeckoDriver to GeckoDriver v0.24.0 level.
Upgrade Firefox version to Firefox v65.0.2 levels.
Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
If your base Web Client version is too old, then uninstall it and install a recent GA and released version of Web Client.
Some extensions allow you to block such unnecessary content, as an example:
uBlock Origin allows you to hide ads on websites.
NoScript allows you to selectively enable and disable all scripts running on websites.
To open the Firefox client with an extension you can download the extension i.e. the XPI file from https://addons.mozilla.org and use the add_extension(extension='webdriver.xpi') method to add the extension in a FirefoxProfile as follows:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.add_extension(extension='extension_name.xpi')
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'C:\path\to\geckodriver.exe')
If your Tests doesn't requires the CSS you can disable the CSS following the this discussion.
Use Explicit Waits or Implicit Waits.
Use driver.quit() to close all
the browser windows and terminate the WebDriver session because if
you do not use quit() at the end of the program, the WebDriver
session will not be closed properly and the files will not be cleared
off memory. And this may result in memory leak errors.
Creating new firefox profile and use it every time while running test cases in Firefox shall eventually increase the performance of execution as without doing so always new profile would be created and caching information would be done there and if driver.quit does not get called somehow before failure then in this case, every time we end up having new profiles created with some cached information which would be consuming memory.
// ------------ Creating a new firefox profile -------------------
1. If Firefox is open, close Firefox.
2. Press Windows +R on the keyboard. A Run dialog will open.
3. In the Run dialog box, type in firefox.exe -P
Note: You can use -P or -ProfileManager(either one should work).
4. Click OK.
5. Create a new profile and sets its location to the RAM Drive.
// ----------- Associating Firefox profile -------------------
ProfilesIni profile = new ProfilesIni();
FirefoxProfile myprofile = profile.getProfile("automation_profile");
WebDriver driver = new FirefoxDriver(myprofile);
Please share execution performance with community if you plan to implement this way.
There is no fix for that as of now.
I suggest you use driver.close() approach.
I was also struggling with the RAM issue and what i did was i counted the number of loops and when the loop count reached to a certain number( for me it was 200) i called driver.close() and then start the driver back again and also reset the count.
This way i did not need to close the driver every time the loop is executed and has less effect on the performance too.
Try this. Maybe it will help in your case too.
In python(Selenium)
driver = webdriver.Chrome()
driver.get("https://www.baidu.com")
for keywords in open('klist','r'):
driver.get("https://www.baidu.com")
driver.find_element_by_class_name('...').click()
....
Although the whole page appears, it just hangs and keeps loading. So a lot of time is wasted.
Not every time it freezes. But once it freezes, it can hang for several minutes before the next step.
I guess it hangs because some resource loads slowly. You can emulate such behavior manually by setting low bandwidth speed in network tab in developer tools (chrome).
In order to find what exact resource causing the problem in case if it's not reproducible by hands you can use proxy like Fiddler, Browsermob or whatever your favorite proxy is.
I am using python selenium to parse large amount of data from more than 10,000+ urls. The browser is Firefox.
For each url, a Firefox browser will be opened and after data parsing, it will be closed, and wait 5 seconds before opening the next url through Firefox.
However, it happened twice these days, everything was running great, all of a sudden, the newly opened browser is blank, it is not loading the url at all. In real life experience, sometimes, even when I manually open a browser, searching for something, it is blank too.
The problem is, when this happened, there is no error at all, even when I wrote the except code to catch any exception, meanwhile I'm using nohup command to run the code, it will record any exception too, but there is no error at all. And once this happened, the code won't be executed any more, and many urls are left there without being parsed.... If I re-run the code on the rest urls, it works fine again.
Here is my code (all the 10,000+ urls are in comment_urls list):
for comment_url in comment_urls:
driver = webdriver.Firefox(executable_path='/Users/devadmin/Documents/geckodriver')
driver.get(comment_url)
time.sleep(5)
try:
// here is my data parsing code .....
driver.quit() // the browser will be closed when the data has been parsed
time.sleep(5) // and wait 5 secods
except:
with open(error_comment_reactions, 'a') as error_output:
error_output.write(comment_url+"\n")
driver.quit()
time.sleep(5)
At the same time, in that data parsing part, if there will be any exception, my code will also record the exception and close the driver, wait 5 seconds. But so far, no error recorded at all.
I tried to find similar problems and solutions online, but those are not helpful.
So, currently, I have 2 questions in mind:
Have you met this problem before and do you know how to deal with it? It is network problem or selenium problem or browser problem?
Or is there anyway in python, that it can tell the browser is not loading the url and it will close it?
For second problem, Prefer to use work queue to parse urls. One app should add all of them to queue (redis, rabbit-mq, amazon sqs and etc.) and then Second app should get 1 url from queue and try to parse it. In case if it will succeed, it should delete url from queue and switch to other url in queue. In case of exception it should os.exit(1) to stop app. Use shell to run second app, when it will return 1, meaning that error occurred, restart the app. Shell script: Get exit(1) from Python in shell
To answer your 2 questions:
1) Yes I have found selenium to be unpredictable at times. This is usually a problem when opening a browser for the first time which I will talk about in my solution. Try not to close the browser unless you need to.
2) Yes you can use the WebDriverWait() class in selenium.webdriver.support.wait
You said you are parsing thousands of comments so just make a new get request with the webdriver you have open.
I use this in my own scraper with the below code:
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get("http://someurl.com")
table = WebDriverWait(browser,60).until(EC.presence_of_element_located((By.TAG_NAME, "table")))`
The variable browser is just webdriver.Firefox() class.
It is a bit long but what it does is wait for a specific html tag to exist on the page with a timeout of 60 seconds.
It is possible that you are experiencing your own time.sleep() locking the thread up as well. Try not to use sleeps to compensate for things like this.
Below is my script, it works fine, but not to my requirement
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://somewebsite.com/')
#nextline of script
In the above example, it opens the browser and immedtly moves to next step.
i want the script to wait till i close the browser manually and move to next step
( as i want to login and download few files from server and move to next step)
I agree with alecxe that you generally should automate the whole process. However, there are cases where you may be writing "throwaway code" or a proof-of-concept where it might be advantageous to have manual control of part of the process. If I found myself in such a situation, I'd do something like this:
import time
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://google.com/')
try:
while True:
# This will fail when the browser is closed.
browser.execute_script("")
time.sleep(0.2)
# Setting such a wide exception handler is generally not advisable but
# I'm not convinced there is a definite set of exceptions that
# Selenium will stick to if it cannot contact the browser. And I'm not
# convinced the set cannot change from release to release.
except:
has_quit = False
while not has_quit:
try:
# This is to allow Selenium to run cleanup code.
browser.quit()
has_quit = True
except: # See comment above regarding such wide handlers...
pass
# Continue with the script...
print "Whatever"
The call to browser.quit() is so that Selenium can cleanup after itself. It is very important for Firefox in particular because Selenium will create a bunch of temporary files which can fill up /tmp (on a Unix-type system, I don't know where Selenium puts the files on a Windows system) over time. In theory Selenium should be able to handle gracefully the case where the browser no longer exists by the time browser.quit() is called but I've found cases where an internal exception was not caught and browser.quit() would fail right away. (By the way, this supports my comment about the set of exceptions that Selenium can raise if the browser is dead being unclear: even Selenium does not know what exceptions Selenium can raise, which is why browser.quit() sometimes fails.) Repeating the call until it is successful seems to work.
Note that browser becomes effectively unusable as soon as you close the browser. You'll have to spawn a new browser if you wish to do more browserly things.
Also, it is not generally possible to distinguish between the user closing the browser and a browser crash.
If the page is not fully loaded, you can always wait for a specific element on the page to show up, for example, your download button.
Or you can wait for all JavaScript to load.
wait.until( new Predicate<WebDriver>() {
public boolean apply(WebDriver driver) {
return ((JavascriptExecutor)driver).executeScript("return document.readyState").equals("complete");
}
}
);
I'm using PhantomJS as a webdriver to load some urls. Usually, the program runs fine. However, it hangs on driver.get(url) a lot, and i'm wondering if there is anything I can do about it?
driver = webdriver.PhantomJS(executable_path= path_to_phantomjs_exe, service_log_path= path_to_ghostdriver_log)
driver.get(url)
It will just hang trying to load a certain url forever. But if i try it again, it might work. Are webdrivers/phantomJS really just that unstable? I guess last resort would be to constantly call driver.get(url) until it finally loads, but is that really going to be necessary? Thanks!
EDIT: It seems to only hang when loading the first link out of a list of them. It eventually does load, however, but after a few minutes. The rest of the links load within seconds. Any help at all would be great.
I've answered this exact problem on this post here: Geb/Selenium tests hang loading new page but copied it here because I see that this question is older.
I hope you can find a way to implement this into your code, but this is what worked for me when I was having a similar situation with PhantomJS hanging.
I traced it to be hanging on a driver.get() call, which for me was saying something wasn't going through or the webdriver just wasn't - for some reason - giving the load successful command back to the driver, allowing the script to continue.
So, I added the following:
driver = webdriver.PhantomJS()
# set timeout information
driver.set_page_load_timeout(15)
I've tested this at a time of 5 (seconds) and it just didn't wait long enough and nothing would happen. 15 seconds worked great for me, but that's maybe something you should test.
On top of this, I also created a loop whenever there was an option for the webdriver to timeout, so that the driver.get() could attempt to re-send the .get() command. Implementing a try / except stacked scenario, I was able to approach this:
while finished == 0:
try:
driver.get(url3)
finished = 1
except:
sleep(5)
I have seen an except handle as:
except TimeoutException as e:
#Handle your exception here
print(e)
but I had no use for this. It might be nice to know how to catch specific exceptions, though.
See this solution for more options for a timeout: Setting timeout on selenium webdriver.PhantomJS
So I was having the same problem:
driver = webdriver.PhantomJS(executable_path= path_to_phantomjs_exe, service_log_path= path_to_ghostdriver_log)
driver.get(url)
So I changed the service_log_path to:
service_log_path=os.path.devnull
This seemed to work for me!!!