I'm using PhantomJS as a webdriver to load some urls. Usually, the program runs fine. However, it hangs on driver.get(url) a lot, and i'm wondering if there is anything I can do about it?
driver = webdriver.PhantomJS(executable_path= path_to_phantomjs_exe, service_log_path= path_to_ghostdriver_log)
driver.get(url)
It will just hang trying to load a certain url forever. But if i try it again, it might work. Are webdrivers/phantomJS really just that unstable? I guess last resort would be to constantly call driver.get(url) until it finally loads, but is that really going to be necessary? Thanks!
EDIT: It seems to only hang when loading the first link out of a list of them. It eventually does load, however, but after a few minutes. The rest of the links load within seconds. Any help at all would be great.
I've answered this exact problem on this post here: Geb/Selenium tests hang loading new page but copied it here because I see that this question is older.
I hope you can find a way to implement this into your code, but this is what worked for me when I was having a similar situation with PhantomJS hanging.
I traced it to be hanging on a driver.get() call, which for me was saying something wasn't going through or the webdriver just wasn't - for some reason - giving the load successful command back to the driver, allowing the script to continue.
So, I added the following:
driver = webdriver.PhantomJS()
# set timeout information
driver.set_page_load_timeout(15)
I've tested this at a time of 5 (seconds) and it just didn't wait long enough and nothing would happen. 15 seconds worked great for me, but that's maybe something you should test.
On top of this, I also created a loop whenever there was an option for the webdriver to timeout, so that the driver.get() could attempt to re-send the .get() command. Implementing a try / except stacked scenario, I was able to approach this:
while finished == 0:
try:
driver.get(url3)
finished = 1
except:
sleep(5)
I have seen an except handle as:
except TimeoutException as e:
#Handle your exception here
print(e)
but I had no use for this. It might be nice to know how to catch specific exceptions, though.
See this solution for more options for a timeout: Setting timeout on selenium webdriver.PhantomJS
So I was having the same problem:
driver = webdriver.PhantomJS(executable_path= path_to_phantomjs_exe, service_log_path= path_to_ghostdriver_log)
driver.get(url)
So I changed the service_log_path to:
service_log_path=os.path.devnull
This seemed to work for me!!!
Related
The following code works for somewhere from 0 to 40 iterations but eventually stalls on browser.get(url) even though the timeout parameter option is set.
browser = webdriver.Chrome(chrome_options = options)
browser.set_page_load_timeout(5)
for url in links:
try:
browser.get(url)
except TimeoutException:
print("Webpage loading cut off")
The website queried is onvista.de, which updates data dynamically. That however shouldn't be handled by the page_load_timeout option.
I've tried working around it by setting up a thread before calling the get method and sending the escape key from that thread after 5 seconds; that however failed because a second thread can't access the webdriver at the same time that the first thread is using it.
I have really no idea what the issue could be at this point, so thanks for every answer!
I'll post this as an answer, in case anyone has the same problem.
For some reason, I had Selenium 2.56 installed, updating via pip -U selenium to version 3 solved the issue.
I have a very complex py.test python-selenium test setup where I create a Firefox webdriver inside a py.test fixture. Here is some idea of what I am doing:
'driver.py':
class Driver(object):
"""
Driver class with basic wrappers around the selenium webdriver
and other convenience methods.
"""
def __init__(self, config, options):
"""Sets the driver and the config.
"""
self.remote = options.getoption("--remote")
self.headless = not options.getoption("--with-head")
if self.headless:
self.display = Display(visible=0, size=(13660, 7680))
self.display.start()
# Start the selenium webdriver
self.webdriver = fixefox_module.get_driver()
'conftest.py':
#pytest.fixture
def basedriver(config, options):
driver = driver.Driver(config, options)
yield driver
print("Debug 1")
driver.webdriver.quit()
print("Debug 2")
And when running the test I can only see Debug 1 printed out. The whole process stops at this point and does not seem to proceed. The whole selenium test is stuck at the webdriver.quit).
The tests, however, completed successfully...
What reasons could be for that behavior?
Addendum:
The reason why the execution hangs seems to be a popup that asks the user if he wants to leave the page because of unsaved data. That means that the documentation for the quit method is incorrect. It states:
Quits the driver and close every associated window.
This is a non-trivial problem, to which selenium acts really a inconsistent. The quit method should, as documented, just close the browser window(s) but it does not. Instead you get a popup asking the user if he wants to leave the page:
The nasty thing is that this popup appears only after the user called
driver.quit()
One way to fix this is to set the following profile for the driver
from selenium import webdriver
profile = webdriver.FirefoxProfile()
# other settings here
profile.set_preference("dom.disable_beforeunload", True)
driver = webdriver.Firefox(firefox_profile = profile)
The warning to close is true by default in firefox as you can see in about:config and you can disable them for your profile:
And since,
The reason why the execution hangs seems to be a popup that asks the
user if he wants to leave the page because of unsaved data.
You can set browser.tabs.warnOnClose in your Firefox configuration profile as follows:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.tabs.warnOnClose", False)
driver = webdriver.Firefox(firefox_profile = profile)
You can look at profile.DEFAULT_PREFERENCES which is the json at python/site-packages/selenium/webdriver/firefox/webdriver_prefs.json
As far as I understood, there are basically two questions asked which I will try to answer :
Why failure of driver.webdriver.quit() method call leaves the script in hang/unresponsive state instead of raising any exception ?
Why the testcase was still a pass if the script never completed it's execution cycle ?
For answering the first question I will try to explain the Selenium Architecture which will clear most of our doubts.
So how Selenium Webdriver Functions ?
Every statement or command you write using Selenium Client Library will be converted to JSON Wire Protocol over http which in turn will be passed to our browser drivers(chromedriver, geckodriver) . So basically these generated http URLs (based on REST architecture) reaches to browser drivers. Inside the browser drivers there are http servers which will internally pass the received URLs to Real Browser (as HTTP over HTTP Server) for which the appropriate response will be generated by Web Browser and sent back to Browser Drivers (as HTTP over HTTP Server) which in turn will use JSON Wire Protocol to send the response back to Selenium Client Library which will finally decide on how to proceed further based on response achieved. Please refer to attached image for more clarification :
Now coming back to the question where script is in hold we can simply conclude that our browser is still working on request received that's why no response is sent back to Browser Driver which in turn left Selenium Library quit() function in hold i.e. waiting for the request processing completion.
So there are variety of workarounds available which we can use, among which one is already explained by Alex. But I believe there is a much better way to handle such conditions, as browser Selenium could leave us in hold/freeze state for other cases too as per my experience so I personally prefer Thread Kill Approach with Timeout as Selenium Object always runs in main() thread. We can allocate a specific time to the main thread and can kill main() thread if timeout session time is reached.
Now moving to the second question which is :
Why the testcase was still a pass if the script never completed it's
execution cycle ?
Well I don't have much idea on how pytest works but I do have basic idea on how test engine operates based on which I will try to answer this one.
For starters it's not at all possible for any test case to pass until the full script run is completed. Again, if your test cases are passing there could be very few possible scenarios such as :
Your test methods never made use of method which leaves the whole execution in hang/freeze state.
You must have called the method inside test tear down environment (w.r.t [TestNG][4] test engine in Java : #AfterClass, #AfterTest, #AfterGroups, #AfterMethod, #AfterSuite) meaning your test execution is completed already. So this might be the reason for tests showing up as successful completion.
I am still not sure what proper cause is there for second reason. I will keep looking and update the post if came up with something.
#Alex : Can you update the question with better understanding i.e. your current test design which I can explore to find better explanation.
So I was able to reproduce your issue using below sample HTML file
<html>
<body>
Please enter a value for me: <input name="name" >
<script>
window.onbeforeunload = function(e) {
return 'Dialog text here.';
};
</script>
<h2>ask questions on exit</h2>
</body>
</html>
Then I ran a sample script which reproduces the hang
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://localhost:8090/index.html")
driver.find_element_by_name("name").send_keys("Tarun")
driver.quit()
This will hang selenium python indefinitely. Which is not a good thing as such. The issue is the window.onload and window.onbeforeunload are tough for Selenium to handle because of the lifecycle of when it happens. onload happens even before selenium has injected its own code to suppress alert and confirm for handling. I am pretty sure onbeforeunload also is not in reach of selenium.
So there are multiple ways to get around.
Change in app
Ask the devs not to use onload or onbeforeunload events. Will they listen? Not sure
Disable beforeunload in profile
This is what you have already posted in your answer
from selenium import webdriver
profile = webdriver.FirefoxProfile()
# other settings here
profile.set_preference("dom.disable_beforeunload", True)
driver = webdriver.Firefox(firefox_profile = profile)
Disable the events through code
try:
driver.execute_script("window.onunload = null; window.onbeforeunload=null")
finally:
pass
driver.quit()
This would work only if you don't have multiple tabs opened or the tab suppose to generate the popup is on focus. But is a good generic way to handle this situation
Not letting Selenium hang
Well the reason selenium hangs is that it send a request to the geckodriver which then sends it to the firefox and one of these just doesn't respond as they wait for user to close the dialog. But the problem is Selenium python driver doesn't set any timeout on this connection part.
To solve the problem it is as simple as adding below two lines of code
import socket
socket.setdefaulttimeout(10)
try:
driver.quit()
finally:
# Set to something higher you want
socket.setdefaulttimeout(60)
But the issue with this approach is that driver/browser will still not be closed. This is where you need even more robust approach to kill the browser as discussed in below answer
In Python, how to check if Selenium WebDriver has quit or not?
Code from above link for making answer complete
from selenium import webdriver
import psutil
driver = webdriver.Firefox()
driver.get("http://tarunlalwani.com")
driver_process = psutil.Process(driver.service.process.pid)
if driver_process.is_running():
print ("driver is running")
firefox_process = driver_process.children()
if firefox_process:
firefox_process = firefox_process[0]
if firefox_process.is_running():
print("Firefox is still running, we can quit")
driver.quit()
else:
print("Firefox is dead, can't quit. Let's kill the driver")
firefox_process.kill()
else:
print("driver has died")
The best way to guarantee you run your teardown code in pytest is to define a finalizer function and add it as a finalizer to that fixture. This guarantees that even if something fails before the yield command, you still get your teardown.
To avoid a popup hanging up your teardown, invest in some WebdriverWait.until commands that timeout whenever you want them to. Popup appears, test cannot proceed, times out, teardown is called.
For ChromeDriver users:
options = Options()
options.add_argument('no-sandbox')
driver.close()
driver.quit()
credits to...
https://bugs.chromium.org/p/chromedriver/issues/detail?id=1135
I am using python selenium to parse large amount of data from more than 10,000+ urls. The browser is Firefox.
For each url, a Firefox browser will be opened and after data parsing, it will be closed, and wait 5 seconds before opening the next url through Firefox.
However, it happened twice these days, everything was running great, all of a sudden, the newly opened browser is blank, it is not loading the url at all. In real life experience, sometimes, even when I manually open a browser, searching for something, it is blank too.
The problem is, when this happened, there is no error at all, even when I wrote the except code to catch any exception, meanwhile I'm using nohup command to run the code, it will record any exception too, but there is no error at all. And once this happened, the code won't be executed any more, and many urls are left there without being parsed.... If I re-run the code on the rest urls, it works fine again.
Here is my code (all the 10,000+ urls are in comment_urls list):
for comment_url in comment_urls:
driver = webdriver.Firefox(executable_path='/Users/devadmin/Documents/geckodriver')
driver.get(comment_url)
time.sleep(5)
try:
// here is my data parsing code .....
driver.quit() // the browser will be closed when the data has been parsed
time.sleep(5) // and wait 5 secods
except:
with open(error_comment_reactions, 'a') as error_output:
error_output.write(comment_url+"\n")
driver.quit()
time.sleep(5)
At the same time, in that data parsing part, if there will be any exception, my code will also record the exception and close the driver, wait 5 seconds. But so far, no error recorded at all.
I tried to find similar problems and solutions online, but those are not helpful.
So, currently, I have 2 questions in mind:
Have you met this problem before and do you know how to deal with it? It is network problem or selenium problem or browser problem?
Or is there anyway in python, that it can tell the browser is not loading the url and it will close it?
For second problem, Prefer to use work queue to parse urls. One app should add all of them to queue (redis, rabbit-mq, amazon sqs and etc.) and then Second app should get 1 url from queue and try to parse it. In case if it will succeed, it should delete url from queue and switch to other url in queue. In case of exception it should os.exit(1) to stop app. Use shell to run second app, when it will return 1, meaning that error occurred, restart the app. Shell script: Get exit(1) from Python in shell
To answer your 2 questions:
1) Yes I have found selenium to be unpredictable at times. This is usually a problem when opening a browser for the first time which I will talk about in my solution. Try not to close the browser unless you need to.
2) Yes you can use the WebDriverWait() class in selenium.webdriver.support.wait
You said you are parsing thousands of comments so just make a new get request with the webdriver you have open.
I use this in my own scraper with the below code:
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get("http://someurl.com")
table = WebDriverWait(browser,60).until(EC.presence_of_element_located((By.TAG_NAME, "table")))`
The variable browser is just webdriver.Firefox() class.
It is a bit long but what it does is wait for a specific html tag to exist on the page with a timeout of 60 seconds.
It is possible that you are experiencing your own time.sleep() locking the thread up as well. Try not to use sleeps to compensate for things like this.
Below is my script, it works fine, but not to my requirement
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://somewebsite.com/')
#nextline of script
In the above example, it opens the browser and immedtly moves to next step.
i want the script to wait till i close the browser manually and move to next step
( as i want to login and download few files from server and move to next step)
I agree with alecxe that you generally should automate the whole process. However, there are cases where you may be writing "throwaway code" or a proof-of-concept where it might be advantageous to have manual control of part of the process. If I found myself in such a situation, I'd do something like this:
import time
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://google.com/')
try:
while True:
# This will fail when the browser is closed.
browser.execute_script("")
time.sleep(0.2)
# Setting such a wide exception handler is generally not advisable but
# I'm not convinced there is a definite set of exceptions that
# Selenium will stick to if it cannot contact the browser. And I'm not
# convinced the set cannot change from release to release.
except:
has_quit = False
while not has_quit:
try:
# This is to allow Selenium to run cleanup code.
browser.quit()
has_quit = True
except: # See comment above regarding such wide handlers...
pass
# Continue with the script...
print "Whatever"
The call to browser.quit() is so that Selenium can cleanup after itself. It is very important for Firefox in particular because Selenium will create a bunch of temporary files which can fill up /tmp (on a Unix-type system, I don't know where Selenium puts the files on a Windows system) over time. In theory Selenium should be able to handle gracefully the case where the browser no longer exists by the time browser.quit() is called but I've found cases where an internal exception was not caught and browser.quit() would fail right away. (By the way, this supports my comment about the set of exceptions that Selenium can raise if the browser is dead being unclear: even Selenium does not know what exceptions Selenium can raise, which is why browser.quit() sometimes fails.) Repeating the call until it is successful seems to work.
Note that browser becomes effectively unusable as soon as you close the browser. You'll have to spawn a new browser if you wish to do more browserly things.
Also, it is not generally possible to distinguish between the user closing the browser and a browser crash.
If the page is not fully loaded, you can always wait for a specific element on the page to show up, for example, your download button.
Or you can wait for all JavaScript to load.
wait.until( new Predicate<WebDriver>() {
public boolean apply(WebDriver driver) {
return ((JavascriptExecutor)driver).executeScript("return document.readyState").equals("complete");
}
}
);
I'm testing a site with lots of proxies, and the problem is some of those proxies are awfully slow. Therefore my code is stuck at loading pages every now and then.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://example.com/example-page.php")
element = browser.find_element_by_id("someElement")
I've tried lots of stuff like explicit waits or implicit waits and been searching around for quite a while but still not yet found a solution or workaround. Nothing seems to really affect page loading line browser.get("http://example.com/example-page.php"), and that's why it's always stuck there.
Anybody got a solution for this?
Update 1:
JimEvans' answer solved my previous problem, and here you can find python patch for this new feature.
New problem:
browser = webdriver.Firefox()
browser.set_page_load_timeout(30)
browser.get("http://example.com/example-page.php")
element = browser.find_element_by_id("elementA")
element.click() ## assume it's a link to a new page http://example.com/another-example.php
another_element = browser.find_element_by_id("another_element")
As you can see browser.set_page_load_timeout(30) only affects browser.get("http://example.com/example-page.php") which means if this page loads for over 30 seconds it will throw out a timeout exception, but the problem is that it has no power over page loading such as element.click(), although it does not block till the new page entirely loads up, another_element = browser.find_element_by_id("another_element") is the new pain in the ass, because either explicit waits or implicit waits would wait for the whole page to load up before it starts to look for that element. In some extreme cases this would take even HOURS. What can I do about it?
You could try using the page load timeout introduced in the library. The implementation of it is not universal, but it's exposed for certain by the .NET and Java bindings, and has been implemented in and the Firefox driver now, and in the IE driver in the forthcoming 2.22. In Java, to set the page load timeout to 15 seconds, the code to set it would look like this:
driver.manage().timeouts().pageLoadTimeout(15, TimeUnit.SECONDS);
If it's not exposed in the Python language bindings, I'm sure the maintainer would eagerly accept a patch that implemented it.
You can still speedup your script execution by waiting for presence (not waiting for visibility) of expected element for say 5-8 sec and then sending window.stop() JS Script (to stop loading further elements ) without waiting for entire page to load or catching the timeout exception for page load after 5-8 seconds then calling window.stop()
Because if the page not adopted lazy loading technique (loading only visible element and loading rest of element only after scroll) it loads each element before returning window.ready state so it will be slower if any of the element takes longer time to render.