Starting webdriver faster Selenium Python

Starting webdriver faster Selenium Python - python

Is there a way to start the chromedriver faster in Selenium?
It's taking about 6 seconds for it to actually start and thats literally half the runtime of my code.

Check ChromeDriverService that will reduce the launching time. And you will see the diff when you have multiple tests running.

It can be done by using ChromeDriverService.
When we use ChromeDriver class, it starts/terminates the ChromeDriver server process which can waste a significant amount of time for large test suites whereas ChromeDriver instance is created per test.
or we can start the Chrome-driver server separately before running tests, and connect to it using the Remote WebDriver
please refer official documantation at
https://sites.google.com/a/chromium.org/chromedriver/getting-started

Related

Selenium using too much RAM with Firefox

I am using selenium with Firefox to automate some tasks on Instagram. It basically goes back and forth between user profiles and notifications page and does tasks based on what it finds.
It has one infinite loop that makes sure that the task keeps on going. I have sleep() function every few steps but the memory usage keeps increasing. I have something like this in Python:
while(True):
expected_conditions()
...doTask()
driver.back()
expected_conditions()
...doAnotherTask()
driver.forward()
expected_conditions()
I never close the driver because that will slow down the program by a lot as it has a lot of queries to process. Is there any way to keep the memory usage from increasing overtime without closing or quitting the driver?
EDIT: Added explicit conditions but that did not help either. I am using headless mode of Firefox.

Well, This the serious problem I've been going through for some days. But I have found the solution. You can add some flags to optimize your memory usage.
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument('--no-sandbox')
options.add_argument('--disable-application-cache')
options.add_argument('--disable-gpu')
options.add_argument("--disable-dev-shm-usage")
These are the flags I added. Before I added the flags RAM usage kept increasing after it crosses 4GB (8GB my machine) my machine stuck. after I added these flags memory usage didn't cross 500MB. And as DebanjanB answers, if you running for loop or while loop tries to put some seconds sleep after each execution it will give some time to kill the unused thread.

To start with Selenium have very little control over the amount of RAM used by Firefox. As you mentioned the Browser Client i.e. Mozilla goes back and forth between user profiles and notifications page on Instagram and does tasks based on what it finds is too broad as a single usecase. So, the first and foremost task would be to break up the infinite loop pertaining to your usecase into smaller Tests.
time.sleep()
Inducing time.sleep() virtually puts a blanket over the underlying issue. However while using Selenium and WebDriver to execute tests through your Automation Framework, using time.sleep() without any specific condition defeats the purpose of automation and should be avoided at any cost. As per the documentation:
time.sleep(secs) suspends the execution of the current thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system.
You can find a detailed discussion in How to sleep webdriver in python for milliseconds
Analysis
There were previous instances when Firefox consumed about 80% of the RAM.
However as per this discussion some of the users feels that the more memory is used the better because it means you don't have RAM wasted. Firefox uses RAM to make its processes faster since application data is transferred much faster in RAM.
Solution
You can implement either/all of the generic/specific steps as follows:
Upgrade Selenium to current levels Version 3.141.59.
Upgrade GeckoDriver to GeckoDriver v0.24.0 level.
Upgrade Firefox version to Firefox v65.0.2 levels.
Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
If your base Web Client version is too old, then uninstall it and install a recent GA and released version of Web Client.
Some extensions allow you to block such unnecessary content, as an example:
uBlock Origin allows you to hide ads on websites.
NoScript allows you to selectively enable and disable all scripts running on websites.
To open the Firefox client with an extension you can download the extension i.e. the XPI file from https://addons.mozilla.org and use the add_extension(extension='webdriver.xpi') method to add the extension in a FirefoxProfile as follows:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.add_extension(extension='extension_name.xpi')
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'C:\path\to\geckodriver.exe')
If your Tests doesn't requires the CSS you can disable the CSS following the this discussion.

Use Explicit Waits or Implicit Waits.
Use driver.quit() to close all
the browser windows and terminate the WebDriver session because if
you do not use quit() at the end of the program, the WebDriver
session will not be closed properly and the files will not be cleared
off memory. And this may result in memory leak errors.

Creating new firefox profile and use it every time while running test cases in Firefox shall eventually increase the performance of execution as without doing so always new profile would be created and caching information would be done there and if driver.quit does not get called somehow before failure then in this case, every time we end up having new profiles created with some cached information which would be consuming memory.
// ------------ Creating a new firefox profile -------------------
1. If Firefox is open, close Firefox.
2. Press Windows +R on the keyboard. A Run dialog will open.
3. In the Run dialog box, type in firefox.exe -P
Note: You can use -P or -ProfileManager(either one should work).
4. Click OK.
5. Create a new profile and sets its location to the RAM Drive.
// ----------- Associating Firefox profile -------------------
ProfilesIni profile = new ProfilesIni();
FirefoxProfile myprofile = profile.getProfile("automation_profile");
WebDriver driver = new FirefoxDriver(myprofile);
Please share execution performance with community if you plan to implement this way.

There is no fix for that as of now.
I suggest you use driver.close() approach.
I was also struggling with the RAM issue and what i did was i counted the number of loops and when the loop count reached to a certain number( for me it was 200) i called driver.close() and then start the driver back again and also reset the count.
This way i did not need to close the driver every time the loop is executed and has less effect on the performance too.
Try this. Maybe it will help in your case too.

Python, Selenium and Chromedriver - endless loop using find_element_by_id causes CPU problem

Good day to all! I've been experiencing this problem for a week now but I don't think I can solve it and I also do not see any solution based on articles online. Hopefully someone can help me here...
My scenario:
I need to monitor prices from 6 different tables in one page that changes almost every second. By end of day, I would close the browser (by pressing the X button) and terminate the script (by pressing Control+C) then run again in the morning and let it run through out the day. The script is written in python and is using selenium to read the prices. The browser I use is Chrome. My OS is Windows 2008 R2; Selenium version is 3.14.1
here is partial part of the code. It is just plainly reading the prices within the tables using find_elements_by_id inside an infinite loop with 1 second interval.
While True:
close1 = float(browser.find_element_by_id('bnaBox1').find_elements_by_id('lastprc1')[0].text.encode('ascii','ignore'))
close2 = float(browser.find_element_by_id('bnaBox2').find_elements_by_id('lastprc2')[0].text.encode('ascii','ignore'))
close3 = float(browser.find_element_by_id('bnaBox3').find_elements_by_id('lastprc3')[0].text.encode('ascii','ignore'))
close4 = float(browser.find_element_by_id('bnaBox4').find_elements_by_id('lastprc4')[0].text.encode('ascii','ignore'))
close5 = float(browser.find_element_by_id('bnaBox5').find_elements_by_id('lastprc5')[0].text.encode('ascii','ignore'))
close6 = float(browser.find_element_by_id('bnaBox6').find_elements_by_id('lastprc6')[0].text.encode('ascii','ignore'))
time.sleep(1)
...
During the first few minutes of the run, the scripts consumes minimal amount of CPU (approx 20~30 percent) but after few more minutes, consumption slowly shoots up to 100%! There is no other processes running in the machine than the script.
Troubleshooting I've done so far (they all did not solve my issue)
upgraded my chrome to latest version - v71 and chromerdriver 2.44
rolled back Chrome to previous versions (v62, v68, v69, v70)
rolled back Chromedriver version to 2.42 and 2.43
cleared my %TEMP% files -
rebooted machine (multiple times)
The program only gets values within tables but I suspect that somewhere in the background, as the the script runs, unnecessary data is piling-up which causes the CPU to hit the ceiling.
Hoping that someone can help me figure out what causes this problem in the CPU and resolve the issue.

It would be tough to guess the exact reason of 100% CPU Usage without any visibility to your code blocks specifically the WebDriver configuration. So the answer will be pretty much based on generic guidelines as follows:
Never close the browser (by pressing the X button). Always invoke driver.quit() within tearDown(){} method to close & destroy the WebDriver and Web Client instances gracefully.
You can find a detailed discussion in PhantomJS web driver stays in memory
Never terminate the script (by pressing Control+C). Incase there are presence of zombie WebDriver or Web Browser instances you can programatically remove them.
You can find a detailed discussion in Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
A couple of useful ChromeOptions() and their usage are as follows:
options.addArguments("start-maximized"); // open Browser in maximized mode
options.addArguments("disable-infobars"); // disabling infobars
options.addArguments("--disable-extensions"); // disabling extensions
options.addArguments("--disable-gpu"); // applicable to windows os only
options.addArguments("--disable-dev-shm-usage"); // overcome limited resource problems
options.addArguments("--no-sandbox"); // Bypass OS security model
Using hardcoded sleeps in the form of time.sleep(1) is a big No.
You can find a detailed discussion in How to sleep webdriver in python for milliseconds
Incase you are using Chrome in headless mode, there had been a lot of discussion going around about the unpredictable CPU and Memory Consumption by Chrome Headless sessions.
You can find a detailed discussion in Limit chrome headless CPU and memory usage
Always keep your Test Environment updated with the latest released binaries as follows:
Upgrade ChromeDriver to current ChromeDriver v2.44 level.
Keep Chrome version between Chrome v69-71 levels. (as per ChromeDriver v2.44 release notes)
Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
If your base Web Client version is too old, then uninstall it through Revo Uninstaller and install a recent GA and released version of Web Client.
Take a System Reboot.
Execute your #Test.
From Space and Memory Management perspective:
(WindowsOS only) Use CCleaner tool to wipe off all the OS chores before and after the execution of your Test Suite.
(LinuxOS only) Free Up and Release the Unused/Cached Memory in Ubuntu/Linux Mint before and after the execution of your Test Suite.

Have you tried releasing memory into the loop?
Maybe by picking up the values (list out of the loop?) and then resetting those variables to None you can avoid excessive memory consumption.
...
while True:
...
close1 = close2 = close3 = close4 = close5 = close6 = None
...
You can also try forcing the garbage collector:
import gc
while True:
...
gc.collect()
If you think that the reason may be a script another another solution to detect the problem might be to enable Chrome to do remote debug and debug the page.
--remote-debugging-port=9222
I hope some of this helps you.

Opening more than 9 sessions with Selenium

I am trying to use Selenium to visit a website with a few dozen sessions at a time, but whenever I try and setup more than 9 sessions, it says "chromedriver.exe is not responding" and the sessions start closing themselves.
Here is my code:
from selenium import webdriver
import time
url = "website URL"
amount = 36
def generateBrowsers():
for x in range(0, amount):
driver = webdriver.Chrome(executable_path="C:/Users/user/Documents/chromedriver_win32/chromedriver.exe")
driver.get(url)
time.sleep(3)
generateBrowsers()
Does anyone know what could be wrong?

Logically, your code block have No Errors.
But as you are trying to open 36 Sessions at a time you need to consider the following facts :
Each call to driver = webdriver.Chrome(executable_path="C:/Users/user/Documents/chromedriver_win32/chromedriver.exe") will initiate :
1. A new WebDriver instance
2. A new Web Browser instance
Each of the WebDriver instance and Web Browser instance will need to occupy some amount of :
1. CPU
2. Memory
3. Network
4. Cache
Now, as you execute your Test Suite from your system which also runs a lot other Applications (some of them may be on Start Up) tries to accomodate within the available CPU, Memory, Network or Cache. So whenever, the usage of mentioned parameters gets beyond the threshhold level, either the next new chromedriver.exe or the chrome.exe will be unable to spawn out properly. In your case chromedriver.exe was unable to spawn out. Hence you see the error :
chromedriver.exe is not responding
Solution
If you have a requirement of spawning 36 Sessions at a time you need to use :
Selenium in Grid Configuration : Selenium Grid consists of a Hub and Node and you will be able to distribute required number of sessions among number of Nodes.

Python / Celery / Selenium continuous task (avoid reopening browser)

Biggest issue I have with selenium is long re-opening time of browser(using it to scrape every few minutes). I am also using proxies and running multiple browsers with python's threading - All starting/stopping every few minutes(when new job comes)
Threading also means only 1 CPU is used and performance suffers.
I've been thinking about starting to use celery(out-of-box multi-core support) and make workers(different proxy/browser) run indefinitely(while loop) with open instances of selenium browsers waiting to get exact URLs to scrape - feed via something like redis.
Is it a good idea to be running continuous tasks like this with celery? Is there any better way to do it?

Its never a good idea to hold open instances of selenium indefinitely,
best practice is to reopen with each task.
so for you question, in my opinion its not a good idea.
let me offer you another architecture instead.
use Docker to run your selenium machines,
basically create selenium-grid (first result in google link)
using Docker
once everything is setup correctly the task will become easy, with multiprocessing send to your selenium hub all the jobs in parallel,
and they will run simultaneously on as many containers as you need.
once the job is done, you can destroy the containers and start fresh, with the next cycle.
Using docker will also allow you to scale you operation very easily

Multiple Selenium instances on same machine?

What is the best way to run multiple selenium instances in parallel if the tests are for:
same browser type
same machine
I've read this: https://code.google.com/p/selenium/wiki/ScalingWebDriver and seems there is a systemic problem regarding running multiple Selenium instances on the same machine. But I'd like to ask the community if there is a way that I'm not seeing.
I have a working selenium instance run e2e tests. However, I would now like to run 5 of those selenium instances in parallel using the same browser type.
I've looked into Selenium Grid 2 and I'm not sure if it fits my use case. It seems like the main point of Selenium Grid 2 is to be able to distribute test according to browser version / operating system. But in my case, each test is for the same type, same browser version.
Running the standalone test works great!
Running multiple Firefox processes:
But as I try to scale by spawning multiple Firefox processes, I get errors that mostly involve HTTP and requests error, including BadClient and StatusError and Exception: Request cannot be sent:
webdriver.Firefox()
Using Grid
I've dug into the webdriver.Firefox() code and it looks like under the scenes, it's connecting locally:
class WebDriver(RemoteWebDriver):
def __init__(self, firefox_profile=None, firefox_binary=None, timeout=30, capabilities=None, proxy=None):
...
RemoteWebDriver.__init__(self,
command_executor=ExtensionConnection("127.0.0.1", self.profile, self.binary, timeout))
The RemoteWebDriver instance seems to just connect to localhost on a free port that it finds. Which seems like the same command used by Grid when registering a node:
java -jar selenium-server-standalone-2.44.0.jar -role node -hub http://localhost:4444/grid/register
Does Grid have any relevance for running parallel Selenium instances on the same machine? Or is it primarily a load balancer for instances running on various different machines?
Is it possible to have reliable, non-flaky instances of Selenium run in parallel on the same machine? I get HTTP flakiness when I run them in parallel (lots of request cannot be sent or Bad Status Error or the browser closing before info can be read from socket)
What's the point of Selenium Grid 2? Does it just act as a load balancer for parallel test runs on multiple machines? If I ran grid locally with the hub and node on the same machine (all for FF), would it effectively be the same as me running multiple webdriver.Firefox() processes?
Or is there some more magic behind the scenes?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.