Multiple Selenium instances on same machine?

Multiple Selenium instances on same machine? - python

What is the best way to run multiple selenium instances in parallel if the tests are for:
same browser type
same machine
I've read this: https://code.google.com/p/selenium/wiki/ScalingWebDriver and seems there is a systemic problem regarding running multiple Selenium instances on the same machine. But I'd like to ask the community if there is a way that I'm not seeing.
I have a working selenium instance run e2e tests. However, I would now like to run 5 of those selenium instances in parallel using the same browser type.
I've looked into Selenium Grid 2 and I'm not sure if it fits my use case. It seems like the main point of Selenium Grid 2 is to be able to distribute test according to browser version / operating system. But in my case, each test is for the same type, same browser version.
Running the standalone test works great!
Running multiple Firefox processes:
But as I try to scale by spawning multiple Firefox processes, I get errors that mostly involve HTTP and requests error, including BadClient and StatusError and Exception: Request cannot be sent:
webdriver.Firefox()
Using Grid
I've dug into the webdriver.Firefox() code and it looks like under the scenes, it's connecting locally:
class WebDriver(RemoteWebDriver):
def __init__(self, firefox_profile=None, firefox_binary=None, timeout=30, capabilities=None, proxy=None):
...
RemoteWebDriver.__init__(self,
command_executor=ExtensionConnection("127.0.0.1", self.profile, self.binary, timeout))
The RemoteWebDriver instance seems to just connect to localhost on a free port that it finds. Which seems like the same command used by Grid when registering a node:
java -jar selenium-server-standalone-2.44.0.jar -role node -hub http://localhost:4444/grid/register
Does Grid have any relevance for running parallel Selenium instances on the same machine? Or is it primarily a load balancer for instances running on various different machines?
Is it possible to have reliable, non-flaky instances of Selenium run in parallel on the same machine? I get HTTP flakiness when I run them in parallel (lots of request cannot be sent or Bad Status Error or the browser closing before info can be read from socket)
What's the point of Selenium Grid 2? Does it just act as a load balancer for parallel test runs on multiple machines? If I ran grid locally with the hub and node on the same machine (all for FF), would it effectively be the same as me running multiple webdriver.Firefox() processes?
Or is there some more magic behind the scenes?

Related

Starting webdriver faster Selenium Python

Is there a way to start the chromedriver faster in Selenium?
It's taking about 6 seconds for it to actually start and thats literally half the runtime of my code.

Check ChromeDriverService that will reduce the launching time. And you will see the diff when you have multiple tests running.

It can be done by using ChromeDriverService.
When we use ChromeDriver class, it starts/terminates the ChromeDriver server process which can waste a significant amount of time for large test suites whereas ChromeDriver instance is created per test.
or we can start the Chrome-driver server separately before running tests, and connect to it using the Remote WebDriver
please refer official documantation at
https://sites.google.com/a/chromium.org/chromedriver/getting-started

Opening more than 9 sessions with Selenium

I am trying to use Selenium to visit a website with a few dozen sessions at a time, but whenever I try and setup more than 9 sessions, it says "chromedriver.exe is not responding" and the sessions start closing themselves.
Here is my code:
from selenium import webdriver
import time
url = "website URL"
amount = 36
def generateBrowsers():
for x in range(0, amount):
driver = webdriver.Chrome(executable_path="C:/Users/user/Documents/chromedriver_win32/chromedriver.exe")
driver.get(url)
time.sleep(3)
generateBrowsers()
Does anyone know what could be wrong?

Logically, your code block have No Errors.
But as you are trying to open 36 Sessions at a time you need to consider the following facts :
Each call to driver = webdriver.Chrome(executable_path="C:/Users/user/Documents/chromedriver_win32/chromedriver.exe") will initiate :
1. A new WebDriver instance
2. A new Web Browser instance
Each of the WebDriver instance and Web Browser instance will need to occupy some amount of :
1. CPU
2. Memory
3. Network
4. Cache
Now, as you execute your Test Suite from your system which also runs a lot other Applications (some of them may be on Start Up) tries to accomodate within the available CPU, Memory, Network or Cache. So whenever, the usage of mentioned parameters gets beyond the threshhold level, either the next new chromedriver.exe or the chrome.exe will be unable to spawn out properly. In your case chromedriver.exe was unable to spawn out. Hence you see the error :
chromedriver.exe is not responding
Solution
If you have a requirement of spawning 36 Sessions at a time you need to use :
Selenium in Grid Configuration : Selenium Grid consists of a Hub and Node and you will be able to distribute required number of sessions among number of Nodes.

Python / Celery / Selenium continuous task (avoid reopening browser)

Biggest issue I have with selenium is long re-opening time of browser(using it to scrape every few minutes). I am also using proxies and running multiple browsers with python's threading - All starting/stopping every few minutes(when new job comes)
Threading also means only 1 CPU is used and performance suffers.
I've been thinking about starting to use celery(out-of-box multi-core support) and make workers(different proxy/browser) run indefinitely(while loop) with open instances of selenium browsers waiting to get exact URLs to scrape - feed via something like redis.
Is it a good idea to be running continuous tasks like this with celery? Is there any better way to do it?

Its never a good idea to hold open instances of selenium indefinitely,
best practice is to reopen with each task.
so for you question, in my opinion its not a good idea.
let me offer you another architecture instead.
use Docker to run your selenium machines,
basically create selenium-grid (first result in google link)
using Docker
once everything is setup correctly the task will become easy, with multiprocessing send to your selenium hub all the jobs in parallel,
and they will run simultaneously on as many containers as you need.
once the job is done, you can destroy the containers and start fresh, with the next cycle.
Using docker will also allow you to scale you operation very easily

Automating Tasks through a VNC Connection within a Browser

So first off, overall what I'm trying to accomplish is for a base machine (as in a VPS) to run automated task through Firefox using Python.
Now the object or goal is to have Firefox run the given tasks in the browser itself, though then connect to a VPS (through the browser) using a VNC connection, and control or issue tasks as well to that VPS (this is the part I'm having trouble with); with also as little memory required as possible for maximum efficiency.
To give an example, if you've used Digital Ocean, you can view your VPS's specific screen or terminal within the current browser.
To be clear, the VPS OS I'm using to run the base process is Linux, though the VPS that the program is connecting to (through the browser) is using a Windows OS. Something such as this let's say (note I didn't screenshot this):
My problem lies with that after running through all of the scripted tasks using Selenium in Python (with Firefox), once I open up the VPS in the browser, I can't figure out how to access it properly or issue jobs to be completed.
I've thought about maybe using (x,y) coordinates for mouse clicks, though I can't say this would exactly work (I tested it with iMacros, though not yet Selenium).
So in a nutshell, I'm running base tasks in Firefox to start, and then connecting to a VPS, and finally issuing more tasks to be completed from Firefox to that VPS that's using a Windows OS environment.
Suggestions on how to make this process simpler, more efficient, or further its reliability?

There is a class in java called Robot class which can handle almost all keyboard operation
There is a similer thing present in python gtk.gdk.Display.
Refer below:-
Is there a Python equivalent to Java's AWT Robot class?
Take a screenshot via a python script. [Linux]
OR
Python ctypes keybd_event simulate ctrl+alt+delete
Demo java code:-
try{
Robot robot = new Robot();
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_ALT);
robot.keyPress(KeyEvent.VK_DELETE);
robot.keyRelease(KeyEvent.VK_CONTROL);
robot.keyRelease(KeyEvent.VK_ALT);
robot.keyRelease(KeyEvent.VK_DELETE);
}
catch(Exception ex)
{
System.out.println(ex.getMessage());
}
Hope it will help you :)

python - number of pyro connections

I'm using python and writing something that connects to a remote object using Pyro4
When running some unit tests (using pyunit) that repeatedly connects to a remote object with pyro, I found I couldn't run more than 9 tests or the tests would get stuck and just hang there.
I've now managed to fix this by using
with Pyro4.Proxy(PYRONAME:name) as pyroObject:
do something with object...
whereas before I was creating the object in the test set up:
def setUp(self):
self.pyroObject = Pyro4.Proxy(PYRONAME:name)
and then using self.pyroObject within the tests
Does anyone know why this has fixed the issue? Thanks

When you're not cleaning up the proxy objects they keep a connection live to the pyro daemon. By default the daemon accepts 16 concurrent connections.
If you use the with.. as... syntax, you're closing the proxy cleanly after you've done using it and this releases a connection in the daemon, making it available for a new proxy.
You can increase the number of 16 by increasing Pyro's threadpool size via the config. Alternatively you could perhaps use the multiplex server type instead of the default threaded one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.