Selenium Grid connection errors and timeout exceptions

Selenium Grid connection errors and timeout exceptions - python

My team and I have been having some trouble with our testing infrastructure. We have test suites that run against different browsers on different platforms using different provisioned settings -- all of which works just fine. Our only issue is that from time to time, we run into strange connection failures and session errors.
Our connection failure deriving from urlib2, which the official selenium bindings are communicating with
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
Is occurring across all browsers on both our Windows and Mac nodes.
A well as our session related error,
selenium.common.exceptions.WebDriverException: Message: Session [insert-session-id-here] was terminated due to TIMEOUT
Giving some context, we have 5 maximum sessions on our windows node with 5 max instances for chrome and firefox. Our tests also run in parallel.
The connection error only occurs when we have our hub/nodes running for an extended period of time. I've found that shutting down the hub using the lifecycle manager and starting it back up seems to do the trick -- until the error appears again.
The session error occurs across different tests on different browsers unexpectedly. From what I've read it may be due to parallelization, but I have no idea what the root cause is.
Hub Configuration
https://pastebin.com/17VQHbrA
Node Configuration
https://pastebin.com/XDit2yT1

Related

Random WinError 10060 when trying to connect to websocket

Background - using a windows machine, Windows 10/11, connected to the internet through ethernet cable. Using python's websocket-client module, on pycharm.
Problem - one fine day, out of the blue, I am no longer able to connect to FTX's websocket wss://ftx.com/ws/ on my local windows machine and kept getting the error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
I tried on my VPS, a linux machine, with the same exact code and it works fine.
Attempts to fix the problem were:
A full reformat of computer, windows 10 > windows 10, and reinstall python 3.10 and pycharm.
Then upgrading windows 10 > windows 11.
Made sure all drivers are up to date, and also reinstalling them.
Resetting network settings from Windows
Tested the websocket connection with other end point and it worked fine. eg wss://api.gemini.com/v1/marketdata/BTCUSD
None of the above worked.
My websocket connection is in a while loop something like this
while True:
try:
ws.run_forever()
time.sleep(0.2)
except:
pass
So it would keep retrying.
There was a weird behavior when, from windows, I disable, and re-enable the network adapter, the websocket connection would suddenly be successful. And if I were to disable and re-enable the network adapter again; sometimes it would continue to fail, or sometimes the connection would be successful again. Note this is while the code is still executing in the while loop.
And, if I were to stop and re-run the code, it would never work, and the error WinError 10060 would arise.
At this point I am stumped and have no clue how to solve this problem as it was totally random and out of the blue. Looking for help and advise please..!

OSError: [WinError 1450] Insufficient system resources exist to complete the requested service - when running PyWinTrace (python-based ETW solution)

I'm trying to run a Python script based on Pywintrace by FireEye which is a python-based Event Tracing for Windows (ETW) solution. It allows you to specify ETW providers and keywords in order to get real-time output of Windows events printed to your screen.
Initially I was able to run the script ~5 times without a problem, but since then I keep getting the following error:
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service.
I don't know why it stopped working because nothing obvious changed about my environment. Also, I've tried to run the script on two separate machines (Windows 10) as well as on a Windows 7 VM, but I always get the same error.
I found a potential solution for the 1450 error on two separate questions on here, followed the steps for all machines, however, the problem persisted. Those steps included changing the registry entries "PoolUsageMax" and "PagedPoolSize" under memory management.
(The steps followed are from the answers on the following two questions:
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service using Selenium in Python through Anaconda
System error 1450 has occurred. Insufficient system resources exist to complete the requested service)
Any idea how else I can try and fix this error and run the code?
Thank you in advance.

Suddenly: WinError 10054 Python Socket.py dependency

I know questions about WinError 10054 have been asked before, but solutions were not applicable in my case.
I have a dozen of scripts running on a daily basis, but as of yesterday a few of them crashed on the following error:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
The scripts are both running on a local computer and a server (acceptance/production) and the strange thing is that on both machines the scripts now crash. In one case the script crashes on ftplib at the moment I try to retrieve a file from the ftp server by ftp.retrbinary(). Another script crashes when I try to close a webdriver, used for controlling the Chrome browser. Both are dependent on the socket.py library and the error traces back to: return self._sock.recv_into(b). Has there been some update recently that could cause this error?

Aborted connection error in flask

If the client closes an established connection that it has made with the flask server, I get the following error in the terminal:
[Errno 10053] An established connection was aborted by the software in your host machine
It seems when flask tries to write in the closed stream, it faces errors and hence will complain.
It seemed like a warning or so as the application does not quit after printing the error, but the problem is that my server will stop serving other requests despite being alive in the system.
I have read similar questions but they did not help. How can I prevent this issue? I use both Windows and Linux operating systems.

The cause of this problem is, as we previously discussed, insufficient permissions to perform the network operation. The remedy to the problem was to run the process as an administrator and/or to modify the system policy to allow connections on the restricted port.

Recovering Celery From a Database Outage

I have Celeryd/RabbitMQ running on a Fedora box, communicating with a MySQL
database on a separate box. I've noticed that, on rare occasions, if
there's even the slightest problem connecting to the MySQL database
(even for a few seconds), celeryd will crash with the error:
OperationalError: (2003, "Can't connect to MySQL server on
'mydatabasedomain' (111)")
and fail to reconnect even when the database becomes available again.
Currently, I'm forced to manually restart the celeryd service to get
celery running again. Is there a more graceful and automatic way to
recover from these types of event? Is there any feature of celeryd to
just quietly wait, logging the OperationalError, and reconnect instead
of exiting out entirely?

I don't know of any way to fix this by simply using a config flag, but you could consider running your worker using supervisor (s. http://supervisord.org).
This is even mentioned in the celery docs (http://celery.readthedocs.org/en/latest/tutorials/daemonizing.html#supervisord) including a link to some example config files.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.