I tried to implement Selenium to scrape the pages from the list. While trying to scrape the list, sometimes, the execution just stops. It seems that sometimes the execution doesn't go past driver.close() and it happens completely at random. Below is the code I use to scrape multiple pages.
I would appreciate if anyone suggested a way to ensure that the driver closes after scraping the data.
from selenium import webdriver
addresses = ['address1', 'address2',...]
results = []
for address in addresses:
driver = get_chromedriver() # returns webdriver instance
driver.get(f"https://www.example.com/{address}")
values = scrape_some_data()
driver.close()
driver.quit()
results.append(values)
# do something with the list of values
A few things I have noticed which might, or might not, be helpful in solving your issues:
Unless you really need to, it might be better to call driver = get_chromedriver() outside the loop, and run the driver.quit() after the loop is complete, that will speed up your execution significantly as your browser will not need to re-open. However if you are accessing multiple instances of the same website then you might need to depend on your method.
driver.quit() should be sufficient for your use without the need for driver.close() here.
If you want to use multiple instances definitely, it might be better to use threading. I've heard of a few cases where issues can occur if a loop is used while destroying/recreating the driver over and over.
Try changing your code as below.
You declare webdriver instance once and use driver.get to open a browser url.
Also, I suggest to append all values before you quit webdriver.
from selenium import webdriver
driver = get_chromedriver() # returns webdriver instance
addresses = ['address1', 'address2',...]
results = []
for address in addresses:
driver.get(f"https://www.example.com/{address}")
values = scrape_some_data()
results.append(values)
driver.close()
# do something with the list of values
Difference between driver.close() and driver.quit():
close() method closes the current window.
quit() method quits the driver and closes every associated window.
So, if you want one window to be closed, use close(), all windows - quit()
One more thing I suggest: add explicit waits for all your data to be loaded before webdriver is closed.
To use explicit waits import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
And use like:
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "css_selector"))) # for a list of elements
Take this as an example: How to find and compare text with the style property using Selenium/Python?
If all above suggestions will not work, try closing webdriver in finally block.
Related
I am trying to automate a process to download data from a website. The code works if I run it step by step but if I run it all at once it fails. Giving the error
ElementNotInteractableException: Message: element not interactable
I have got around this using time.sleep(x amount of time) but it still seems to fail intermittently. I am having trouble implementing implicit waits. Any help would be appreciated. Code below.
import selenium
browser = webdriver.Chrome(executable_path=r'path\to\chromedriver.exe')
browser.get("https://map.sarig.sa.gov.au/")
browser.maximize_window()
browser.switch_to.frame(browser.find_element_by_id('MapViewer'))
browser.find_element_by_xpath('//*[#id="TourWidget"]/div[1]/span').click()
browser.find_element_by_xpath('//*[#id="menuAllMapLayers"]/div[2]/p').click()
browser.find_element_by_xpath('//*[#id="238"]/li[1]/div/div/span[1]').click()
time.sleep(3)
browser.find_element_by_xpath('//*[#id="238"]/li[1]/div/div/span[1]').click()
browser.find_element_by_xpath('//*[#id="238"]/li[3]/div/div/label/span').click()
browser.find_element_by_xpath('//*[#id="239"]/li[1]/div/div/span[1]').click()
browser.find_element_by_xpath('//*[#id="239"]/li[3]/div/div/label/span').click()
browser.find_element_by_xpath('//*[#id="menuActiveLayers"]').click()
browser.find_element_by_xpath('//*[#id="groupOptions238"]/span').click()
time.sleep(3)
browser.find_element_by_xpath('//*[#id="238"]/li[2]/div/div[3]/div[2]/span').click()
browser.find_element_by_xpath('//*[#id="groupOptions239"]/span').click()
time.sleep(3)
browser.find_element_by_xpath('//*[#id="239"]/li[2]/div/div[3]/div[2]/span').click()
Use ActionChains and get access to pause(3) instead of using sleep(3) but it could also help to use Waits and checking if your elements actually are "visible" rather than "present" (see expected_conditions)
It's a lot of dropdowns so maybe there are not visible all the time, but you can run these checks after doing a move_to_element() so it would actually be present.
I am creating a program in python using web browser. There is an internet issue. When the internet is slow, the program gets the error (xpath is not found) and stops. I am also using the sleep function
How can I create a while loop of xpath?
or any other methods please explain.
I have done this...
browser.find_element_by_xpath('//*[#id="react-root"]/section/nav/div[2]/div/div/div[2]/div[1]/div/span').click()
time.sleep(3)
i want to do
when internet gets slow. The program will wait for xpath then click.
Try using Explicit Wait
See this link : https://www.selenium.dev/documentation/en/webdriver/waits/#explicit-wait
You can and should use webdriver wait.
It was introduced exactly for this purpose.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="react-root"]/section/nav/div[2]/div/div/div[2]/div[1]/div/span'))).click()
You also should use better locators.
//*[#id="react-root"]/section/nav/div[2]/div/div/div[2]/div[1]/div/span
doesn't look good at all.
I found this python script on github that sends automatic WhatsApp Web messages through Selenium.
#https://www.github.com/iamnosa
#Let's import the Selenium package
from selenium import webdriver
#Let's use Firefox as our browser
web = webdriver.Firefox()
web.get('http://web.whatsapp.com')
input()
#Replace Mr Kelvin with the name of your friend to spam
elem = web.find_element_by_xpath('//span[contains(text(),"Mr Kelvin")]')
elem.click()
elem1 = web.find_elements_by_class_name('input')
while True:
elem1[1].send_keys('hahahahahahaha')
web.find_element_by_class_name('send-container').click()
Even though it was meant for spamming, I was trying to adapt it for a good purpose, but the script as it stands doesn't seem to work. Instead of sending a message through WhatsApp Web, it simply loads a QR authentication screen and then it does nothing after I authenticate with my cellphone.
Any clue as to why this is happening? I'm running the lastest version of Selenium WebDriver on Firefox and geckodriver has already been extracted to /usr/bin/.
I realise this post is older, but it still seems to be frequently looked at.
The keystroke explanation of #vhad01 makes sense but did not work for me.
A simple dirty workaround that worked for me:
Replace input() with
import time
time.sleep(25)
while 25 is the amount of seconds it will be waited until the code will be further executed. (15 should also be sufficient to scan the QR code,...).
The way I implement the scanning of the QR code is by detecting if the search bar is present or not on the page.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chatlist_search = ".jN-F5.copyable-text.selectable-text"
web.get("https://web.whatsapp.com")
WebDriverWait(web, 60).until(EC.visibility_of_element_located((By.CSS_SELECTOR, chatlist_search)))
This would wait until the chat search-bar is rendered on the page, or it will timeout in 60 seconds.
This line :
input()
is waiting for a keystroke to continue.
Simply press any key after scanning.
I was writing a selenium script to schedule my msgs and I came across your question. Yes, problem is that input() line.
Instead of using input():
Use time.sleep(), no doubt it will work but better approach it to use implicit_wait(15)
Time.sleep() makes you wait even after scanning. The script totally stops till the given seconds.
In implicit_wait() the if element appear before specified time than script will start executing otherwise script will throw NoSuchElementException.
I used a more different method to whatsapp_login() and QR scanning. To see that my repo link: https://github.com/shauryauppal/PyWhatsapp
You would like this approach too.
A better way is to scan the QR code hit return in the command line and the proceed further on your code.
browser = webdriver.Firefox()
browser.get('https://web.whatsapp.com/')
print('Please Scan the QR Code and press enter')
input()
This is all you need and is also not very vague logic to apply to this problem.
My script does not come out to execute the next line after clicking below link.
monkey_click_by_id(driver, "fwupdatelink")
Is there any way to come out after clicking it explicitly without fail?
The issue could be that the webdriver moves on too quickly before your scripts gets a chance to get the information it needs. my suggestions is to make it wait until the element is loaded by visibility which would be the same as an actual user seeing the link show up in their browser. just to be safe.
try adding this code before your script. it waits for the element with ID "fwupdatelink" to be visible or otherwise just 10 second which ever is shorter.
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.ID,"fwupdatelink")))
You could also use a try/except statement, just in case your driver gives you a timeout error.
Im using PhantomJS to collect data about a Html page. My code it`s something like this:
from selenium import webdriver
class PageElements():
def __init__(self, url):
self.driver = webdriver.PhantomJS()
self.driver.get(url)
self.elements, self.attribute_types = self._load_elements(self.driver)
def _load_elements(self, self.driver)
""""This is not relevant"""
So, after I execute the code on IPython Notebook sometimes, to test things out. After a while, i get this on my Activity Monitor:
And this:
The processes still run even after i add a destroyer like:
def __del__(self):
self.driver.close()
What is happening? I would really appreciate a "why this is happening" answer, instead a "do this" one. Why my destroyer isn't working?
I opened #forivall links, and saw the Selenium code. The PhantomJS webdriver has it`s own destructor (thus making mine redundant). Why aren't they working in this case?
__del__() tends to be unreliable in python. Not only do you not know when it will be called, you don't even have any guarantees that it will ever be called. try/finally constructs, or (even better) with-blocks (a.k.a. context managers), are much more reliable.
That said, I had a similar issue even using context managers. phantomjs processes were left running all over the place. I was invoking phantomjs through selenium as follows:
from selenium import webdriver
from contextlib import closing
with closing(webdriver.PhantomJS()) as driver:
do_stuff(driver)
contextlib's closing() function ensures that the close() method of its argument gets called whatever happens, but as it turns out, driver.close(), while available, is the wrong method for cleaning up a webdriver session. driver.quit() is the proper way to clean up. So instead of the above, do one of the following:
from selenium import webdriver
from contextlib import contextmanager
#contextmanager
def quitting(quitter):
try:
yield quitter
finally:
quitter.quit()
with quitting(webdriver.PhantomJS()) as driver:
do_stuff(driver)
or
from selenium import webdriver
driver = webdriver.PhantomJS()
try:
do_stuff(driver)
finally:
driver.quit()
(The above two snippets are equivalent)
Credit goes to #Richard's comment on the original question for pointing me toward .quit().
As of July 2016, following the discussion on this GitHub issue, the best solution is to run:
import signal
driver.service.process.send_signal(signal.SIGTERM)
driver.quit()
Instead of driver.close(). Just running driver.quit() will kill the node process but not the phantomjs child process that it spawned.
self.driver = webdriver.PhantomJS()
This creates a web browser that is then used by Selenium to run the tests. Each time Selenium runs, it opens a new instance of the web browser, rather than looking to see if there is a previous one it could re-use. If you do not use .close at the end of the test, then the browser will continue to run in the background.
As you have seen, running the test multiple times leaves multiple browsers orphaned.
What the difference between this case, and objects that Python usually destroy automatically with it`s garbage collector?
The difference is that it's creating something outside of Python's domain: it's creating a new OS-level process. Perhaps webdriver.PhantomJS should have its own __del__ that will shut itself down Perhaps the behaviour should be more robust, but that's not the design decision that the selenium developers went with, probably because most of the other drivers are not headless (so it's obvious that the windows are open).
Unfortunately, neither the selenium (or unofficial) documentation has much clarification/best practices on this. (see comments below on __del__ behaviour).
links to source:
phantomjs/webdriver.py
remote/webdriver.py (superclass)
I was also struggling for the same problem and I solved it from this source link.
By replacing self.process.kill() in selenium/webdriver/phantomjs/service.py with self.process.send_signal(signal.SIGTERM).
By using driver.quit() will kill all process of phantomjs on completing program or cancel the program using CTR+C