web scraping contat list selenium python - python

How can i loop through contacts in group in Discord using selenium in Python?
I tried this code, and i have this error:
selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
Problem is scroller and contacts are constantly updating...
I tried this code:
while True:
num=0
try:
users_list = driver.find_elements_by_css_selector("div.memberOnline-1CIh-0.member-3W1lQa")
for user in users_list:
num+=1
user.click()
driver.execute_script("arguments[0].scrollIntoView();",user)
print('User number {}'.format(num))
except StaleElementReferenceException and ElementClickInterceptedException:
print('bad')
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight",users_list)

From your given code, you only scroll element, so the reason of Stale exception is you not wait page load complete, or at least not wait the contacts not load complete.
For debug purpose, you can simple add a long sleep before the loop, like sleep(15), and replace to explicit wait if production code, like
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
Detail of Explicit Wait at here
If you call click() in loop, you need to find the elements again in loop:
while True:
num=0
try:
time.sleep(15)
users_list = driver
.find_elements_by_css_selector("div.memberOnline-1CIh-0.member-3W1lQa")
length = len(users_list)
for num in range(0, length):
user = users_list[num]
user.click()
time.sleep(15)
driver.execute_script("arguments[0].scrollIntoView();",user)
print('User number {}'.format(num+1))
// because the above `click` make page happen changes
// so selenium will treat it as a new page,
// those element reference found on `old` page, can not work on `new` page
// you need to find elements belongs to `old` page again on `new` page
// find users_list again from `new` page
users_list = driver
.find_elements_by_css_selector("div.memberOnline-1CIh-0.member-3W1lQa")
except StaleElementReferenceException and ElementClickInterceptedException:
print('bad')
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight",
users_list)

Related

How to break a loop if certain element is disabled and get text from multiple pages in Selenium Python

I am a new learner for python and selenium. I have written a code to extract data from multiple pages but there is certain problem in the code.
I am not able to break the a while loop function which clicks on next page until there is an option. The next page element disables after reaching the last page but code sill runs.
xpath: '//button[#aria-label="Next page"]'
Full SPAN: class="awsui_icon_h11ix_31bp4_98 awsui_size-normal-mapped-height_h11ix_31bp4_151 awsui_size-normal_h11ix_31bp4_147 awsui_variant-normal_h11ix_31bp4_219"
I am able to get the list of data which I want to extract from the webpage but I am getting on the last page data when I close the webpage from my end, ending the while loop.
Full Code:
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
driver.find_element(by=By.XPATH, value = '//*[#id="sim-issueListContent"]/div[1]/div/div/div[2]/div[1]/span/div/input').send_keys('No Stock')
dfs = []
page_counter = 0
while True:
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")))
cards = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
sims = []
for card in cards:
sims.append([card.text])
df = pd.DataFrame(sims)
dfs.append(df)
print(page_counter)
page_counter+=1
try:
wait.until(EC.element_to_be_clickable((By.XPATH,'//button[#aria-label="Next page"]'))).click()
except:
break
driver.close()
driver.quit()
I am also, attaching the image of the class and sorry I cannot share the URL as it of private domain.
The easiest option is to let your wait.until() fail via timeout when the "Next page" button is missing. Right now your line wait = WebDriverWait(driver, 30) is setting the timeout to 30 seconds; assuming the page normally loads much faster than that, you could change the timeout to be 5 seconds and then the loop will end faster once you're at the last page. If your page load times are sometimes slow then you should make sure the timeout won't accidentally cut off too early; if the load times are consistently fast then you might be able to get away with an even shorter timeout interval.
Alternatively, you could look through the specific target webpage more carefully to find some element that a) is always present and b) can be used to determine whether we're on the final page or not. Then you could read the value of that element and decide whether to break the loop before trying to find the "Next page" button. This could save a couple of seconds of waiting on the final loop iteration (avoid waiting for timeout) but may not be worth the trouble.
Change the below condtion
try:
wait.until(EC.element_to_be_clickable((By.XPATH,'//button[#aria-label="Next page"]'))).click()
except:
break
as shown in the below pseduocode #disabled is the diff that will make sure to exit the while loop if the button is disabled.
if(driver.find_elements_by_xpath('//button[#aria-label="Next page"][#disabled]'))).size()>0)
break
else
driver.find_element_by_xpath('//button[#aria-label="Next page"]').click()

ElementClickInterceptedException: element click intercepted. Other element would receive the click: Selenium Python

I have seen other questions about this error but my case is that in my program the other element should receive the click. In details: the webdriver is scrolling through google search and it must click every website it finds but the program is preventing that. How can I make it NOT search the previous site it clicked?
This is the function. The program is looping it and after the first loop it scrolls down and the error occurs:
def get_info():
browser.switch_to.window(browser.window_handles[2])
description = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "h3"))
).text
site = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "cite"))
)
site.click()
url=browser.current_url
#removes the https:// and the / of the url
#to get just the domain of the website
try:
link=url.split("https://")
link1=link[1].split("/")
link2=link1[0]
link3=link2.split("www.")
real_link=link3[1]
except IndexError:
link=url.split("https://")
link1=link[1].split("/")
real_link=link1[0]
time.sleep(3)
screenshot=browser.save_screenshot("photos/"+"(" + real_link + ")" + ".png")
global content
content=[]
content.append(real_link)
content.append(description)
print(content)
browser.back()
time.sleep(5)
browser.execute_script("window.scrollBy(0,400)","")
time.sleep(5)
You can create a list of clicked website and check every time if that link is clicked or not. Here's the demo code :
clicked_website=[]
url=browser.current_url
clicked_website.append(url)
# Now while clicking
if <new_url> not in clicked_website:
<>.click()
This is just an idea how to implement. Your code is mess, I didn't understand clearly so, implement in your code by yourself.

Clicking through multiple links using selenium

Trying to break a bigger problem I have into smaller chunks
main question
I am currently inputting a boxer's name into an autocomplete box, selecting the first option that comes up (boxer's name) then clicking view more until I get a list of all the boxer's fights and the view more button stops appearing.
I am then trying to create a list of onclick hrefs I would like to click then iteratively clicking on each and getting the html from each page/fight. I would ideally want to extract the text in particular.
This is the code I have written:
page_link = 'http://beta.compuboxdata.com/fighter'
chromedriver = 'C:\\Users\\User\\Downloads\\chromedriver'
cdriver = webdriver.Chrome(chromedriver)
cdriver.maximize_window()
cdriver.get(page_link)
wait = WebDriverWait(cdriver,20)
wait.until(EC.visibility_of_element_located((By.ID,'s2id_autogen1'))).send_keys('Deontay Wilder')
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'select2-result-label'))).click()
while True:
try:
element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'view_more'))).click()
except TimeoutException:
break
# fighters = cdriver.find_elements_by_xpath("//div[#class='row row-bottom-margin-5']/div[2]")
links = [x.get_attribute('onclick') for x in wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(#onclick, 'get_fight_report')]/a")))]
htmls = []
for link in links:
cdriver.get(link)
htmls.append(cddriver.page_source)
Running this however gives me the error message:
ElementClickInterceptedException Traceback (most recent call last)
<ipython-input-229-1ee2547c0362> in <module>
10 while True:
11 try:
---> 12 element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'view_more'))).click()
13 except TimeoutException:
14 break
ElementClickInterceptedException: Message: element click intercepted: Element <a class="view_more" href="javascript:void(0);" onclick="_search('0')"></a> is not clickable at point (274, 774). Other element would receive the click: <div class="col-xs-12 col-sm-12 col-md-12 col-lg-12">...</div>
(Session info: chrome=78.0.3904.108)
UPDATE
I have tried looking at a few answers with similar error messages and tried this
while True:
try:
element = cdriver.find_element_by_class_name('view_more')
webdriver.ActionChains(cdriver).move_to_element(element).click(element).perform()
# element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'view_more'))).click()
except TimeoutException:
break
links = [x.get_attribute('onclick') for x in wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(#onclick, 'get_fight_report')]/a")))]
htmls = []
for link in links:
cdriver.get(link)
htmls.append(cddriver.page_source)
but this seems to create some sort of infinite loop at the ActionChains point. Seems to be constantly waiting for the view more href to appear
click function should already move the window so the element is in the viewable window. So you don't need that action chain (I think...) but the original error shows some other element OVER the view more button.
You may need to remove (or hide) this element from the DOM, or if it's a html window, "dismiss" it. So pinpointing this covering element is key and then deciding on a strategy to uncover the view more button.
Your site http://beta.compuboxdata.com/fighter doesn't seem to be working at the time so I can't dig in further.

Unable to fetch all the necessary links during Iteration - Selenium Python

I am newbie to Selenium Python. I am trying to fetch the profile URLs which will be 10 per page. Without using while, I am able to fetch all 10 URLs but for only the first page alone. When I use while, it iterates, but fetches only 3 or 4 URLs per page.
I need to fetch all the 10 links and keep iterating through pages. I think, I must do something with StaleElementReferenceException
Kindly help me solve this problem.
Given the code below.
def test_connect_fetch_profiles(self):
driver = self.driver
search_data = driver.find_element_by_id("main-search-box")
search_data.clear()
search_data.send_keys("Selenium Python")
search_submit = driver.find_element_by_name("search")
search_submit.click()
noprofile = driver.find_elements_by_xpath("//*[text() = 'Sorry, no results containing all your search terms were found.']")
self.assertFalse(noprofile)
while True:
wait = WebDriverWait(driver, 150)
try:
profile_links = wait.until(EC.presence_of_all_elements_located((By.XPATH,"//*[contains(#href,'www.linkedin.com/profile/view?id=')][text()='LinkedIn Member'or contains(#href,'Type=NAME_SEARCH')][contains(#class,'main-headline')]")))
for each_link in profile_links:
page_links = each_link.get_attribute('href')
print(page_links)
driver.implicitly_wait(15)
appendFile = open("C:\\Users\\jayaramb\\Documents\\profile-links.csv", 'a')
appendFile.write(page_links + "\n")
appendFile.close()
driver.implicitly_wait(15)
next = wait.until(EC.visibility_of(driver.find_element_by_partial_link_text("Next")))
if next.is_displayed():
next.click()
else:
print("End of Page")
break
except ValueError:
print("It seems no values to fetch")
except NoSuchElementException:
print("No Elements to Fetch")
except StaleElementReferenceException:
print("No Change in Element Location")
else:
break
Please let me know if there are any other effective ways to fetch the required profile URL and keep iterating through pages.
I created a similar setup which works alright for me. I've had some problems with selenium trying to click on the next-button but it throwing a WebDriverException instead, likely because the next-button is not in view. Hence, instead of clicking the next-button I get its href-attribute and load the new page up with driver.get() and thus avoiding an actual click making the test more stable.
def test_fetch_google_links():
links = []
# Setup driver
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.maximize_window()
# Visit google
driver.get("https://www.google.com")
# Enter search query
search_data = driver.find_element_by_name("q")
search_data.send_keys("test")
# Submit search query
search_button = driver.find_element_by_xpath("//button[#type='submit']")
search_button.click()
while True:
# Find and collect all anchors
anchors = driver.find_elements_by_xpath("//h3//a")
links += [a.get_attribute("href") for a in anchors]
try:
# Find the next page button
next_button = driver.find_element_by_xpath("//a[#id='pnnext']")
location = next_button.get_attribute("href")
driver.get(location)
except NoSuchElementException:
break
# Do something with the links
for l in links:
print l
print "Found {} links".format(len(links))
driver.quit()

Scraper: Try skips code in while loop (Python)

I am working on my first scraper and ran into an issue. My scraper accesses a website and saves links from the each result page. Now, I only want it to go through 10 pages. The problem comes when the search results has less than 10 pages. I tried using a while loop along with a try statement, but it does not seem to work. After the scraper goes through the first page of results, it does not return any links on the successive pages; however, it does not give me an error and stops once it reaches 10 pages or the exception.
Here is a snippet of my code:
links = []
page = 1
while(page <= 10):
try:
# Get information from the propertyInfo class
properties = WebDriverWait(driver, 10).until(lambda driver: driver.find_elements_by_xpath('//div[#class = "propertyInfo item"]'))
# For each listing
for p in properties:
# Find all elements with a tags
tmp_link = p.find_elements_by_xpath('.//a')
# Get the link from the second element to avoid error
links.append(tmp_link[1].get_attribute('href'))
page += 1
WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
except ElementNotVisibleException:
break
I really appreciate any pointers on how to fix this issue.
You are explicitely catching ElementNotVisibleException exception and stopping on it. This way you won't see any error message. The error is probably in this line:
WebDriverWait(driver, 10).until(lambda driver:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click())
I assume lambda here should be a test, which is run until succeeded. So it shouldn't make any action like click. I actually believe that you don't need to wait here at all, page should be already fully loaded so you can just click on the link:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
This will either pass to next page (and WebDriverWait at the start of the loop will wait for it) or raise exception if no next link is found.
Also, you better minimize try ... except scope, this way you won't capture something unintentionally. E.g. here you only want to surround next link finding code not the whole loop body:
# ...
while(page <= 10):
# Scrape this page
properties = WebDriverWait(driver, 10).until(...)
for p in properties:
# ...
page += 1
# Try to pass to next page
try:
driver.find_element_by_xpath('//*[#id="paginador_siguiente"]/a').click()
except ElementNotVisibleException:
# Break if no next link is found
break

Categories

Resources