Can't reach the bottom of a webpage

Can't reach the bottom of a webpage - python

I've written a script in python with selenium to handle the infinite scrolling webpage. The problem I'm facing is that It scrolls few times then quits the browser. It never reaches the bottom. I tried with Explicit Wait as well but that gives even fewer scrolling. How can I reach the bottom when there will be no more scrolling to do.
This is my try:
import time
from selenium import webdriver
from urllib.parse import urljoin
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver = webdriver.Chrome()
driver.get(url)
last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
items = driver.find_elements_by_css_selector(".v1Nh3 a")
new_len = len(items)
if last_len == new_len:break
driver.quit()
Edit:
If I try like below, I can do the scrolling as many times as I want but that is not a good idea to cope with:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver = webdriver.Chrome()
driver.get(url)
for scroll in range(1,10): #I can do the scrolling as many times as I want but it is fully hardcoded
item = driver.find_element_by_tag_name("body")
item.send_keys(Keys.END)
elems = driver.find_elements_by_css_selector(".v1Nh3 a")
time.sleep(3)
driver.quit()
I hope there is any way to do the scrolling automatically until it reaches the bottom.

So few thing here. In case of infinite scrolling I would follow few things
Disable images so that the scrolling is faster
Never trust a condition to be true if it is not consistent. Test it for it continuously for a period and if the condition is consistent then trust it
Try to not scroll way too long, infinite scrolling can cause browser to clog too much memory and sometimes even crash
Dump data in batches after every scroll. So on first page load, I would dump all page date. Then every scroll, I would just dump the delta part. This can be easily done using an xpath.
Below is a updated script which will do better for you. Do remember nothing is perfect, so you need to make your script adapt to failures
import time
from selenium import webdriver
from urllib.parse import urljoin
option = webdriver.ChromeOptions()
chrome_prefs = {}
option.experimental_options["prefs"] = chrome_prefs
chrome_prefs["profile.default_content_settings"] = {"images": 2}
chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}
driver = webdriver.Chrome(chrome_options=option)
url = "https://www.instagram.com/explore/tags/travelphotoawards/"
driver.get(url)
last_len = len(driver.find_elements_by_css_selector(".v1Nh3 a"))
new_len = last_len
consistent = 0
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
items = driver.find_elements_by_css_selector(".v1Nh3 a")
new_len = len(items)
if last_len == new_len:
consistent += 1
if consistent == 3:
break
else:
consistent = 0
driver.quit()

Every time there is a scroll older images disappear. You might get the same number or even smaller number of images after the scroll.
Each image has unique href, you can compare the last image href to the previous last image
last_href = driver.find_elements_by_css_selector('.v1Nh3 > a')[-1].get_attribute('href')
new_href = last_href
while True:
last_href = new_href
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
new_href = driver.find_elements_by_css_selector('.v1Nh3 > a')[-1].get_attribute('href')
if last_href != new_href:
break

Related

not able to scrape data because of dynamic changes in element identities

hey guys i was trying to scrape Zomato's restaurants those have ratings above 4 but https://www.zomato.com/pune/order-food-online?delivery_subzone=1165 but its class name or every thing changing after next few elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import json
driver=webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get('https://www.zomato.com/pune/order-food-online?delivery_subzone=1165')
rating=WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.XPATH, '//p[#class="sc-1hez2tp-0 sc-lhdg1m-2 hDJwRc"]'))
)
for item in rating:
stars=item.text
if stars > '4.0':
title=WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//p[#class='sc-1hez2tp-0 sc-izFuNb jbErXF']"))
)
time.sleep(10)
driver.close()
please guys I'm doing it by selenium

Go to the page.
Filter out the restaurants with 4.0+ ratings using the filters provided above - using the xpath //div[contains(text(),'Rating: 4.0+')] (use a click() method).
All of the cards of the restaurants have the image alt of Restaurant Card. So you can use the css selector img[alt='Restaurant Card'] to get all the cards appearing after filtering, and keep them in some count variable.
As you keep scrolling, you need to keep adding to this count variable.
Edit: Here is the whole script for you - which gives the count of restaurants as 117
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from urllib.parse import urljoin
##### Web scrapper for infinite scrolling page #####
driver = webdriver.Chrome(executable_path=r"path_to-chromedriver")
driver.get("https://www.zomato.com/pune/delivery-in-budhwar-peth")
time.sleep(10) # Allow 2 seconds for the web page to open
driver.find_element_by_xpath("//div[contains(text(),'Rating: 4.0+')]").click()
scroll_pause_time = 1 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;") # get the screen height of the web
i = 1
count=0
while True:
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
soup = BeautifulSoup(driver.page_source, "html.parser")
for img in soup.find_all('img',alt='Restaurant Card'):
count+=1
print('Count of all rests is',count)
driver.quit()

how to save data from multiple pages using webdriver into a single csv

so i'm trying to save data from googlescholar using selenium (webdriver) and so far i can print the data that i want, but i when i saved it into a csv it only saves the first page
from selenium import webdriver
from selenium.webdriver.common.by import By
# Import statements for explicit wait
from selenium.webdriver.support.ui import WebDriverWait as W
from selenium.webdriver.support import expected_conditions as EC
import time
import csv
from csv import writer
exec_path = r"C:\Users\gvste\Desktop\proyecto\chromedriver.exe"
URL = r"https://scholar.google.com/citations?view_op=view_org&hl=en&authuser=2&org=8337597745079551909"
button_locators = ['//*[#id="gsc_authors_bottom_pag"]/div/button[2]', '//*[#id="gsc_authors_bottom_pag"]/div/button[2]','//*[#id="gsc_authors_bottom_pag"]/div/button[2]']
wait_time = 3
driver = webdriver.Chrome(executable_path=exec_path)
driver.get(URL)
wait = W(driver, wait_time)
#driver.maximize_window()
for j in range(len(button_locators)):
button_link = wait.until(EC.element_to_be_clickable((By.XPATH, button_locators[j])))
address = driver.find_elements_by_class_name("gsc_1usr")
#for post in address:
#print(post.text)
time.sleep(4)
with open('post.csv','a') as s:
for i in range(len(address)):
addresst = address
#if addresst == 'NONE':
# addresst = str(address)
#else:
addresst = address[i].text.replace('\n',',')
s.write(addresst+ '\n')
button_link.click()
time.sleep(4)
#driver.quit()

You only get one first page data because your program stops after it clicks next page button. You have to put all that in a for loop.
Notice i wrote in range(7), because I know there are 7 pages to open, in reality we should never do that. Imagine if we have thousands of pages. We should add some logic to check if the "next page button" exists or something and loop until it doesn't
exec_path = r"C:\Users\gvste\Desktop\proyecto\chromedriver.exe"
URL = r"https://scholar.google.com/citations?view_op=view_org&hl=en&authuser=2&org=8337597745079551909"
button_locators = "/html/body/div/div[8]/div[2]/div/div[12]/div/button[2]"
wait_time = 3
driver = webdriver.Chrome(executable_path=exec_path)
driver.get(URL)
wait = W(driver, wait_time)
time.sleep(4)
# 7 pages. In reality, we should get this number programmatically
for page in range(7):
# read data from new page
address = driver.find_elements_by_class_name("gsc_1usr")
# write to file
with open('post.csv','a') as s:
for i in range(len(address)):
addresst = address[i].text.replace('\n',',')
s.write(addresst+ '\n')
# find and click next page button
button_link = wait.until(EC.element_to_be_clickable((By.XPATH, button_locators)))
button_link.click()
time.sleep(4)
also in the future you should look to change all these time.sleeps to wait.until. Because sometimes your page loads quicker, and the program could do it's job faster. Or even worse, your network might get a lag and that would screw up your script.

Python Selenium: how to click on table content when changing table page

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup
import time
url = "https://www.bungol.ca/"
driver = webdriver.Firefox(executable_path ='/usr/local/bin/geckodriver')
driver.get(url)
#Select toronto by default
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/select/optgroup[1]/option[1]""").click()
time.sleep(1)
driver.find_element_by_xpath("""/html/body/section/div[2]/div/div[1]/form/div/button""").click()
driver.find_element_by_xpath("""/html/body/nav/div[1]/ul[1]/li[3]/select/option[8]""").click()
#select last 2 years
driver.find_element_by_xpath("""//*[#id="activeListings"]""").click()
#opening sold listing in that area
driver.find_element_by_xpath("""/html/body/div[5]/i""").click() #closes property type slide
driver.find_element_by_xpath("""//*[#id="navbarDropdown"]""").click()
driver.find_element_by_xpath("""//*[#id="listViewToggle"]""").click()
def data_collector():
hidden_next = driver.find_element_by_class_name("nextPaginate")
#inputs in textbox
inputElement = driver.find_element_by_id('navbarSearchAddressInput')
inputElement.send_keys('M3B2B6')
time.sleep(1)
#inputElement.send_keys(Keys.ENTER)
row_count = 3
table = driver.find_elements_by_css_selector("""#listViewTableBody""")
while hidden_next.is_displayed(): #while there is a next page button to be pressed
time.sleep(3) #delay for table refresh
#row_count = len(driver.find_elements_by_css_selector("""html body#body div#listView.table-responsive table#listViewTable.table.table-hover.mb-0 tbody#listViewTableBody tr.mb-2"""))
for row in range(row_count): #loop through the rows found
#alternate row by changing the tr index
driver.find_element_by_xpath("""/html/body/div[8]/table/tbody/tr[""" + str(row + 1) + """]/td[1]""").click()
time.sleep(2)
print(driver.find_element_by_css_selector("""#listingStatus""").text) #sold price
#closes the pop up after getting the data
driver.find_element_by_css_selector('.modal-xl > div:nth-child(1) > div:nth-child(1) > button:nth-child(1)').click()
time.sleep(1)
#clicks next page button for the table
driver.find_element_by_xpath("""//*[#id="listViewNextPaginate"]""").click()
if __name__ == "__main__":
data_collector()
The code loops through all the rows in the first table (currently set to 3 for testing), clicks on each row - pop-up shows up, grabs the information and close the pop-up. But when it clicks to the next page, it doesn't click on any of the rows of the second page. It doesn't show an error for not finding the row xpath either. But instead shows error for the pop-window close button because the popup did not open due to not pressing on the row to display pop-up window.
How do I make it click the rows when the table flips to the next page?
for table reference:
https://www.bungol.ca/map/location/toronto/?
close the property slider on the left
click tool -> open list

In my browser I also can't open the pop up, when I click on the row in the second page. So I think this can be the fault of the website.
If You want check if the element exists, You can use this code:
def check_exists_by_xpath(xpath, driver):
try:
driver.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True

Try this. My understanding is your script goes through the listings, opens a listing, grabs the listings status, close the listing and does the same for all the listings.
If my understanding is correct, the below code may help you. Its better to change implicit and time.sleep() to explicit wait and clean up the functions.
Having said that, I did not fully test the code, but the code did navigate to more than one page of listings and collected data
from selenium.webdriver import Firefox
from selenium.webdriver.support.select import Select
import time
driver = Firefox(executable_path=r'path to geckodriver.exe')
driver.get('https://www.bungol.ca/')
driver.maximize_window()
driver.implicitly_wait(10)
# Select toronto by default
driver.find_element_by_css_selector('#locationChoice button[type="submit"]').click()
sold_in_the_last = Select(driver.find_element_by_id('soldInTheLast'))
sold_in_the_last.select_by_visible_text('2 Years')
driver.find_element_by_id('activeListings').click()
# opening sold listing in that area
driver.find_element_by_css_selector('#leftSidebarClose>i').click()
driver.find_element_by_id('navbarDropdown').click()
driver.find_element_by_id('listViewToggle').click()
def get_listings():
listings_table = driver.find_element_by_id('listViewTableBody')
listings_table_rows = listings_table.find_elements_by_tag_name('tr')
return listings_table_rows
def get_sold_price(listing):
listing.find_element_by_css_selector('td:nth-child(1)').click()
time.sleep(2)
sold_price = driver.find_element_by_id('listingStatus').text
time.sleep(2)
close = driver.find_elements_by_css_selector('.modal-content>.modal-body>button[class="close"]')
close[2].click()
time.sleep(2)
return sold_price
def data_collector():
data = []
time.sleep(2)
next = driver.find_element_by_id('listViewNextPaginate')
# get all the listing prior to the last page
while next.is_displayed():
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
next.click()
# get listings from last page
listings = get_listings()
for listing in listings:
data.append(get_sold_price(listing))
return data
if __name__ == '__main__':
from pprint import pprint
data = data_collector()
pprint(data)
print(len(data))

Scraping an updating JavaScript page in Python

I've been working on a research project that is looking to obtain a list of reference articles from the Brazil Hemeroteca (The desired page reference: http://memoria.bn.br/DocReader/720887x/839, needs to be collected from two hidden elements on the following page: http://memoria.bn.br/DocReader/docreader.aspx?bib=720887x&pasta=ano%20189&pesq=Milho). I asked a question a few weeks back that was answered and I was able to get things running well in regards to that, but now I've hit a new snag and I'm not exactly sure how to get around it.
The problem is that after the first form is filled in, the page redirects to a second page, which is a JavaScript/AJAX enabled page which I need to spool through all of the matches, which is done by means of clicking a button at the top of the page. The problem I'm encountering is that when clicking the next page button I'm dealing with elements on the page that are updating, which leads to Stale Elements. I've tried to implement a few pieces of code to detect when this "stale" effect occurs to indicate the page has changed, but this has not provided much luck. Here is the code I've implemented:
import urllib
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import time
saveDir = "C:/tmp"
print("Opening Page...")
browser = webdriver.Chrome()
url = "http://bndigital.bn.gov.br/hemeroteca-digital/"
browser.get(url)
print("Searching for elements")
fLink = ""
fails = 0
frame_ref = browser.find_elements_by_tag_name("iframe")[0]
iframe = browser.switch_to.frame(frame_ref)
journal = browser.find_element_by_id("PeriodicoCmb1_Input")
search_journal = "Relatorios dos Presidentes dos Estados Brasileiros (BA)"
search_timeRange = "1890 - 1899"
search_text = "Milho"
xpath_form = "//input[#name=\'PesquisarBtn1\']"
xpath_journal = "//li[text()=\'"+search_journal+"\']"
xpath_timeRange = "//input[#name=\'PeriodoCmb1\' and not(#disabled)]"
xpath_timeSelect = "//li[text()=\'"+search_timeRange+"\']"
xpath_searchTerm = "//input[#name=\'PesquisaTxt1\']"
print("Locating Journal/Periodical")
journal.click()
dropDownJournal = WebDriverWait(browser, 60).until(EC.presence_of_element_located((By.XPATH, xpath_journal)))
dropDownJournal.click()
print("Waiting for Time Selection")
try:
timeRange = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeRange)))
timeRange.click()
time.sleep(1)
print("Locating Time Range")
dropDownTime = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_timeSelect)))
dropDownTime.click()
time.sleep(1)
except:
print("Failed...")
print("Adding Search Term")
searchTerm = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_searchTerm)))
searchTerm.clear()
searchTerm.send_keys(search_text)
time.sleep(5)
print("Perform search")
submitButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, xpath_form)))
submitButton.click()
# Wait for the second page to load, pull what we need from it.
download_list = []
browser.switch_to_window(browser.window_handles[-1])
print("Waiting for next page to load...")
matches = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//span[#id=\'OcorNroLbl\']")))
print("Next page ready, found match element... counting")
countText = matches.text
countTotal = int(countText[countText.find("/")+1:])
print("A total of " + str(countTotal) + " matches have been found, standing by for page load.")
for i in range(1, countTotal+2):
print("Waiting for page " + str(i-1) + " to load...")
while(fLink in download_list):
try:
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
except:
fails += 1
time.sleep(1)
if(fails == 10):
print("Locked on a page, attempting to push to next.")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
#raise
while(fLink == ""):
jIDElement = browser.find_element_by_xpath("//input[#name=\'HiddenBibAlias\']")
jPageElement = browser.find_element_by_xpath("//input[#name=\'hPagFis\']")
fLink = "http://memoria.bn.br/DocReader/" + jIDElement.get_attribute('value') + "/" + jPageElement.get_attribute('value') + "&pesq=" + search_text
fails = 0
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
There are two "bugs" I'm trying to solve with this block. First, the very first page is always skipped in the loop (IE: fLink = ""), even though there is a test in there for it, I'm not sure why this occurs. The other bug is that the code will hang on specific pages completely randomly and the only way out is to break the code execution.
This block has been modified a few times so I know it's not the most "elegant" of solutions, but I'm starting to run out of time.

After taking a day off from this to think about it (And get some more sleep), I was able to figure out what was going on. The above code has three "big faults". This first is that it does not handle the StaleElementException versus the NoSuchElementException, which can occur while the page is shifting. Secondly, the loop condition was checking iteratively that a page wasn't in the list, which when entering the first run allowed the blank condition to load in directly as the loop was never executed on the first run (Should have used a do-while there, but I made more modifications). Finally, I made the silly error of only checking if the first hidden element was changing, when in fact that is the journal ID, and is pretty much constant through all.
The revisions began with an adaptation of a code on this other SO article to implement a "hold" condition until either one of the hidden elements changed:
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import NoSuchElementException
def hold_until_element_changed(driver, element1_xpath, element2_xpath, old_element1_text, old_element2_text):
while True:
try:
element1 = driver.find_element_by_xpath(element1_xpath)
element2 = driver.find_element_by_xpath(element2_xpath)
if (element1.get_attribute('value') != old_element1_text) or (element2.get_attribute('value') != old_element2_text):
break
except StaleElementReferenceException:
break
except NoSuchElementException:
return False
time.sleep(1)
return True
I then modified the original looping condition, going back to the original "for loop" counter I had created without an internal loop, instead shooting a call to the above function to create the "hold" until the page had flipped, and voila, worked like a charm. (NOTE: I also upped the timeout on the next page button as this is what caused the locking condition)
for i in range(1, countTotal+1):
print("Waiting for page " + str(i) + " to load...")
bibxpath = "//input[#name=\'HiddenBibAlias\']"
pagexpath = "//input[#name=\'hPagFis\']"
jIDElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, bibxpath)))
jPageElement = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, pagexpath)))
jidtext = jIDElement.get_attribute('value')
jpagetext = jPageElement.get_attribute('value')
fLink = "http://memoria.bn.br/DocReader/" + jidtext + "/" + jpagetext + "&pesq=" + search_text
print("Link obtained: " + fLink)
download_list.append(fLink)
if(i != countTotal):
print("Moving to next page...")
nextPageButton = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, "//input[#id=\'OcorPosBtn\']")))
nextPageButton.click()
# Wait for next page to be ready
change = hold_until_element_changed(browser, bibxpath, pagexpath, jidtext, jpagetext)
if(change == False):
print("Something went wrong.")
All in all, a good exercise in thought and some helpful links for me to consider when posting future questions. Thanks!

Scroll in Selenium Webdriver (Python)

Prerequisites.
You need an account at Instagram to use this script.
Setup a test environment:
Log in, open the needed list(works correctly):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
driver = webdriver.Chrome(
# driver = webdriver.Firefox(
# driver = webdriver.PhantomJS(
service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])
driver.get("https://instagram.com/accounts/login")
username = driver.find_element_by_name("username")
password = driver.find_element_by_name("password")
username1 = 'instagram' # change it!
password1 = 'instagrampassword1' # change it!
username.send_keys(username1)
password.send_keys(password1)
submit_button = driver.find_element_by_css_selector(
'#react-root > div > article > div > div:nth-child(1) > div > form > span > button')
submit_button.click()
sleep(2)
link = 'https://www.instagram.com/youtube/'
driver.get(link)
driver.implicitly_wait(2)
driver.find_elements_by_class_name("_218yx")[2].click()
Wrong scroll.
How to fix this block?
How to focus and scroll correctly on this page?
My attempts:
driver.find_element_by_class_name("_cx1ua").send_keys(Keys.NULL) # focus
#The element has been deleted entirely or
#The element is no longer attached to the DOM.
driver.find_element_by_class_name("_q44m8").send_keys(Keys.NULL)
# cannot focus element
driver.find_element_by_class_name("_qjr85").send_keys(Keys.NULL)
# cannot focus element
for i in range(5):
driver.find_element_by_class_name("_cx1ua").send_keys(Keys.END)
=============================================================
to #Moshisho :
We need to focus on some element to activate it.
The question is what the element we need to choose to focus and how?
This is not a "body":
something like that, but not this:
background = driver.find_element_by_css_selector("body")
# background = driver.find_element_by_css_selector("div._2uju6")
for i in range(5):
background.send_keys(Keys.SPACE)
time.sleep(1)
Without it this command do not work.
to #Naveen :
print(driver.find_element_by_css_selector("div._a1rcs").location_once_scrolled_into_view) # {'x': 0, 'y': 0}
print(driver.find_element_by_class_name("_cx1ua").location_once_scrolled_into_view) # {'x': 376, 'y': 229}
print(driver.find_element_by_class_name("_q44m8").location_once_scrolled_into_view) # {'x': 376, 'y': 180}
print(driver.find_element_by_class_name("_qjr85").location_once_scrolled_into_view) # {'x': 376, 'y': 180}
And what's next?
driver.execute_script("window.scrollTo(0, 3000);") # do not working

Try the following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
from selenium.webdriver.support.ui import Select
driver = webdriver.Chrome(
# driver = webdriver.Firefox(
# driver = webdriver.PhantomJS(
service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])
driver.maximize_window()
driver.get("https://instagram.com/accounts/login")
username = driver.find_element_by_name("username")
password = driver.find_element_by_name("password")
username1 = 'instagramlogin1' # change it!
password1 = 'instagrampassword1' # change it!
username.send_keys(username1)
password.send_keys(password1)
submit_button = driver.find_element_by_css_selector(
'#react-root > div > article > div > div:nth-child(1) > div > form > span > button')
submit_button.click()
sleep(2)
link = 'https://www.instagram.com/youtube/'
driver.get(link)
driver.implicitly_wait(2)
following = driver.find_element_by_xpath("//a[#href='/youtube/following/']/span")
total_following = int(following.text)
print "total no. of users following: ", total_following
# click on 239 following, displays 10 users
following.click()
loaded_following = driver.find_elements_by_xpath("//ul[#class='_539vh _4j13h']/li")
loaded_till_now = len(loaded_following)
while(loaded_till_now<total_following):
print "following users loaded till now: ", loaded_till_now
print loaded_following[loaded_till_now-1]
loaded_following[loaded_till_now-1].location_once_scrolled_into_view
# driver.execute_script("arguments[0].focus();", loaded_following[loaded_till_now-1])
driver.find_element_by_tag_name('body').send_keys(Keys.END) # triggers AJAX request to load more users. observed that loading 10 users at a time.
sleep(1) # tried wihtout sleep but throws StaleElementReferenceException. As it takes time to get the resposne and update the DOM
loaded_following = driver.find_elements_by_xpath("//ul[#class='_539vh _4j13h']/li")
loaded_till_now = len(loaded_following)
# All 239 users are loaded.
driver.quit()
Observed that browser is sending AJAX request to load more users. this action is triggered when you scroll using mouse or enter Space or End keys

In order to scroll in the window, you need to execute JavaScript, try this:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
EDIT: in order to focus an element (it needs to be able to get the focus e.g. an anchor, input, button etc...) you also need to use JavaScript executor:
elementToFocus = driver.find_element_by_id("yourID")
driver.execute_script("arguments[0].focus();", elementToFocus)

I'm working with a dynamic React app, I need to scroll to the pages bottom to make react render all the data.
For unknown reasons, solutions based on JS execute_script didn't work. However I got send_keys solution working:
# scroll to bottom to load all
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "//body"))
)
attempt_num = 2
while attempt_num > 0:
try:
elem = driver.find_element_by_xpath("//body")
elem.click()
elem.send_keys(Keys.END)
except StaleElementReferenceException as e:
print(e)
attempt_num = attempt_num - 1
The click() on body and the retry for StaleElementReferenceException are crucial. I haven't found a more elegant way than to retry.
See top answer of How to avoid "StaleElementReferenceException" in Selenium?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't reach the bottom of a webpage - python

Related

not able to scrape data because of dynamic changes in element identities

how to save data from multiple pages using webdriver into a single csv

Python Selenium: how to click on table content when changing table page

Scraping an updating JavaScript page in Python

Scroll in Selenium Webdriver (Python)

Categories

Resources