use selenium to infinite scroll down don't work

use selenium to infinite scroll down don't work - python

I am trying to use selenium to scroll down infinitely this webpage https://gfycat.com/discover/trending-gifs
I try this code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options, executable_path=r"C:\chromedriver.exe")
driver.get(url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
driver.quit()
But no scroll down happened.
I also tried:
from selenium.webdriver.common.keys import Keys
for i in range(10):
driver.find_element_by_css_selector('html').send_keys(Keys.END)
But no scroll down happened too.

For infinite of Scrolling website you can using this methods of coding in Selenium as you can see I am using while for making infinite in addition you should be import time module for time out of loading website
def scroll(driver):
timeout = 5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# load the website
time.sleep(5)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")

Related

Web scraping social media followers, but the list in the 100s of thousands. Selenium runs out of memory

So I've been using Selenium in Chrome to go to a social media profile and scrape the usernames of its followers. However, the list is in the 100s of thousands and the page only loads a limited amount. My solution was to tell Selenium to scroll down endlessly and scrape usernames using 'driver.find_elements' as it goes, but after a few hundred usernames Chrome soon crashes with the error code "Ran out of memory".
Am I even capable of getting that entire list?
Is Selenium even the right tool to use or should I use Scrapy? Maybe both?
I'm at a loss on how to move forward from here.
Here's my code just in case
from easygui import *
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService;
from webdriver_manager.chrome import ChromeDriverManager;
choice = ccbox("Run the test?","",("Run it","I'm not ready yet"));
if choice == False:
quit()
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()));
time.sleep(60) #this is a wait to give me time to manually log in and go
#to followers list
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
driver.execute_script("window.scrollTo(0, 1080);")
time.sleep(1)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
last_height = new_height

I figured it out! So every "follower" had an element and endlessly scrolling would store all of these elements in memory until it hit a limit. I solved this by deleting the elements with javascript after scrolling a certain amount, rinse and repeat until reaching the bottom :)

Selenium Python Unable to scroll down in tiktok while fetching videos

i am trying to use Selenium Python to open tiktok user page and scroll down to load all user videos
i can open the url and get the source code including all loaded videos data, but when scroll down and time sleep for a while and get source code, the page code is the sane with same videos and nothing new is loaded!!
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/#tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
i also tried to use this code for scroll down
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
print(wd.page_source)
but also nothing is loaded in source code!
, i am using google colab, any help?
update: changed variable "driver" to "wd"
update: that's the install code for chromium driver
install
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium

In the while you start using an inexistent variable called driver, i changed it for wd and it scrolled down, but the web showed that there is a problem trying to load from there.
and the code also throws an error
[9612:864:0614/164525.919:ERROR:util.cc(127)] Can't create base directory: C:\Program Files\Google\GoogleUpdater
I searched this error and it seems to be related to the version of chrome and chromedriver as stated here:https://www.reddit.com/r/selenium/comments/uqt9z9/cant_create_base_directory/
That's as far as i achived, hope it helps. :)
Here's my current code
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
#options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/#tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = wd.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = wd.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
I took out the headless argument to see the results more clearly

How to scroll at the end of a page with finite number of load ? Selenium - Python

I would like to scroll until the end of a page like : https://fr.finance.yahoo.com/quote/GM/history?period1=1290038400&period2=1612742400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true
The fact is using this :
# # Get scroll height after first time page load
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(2)()
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
does not work. yes it should work for pages with infinite loads but doesn't work for yahoo finance, which has a finite number of loads but the condition should break when it reachs the end. So I'm quite confuse at the moment.
We could also use :
while driver.find_element_by_tag_name('tfoot'):
# Scroll down three times to load the table
for i in range(0, 3):
driver.execute_script("window.scrollBy(0, 5000)")
time.sleep(2)
but it sometimes blocks at a certain loads.
What would be the best way to do this ?

Requires pip install undetected-chromedriver, but will get the job done.
It's just my webdriver of choice, you can also do the exact same with normal selenium.
from time import sleep as s
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import undetected_chromedriver as uc
options = uc.ChromeOptions()
options.headless = False
driver = uc.Chrome(options=options)
driver.get('https://fr.finance.yahoo.com/quote/GM/history?period1=1290038400&period2=1612742400&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#consent-page > div > div > div > div.wizard-body > div.actions.couple > form > button'))).click() #clicks the cookie warning or whatever
last_scroll_pos=0
while True:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'body'))).send_keys(Keys.DOWN)
s(.01)
current_scroll_pos=str(driver.execute_script('return window.pageYOffset;'))
if current_scroll_pos == last_scroll_pos:
print('scrolling is finished')
break
last_scroll_pos=current_scroll_pos

Chromedriver.exe window showing up even in headless mode

The headless argument only stops the opening of the Chrome browser window still, the chromedriver.exe window opens. Is there any way to prevent both windows from opening?
Webdriver code
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
options.add_argument('disable-infobars')
driver = webdriver.Chrome(options=options)
driver.get(link)
NovelBox.scroll(driver)
soup = BeautifulSoup(driver.page_source, "lxml")
driver.quit()
Scroll function
def scroll(driver):
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(1)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
# If heights are the same it will exit the function
break
last_height = new_height

from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options, executable_path=PATH\TO\SELENIUM\DRIVER\EXE)

The problem was with Python IDLE/Shell. When I run the script through Python IDLE or shell the chromedriver.exe window will be opened. But not in Visual Code or running through the terminal.

Python script to scroll down non scroll-able page

I have an Airtable table I review on occasion and tried to create a Python script using selenium to scroll down a full page until it gets to the end. Here's the code but I can't get it to scroll down. I don't get any errors but it seems like it doesn't connect with the page to scroll. Any help is appreciated. Thanks
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
url = 'https://airtable.com/embed/shrqYt5kSqMzHV9R5/tbl8c8kanuNB6bPYr?backgroundColor=green&viewControls=on'
driver = webdriver.Chrome()
driver.get(url)
driver.fullscreen_window()
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//html')))
scroll_pause_time = 5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
sleep(scroll_pause_time)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height

The reason being you are not able to scroll your table is your page is anti scroll-able. You can check this by simply trying to scroll manually. Rather to load your page you have to drag your vertical scroll by clicking on it. To do so we can use drag_and_drop_by_offset method of ActionChains class as below:
# After your page is loaded
page_hight = driver.get_window_size()['height'] #Get page height
scroll_bar = driver.find_element_by_xpath("//div[contains(#class,'antiscroll-scrollbar-vertical')]")
ActionChains(driver).drag_and_drop_by_offset(scroll_bar, 0, page_hight-160).click().perform() #Substracted 160 fro page height to compensate differnec between window and screen height
Output

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

use selenium to infinite scroll down don't work - python

Related

Web scraping social media followers, but the list in the 100s of thousands. Selenium runs out of memory

Selenium Python Unable to scroll down in tiktok while fetching videos

How to scroll at the end of a page with finite number of load ? Selenium - Python

Chromedriver.exe window showing up even in headless mode

Python script to scroll down non scroll-able page

Categories

Resources