i am trying to use Selenium Python to open tiktok user page and scroll down to load all user videos
i can open the url and get the source code including all loaded videos data, but when scroll down and time sleep for a while and get source code, the page code is the sane with same videos and nothing new is loaded!!
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/#tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
i also tried to use this code for scroll down
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
print(wd.page_source)
but also nothing is loaded in source code!
, i am using google colab, any help?
update: changed variable "driver" to "wd"
update: that's the install code for chromium driver
install
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
In the while you start using an inexistent variable called driver, i changed it for wd and it scrolled down, but the web showed that there is a problem trying to load from there.
and the code also throws an error
[9612:864:0614/164525.919:ERROR:util.cc(127)] Can't create base directory: C:\Program Files\Google\GoogleUpdater
I searched this error and it seems to be related to the version of chrome and chromedriver as stated here:https://www.reddit.com/r/selenium/comments/uqt9z9/cant_create_base_directory/
That's as far as i achived, hope it helps. :)
Here's my current code
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
#options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/#tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = wd.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = wd.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
I took out the headless argument to see the results more clearly
Related
So I've been using Selenium in Chrome to go to a social media profile and scrape the usernames of its followers. However, the list is in the 100s of thousands and the page only loads a limited amount. My solution was to tell Selenium to scroll down endlessly and scrape usernames using 'driver.find_elements' as it goes, but after a few hundred usernames Chrome soon crashes with the error code "Ran out of memory".
Am I even capable of getting that entire list?
Is Selenium even the right tool to use or should I use Scrapy? Maybe both?
I'm at a loss on how to move forward from here.
Here's my code just in case
from easygui import *
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService;
from webdriver_manager.chrome import ChromeDriverManager;
choice = ccbox("Run the test?","",("Run it","I'm not ready yet"));
if choice == False:
quit()
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()));
time.sleep(60) #this is a wait to give me time to manually log in and go
#to followers list
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
driver.execute_script("window.scrollTo(0, 1080);")
time.sleep(1)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
last_height = new_height
I figured it out! So every "follower" had an element and endlessly scrolling would store all of these elements in memory until it hit a limit. I solved this by deleting the elements with javascript after scrolling a certain amount, rinse and repeat until reaching the bottom :)
I am trying to use selenium to scroll down infinitely this webpage https://gfycat.com/discover/trending-gifs
I try this code:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options, executable_path=r"C:\chromedriver.exe")
driver.get(url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
driver.quit()
But no scroll down happened.
I also tried:
from selenium.webdriver.common.keys import Keys
for i in range(10):
driver.find_element_by_css_selector('html').send_keys(Keys.END)
But no scroll down happened too.
For infinite of Scrolling website you can using this methods of coding in Selenium as you can see I am using while for making infinite in addition you should be import time module for time out of loading website
def scroll(driver):
timeout = 5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# load the website
time.sleep(5)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
I have an Airtable table I review on occasion and tried to create a Python script using selenium to scroll down a full page until it gets to the end. Here's the code but I can't get it to scroll down. I don't get any errors but it seems like it doesn't connect with the page to scroll. Any help is appreciated. Thanks
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
url = 'https://airtable.com/embed/shrqYt5kSqMzHV9R5/tbl8c8kanuNB6bPYr?backgroundColor=green&viewControls=on'
driver = webdriver.Chrome()
driver.get(url)
driver.fullscreen_window()
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//html')))
scroll_pause_time = 5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
sleep(scroll_pause_time)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
The reason being you are not able to scroll your table is your page is anti scroll-able. You can check this by simply trying to scroll manually. Rather to load your page you have to drag your vertical scroll by clicking on it. To do so we can use drag_and_drop_by_offset method of ActionChains class as below:
# After your page is loaded
page_hight = driver.get_window_size()['height'] #Get page height
scroll_bar = driver.find_element_by_xpath("//div[contains(#class,'antiscroll-scrollbar-vertical')]")
ActionChains(driver).drag_and_drop_by_offset(scroll_bar, 0, page_hight-160).click().perform() #Substracted 160 fro page height to compensate differnec between window and screen height
Output
https://www.narendramodi.in/category/text-speeches -> I wanted to scrape this page. As this a dynamic one, I need to scroll down until the bottom of the page and then get the HTML content to scrape it. But when this website is opened through selenium chrome web driver, neither manually nor automatically is the website loading dynamically as I scroll down. When the website is opened from normal chrome, it works just fine. I even tried with firefox driver and the result is same. Here's the code that I have tried out.
driver = webdriver.Chrome(executable_path=r'C:/tools/drivers/chromedriver.exe')
driver.get('https://www.narendramodi.in/news')
# https://stackoverflow.com/a/27760083
SCROLL_PAUSE_TIME = 2.0
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
print(last_height)
while True:
# Scroll down to bottom
time.sleep(SCROLL_PAUSE_TIME)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
print(new_height)
if new_height == last_height:
break
last_height = new_height
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()
soup = BeautifulSoup(res, 'lxml')
How can I scrape this entire page?
Some website detects the use of Selenium and stop loading its content.
You can try tuning Selenium settings or using a package like selenium-stealth (pypi link: https://pypi.org/project/selenium-stealth/)
I am using selenium to scrape an infinite scrolling page.
I am trying to use this code:
import time
import pandas as np
import numpy as np
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
url = 'https://twitter.com/search?f=tweets&q=csubwaystats%20since%3A2018-05-28%20until%3A2018-08-28'
browser.get(url)
time.sleep(1)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = webdriver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
webdriver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = webdriver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
I obtained this code from multiple sources, the most recent being:
How can I scroll a web page using selenium webdriver in python?
I updated it to include "webdriver" instead of "driver" because I import selenium as webdriver. It doesn't work otherwise.
My issue is that when I run the code I get:
AttributeError: module 'selenium.webdriver' has no attribute 'execute_script'
I don't really understand what this means and how to fix it? I haven't been able to find information on this.
I am new to python and so am probably missing something obvious but any advice would be appreciated.
webdriver is the name of the module, not your instance of it. In fact, you assigned the instance you created to the name browser with this line: browser = webdriver.Chrome()
so instead of calling webdriver.execute_script() (which will give you an AttributeError), you must call it using your instance, like this: browser.execute_script().
To make it work you have to create an instance of webdriver, e.g.:
from selenium import webdriver
driver = webdriver.Chrome() # webdriver.Ie(), webdriver.Firefox()...
last_height = driver.execute_script("return document.body.scrollHeight")
You can download Chromedriver from here
You also need to add path to Chromedriver to your environment variable PATH or just put downloaded file into the same folder as your Python executable...
AttributeError: module 'selenium.webdriver' has no attribute 'execute_script'
You are getting this error because 'execute_script' is not a class attribute, you just can not use it directly. Since it is an instance attribute you should create an instance of the class. Please check here to learn more about classes.
This will work fine now since 'execute_script' is running as an instance attribute.
last_height = browser.execute_script("return document.body.scrollHeight")
Your final code would have looked like this:
import time
import pandas as np
import numpy as np
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
url = 'https://twitter.com/search?f=tweets&q=csubwaystats%20since%3A2018-05-28%20until%3A2018-08-28'
browser.get(url)
time.sleep(1)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = browser.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
webdriver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = webdriver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
For others check your function name. For me I wrote Java function name not the Python one
driver.execute_script("script") # Python
driver.ExecuteScript("script"); # Java
Posting this here because its the top google result for the error