I have tried to do this:
driver_1.execute_script("window.scrollTo(0, document.body.scrollHeight);")
but it does nothing, so I made a loop to scroll the page by steps:
initial_value = 0
end = 300000
for i in xrange(1000,end,1000):
driver_1.execute_script("window.scrollTo(" + str(initial_value) + ', ' + str(i) + ")")
time.sleep(0.5)
initial_value = i
print 'scrolling >>>>'
It kinda works, but I don't know how long is a a given page, so I have to put a big number as the max height, that gives me two problems. First is that even a big number couldn't be large enought to scroll some pages and second one is that if the page is shorter than that limit a loss quite a lot time waiting for the script to finish when is doing nothing
You need something to rely on, some indicator for you to stop scrolling. Here is an example use case when we would stop scrolling if more than N particular elements are already loaded:
Slow scrolling down the page using Selenium
Similar use case:
Headless endless scroll selenium
FYI, you may have noticed an other way to scroll to bottom - scrolling into view of a footer:
footer = driver.find_element_by_tag_name("footer")
driver.execute_script("arguments[0].scrollIntoView();", footer)
To scroll the page to the end, you could simply send the END key:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com/search?tab=newest&q=selenium")
driver.find_element_by_tag_name("body").send_keys(Keys.END)
You could also scroll the full height :
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com/search?tab=newest&q=selenium")
driver.execute_script("window.scrollBy(0, document.documentElement.scrollHeight)")
Hey I found another solution that worked perfectly for me. Check this answer here.
Also this implementation:
driver.find_element_by_tag_name("body").send_keys(Keys.END) does not work for pages that that use infinite scrolling design.
Related
I have tried driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") after the page has loaded to no avail. It simply does nothing.
Why isn't it working? Is there another method I can use.
scrollTo is usually the preferred way but not possible on every site.
Alternatively you can use this:
from selenium.webdriver.common.keys import Keys
elem = driver.find_element(By.TAG_NAME, "html")
elem.send_keys(Keys.END)
However, I would much prefer requests instead of selenium.
There are several ways to scroll the page with Selenium.
Additionally to
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
You can try
from selenium.webdriver.common.keys import Keys
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
But the most powerful way that normally works even when the previously methods will not is:
Locate some element out of the visible screen, maybe on the bottom of the page and apply on it the following
bottom_element = driver.find_element(By.XPATH, bottom_element_locator)
bottom_element.location_once_scrolled_into_view
This originally intend to return you coordinates (x, y) of element on page, but also scroll down right to target element
Another way to scroll to bottom of page is to emulate the CTRL + END key combination
from selenium.webdriver.common.keys import Keys
driver.find_element_by_css_selector('body').send_keys(Keys.CONTROL + Keys.END)
I am trying to scrape product's delivery date data from a bunch of lists of product urls.
I am running the python file on terminal doing multi-processing so ultimately this opens up multiple chrome browsers (like 10 - 15 of them) which slows down my computer quite a bit.
My code basically clicks a block that contains shipping options, which would would show a pop up box that shows estimated delivery time. I have included an example of a product url in the code below.
I noticed that some of my chrome browsers freeze and do not locate the element and click it like I have in my code. I've incorporated refreshing the page in webdriver into my code just in case that will do the trick but it doesn't seem like the frozen browsers are even refreshing.
I don't know why it would do that as I have set the webdriver to wait until the element is clickable. Do I just increase the time in time.sleep() or the seconds in webdriverwait() to resolve this?
chromedriver = "path to chromedriver"
driver = webdriver.Chrome(chromedriver, options=options)
# Example url
url = "https://shopee.sg/Perfect-Diary-X-Sanrio-MagicStay-Loose-Powder-Weightless-Soft-velvet-Blurring-Face-Powder-With-Cosmetic-Puff-Oil-Control-Smooth-Face-Powder-Waterproof-Applicable-To-Mask-Face-Cinnamoroll-Purin-Gudetama-i.240344695.5946255000?ads_keyword=makeup&adsid=1095331&campaignid=563217&position=1"
driver.get(url)
time.sleep(2)
try:
WebDriverWait(driver,60).until(EC.element_to_be_clickable((By.XPATH,'//div[#class="flex flex-column"]//div[#class="shopee-drawer "]'))).click()
while retries <= 5:
try:
shipping_block = WebDriverWait(driver,60).until(EC.element_to_be_clickable((By.XPATH,'//div[#class="flex flex-column"]//div[#class="shopee-drawer "]'))).click()
break
except TimeoutException:
driver.refresh()
retries += 1
except (NoSuchElementException, StaleElementReferenceException):
delivery_date = None
The element which you desire will get displayed when you hover the mouse, V The type of the element is svg which you need to handle accordingly.
you can take the help of this XPath to hover the mouse
((//*[name()='svg']//*[name()='g'])/*[name()='path'][starts-with(#d, 'm11 2.5c0 .1 0 .2-.1.3l')])
Getting the text from the popUp you need to check with this XPath
((//div[#class='shopee-drawer__contents']//descendant::div[4]))
You can use get_attribute("innerText") to get all the values
You can just check it here the same answer, I hope it will help
First, don't use headless option in webdriver.ChromeOption() to let the webpage window pop up to observe if the element is clicked or not.
Second, your code is just click the element, it doesn't help you to GET anything. After click and open new window, it should do something like this:
items = WebDriverWait(driver, 60).until(
EC.visibility_of_all_elements_located((By.CLASS_NAME, 'load-done'))
)
for item in items:
deliver_date= item.get_attribute('text')
print(deliver_date)
i am in a project to scroll and get the every posts that everyone posted. the problem is my code not reading every posts (just 2 or 3 and skipping to next). below is my code and i like to have my code in a way that it reads every posts. i also tried changing sleep() duration and pixel count while scrolling and scroll to view options , but no improvement
# scrolling and grabbing data
for i in range(1000):
element = driver.find_element_by_xpath('//div[contains(#class, "mnk10 copy-txt")]')
# driver.execute_script("return document.body.scrollHeight / 2",element)
driver.execute_script("arguments[0].scrollIntoView(true)",element)
# driver.execute_script("arguments[0].scrollBy(0, -300)",element)
# driver.execute_script("return, document.body.scrollHeight/4",element)
data1 = driver.find_element_by_xpath('//div[contains(#class, "mnk10 copy-txt")]').get_attribute('dat-plin-txt')
print(data1)
time.sleep(2)
You could try to scroll by a certain amount of one post instead of "scrollIntoView(true)" using the following script part:
driver.execute_script("arguments[0].scrollBy(0, 500)", element)
you might or might not have to change the "500" part
In Python 3 and selenium I have this script to automate the search of terms in a site with public information
from selenium import webdriver
# Driver Path
CHROME = '/usr/bin/google-chrome'
CHROMEDRIVER = '/home/abraji/Documentos/Code/chromedriver_linux64/chromedriver'
# Chosen browser options
chrome_options = webdriver.chrome.options.Options()
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.binary_location = CHROME
# Website accessed
link = 'https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7'
# Search term
nome = "MARCONI FERREIRA PERILLO JUNIOR"
# Waiting time
wait = 60
# Open browser
browser = webdriver.Chrome(CHROMEDRIVER, options = chrome_options)
# Implicit wait
browser.implicitly_wait(wait)
# Access the link
browser.get(link)
# Search by term
browser.find_element_by_xpath("//*[#id='NomeParte']").send_keys(nome)
browser.find_element_by_xpath("//*[#id='btnBuscarProcPublico']").click()
# Searches for the text of the last icon - the last page button
element = browser.find_element_by_xpath("//*[#id='divTabela']/div[2]/div[2]/div[4]/div[2]/ul/li[9]/a").text
element
'»'
This site when searching for terms paginates results and always shows as the last pagination button the "»" button.
The next to last button in the case will be "›"
So I need to capture the button text always twice before the last one. Here is this case the number "8", to automate page change - I will know how many clicks on next page will be needed
Please, when I search Xpath how do I capture the element two positions before?
I know this is not an answer to the original question.
But clicking the next button several times is not a good practice.
I checked the network traffic and see that they are calling their API url with offset parameter. You should be able to use this URL with proper offset as you need.
If you really need to access the last but two, you need to get the all navigation buttons first and then access by indexing as follows.
elems = self.browser.find_elements_by_xpath(xpath)
elems[-2]
EDIT
I just tested their API and it works with proper cookie value given.
This way is much faster than automation using Selenium.
Use Selenium just to extract cookie value to be used in the header of the web request.
I've never studied HTML seriously, so probably what I'm going to say is not right.
While I was writing my selenium code, I noticed that some buttons on some webpages does not redirect to other pages, but they change the structure of the the first one. From what I've understood, this happens because there's some JavaScript code that modifies it.
So, when I wanna get some data which is not present on the first page loading, I just have to click the right sequence of button to obtain it, rigth?
The page I wanted to load is https://watch.nba.com/, and what I want to get is the match list of a given day. The fastest path to get it is to open the calendary:
calendary = driver.find_element_by_class_name("calendar-btn")
calendary.click()
and click the selected day:
page = calendary.find_element_by_xpath("//*[contains(text(), '" + time.strftime("%d") + "')]")
page.click()
running this code, I get this error:
selenium.common.exceptions.ElementNotVisibleException
I read somewhere that the problem is that the page is not correctly loaded, or the element is not visible/clickable, so I tried with this:
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(text(), '" + time.stfrtime("%d") + "')]")))
page.click()
But this time I get this error:
selenium.common.exceptions.TimeoutException
Can you help me to solve at least one of these two problems?
The reason you are getting such behavior is because this page is loaded with iFrames (I can see 15 on the main page) once you click the calendar button, you will need to switch your context to either be on the defaultContext or a specific iframe where the calendar resides. There is tons of code outthere that shows you how to switch into and out of iframe. Hope this helps.