I am trying to web scrape a dynamically loaded page with Selenium. I can copy and paste the below url into a normal Chrome browser and it works perfectly fine but when I use selenium, it return the wrong page of horse races for a different day. It seems to work the first time you run the code but retains some sort of memory and you cannot run it again with a different date as it just returns the original date?
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = "https://www.tab.com.au/racing/meetings/2021-06-11"
driver = webdriver.Chrome('xxxxxxxx')
driver.get(url)
Has anyone every come across something like this with Selenium?
Related
I want to get the latest result from the aviator game each time it crashes, i'm trying to do it with python and selenium but i can't get it to work, the website takes some time to load which complicates the process since the classes are not loaded from the beginning
this is the website i'm trying to scrape: https://estrelabet.com/ptb/bet/main
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = 'https://estrelabet.com/ptb/bet/main'
options = Options()
options.headless = True
navegador = webdriver.Chrome(options=options)
navegador.get('https://estrelabet.com/ptb/bet/main')
navegador.find_element()
navegador.quit()
this is what i've done so far
i want to get all the elements in the results block
payout block
and get these results individually
result
I tried to extract the data using selenium but it was impossible since the iDs and elements were dynamic, I was able to extract data using an OCR library called Tesseract, I share the code I used for this purpose, I hope it helps you
AviatorScraping github
I unfortunately can not stop a page from loading using Selenium in Python.
I have tried:
driver.execute_script("window.stop();")
driver.set_page_load_timeout(10)
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
The page is a .cgi that constantly loads. I would like to either scrape data from a class on the page or the page title, however neither works with the 3 methods above.
When I try to manually press ESC, or click the cross, it works perfectly.
Thank you for reading.
You didn't share your code and a page you are working on, so we can only guess.
So, in case you really tried all the above correctly and it still not helped try adding Eager page loading strategy to your driver options.
Eager page loading strategy will make WebDriver wait until the initial HTML document has been completely loaded and parsed, and discards loading of stylesheets, images and subframes (DOMContentLoaded event fire is returned).
With it your code will look something like this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.page_load_strategy = 'eager'
driver = webdriver.Chrome(options=options)
# Navigate to url
driver.get(your_page_url)
UPD
You are trying to upload a file with Selenium and doing it wrong.
To upload the file with Selenium you need to send a full file path to that element.
So, if the file you want to upload is located by C:/Model.lp your code should be:
driver.find_element_by_xpath("//input[#name='field.1']").send_keys("C:/Model.lp")
Let's use the url https://www.google.cl/#q=stackoverflow as an example. Using Chrome Developer Tools on the first link given by the search we see this html code:
Now, if I run this code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = urlopen("https://www.google.cl/#q=stackoverflow")
soup = BeautifulSoup(url)
print(soup.prettify())
I wont find the same elements. In fact, I wont find any link from the results given by the google search. Same goes if I use the requests module. Why does this happen? Can I do something to get the same results as if I was requesting from a web browser?
Since the html is generated dynamically, likely from a modern single page javascript framework like Angular or React (or even just plain JavaScript), you will need to actually drive a browser to the site using selenium or phantomjs before parsing the dom.
Here is some skeleton code.
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("http://google.com")
html = driver.execute_script("return document.documentElement.innerHTML")
soup = BeautifulSoup(html)
Here is the selenium documentation for more info on running selenium, configurations, etc.:
http://selenium-python.readthedocs.io/
edit:
you will likely need to add a wait before grabbing the html, since it may take a second or so to load certain elements of the page. See below for reference to the explicity wait documentation of python selenium:
http://selenium-python.readthedocs.io/waits.html
Another source of complication is that certain parts of the page might be hidden until AFTER user interaction. In this case you will need to code your selenium script to interact with the page in certain ways before grabbing the html.
I am trying to crawl a website (with python) and get its users info. But when I download the source of the pages, it is different from what I see in inspect element in chrome. I googled and it seems I should use selenium, but I don't know how to use it. This is the code I have and when I see the driver.page_source it is still the source page as in chrome and doesn't look like the source in inspect element.
I really appreciate if someone can help me to fix this.
import os
from selenium import webdriver
chromedriver = "/Users/adam/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get("http://www.tudiabetes.org/forum/users/Bug74/activity")
driver.quit()
It's called XHR.
Your page was loaded from another call, (your url only loads the strcuture of the page, and the meat of the page comes from a different source using XHR, json formatted string) not the pageload it self.
You should really consider using requests and bs4 to query this page instead.
I am pretty new with using python with selenium web testing.
I am creating a handful of test cases for my website and I would like to see how long it takes for specific pages to load. I was wondering if there is a way to print the page load time after or during the test.
Here is a basic example of what one of my test cases looks like:
import time
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("some URL")
driver.implicitly_wait(10)
element = driver.find_element_by_name("username")
element.send_keys("User")
element = driver.find_element_by_name("password")
element.send_keys("Pass")
element.submit()
time.sleep(2)
driver.close()
In this example I would like to see how long it took for the page to load after submitting my log in information.
I have found a way around this by running my tests as python unit tests. I now record my steps using the selenium IDE and export them into a python file. I then modify the file as needed. After the test runs it shows the time by default.