How can i webscrap aviator game results? - python

I want to get the latest result from the aviator game each time it crashes, i'm trying to do it with python and selenium but i can't get it to work, the website takes some time to load which complicates the process since the classes are not loaded from the beginning
this is the website i'm trying to scrape: https://estrelabet.com/ptb/bet/main
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = 'https://estrelabet.com/ptb/bet/main'
options = Options()
options.headless = True
navegador = webdriver.Chrome(options=options)
navegador.get('https://estrelabet.com/ptb/bet/main')
navegador.find_element()
navegador.quit()
this is what i've done so far
i want to get all the elements in the results block
payout block
and get these results individually
result

I tried to extract the data using selenium but it was impossible since the iDs and elements were dynamic, I was able to extract data using an OCR library called Tesseract, I share the code I used for this purpose, I hope it helps you
AviatorScraping github

Related

How can I scrape data from this table with Python?

Unfortunately, I am an absolute beginner in the field of web scraping, but I would like to deal with it intensively in the near future. I want to save the data of a table with a Python script in an Excel file, which is also not a problem. However, the source code of the website does not contain any of the values that I would like to have. When examining, the values are entered in the HTML structure, but when I use the XPath, it is output that this is not permitted, that this is not permitted. If I use the Chrome add-on "DataMiner", it can read out the values. How can I achieve this myself in Python? In the picture, the data I want to scrape is shown. Unfortunately, this data is not included in the source code.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import requests
url = 'https://herakles.webuntis.com/WebUntis/monitor?school=Europaschule%20Gym%20Rhauderfehn&monitorType=subst&format=Test%20Sch%C3%BCler'
from selenium import webdriver
browser = webdriver.Chrome()
browser.get(url)
time.sleep(5)
htmlSource = browser.page_source
print(htmlSource)
Update: The script now prints out the source code, but when searching for an element by the XPath, it still doesn't show anything. As I already said, I'm completely new to Python and web-scraping.
image
here's a version with requests only. you can obtain the payload data from your devtools network tab
import requests
get_url="https://herakles.webuntis.com/WebUntis/monitor?school=Europaschule%20Gym%20Rhauderfehn&monitorType=subst&format=Test%20Sch%C3%BCler"
post_url="https://herakles.webuntis.com/WebUntis/monitor/substitution/data?school=Europaschule Gym Rhauderfehn"
payload={"formatName":"Test Schüler","schoolName":"Europaschule Gym Rhauderfehn","date":20211204,"dateOffset":0,"strikethrough":True,"mergeBlocks":True,"showOnlyFutureSub":True,"showBreakSupervisions":False,"showTeacher":True,"showClass":True,"showHour":True,"showInfo":True,"showRoom":True,"showSubject":True,"groupBy":1,"hideAbsent":True,"departmentIds":[],"departmentElementType":-1,"hideCancelWithSubstitution":True,"hideCancelCausedByEvent":False,"showTime":False,"showSubstText":True,"showAbsentElements":[],"showAffectedElements":[1],"showUnitTime":True,"showMessages":True,"showStudentgroup":False,"enableSubstitutionFrom":True,"showSubstitutionFrom":1600,"showTeacherOnEvent":False,"showAbsentTeacher":True,"strikethroughAbsentTeacher":True,"activityTypeIds":[2,3],"showEvent":True,"showCancel":True,"showOnlyCancel":False,"showSubstTypeColor":False,"showExamSupervision":False,"showUnheraldedExams":False}
with requests.session() as s:
r=s.get(get_url)
s.headers['Content-Type']="application/json;charset=UTF-8"
r=s.post(post_url,json=payload)
print(r.json())

Selenium get(url) showing wrong page

I am trying to web scrape a dynamically loaded page with Selenium. I can copy and paste the below url into a normal Chrome browser and it works perfectly fine but when I use selenium, it return the wrong page of horse races for a different day. It seems to work the first time you run the code but retains some sort of memory and you cannot run it again with a different date as it just returns the original date?
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = "https://www.tab.com.au/racing/meetings/2021-06-11"
driver = webdriver.Chrome('xxxxxxxx')
driver.get(url)
Has anyone every come across something like this with Selenium?

Selenium Webdriver for webscraping on multiple websites concurrently?

I am working on webscraping flight prices using Selenium Webdriver. I want my code to be able to search flight prices for multiple trips. As of now my code works for 1 destination only.
Most answers that I find online involves using a for loop for the specific URLs of multiple destinations, which is not applicable to my case as the URLs depend on the different destinations that I choose.
Anyone knows how I can search for these prices concurrently without waiting for individual searches to be completed? Or perhaps an even faster way to do this?
Thanks!
I believe you could use the MultiPoolProcess to fetch the flights concurrently. Here is an example that I have worked with selenium:
Script to excute your selenium function:
# MultiProcess
from subprocess import Popen
from concurrent.futures import ProcessPoolExecutor, wait, FIRST_EXCEPTION, as_completed
urls = [url1, url2, url3]
N = 4 # Number of processors that you want to use
# Execute each bot
with ProcessPoolExecutor(N) as executor:
for url in urls:
command = ["python", "mySeleniumScript.py", url]
future = executor.submit(Popen,command)
self.futures.append(future)
In this case your python script containing the selenium scraper should parse the url from the input. Like this:
mySeleniumScript.py
from selenium import webdriver
import sys
url = sys.argv[1]
driver = webdriver.Firefox()
driver.get(url)
*** Your scraper logic here ***
Hopefully this point you in the right direction, let me know how it went!

Why does python and my web browser show different codes for the same link?

Let's use the url https://www.google.cl/#q=stackoverflow as an example. Using Chrome Developer Tools on the first link given by the search we see this html code:
Now, if I run this code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = urlopen("https://www.google.cl/#q=stackoverflow")
soup = BeautifulSoup(url)
print(soup.prettify())
I wont find the same elements. In fact, I wont find any link from the results given by the google search. Same goes if I use the requests module. Why does this happen? Can I do something to get the same results as if I was requesting from a web browser?
Since the html is generated dynamically, likely from a modern single page javascript framework like Angular or React (or even just plain JavaScript), you will need to actually drive a browser to the site using selenium or phantomjs before parsing the dom.
Here is some skeleton code.
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("http://google.com")
html = driver.execute_script("return document.documentElement.innerHTML")
soup = BeautifulSoup(html)
Here is the selenium documentation for more info on running selenium, configurations, etc.:
http://selenium-python.readthedocs.io/
edit:
you will likely need to add a wait before grabbing the html, since it may take a second or so to load certain elements of the page. See below for reference to the explicity wait documentation of python selenium:
http://selenium-python.readthedocs.io/waits.html
Another source of complication is that certain parts of the page might be hidden until AFTER user interaction. In this case you will need to code your selenium script to interact with the page in certain ways before grabbing the html.

Python Selenium: Looking for ways to print page load time

I am pretty new with using python with selenium web testing.
I am creating a handful of test cases for my website and I would like to see how long it takes for specific pages to load. I was wondering if there is a way to print the page load time after or during the test.
Here is a basic example of what one of my test cases looks like:
import time
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("some URL")
driver.implicitly_wait(10)
element = driver.find_element_by_name("username")
element.send_keys("User")
element = driver.find_element_by_name("password")
element.send_keys("Pass")
element.submit()
time.sleep(2)
driver.close()
In this example I would like to see how long it took for the page to load after submitting my log in information.
I have found a way around this by running my tests as python unit tests. I now record my steps using the selenium IDE and export them into a python file. I then modify the file as needed. After the test runs it shows the time by default.

Categories

Resources