Python webScraping - Betclic URL's - python

EDIT:
So I found a way to do it by clicking on the Countries elements, see my answer.
Still have one question that would make this better:
When I execute the scrollIntoView(true) on a country <li> it goes under another element (<div class="sportList_subtitle">Desportos</div>) and is not clickable.
Is there some javascript or selenium function like "scrollIntoClickable"?
ORIGINAL:
I'm trying to scrape info from Betclic website with python and BeautifulSoup + Selenium.
Given the URL for each game has the structure: "domain"/"sports_url"/"competition_url"/"match_url"
Example: https://www.betclic.pt/futebol-s1/liga-dos-campeoes-c8/rennes-chelsea-m2695669
You can try it in your language, they translate the actual URL string but the structure and ID's are the same.
The only thing that's left is grabbing all the different "competition_url"
So my question now is from the "sports_url" (https://www.betclic.pt/futebol-s1) how can I get all sub "competition_url"?
The problem is with the "hidden" URL's under each country's name on the left panel. Those only appear after you click the arrow next to each country's name, like a drop-down list. The click event actually adds one class "is-active" to the <li> for that country and also
an <ul> at the end of that <li>. It's this added <ul> that has the URL's list I'm trying to get.
Code before click:
<!---->
<li class="sportList_item has-children ng-star-inserted" routerlinkactive="active-link" id="rziat-DE">
<div class="sportList_itemWrapper prebootFreeze">
<div class="sportlist_icon flagsIconBg is-DE"></div>
<div class="sportlist_name">Alemanha</div>
</div>
<!---->
</li>
Code after click (reduced for presentation):
<li class="sportList_item has-children ng-star-inserted is-active" routerlinkactive="active-link" id="rziat-DE">
<div class="sportList_itemWrapper prebootFreeze">
<div class="sportlist_icon flagsIconBg is-DE"></div>
<div class="sportlist_name">Alemanha</div>
</div>
<!---->
<ul class="sportList_listLv2 ng-star-inserted">
<!---->
<li class="sportList_item ng-star-inserted" routerlinkactive="active-link">
<a class="sportList_itemWrapper prebootFreeze" id="competition-link-5" href="/futebol-s1/alemanha-bundesliga-c5">
<div class="sportlist_icon"></div>
<div class="sportlist_name">Alemanha - Bundesliga</div>
</a>
</li>(...)
</li>(...)
</li>(...)
</li>
</ul>
</li>
In this example is that "/futebol-s1/alemanha-bundesliga-c5" that I'm looking for.
Is there a way to get all those URL's? Or the "hiden" <ul> for that matter?
Maybe a way to simulate the click and parse the HTML code again?
Thanks in advance!

So I found a way to do it by clicking on the Countries elements.
Still have one question that would make this better:
When I execute the scrollIntoView(true) on a country <li> it goes under another element (<div class="sportList_subtitle">Desportos</div>) and is not clickable.
Is there some javascript or selenium function like "scrollIntoClickable"?
How I'm doing it now:
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.betclic.pt/"
driver.get(url)
link_set = set()
all_sports = driver.find_element_by_css_selector(
("body > app-desktop > div.layout > div > app-left-menu > div >"
" app-sports-nav-bar > div > div:nth-child(2) > ul")
).find_elements_by_tag_name("li")
try:
cookies = driver.find_element_by_css_selector("body > app-desktop > bc-gb-cookie-banner > div > div > button")
cookies.click()
except:
print("Cookie error or not found...")
for sport in all_sports:
sport.click()
has_container = driver.find_element_by_tag_name("app-block-ext").size.get('height')>0
if not has_container:
for competition in driver.find_elements_by_css_selector("a[id*='block-link-']"):
link_set.add(competition.get_attribute("href"))
driver.execute_script("arguments[0].scrollIntoView(true);", competition)
else:
driver.execute_script("arguments[0].scrollIntoView(true);", driver.find_element_by_tag_name("app-block-ext"))
all_countries = driver.find_elements_by_css_selector("li[id^='rziat']")
for country in all_countries:
country.click()
competitions = driver.find_elements_by_css_selector("a[id^='competition-link']")
for element in competitions:
link_set.add(element.get_attribute("href"))
driver.execute_script("arguments[0].scrollIntoView(true);", country)
for link in sorted(link_set):
print(link)

Related

Python Selenium How to find elements by XPATH with info from TAG and SUB TAG

HTML:
<div id="related">
<a class="123" href="url">
<h3 class="456">
<span id="id00" aria-label="TEXT HERE">
</span>
</h3>
</a>
<a class="123" href="url">
<h3 class="456">
<span id="id00" aria-label="NOT HERE">
</span>
</h3>
</a>
</div>
I'm trying to find & click on <a (inside the div id="related" with class="123" AND where SPAN aria-label contains "TEXT"
items = driver.find_elements(By.XPATH, "//div[#id='related']//a[#class='123'][contains(#href, 'url')]//span[contains(#aria-label, 'TEXT']")
But it's not finding the href, it's only finding the span.
then I want to do:
items[3].click()
How can I do that.
Your XPath has some typo problems.
Try this:
items = driver.find_elements(By.XPATH, "//div[#id='related']//a[#class='123'][contains(#href,'watch?v=')]//span[contains(#aria-label,'TEXT')]")
This will give you the span element inside the presented block.
To locate the a element you should use another XPath.
UPD
To find all the a elements inside div with #id='related' and containing span with specific aria-label attribute can be clearly translated to XPath like this:
items = driver.find_elements(By.XPATH, "//div[#id='related']//a[#class='123' and .//span[contains(#aria-label,'TEXT')]]")

Find subdivs within selenium in python ( selenium.webdriver.firefox.webelement)

I use selenium to access all divs which contain roster information:
# this returns a <selenium.webdriver.firefox.webelement.FirefoxWebElement
divs = driver.find_elements_by_class_name('appointment-template')
These divs inside this element should look like this:
div class="appointment-template" id="appointment-3500508">
<p class="title">Mentoruur</p>
<p class="time">11:15<span class="time-end"> - 12:15</span></p>
<ul class="facility">
<li onclick=""
title="HIC_Online">ONLINE
</li>
</ul>
<ul class="docent">
<li onclick=""
title="HANSJE">HANSJE
</li>
</ul>
<ul class="group">
<li onclick=""
title="ASD123">ASD123
</li>
</ul>
The next thing I want to do is access values like the docent name and time values that lie within this div:
for div in divs:
print(div.find_element_by_class_name('title'))
print(div.find_element_by_class_name('time'))
This does not seem to work:
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .title
How can I use selenium to get the values like:
Mentoruur
11:15 - 12:15
Hansje
to get the Mentoruur, one should try the below css :
div.appointment-template p.title
use it like this :
title = driver.find_element(By.CSS_SELECTOR, "div.appointment-template p.title").text
print(title)
to get time :
div.appointment-template p.time
code :
time = driver.find_element(By.CSS_SELECTOR, "div.appointment-template p.time").text
print(time)
same way you can go ahead with others.
In order to locate element inside element it's better to use this technique:
for div in divs:
print(div.find_element_by_xpath('.//p[#class="title"]'))
print(div.find_element_by_xpath('.//p[#class="time"]'))
the dot . in front of xpath expression means "from here". This is what we need when searching inside specific parent element

Python & Selenium: How can I obtain a href element from a specific div? My element is stale

I am new to using selenium with python for web scraping. The webpage I am trying to scrape data from has href elements within a specific div that I am trying to access. I have tried using find_element_by_xpath() to obtain this, however it is stating it cannot find the element. I then tried to find the div using the class and from this find the href, but it states my element is stale. I am struggling to understand why is it stale as I have found this second method seems to work for people on tutorials/stackoverflow.
The basic HTML is like:
<div class=div1>
<ul>
<li>
<a href='path/to/div1stuff/1'>Generic string 1</a>
<a href='path/to/div1stuff/2'>Generic string 2</a>
<a href='path/to/div1stuff/3'>Generic string 3</a>
</li>
</ul>
</div>
<div class=div2>
<ul>
<li>
<a href='path/to/div2stuff/1'>Generic string 1</a>
<a href='path/to/div2stuff/2'>Generic string 2</a>
<a href='path/to/div2stuff/3'>Generic string 3</a>
</li>
</ul>
</div>
And my python code:
class Scraper(object):
def __init__(self):
pass
def execute(self):
""" Run class methods """
self.home = "https://www.website2scrape.com/"
self.get_stuff()
def get_stuff(self):
""" Get stuff """
driver = webdriver.Firefox("/usr/local/bin/")
driver.get(self.home)
# Example 1
driver.find_element_by_xpath("//div[#class='div2']//a[contains(#href,'Generic string 2')]").click()
# Example 2
elements = driver.find_elements_by_css_selector("div.div2")
for element in elements:
print(element.get_attribute("href"))
Example 1 gives the error element cant be found.
Example 2 gives the error the element is stale
I am trying to click on the generic string 2 href from div2, however if I just get href by using:
driver.find_element_by_xpath('//a[contains(#href, "Generic string 2")]')
it clicks on the href from div1. How can I get the href from a specific div class?
In first example you have to use text() instead of #href
driver.find_element_by_xpath("//div[#class='div2']//a[contains(text(),'Generic string 2')]").click()
In second example you search href in div but it is in a so you have to add a to selector
elements = driver.find_elements_by_css_selector("div.div2 a")
Minimal working code:
import selenium.webdriver
driver = selenium.webdriver.Firefox()
html_content = """
<div class=div1>
<ul>
<li>
<a href='path/to/div1stuff/1'>Generic string 1</a>
<a href='path/to/div1stuff/2'>Generic string 2</a>
<a href='path/to/div1stuff/3'>Generic string 3</a>
</li>
</ul>
</div>
<div class=div2>
<ul>
<li>
<a href='path/to/div2stuff/1'>Generic string 1</a>
<a href='path/to/div2stuff/2'>Generic string 2</a>
<a href='path/to/div2stuff/3'>Generic string 3</a>
</li>
</ul>
</div>
"""
driver.get("data:text/html;charset=utf-8," + html_content)
elements = driver.find_elements_by_css_selector("div.div2 a")
for x in elements:
print(x.get_attribute('href'))
item = driver.find_element_by_xpath("//div[#class='div2']//a[contains(text(),'Generic string 2')]")
print(item.get_attribute('href'))
item.click()
Please find below xpath to click on second link under div 2 tag.
Solution 1:
element = driver.findElement(By.xpath("//div[#class='div2']//ul//li//a[2]"));
element.click()
and if you want to click based on text you can use below code
Solution 2:
driver.find_element_by_xpath("//div[#class='div2']//a[contains(text(),'Generic string 2')]").click()
click based on href element
Solution 3:
driver.find_element_by_xpath("//div[#class='div2']//ul/li//a[contains(#href,'path/to/div2stuff/2')]").click()

I want to click on the <li> item in the left navigation of a webpage, using Python and Selenium webdriver to locate it

I want to click on MAKE UP in the
left navigation, Please find attached image and link for the webpage
Image for the Webpage
Link for the Webpage
I am currently using the below code to click on the item but not
getting any result.I am able to acces the elements by class
name('has-sub').I can even print them but cant click them
obc = driver.find_elements_by_class_name('has-sub')
for ea in obc:
if ea.text == "Makeup":
ea.click()
Just for the more info below is the html code for the webpage
<li class="has-sub" style="height: 38px;">
Makeup
<ul class="submenu" style="top: 0px;">
<li>
<a id="SBN_facet_Face" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/face" escapexml="false">Face </a>
</li>
<li>
<a id="SBN_facet_Lips" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/lips" escapexml="false">Lips </a>
</li>
<li>
<a id="SBN_facet_Eyes" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/eyes" escapexml="false">Eyes </a>
</li>
<li>
<a id="SBN_facet_Nails" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/nails" escapexml="false">Nails </a>
</li>
<li>
<a id="SBN_facet_Brushes & Tools" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/beauty-brushes-accessories" escapexml="false">Brushes & Tools </a>
</li>
<li>
<a id="SBN_facet_Makeup" href="http://shop.davidjones.com.au/djs/en/davidjones/beauty/beauty-makeup" escapexml="false">All Makeup </a>
</li>
</ul>
</li>`enter code here`
Any help will be appreciated .
I am able to click using below code.
wait = WebDriverWait(driver, 10)
elements = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[#class='has-sub']")))
for element in elements:
if element.find_elements_by_link_text("Makeup"):
element.click()
break
innerElements = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[#class='has-sub open']/ul/li")))
for innerElement in innerElements:
if innerElement.text == "Face":
innerElement.click()
break
Hope this will help you.
Problem here is, you are trying to click on the element while the text is under the element. So what you are going to need to do is:
obc = driver.find_elements_by_xpath('//li[#class='has-sub']/a[contains(text(), 'Makeup')]')
I tested the xpath on your webpage and it worked.
As per the HTML you have provided, to click on MAKE UP in the left navigation pane, you can use the following code block :
obc = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='aside all-open']/ul//li[#class='has-sub']/a")))
for ea in obc:
if 'Makeup' in ea.get_attribute("innerHTML"):
ea.click()
break

Why did changing my xpath make my selenium click work consistently?

I am running a series of selenium tests with python. I have a navigation on the page I'm testing that has this structure:
<ul>
<li class="has-sub">
<a href="#">
<span> First nav </span>
</a>
<ul style="display:block">
<li>
<a href="#">
<span> First subnav </span>
</a>
</li>
<li>...</li>
<li>...</li>
<li>...</li>
</ul>
</li>
<li>...</li>
</ul>
Now I am clicking on the first subnav, that is the first span, but clicking on First nav to open up that list then first subnav. I implement a webdriverwait, to wait for the element to be visible and click on it via it's xpath,
//span[1]
I often got timeout exceptions waiting for the subnav span to be visible after clicking on the nav, which made me think something was wrong with clicking on the first nav to open up the list. So I changed the xpath of the first nav (//span[1]) to
//li[#class='has-sub']/descendant::span[text()='First subnav']
and I never get timeout exceptions when waiting for subnav span to be visible now. So seems like it's always clicking on the nav span every time to open it up and give me no timeout when trying to get to the subnav. Anyone have any idea why that is?
Here is my python code as well:
inside LoadCommPage class:
def click_element(self, by, locator):
try:
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((by, locator)))
print "pressing element " + str(locator)
self.driver.find_element(by, locator).click()
except TimeoutException:
print "no clickable element in 10 sec"
print self.traceback.format_exc()
self.driver.close()
inside main test (load_comm_page is an instance of LoadCommPage, where click_clement is defined):
load_comm_page.click_element(*LoadCommPageLocators.sys_ops_tab)
And another class for the locators:
class LoadCommPageLocators(object):
firstnav_tab = (By.XPATH, "//li[#class='has-sub']/descendant::span[text()='First nav']")
Xpath indexes begin at one, not 0 so the Xpath
//span[1]
is looking for the first span element in the html. Whereas
//span[2]
will look for the second span.

Categories

Resources