How to work with multiple classes using BeautifulSoup & BS4

How to work with multiple classes using BeautifulSoup & BS4 - python

I am trying to write youtube scraper and as part of my task I need to work with multiple classes at bs4.
HTML looks like
<span id="video-title" class="style-scope ytd-playlist-panel-video-renderer">
</span>
My aim to use class attribute to get all 50 different music and work with them.
I have tried like that and it returns me nothing.
soup_obj.find_all("span", {"class":"style-scope ytd-playlist-panel-video-renderer"})
and I also tried as Selenium style (instead of spaces between class pass dot(.)
soup_obj.find_all("span", {"class":"style-scope.ytd-playlist-panel-video-renderer"})
Does anyone have idea about it ?

This should work
soup_obj.find_all("span", {"class":["style-scope", "ytd-playlist-panel-video-renderer"]})

Using Selenium you can't send multiple classnames within:
driver.find_elements(By.CLASS_NAME, "classname")
If your aim is to use only class attribute then you need to pass only a single classname and you can use either of the following Locator Strategies:
Using classname as style-scope:
elements = driver.find_elements(By.CLASS_NAME, "style-scope")
Using classname as style-scope:
elements = driver.find_elements(By.CLASS_NAME, "ytd-playlist-panel-video-renderer")
To pass both the classnames you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
elements = driver.find_elements(By.CSS_SELECTOR, "span.style-scope.ytd-playlist-panel-video-renderer")
Using XPATH:
elements = driver.find_elements(By.XPATH, "//span[#class='style-scope ytd-playlist-panel-video-renderer']")

Related

How to access HTML elements in a nested fashion using Selenium and Python

I'm trying to figure out how to access HTML elements in a nested fashion through Selenium and Python.
For example I have:
box = driver.find_element_by_tag_name('tbody')
which represents the body of the data I'd like to mine. I'd like to iterate through each row in this body (each row characterized by a <tr> tag) using something like:
for driver.find_element_by_tag_name('tr') in box:
But obviously that's not possible because box is a Selenium object and is non-iterable.
What's the best way to do something like this?

An optimum approach would be to construct locator strategies which would traverse from the parent till the descendants as follows:
Using CSS_SELECTOR:
for element in driver.find_elements(By.CSS_SELECTOR, "tbody tr"):
print(element.text)
Using XPATH:
for element in driver.find_elements(By.XPATH, "//tbody//tr"):
print(element.text)

How to use mutiple attributes (including a partial string match) with find_elements in Selenium for Python

Currently I have this line of code which correctly selects this type of object on the webpage I'm trying to manipulate with Selenium:
pointsObj = driver.find_elements(By.CLASS_NAME,'treeImg')
What I need to do is add in a partial string match condition as well which looks in the section "CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)" in the line below.
<span class="treeImg v65point" style="cursor:pointer;">CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)</span>
I found online there's the ChainedBy option but I can't think of how to reference that text in the span. Do I need to use XPath? I tried that for a second but I couldn't think of how to parse it.

Refering both the CLASS_NAME and the innerText you can use either of the following locator strategies:
xpath using the classname treeImg and partial innerText:
pointsObj = driver.find_elements(By.XPATH,"//span[contains(#class, 'treeImg') and contains(., 'AHU-01_ahu_ChilledWtrVlvOutVolts')]")
xpath using all the classnames and entire innerText:
pointsObj = driver.find_elements(By.XPATH,"//span[#class='treeImg v65point' and text()='CLGV (AHU-01_ahu_ChilledWtrVlvOutVolts)']")

Find all elements with href tag containing certain text with Selenium and Python

Lets say I have html code with three links
Whatever
Whatever
Whatever
and I want to use selenium to find all elements which have a href tag which includes the string "hey" (in this case the first two links). How would I write python Selenium code which accomplishes this?

This works:
all_href = driver.find_elements(By.XPATH, "//*[contains(#href, 'hey')]")
print(len(all_href)

This XPath will do this work:
"//a[contains(#href,'hey')]"
To use that with Selenium you will need a find_elements or findElements method, depends on the language binding you use with Selenium.
For Selenium in Python this will give you the list of all such elements:
all_hey_elements = driver.find_elements(By.XPATH, "//a[contains(#href, 'hey')]")

Get data-ref using selenium in python

I have a website with such class:
<html>
<body><div class="class1" data-ref="data"></div></body>
<html>
I want to use selenium in python 3 to get the data-ref (i.e. "data").
I know you can get the text using .text is there something like that for data-ref?

Just do this:
get_attribute("attribute name")
In your case
get_attribute("data-ref")

You can first find the element using its xpath and then can find the data-ref value using get_attribute method.
You can do it like:
element = driver.find_element_by_xpath("//div[#class='class1']")
# Find the value of data-ref
value = element.get_attribute("data-ref")

To retrieve the value of data-ref attribute i.e. data you can use either of the following Locator Strategies:
Using XPATH:
print(driver.find_element_by_xpath("//div[#class='class1']").get_attribute("data-ref"))
Using CSS_SELECTOR:
print(driver.find_element_by_xpath("div.class1").get_attribute("data-ref"))

How to use function ends-with in selenium?

I look up the information that lxml does not support xpath2.0 so that it can't use ends-with, so selenium can't use ends-with how to use it or replace ends-with. thank you very much indeed！！!
HTML sample
<span id="xxxxx_close">wwwww</span>
The 'xxxxx' part of #id is random

You can apply an ends-with CSS selector:
By.cssSelector("[id$=_close]")
There's no need of including span tag in css selector search as well.

The ends-with XPath Constraint Function is part of XPath v2.0 but as per the current implementation Selenium supports XPath v1.0.
As per the HTML you have shared to identify the element you can use either of the Locator Strategies:
XPath using contains():
xpath using contains for id attribute:
driver.findElement(By.xpath("//span[contains(#id,'_close')]")).click();
xpath using contains for id and innerHTML attribute:
driver.findElement(By.xpath("//span[contains(#id,'_close') and contains(.,'wwwww')]")).click();
Alternatively, you can also use CssSelector as follows:
css_selector using ends-with (i.e. $ wildcard) clause for id attribute:
driver.find_element_by_css_selector("span[id$='_close']").click();
css_selector using contains (i.e. * wildcard) clause for id attribute:
driver.find_element_by_css_selector("span[id*='_close']").click();

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to work with multiple classes using BeautifulSoup & BS4 - python

This should work soup_obj.find_all("span", {"class":["style-scope", "ytd-playlist-panel-video-renderer"]})

Related

How to access HTML elements in a nested fashion using Selenium and Python

How to use mutiple attributes (including a partial string match) with find_elements in Selenium for Python

Find all elements with href tag containing certain text with Selenium and Python

Get data-ref using selenium in python

How to use function ends-with in selenium?

Categories

Resources