Select value of an element property with Selenium - python

I need to extract a value of span tag property using selenium.
This is the html code :
<small class="time">
<a title="2015" class="class2 class3 class4 class5" href="url">
<span data-long-form="true" data-time-ms="1438835437000" data-time="1438835437" data-aria-label-part="last" class="class6 class7">Aug 5</span>
</a>
</small>
I need to extract the value of the "date-time" property of the span tag, here is the python code I am trying to use :
try:
timestamp = element.find_element_by_xpath(".//small[contains(#class, 'time')]/a[1]/span[1]")
print "timestamp", timestamp.value_of_css_property("data-time")
except exp.NoSuchElementException:
print "Timestamp location not proper"
I also tried :
timestamp = element.find_element_by_css_selector(".class2.class3.class4.class5").value_of_css_property("date-time")
but all are returning blank result.
Any Idea what is the cause of this problem?

Use get_attribute():
element = driver.find_element_by_css_selector("small.time span[data-time]")
element.get_attribute("data-time")
Note that in your second attempt, you've used date-time instead of data-time.

Related

Scraping the attribute of the first child from multiple div (selenium)

I'm trying to scrap the class name of the first child (span) from multiple div.
Here is the html code:
<div class="ui_column is-9">
<span class="name1></span>
<span class="...">...</span>
...
<div class ="ui_column is-9">
<span class="name2></span>
<span class="...">...</span>
...
<div class ..
URL of the page for the complete code.
I'm achieving this task with this code for the first five div:
i=0
liste=[]
while i <= 4:
parent= driver.find_elements_by_xpath("//div[#class='ui_column is-9']")[i]
child= parent.find_element_by_xpath("./child::*")
class_name= child.get_attribute('class')
i = i+1
liste.append(nom_classe)
But do you know if there is an easier way to do it ?
You can directly get all these first span elements and then extract their class attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[1]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)
You can also extract the class attribute values from 5 first elements only by limiting the loop for 5 iterations
UPD
Well, after updating your question the answer becomes different and much simpler.
You can get the desired elements directly and extract their class name attribute values as following:
liste = []
first_spans = driver.find_elements_by_xpath("//div[#class='ui_column is-9']//span[contains(#class,'ui_bubble_rating')]")
for element in first_spans:
class_name= element.get_attribute('class')
liste.append(class_name)

How to get full date value from HTML using Selenium (Python)

I am trying to get date from HTML using selenium. The problem is, that in the HTML (if viewed through the console in a browser) I see the full date like 01.01.2001, but selenium returns me 01.January what I should do, to get full date?
The part of HTML(what I see in the browser console) looks like this:
<div>
<h4 class="status-param"> Name and Surname </h4>
<h4 class="status-param"> Date of Birth: 01.01.2001 </h4> <!--problem only with this -->
<h4 class="status-param"> 12.12.2012 00:00 </h4>
</div>
my code
driver = webdriver.Chrome('chrome/chromedriver')
driver.get(html)
result = []
try:
elements = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CLASS_NAME, "status-param"))
)
for element in elements:
result.append(element.text)
finally:
driver.quit()
I get the following
'Name and Surname'
'Date of Birth: 01.January'
'12.12.2012'
use the below xpath :
(//div/h4[#class='status-param'])[2]
and use it like this :
elements = WebDriverWait(driver , 10).until(EC.visibility_of_element_located((By.XPATH, "(//div/h4[#class='status-param'])[2]"))).get_attribute('innerHTML')
print(elements)
the above should only prints 01.01.2001
Update 1 :
You can simply break string into 2 parts ang grab the second part like this :
print(elements.split("\\:")[1])

how to get desired text from element in linkedin scraping using python and selenium

Below 'a' element has two text strings "First Name" and "View First Names's profile". With below python code using get_text() I am getting both the text strings. However I want to get only first i.e. "First Name". Pl let me know code to drop 2nd string i.e. "View First Names's profile"
all_classes = src.find_all('div', {'class':'mb1'})
for linkClass in all_classes:
linkClass = linkClass.find_all('a', {'class': 'app-aware-link'})
for element in linkClass:
name = element.get_text().strip()
Name.append(name)
HTML
<a class="app-aware-link" href="https://www.linkedin.com/in/shreyansjain-iitdhn?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAABpqUi4Bg1wC5QB22-ydCRRB580Zd4gutQ8">
<span dir="ltr">
<span aria-hidden="true"><!-- -->First Name<!-- --></span><span class="visually-hidden"><!-- -->View First Names’s profile<!-- --></span>
</span>
</a>
To extract the first name in Selenium I would do this :
use the below CSS_SELECTOR for First name :
.app-aware-link span[dir='ltr'] span:first-of-type
Profile name :
.app-aware-link span[dir='ltr'] span:last-of-type
and extract the text b/w them like this :
first-name :
for name in driver.find_elements(By.CSS_SELECTOR, " .app-aware-link span[dir='ltr'] span:first-of-type"):
print(name.text)
Profile_name :
for profile_name in driver.find_elements(By.CSS_SELECTOR, ".app-aware-link span[dir='ltr'] span:last-of-type"):
print(profile_name.text)
Try this:
all_classes = src.find_all('div', {'class':'mb1'})
for linkClass in all_classes:
linkClass = linkClass.find_all('a', {'class': 'app-aware-link'})
for element in linkClass:
if element is not None:
first_name = element.find_elements_by_xpath('./span/span')[0]
if first_name is not None:
name = first_name.get_text().strip()
Name.append(name)

Selenium python - How to scroll to text that its parent element contains class="min"

How can I scroll to text that only contains min class
<div>
<div class="item filter_2 firstPart">
<div class="date">16/10/2018</div>
<div class="time">04:00</div>
<div class="event">Ningbo, China</div>
<div class="subevent">HE, Yecong - Kecmanovic, Miomir</div>
<div class="odds">
<div class="odd" idq="2998675069">
<div class="tq">1HH</div>
<div class="value">8.00</div>
</div>
<div class="odd min" idq="2998675068">
<div class="tq">2HH</div>
<div class="value">1.03</div>
</div>
</div>
</div>
</div>
I will like to scroll to text if min class is present
Here is what i have tried:
new_text = ['2.10', '2.15', '2.20', '2.25', '2.30', '2.35', '2.40',
'2.45', '2.50', '2.55', '2.60', '2.65', '2.70',
'2.75', '2.80', '2.85', '2.90', '2.95', '3.10']
for text in new_text:
if text in driver.page_source:
parent = driver.find_element_by_css_selector(".odd.min")
child = parent.find_element_by_xpath("//div[#class='value'
and text()='" + text + "']")
if child:
print(text)
element = child
driver.execute_script('arguments[0].scrollIntoView();',
element)
driver.save_screenshot('lo7.png')
break
else:
print("No odd found")
continue
The problem about this code is that it also scrolls to text that does not contain min class
Image file:
//div[#class='odd min']/div[#class='tq']/text()
u can try this xpath expression to get the value of "2HH".
The problem is with your XPath locator. You are locating parent and then starting there using an XPath searching only for children using parent.find_element_by_xpath("//div.... If you want an XPath to start from the parent context, you need to add a . at the start, e.g. ".//div[#class='value' and ...". If you don't include that ., then your XPath looks at the entire page, as you discovered.
There is a better way to do this... don't print a bunch of screenshots, just pull the odds that you want and compare them to your desired list.
values_from_page = driver.find_elements_by_css_selector(".odd.min > div.value") # all odds elements from the page
odds = (e.text for e in values_from_page if e.is_displayed()) # filter down to only visible elements and get the text
print(odds)
new_text = ['2.10', '2.15', '2.20', '2.25', '2.30', '2.35', '2.40',
'2.45', '2.50', '2.55', '2.60', '2.65', '2.70',
'2.75', '2.80', '2.85', '2.90', '2.95', '3.10']
missing_odds = new_text.difference(odds) # filter down to any new_text odds missing on page
print(missing_odds)
This is untested code but should be pretty close. With my code, it should run WAY faster because you are only scraping the page once (and only once) instead of scraping twice per item in new_text plus scrolling the page and taking a screenshot for each one.
When you take a screenshot, someone has to look at it to verify. That takes manual work and time... avoid this whenever possible. Let the automation do the validation for you and only report when something is wrong/missing. If missing_odds is empty len(x) = 0, then all the items in new_text were found. Anything that is printed, was missing from the page.
Hopefully that helps get you started in the right direction.

Python selenium webdriver. Find elements with specified class name

I am using Selenium to parse a page containing markup that looks a bit like this:
<html>
<head><title>Example</title></head>
<body>
<div>
<span class="Fw(500) D(ib) Fz(42px)">1</span>
<span class="Fw(500) D(ib) Fz(42px) Green XYZ">2</span>
</div>
</body>
</html>
I want to fetch all span elements that contain the class foobar.
I have tried both of this (the variable wd is an instance of selenium.webdriver):
elem = wd.find_elements_by_css_selector("span[class='Fw(500) D(ib) Fz(42px).']")
elem = wd.find_element_by_xpath("//span[starts-with(#class, 'Fw(500) D(ib) Fz(42px))]")
NONE OF WHICH WORK.
How can I select only the elements that start with Fw(500) D(ib) Fz(42px)
i.e. both span elements in the sample markup given.
Try as below :-
elem = wd.find_elements_by_css_selector("span.foobar")
If there is space between class foo and bar then try as below :-
elem = wd.find_elements_by_css_selector("span.foo.bar")
Edited : If your class contains with non alphabetical charactor and you want to find element which starts with Fw(500) D(ib) Fz(42px) then try as below :-
elem = wd.find_elements_by_css_selector("span[class ^= 'Fw(500) D(ib) Fz(42px)']")
Try to find elements by XPath:
//span[#class='foobar']
This should work.

Categories

Resources