How to get full date value from HTML using Selenium (Python) - python

I am trying to get date from HTML using selenium. The problem is, that in the HTML (if viewed through the console in a browser) I see the full date like 01.01.2001, but selenium returns me 01.January what I should do, to get full date?
The part of HTML(what I see in the browser console) looks like this:
<div>
<h4 class="status-param"> Name and Surname </h4>
<h4 class="status-param"> Date of Birth: 01.01.2001 </h4> <!--problem only with this -->
<h4 class="status-param"> 12.12.2012 00:00 </h4>
</div>
my code
driver = webdriver.Chrome('chrome/chromedriver')
driver.get(html)
result = []
try:
elements = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CLASS_NAME, "status-param"))
)
for element in elements:
result.append(element.text)
finally:
driver.quit()
I get the following
'Name and Surname'
'Date of Birth: 01.January'
'12.12.2012'

use the below xpath :
(//div/h4[#class='status-param'])[2]
and use it like this :
elements = WebDriverWait(driver , 10).until(EC.visibility_of_element_located((By.XPATH, "(//div/h4[#class='status-param'])[2]"))).get_attribute('innerHTML')
print(elements)
the above should only prints 01.01.2001
Update 1 :
You can simply break string into 2 parts ang grab the second part like this :
print(elements.split("\\:")[1])

Related

how to get desired text from element in linkedin scraping using python and selenium

Below 'a' element has two text strings "First Name" and "View First Names's profile". With below python code using get_text() I am getting both the text strings. However I want to get only first i.e. "First Name". Pl let me know code to drop 2nd string i.e. "View First Names's profile"
all_classes = src.find_all('div', {'class':'mb1'})
for linkClass in all_classes:
linkClass = linkClass.find_all('a', {'class': 'app-aware-link'})
for element in linkClass:
name = element.get_text().strip()
Name.append(name)
HTML
<a class="app-aware-link" href="https://www.linkedin.com/in/shreyansjain-iitdhn?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAABpqUi4Bg1wC5QB22-ydCRRB580Zd4gutQ8">
<span dir="ltr">
<span aria-hidden="true"><!-- -->First Name<!-- --></span><span class="visually-hidden"><!-- -->View First Names’s profile<!-- --></span>
</span>
</a>
To extract the first name in Selenium I would do this :
use the below CSS_SELECTOR for First name :
.app-aware-link span[dir='ltr'] span:first-of-type
Profile name :
.app-aware-link span[dir='ltr'] span:last-of-type
and extract the text b/w them like this :
first-name :
for name in driver.find_elements(By.CSS_SELECTOR, " .app-aware-link span[dir='ltr'] span:first-of-type"):
print(name.text)
Profile_name :
for profile_name in driver.find_elements(By.CSS_SELECTOR, ".app-aware-link span[dir='ltr'] span:last-of-type"):
print(profile_name.text)
Try this:
all_classes = src.find_all('div', {'class':'mb1'})
for linkClass in all_classes:
linkClass = linkClass.find_all('a', {'class': 'app-aware-link'})
for element in linkClass:
if element is not None:
first_name = element.find_elements_by_xpath('./span/span')[0]
if first_name is not None:
name = first_name.get_text().strip()
Name.append(name)

Text value is empty while extracting div text

Im trying to extract the text "Quesadilla" , however when I try to get the text it comes as empty.
HTML trying to extract from:
<div data-baseweb="block" data-testid="menu-item-name" class="d0 bd ib ic id ie be ed f7 bh fz if">
<div lines="2" class="ig ih ii bq">Quesadilla</div></div>
</div>
Code:
menuItemNames = driver.find_elements_by_xpath("//div[#data-testid='menu-item-name']")
for menuItemName in menuItemNames:
print(menuItemName.text)
Is there a better way to do this?
When I use the above code to get the text I get the values of some div while others return empty or null
try to introduce CSS_SELECTOR with explicit wait for more reliability :
CSS SELETOR :
div[data-testid='menu-item-name'] div
Code with explicit wait :
driver = webdriver.Chrome("C:\\Users\\etc\\Desktop\\Selenium+Python\\chromedriver.exe")
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("Your URL")
ele = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[data-testid='menu-item-name'] div "))).text
print(ele)
if there are 10 elements inside this div div[data-testid='menu-item-name'] you could properly try to use css selector like this :
elements = driver.find_elements(By.CSS_SELECTOR, "div[data-testid='menu-item-name']")
for ele in elements:
print(ele.text)
Learn more about explicit wait here

find an element under a <ul> <li> <a> "some text" </a> </li> </ul> tag as shown in figure

This is the HTMl code, I want to get "1" and so on all values written in nested <li> <a> tags
I have tried
total = driver.find_element_by_xpath("//a[text()='...']/following-sibling::a").text
and
totl = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#class='ng-binding']")))
print (totl.text)
but nothing works. It will be a great favor if you let me out of it.
To be able to get text WebElement should be visible, that's why wait for visibility of all elements. Code examples to get all a elements (total is be a list of WebElements):
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'ul[uib-pagination] li a')))
# or
total = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li.pagination-page a')))
To get text from total:
# texts of all links in total
total_texts = [element.text from element in total]
print(total_texts)
# text of the first one
first_page_number = total[0].text
# text of the last one
last_page_number = total[-1].text

Python selenium webdriver. Find elements with specified class name

I am using Selenium to parse a page containing markup that looks a bit like this:
<html>
<head><title>Example</title></head>
<body>
<div>
<span class="Fw(500) D(ib) Fz(42px)">1</span>
<span class="Fw(500) D(ib) Fz(42px) Green XYZ">2</span>
</div>
</body>
</html>
I want to fetch all span elements that contain the class foobar.
I have tried both of this (the variable wd is an instance of selenium.webdriver):
elem = wd.find_elements_by_css_selector("span[class='Fw(500) D(ib) Fz(42px).']")
elem = wd.find_element_by_xpath("//span[starts-with(#class, 'Fw(500) D(ib) Fz(42px))]")
NONE OF WHICH WORK.
How can I select only the elements that start with Fw(500) D(ib) Fz(42px)
i.e. both span elements in the sample markup given.
Try as below :-
elem = wd.find_elements_by_css_selector("span.foobar")
If there is space between class foo and bar then try as below :-
elem = wd.find_elements_by_css_selector("span.foo.bar")
Edited : If your class contains with non alphabetical charactor and you want to find element which starts with Fw(500) D(ib) Fz(42px) then try as below :-
elem = wd.find_elements_by_css_selector("span[class ^= 'Fw(500) D(ib) Fz(42px)']")
Try to find elements by XPath:
//span[#class='foobar']
This should work.

Select value of an element property with Selenium

I need to extract a value of span tag property using selenium.
This is the html code :
<small class="time">
<a title="2015" class="class2 class3 class4 class5" href="url">
<span data-long-form="true" data-time-ms="1438835437000" data-time="1438835437" data-aria-label-part="last" class="class6 class7">Aug 5</span>
</a>
</small>
I need to extract the value of the "date-time" property of the span tag, here is the python code I am trying to use :
try:
timestamp = element.find_element_by_xpath(".//small[contains(#class, 'time')]/a[1]/span[1]")
print "timestamp", timestamp.value_of_css_property("data-time")
except exp.NoSuchElementException:
print "Timestamp location not proper"
I also tried :
timestamp = element.find_element_by_css_selector(".class2.class3.class4.class5").value_of_css_property("date-time")
but all are returning blank result.
Any Idea what is the cause of this problem?
Use get_attribute():
element = driver.find_element_by_css_selector("small.time span[data-time]")
element.get_attribute("data-time")
Note that in your second attempt, you've used date-time instead of data-time.

Categories

Resources