Extract id of the div using selenium in Python - python

Can someone help me to extract the id text (the underline one in the below-attached image) from this link: https://opennem.org.au/facilities/au/?selected=DEIBDL&status=operating,committed,commissioning,retired
To get the right on the item in any of the items in the table and inspect.
I knew how to extract the table text but I want the id text which I underlined

The selected element can be located by this XPath
//div[#class='card is-selected']
Or this CSS Selector
div.is-selected
So, you can use this Selenium code to extract the id attribute value:
id = driver.find_element(By.XPATH,"//div[#class='card is-selected']").get_attribute("id")
But this will give you the id appearing on the dev tools, this is not always the text you see as a user on the page.
To get the text user sees you will need this:
name = driver.find_element(By.CSS_SELECTOR,"div.is-selected .station-name").text
This can be done woth XPath as well

Related

How to click on a label tag within nested divs in Selenium Python

I am trying to click on an 'Select all' label on a website but I am having troubles. The HTML has it the label nested within several divs. HTML Example here.
I have tried various XPATH examples but nothing actually clicks the element. Any ideas?
Here is my latest attempt:
driver.find_element(By.XPATH,"//div//label[contains(., 'Select all')]")strong text
UPDATE
I was able to select the options individually with the following code:
select_element = driver.find_element(By.ID,'availableList')
select_object = Select(select_element)
all_available_options = select_object.options
count_of_options = len(all_available_options)
for x in range(count_of_options):
select_object.select_by_index(x)
It has unique ID you can use that instead of XPATH.
driver.find_element(By.ID,'availableAll').click()
or you can use full XPATH it will work.

Python Selenium pulling child property from parent?

I'm trying to scrape a webform for text in specific fields however i can't do it with xpath because some forms are missing fields which won't be included in the page when it loads (i.e. if /html/blah/blah/p[3] is the initials field for one form it might be first name on another form but have the same xpath. The structure for the fields is like this:
<p><strong>Initials:</strong> WT</p>
so using python selenium i'm doing
driver.find_element_by_xpath("//*[contains(text(), 'Initials:')]") which does successfully pull the "Initials:" text between the strong tags but i specifically need the child text after it, in this case WT. It has the attribute "nextSibling.data" which contains the WT value but from my googling i don't think its possible to pull that attribute with python selenium. Does anyone know a way to pull the WT text following the xpath query?
The 'WT' text is in a weird spot. I don't think it is actually a sibling per-se. The only way I know to grab that text would be to use p_element.get_attribute('outerHTML'), which in this instance should grab the string '<p><strong>Initials:</strong> WT</p>'. I doubt this is the cleanest solution, but here's a way to parse that text out:
strong_close_tag = '</strong>'
p_close_tag = '</p>'
p_element = driver.find_element_by_xpath("//*[contains(text(), 'Initials:')]/parent")
print(p_element.get_attribute('outerHTML')[text.index(strong_close_tag)+len(strong_close_tag):text.index(p_close_tag)])
OR -- use p_element.get_attribute('innerHTML'), which should return just <strong>Initials:</strong> WT. Then, similarly, grab the text after the </strong> closing tab, maybe like this:
p_element = driver.find_element_by_xpath("//*[contains(text(), 'Initials:')]/parent")
print p_element.get_attribute('innerHTML').split("</strong>",1)[1]

Clicking on 'href' from <a> tag

Hi i have the following in python
#Searching for company
varA = soup.find(Microsoft)
#Finding the <a> tag which contains href
#{<a data-deptmodal="true" href="https://someURL BASED ON COMPANY NAME">TEXT BASED ON COMPANY NAME</a>}
button = org.find_previous('a')
driver.find_element_by_tag_name(button).click()
and i get an error like
TypeError: Object of type 'Tag' is not JSON serializable
How do I make the webdriver click on my href after i get the soup
please note that my href changes everytime i change the company name.
To add to the existing comment, BeautifulSoup is an HTML parser, it helps you to extract data from the HTML, it is not interacting with the page in any manner - it cannot, for instance, click the link.
If you need to click the link in the browser, do it via selenium. In your case the .find_element_by_link_text() (or .find_element_by_partial_link_text()) locator fits the problem really well:
driver.find_element_by_link_text("Microsoft")
Documentation reference: Locating Hyperlinks by Link Text.

Selecting with non-class tag in scrapy python

I am trying to scrap title of a website but the problem it has no class and id.
Usually i use this to get title that has class:
titles = response.xpath('//a[#class="result-title hdrlnk"]/text()').extract()
Now I am trying to extract text, please see the screenshot, can you please fix it? [https://i.stack.imgur.com/k6aCN.png][1]
You may locate a specific node by any attribute (not only class and id) or its relative position with some others.
A few examples for the text in your screenshot:
response.xpath('//div[#class="job-title-text"]/a/text()')
response.xpath('//a[contains(#onclick,"clickJObTitle")]/text()')
response.xpath('//a[contains(#href,"jobdetails")]/text()')
response.css('div.job-title-text a::text')
response.css('a[onclick*=clickJObTitle]::text')
response.css('a[href*=jobdetails]::text')
See also:
https://www.w3schools.com/xml/xpath_syntax.asp
https://www.w3schools.com/cssref/css_selectors.asp

Alternative selection for element with unique ID - python selenium?

There is unique ID that changes for the text field, but the text field is the same!
The id every time looks like this:
id-e9eeb082b846435682bfe4ce10359f17- css
//*[#id="id-e9eeb082b846435682bfe4ce10359f17"] - xpath
html body div.main.main_bottom-740 div.container div.row div.col-md-24.col-xs-24 div.article-area.article-area_without-top-radius form.step2 div.row div.col-md-12.col-xs-12 input#id-e9eeb082b846435682bfe4ce10359f17.input.input__field.input_full-width.input_with-label-above.input_email-validation - path to CSS
I want to know how to choose it by another way - xpath, or id does not work
Some HTML in your example could help.
If the ID changes each time you load it you could try something like this:
//form[contains(#class,'step2')]/div[contains(#class,'row')]/div[contains(#class,'col-xs-12')]/input[contains(#id,'id-')]
Or you might be able to add some form of Regex to it.

Categories

Resources