XPath: Select specific item from property string

XPath: Select specific item from property string - python

Trying to drill down to a specific Xpath of a url in a longer string. I've gotten down to each of the listed blocks, but can't seem to get any further than the long string of properties.
example code:
<div class="abc class">
<a class="123" title="abc" keys="xyz" href="url string">
Right now I have...
.//*[#id='content']/div/div[1]/a
That only retrieves the whole string of data, from class through href. What would I need to just retrieve the "url string" from that part? Would this need to be accomplished with a subsequent 'for' argument in the python input?

A pure XPath solution would involve just adding the #href to the expression:
.//*[#id='content']/div/div[1]/a/#href
In Python, assuming you are using lxml.html, you can get the attribute using the .attrib:
for link in root.xpath(".//*[#id='content']/div/div[1]/a"):
print(link.attrib['href'])

Try to avoid this array
if your class name is unique you can do it like:-
//*[#id='content']/div/div[#class='abc class']/a[#keys='xyz']/#href
Hope it will help you :)

Related

I'm having trouble selectiong an element using Selenium with Python

I want to read out the text in this html element using selenium with python. I just can't find a way to find or select it without using the text (i don't want that because its content changes)
<div font-size="14px" color="text" class="sc-gtsrHT jFEWVt">0.101 ONE</div>
Do you have an idea how i could select it? The conventional ways listed in the documentation seem to not work for me. To be honest i'm not very good with html what doesn't make things any easier.
Thank you in advance

Try this :
element = browser.find_element_by_class_name('sc-gtsrHT jFEWVt').text
Or use a loop if you have several elements :
elements = browser.find_elements_by_class_name('sc-gtsrHT jFEWVt')
for e in elements:
print(e.text)

print(browser.find_element_by_xpath("//*[#class='sc-gtsrHT jFEWVt']").text)
You could simply grab it by class name. It's 2 class names so it would be like so. by_class_name only uses one.
If the class name isn't dynamic otherwise you'd have to right click and copy the xpath or find a unique identiftier.

Find by XPath as long as font size and color attribute are consistent. Be like,
//div[#font-size='14px' and #color='text' and starts-with(#class,'sc-')]
I guess the class name is random?

Remove HTML Tags python

I have looked everywhere for a solution to my problem, but none of them seem to work. Essentially, I want to know the simplest way to remove HTML tags from a string. For example,
PriceTag = Soup.find_all(class_="text-robux-lg wait-for-i18n-format-render")
print(PriceTag)
This returns [<span class="text-robux-lg wait-for-i18n-format-render">1,250</span>] which is very much expected, but I don't know how to take 'PriceTag' and remove the HTML tags.

Try using the .text method:
print(PriceTag.text)
This will remove the HTML tags and extract the inner text of the selected element.
If this is a find_all, you need to use a for-loop to traverse:
for price_tag in PriceTag:
print(price_tag.text)

I am not that experienced but i'll have a go at your question
for price in Pricetag:
print(price.text.strip())

selenium exact match based on text

If I have some HTML:
<span class="select2-selection__rendered" id="select2-plotResults-container" role="textbox" aria-readonly="true" title="50">50</span>
And I want to find it using something like:
driver.find_element_by_xpath('//*[contains(text(), "50")]')
The problem is that there is 500 somewhere before on the webpage and it's picking up on that, is there way to search for a perfect match to 50?

Instead of contains, search for a specific text value:
driver.find_element_by_xpath('//*[text()="50"]')
And if you know it will be a span element, you can be a little more specific:
driver.find_element_by_xpath('//span[text()="50"]')
Note that your question asks how to find an element by its text value. If possible and would apply to your situation, you should look for a specific class or id, if known and consistent.

You can search for it by its absolute Xpath. For that, inspect the page and find the element. Then right-click it and copy its Xpath or full Xpath.
Otherwise you can use the id:
driver.find_element_by_id("select2-plotResults-container")
Here is more on locating elements.

use something like this
msg_box=driver.find_element_by_class_name('_3u328') and driver.find_element_by_xpath('//div[#data-tab = "{}"]'.format('1'))

Selenium Webscraping Twitter - Getting hold of tweet timestamp?

When inspecting a twitter results page, within the following class:
<small class="time">
....
</small>
Is a timestamp for each tweet 'data-time':
<span class="_timestamp js-short-timestamp js-relative-timestamp" data-time="1510698047" data-time-ms="1510698047000" data-long-form="true" aria-hidden="true">12m</span>
Within selenium i am using the following code:
tweet_date = browser.find_elements_by_class_name('_timestamp')
But looking at a single entry only returns, in this case, 12m.
How is it possible to access one of the other properties within the class within selenium?

I usually use find_elements_by_xpath, this will let you grab a specific element from a page without worrying about names. Or so that's how it seems to work.
EDIT
Alright so I think I've got it figured out. First, find element by xpath and assign.
ts=browser.find_elements_by_xpath('//*[#id="stream-item-tweet-929138668551380992"]/div/div[2]/div[1]/small/a/span')
Forgot that if you use "elements" instead of "element" you'll need to add something like this.
ts=ts[0]
Then you can use the get_attribute method to get the info associated with 'data-time' in the html.
raw_time=ts.get_attribute('data-time')
Returns
raw_time == '1510358895'

Thank you to SuperStew who found the key to the answer - get_attribute()
My final solution for anyone wondering:
tweet_date = browser.find_elements_by_class_name("_timestamp")
And then for any date in that list:
tweet_date[1].get_attribute('data-time')

Selenium XPath multiple attributes including text

Here is the HTML I'm dealing with
<a class="_54nc" href="#" role="menuitem">
<span>
<span class="_54nh">Other...</span>
</span>
</a>
I can't seem to get my XPath structured correctly to find this element with the link. There are other elements on the page with the same attributes as <a class="_54nc"> so I thought I would start with the child and then go up to the parent.
I've tried a number of variations, but I would think something like this:
crawler.get_element_by_xpath('//span[#class="_54nh"][contains(text(), "Other")]/../..')
None of the things I've tried seem to be working. Any ideas would be much appreciated.

Or, more cleaner is //*[.='Other...']/../.. and with . you are directly pointing to the parent element
In other scenario, if you want to find a tag then use css [role='menuitem'] which is a better option if role attribute is unique

how about trying this
crawler.get_element_by_xpath('//a[#class="_54nc"][./span/span[contains(text(), "other")]]')

Try this:
crawler.get_element_by_xpath('//a[#class='_54nc']//span[.='Other...']');
This will search for the element 'a' with class as "_54nc" and containing exact text/innerHTML "Other...". Furthermore, you can just edit the text "Other..." with other texts to find the respective element(s)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

XPath: Select specific item from property string - python

A pure XPath solution would involve just adding the #href to the expression: .//[#id='content']/div/div[1]/a/#href In Python, assuming you are using lxml.html, you can get the attribute using the .attrib: for link in root.xpath(".//[#id='content']/div/div[1]/a"): print(link.attrib['href'])

Try to avoid this array if your class name is unique you can do it like:- //*[#id='content']/div/div[#class='abc class']/a[#keys='xyz']/#href Hope it will help you :)

Related

I'm having trouble selectiong an element using Selenium with Python

Remove HTML Tags python

selenium exact match based on text

Selenium Webscraping Twitter - Getting hold of tweet timestamp?

Selenium XPath multiple attributes including text

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

XPath: Select specific item from property string - python

A pure XPath solution would involve just adding the #href to the expression: .//*[#id='content']/div/div[1]/a/#href In Python, assuming you are using lxml.html, you can get the attribute using the .attrib: for link in root.xpath(".//*[#id='content']/div/div[1]/a"): print(link.attrib['href'])

Try to avoid this array if your class name is unique you can do it like:- //*[#id='content']/div/div[#class='abc class']/a[#keys='xyz']/#href Hope it will help you :)

Related

I'm having trouble selectiong an element using Selenium with Python

Remove HTML Tags python

selenium exact match based on text

Selenium Webscraping Twitter - Getting hold of tweet timestamp?

Selenium XPath multiple attributes including text

Categories

Resources

A pure XPath solution would involve just adding the #href to the expression: .//[#id='content']/div/div[1]/a/#href In Python, assuming you are using lxml.html, you can get the attribute using the .attrib: for link in root.xpath(".//[#id='content']/div/div[1]/a"): print(link.attrib['href'])