How to use selector select html element's attribute? - python

Currently, I am using Scrapy.
The selector works fine when matching
<xxx> something to match </xxx>
But I want to match
<xxx name="something I want match"> xxx </xxx>
What I want to match is inside the element tag.
I know Regex is one solution. Is there a easier way doing so.

I found 2 ways doing so:
1.sel.xpath('//baseTag/#attrName')
2.sel.css('baseTag::attr(attrName)')
see more

Related

How to use Regex in CSS Selector scrapy

I need to get a ul tag by the class name but the class name has a lot of different combinations but it is always just two letters that changes. product-gallerytw__thumbs could be one and product-galleryfp__thumbs could be one. I need to know how to use a css selector that uses regex so that either of these could be found (or any other combination)
I can't use Xpath as the location changes
img_ul = response.css('.product-gallerytw__thumbs')
print(img_ul)
This is what I am trying to do but have not found a way to add regex inside the .css()
You actually can use xpath:
img_ul = response.xpath("//*[contains(#class,'product-gallery')]")
or if you really need to specify everything but the two characters:
img_ul = response.xpath("//*[contains(#class,'product-gallery')][contains(#class,'__thumbs')]")
There is nothing a css selector can do that xpath can't. In fact css selectors are simply an abstraction of xpath selectors.

selenium exact match based on text

If I have some HTML:
<span class="select2-selection__rendered" id="select2-plotResults-container" role="textbox" aria-readonly="true" title="50">50</span>
And I want to find it using something like:
driver.find_element_by_xpath('//*[contains(text(), "50")]')
The problem is that there is 500 somewhere before on the webpage and it's picking up on that, is there way to search for a perfect match to 50?
Instead of contains, search for a specific text value:
driver.find_element_by_xpath('//*[text()="50"]')
And if you know it will be a span element, you can be a little more specific:
driver.find_element_by_xpath('//span[text()="50"]')
Note that your question asks how to find an element by its text value. If possible and would apply to your situation, you should look for a specific class or id, if known and consistent.
You can search for it by its absolute Xpath. For that, inspect the page and find the element. Then right-click it and copy its Xpath or full Xpath.
Otherwise you can use the id:
driver.find_element_by_id("select2-plotResults-container")
Here is more on locating elements.
use something like this
msg_box=driver.find_element_by_class_name('_3u328') and driver.find_element_by_xpath('//div[#data-tab = "{}"]'.format('1'))

How to use BeautifulSoup to get only strings from tags that have specific start?

I am scraping usernames and all of them are in the same a tag and their hrefs all start the same, like this:
Sadastyczny
I tried finding only if they have the class link5 but there are other values that have that class which I don't want to scrape. So is there a way to search for all the tags which have the
href="http://lolprofile.net/summoner"
in them but not the rest since that obviously is different for every username?
From the BeautifulSoup documentation.
Using a regular expression you can match the sites. If you have never heard of regular expressions you can use this:
soup.find_all(href=re.compile("http://lolprofile.net/summoner/*"))
Don't forget to import the re-module!

Select only tags that contain text with XPath in Selenium

I want to select only tags (without children/descendants) that contain text.
This is what I am looking for:
//*/descendant::text()[normalize-space()]
It doesn't work in Selenium. Is there a way to use this expression in Selenium with find_elements_by_xpath()?
When you say:
only tags (without children/descendants)
I'm assuming you mean no child/descendant element nodes. If this is correct, this xpath should (I don't use selenium) work...
//*[normalize-space() and not(*)]
This will select any element that contains text (other than whitespace) and doesn't contain a child element.
For example, it will not match p but it will match b in this case:
<p>text <b>more text</b></p>
Try
//*[contains(text()='textToFind')]
This will basically look for any tag, containing the specified text.

Find text and elements with python and selenium?

When I go to a certain webpage I am trying to find a certain element and piece of text:
<span class="Bold Orange Large">0</span>
This didn't work: (It gave an error of compound class names or something...)
elem = browser.find_elements_by_class_name("Bold Orange Large")
So I tried this: (but I'm not sure it worked because I don't really understand the right way to do css selectors in selenium...)
elem = browser.find_elements_by_css_selector("span[class='Bold Orange Large']")
Once I find the span element, I want to find the number that is inside...
num = elem.(what to put here??)
Any help with css selectors, class names, and finding element text would be great!!
Thanks.
EDIT:
My other problem is that there are multiple of those exact span elements but with different numbers inside..how can I deal with that?
you're correct in your usage of css selectors! Also your first attempt was failing because there are spaces in the class name and selenium does not seem to be able to find standalone identifiers with spacing at all. I think that is a bad development practice to begin with, so its not your problem. Selenium itself does not include an html editor, because its already been done before.
Try looking here: How to find/replace text in html while preserving html tags/structure.
Also this one is relevant and popular as well: RegEx match open tags except XHTML self-contained tags

Categories

Resources