Pulling text value from <a class> <b class> with Selenium - python

I'm using selenium to draw some information about ownership for a given PredictIt market (ie, https://www.predictit.org/Home/SingleOption?contractId=7347#data). The shares owned is nested in:
How can I pull out the number?
I've tried
self.driver.find_element_by_class_name("showpointer showOwnership").text
self.driver.find_element_by_id('showpointer showOwnership')
self.driver.find_element_by_class_name("label alert-success label-lg")
self.find_element_by_css_selector("spand[class='label alert-success label-lg']")
self.driver.findElement(By.cssSelector("#ctrlNotesWindow .notesData > .notesDate")).getText())
all to no avail. Any suggestions would be greatly appreciated! Thanks :)
Edit: All errors have been:
"NoSuchElement Error"

Why your attempts failed:
the "by class name" locators requires you to specify a single class name - not multiple of them
there are no id elements present - the "by id" locator would match nothing
you are looking for a non-existent spand element with your CSS selector. Plus, you are calling find_element_by_css_selector on self instead of self.driver
your last attempt is in Java, not in Python
Judging by what you've posted, I would use a CSS selector checking classes of a and b:
self.driver.find_element_by_css_selector("a.showOwnership > b.label").text

Related

Selenium Webscraping Twitter - Getting hold of tweet timestamp?

When inspecting a twitter results page, within the following class:
<small class="time">
....
</small>
Is a timestamp for each tweet 'data-time':
<span class="_timestamp js-short-timestamp js-relative-timestamp" data-time="1510698047" data-time-ms="1510698047000" data-long-form="true" aria-hidden="true">12m</span>
Within selenium i am using the following code:
tweet_date = browser.find_elements_by_class_name('_timestamp')
But looking at a single entry only returns, in this case, 12m.
How is it possible to access one of the other properties within the class within selenium?
I usually use find_elements_by_xpath, this will let you grab a specific element from a page without worrying about names. Or so that's how it seems to work.
EDIT
Alright so I think I've got it figured out. First, find element by xpath and assign.
ts=browser.find_elements_by_xpath('//*[#id="stream-item-tweet-929138668551380992"]/div/div[2]/div[1]/small/a/span')
Forgot that if you use "elements" instead of "element" you'll need to add something like this.
ts=ts[0]
Then you can use the get_attribute method to get the info associated with 'data-time' in the html.
raw_time=ts.get_attribute('data-time')
Returns
raw_time == '1510358895'
Thank you to SuperStew who found the key to the answer - get_attribute()
My final solution for anyone wondering:
tweet_date = browser.find_elements_by_class_name("_timestamp")
And then for any date in that list:
tweet_date[1].get_attribute('data-time')

python selenium accessing html elements on more than two levels

I have a page that has to be scrapped.I use the python code
div = driver.find_element_by_class_name("parent")
data = div.find_elements_by_class_name("child1")
//I cannot access the web elements of **data** for eg: data.find_elements_by
for tag in data
//I cannot print the information of each div here
the Html
<div class="Parent">
<div class = child1 >
<div class = "heading">
data
</div>
</div>
<div class = child1 child2 >strong text
<div class = "heading">
<span>data</span>
</div>
</div>
</div>
Is there an easy way to access data
Well you can access html tags or text in different ways http://selenium-python.readthedocs.io/locating-elements.html
For multiple elements you can use :
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
There isn't a simple solution as far as I'm aware only by having specifics about the information you're looking for.
For instance let's you're using xpath (my personal preference):
Absolute XPath :
/html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
We can use Absolute xpath: /html/body/div[2]/div/div/footer/section[3]/div/ul/li[3]/a
Above xpath will technically work, but each of those nested
relationships will need to be present 100% of the time, or the locator
will not function. Above choosed xpath is known as Absolute xpath.
There is a good chance that your xpath will vary in every release. It
is always better to choose Relative xpath, as it helps us to reduce
the chance of element not found exception.
Relative xpath: //*[#id=’social-media’]/ul/li[3]/a
We can have a different approach to the data, therefore by using the correct way to 'select' the data we need, we can only extract/select the needed information. Look into each of these methods to understand them better, because you're asking for one line of code and each of those have their pros/cons (times when they can be useful or not).
It seems you want to access text which is inside heading div, if it is so then you can try the below code.
element=driver.find_element_by_class_name("heading")
data=element.text
assuming you are asking a way to loop through data where the info present is located in different locators in different nesting levels
There are multiple ways,
look for various selectors that match your pattern - find a way to do it that matches your problem - you refer css/xpath selector reference
if there are many selectors( but consistenly being used), you can use ByChained/ByAll Selectors look for the implementation in java, it will be like this, you can mimic the implementation,
selector1 = .Heading .child2
selector3 = .Heading .child3 span
selector2 = .Heading .child1
ByAll(selector1,selector2,selector3)'
if the parent is the only matching selector and there's no way to know abt child selectors, then another way is to use innerText/textContent property from a common parent
driver.findElement(By.cssSelector
('.child1').getAttribute('innerText')
if none of these, solves your problem, and you application is dynamic enough to use different references and different nesting levels each time for all the page, then it was meant to be not scrapped. so your should look for other ways of scrapping it.

to get xpath in python selenium for ID and class together

In python selenium, how to create xpath for below code which needs only id and class:
<button type="button" id="ext-gen756" class=" x-btn-text">Save</button>
And I also need to select Global ID from below drop-down without clicking it.
<div class="x-combo-list-item">Global ID</div>
My below solution is not working-
//div[#class='x-combo-list-item']/div[contains(.,'Global ID')]
I do not want to mention droplist sequence number like-
//div[#class='x-combo-list-item']/div[1]
If you want to club id and class together in your xpath try like this-
driver.find_element_by_xpath('//button[#id="ext-gen756"][#class=" x-btn-text"]');
You can also try the same using AND -
driver.find_element_by_xpath('//button[#id="ext-gen756" and #class=" x-btn-text"]');
EDITED
Your xpath seem incorrect. Use following -
driver.find_element_by_xpath('//div[#class="x-combo-list-item"][contains(.,"Global ID")]');
Just answering my own question after a long time had a look on this. The Question was posted when I was new in xpath topics.
<button type="button" id="ext-gen756" class=" x-btn-text">Save</button>
in terms of id and class:
driver.find_element_by_xpath("//button[#id='ext-gen756'][#class=' x-btn-text']")
Also sometime if Id's are dynamic and changes for every reload of the page then you may try:
driver.find_element_by_xpath("//button[#type='Save'][contains(#id,'ext-gen')][#class=' x-btn-text']")
Here I have used #type and for the #id contains option as prefix(ext-gen) usually remains the same for the dynamic ID's

XPath: Select specific item from property string

Trying to drill down to a specific Xpath of a url in a longer string. I've gotten down to each of the listed blocks, but can't seem to get any further than the long string of properties.
example code:
<div class="abc class">
<a class="123" title="abc" keys="xyz" href="url string">
Right now I have...
.//*[#id='content']/div/div[1]/a
That only retrieves the whole string of data, from class through href. What would I need to just retrieve the "url string" from that part? Would this need to be accomplished with a subsequent 'for' argument in the python input?
A pure XPath solution would involve just adding the #href to the expression:
.//*[#id='content']/div/div[1]/a/#href
In Python, assuming you are using lxml.html, you can get the attribute using the .attrib:
for link in root.xpath(".//*[#id='content']/div/div[1]/a"):
print(link.attrib['href'])
Try to avoid this array
if your class name is unique you can do it like:-
//*[#id='content']/div/div[#class='abc class']/a[#keys='xyz']/#href
Hope it will help you :)

Selenium XPath multiple attributes including text

Here is the HTML I'm dealing with
<a class="_54nc" href="#" role="menuitem">
<span>
<span class="_54nh">Other...</span>
</span>
</a>
I can't seem to get my XPath structured correctly to find this element with the link. There are other elements on the page with the same attributes as <a class="_54nc"> so I thought I would start with the child and then go up to the parent.
I've tried a number of variations, but I would think something like this:
crawler.get_element_by_xpath('//span[#class="_54nh"][contains(text(), "Other")]/../..')
None of the things I've tried seem to be working. Any ideas would be much appreciated.
Or, more cleaner is //*[.='Other...']/../.. and with . you are directly pointing to the parent element
In other scenario, if you want to find a tag then use css [role='menuitem'] which is a better option if role attribute is unique
how about trying this
crawler.get_element_by_xpath('//a[#class="_54nc"][./span/span[contains(text(), "other")]]')
Try this:
crawler.get_element_by_xpath('//a[#class='_54nc']//span[.='Other...']');
This will search for the element 'a' with class as "_54nc" and containing exact text/innerHTML "Other...". Furthermore, you can just edit the text "Other..." with other texts to find the respective element(s)

Categories

Resources