I intended to get texts next to the div-class like below codes.
<div class="review-contents__text">소재가 좀 저렴해 보이지만 그래도 입으면 휠씬 나아보여요</div>
First, I make a code like this.
texts_outer_review1 = driver.find_elements(By.CSS_SELECTOR, "div.review-contents > div.review-contents__text")
outer_TextData.append(texts_outer_review1)
But, It shows what seems to be interpreted in Selenium lib as captured image
Can I get the texts next to class names By codes?:
texts_outer.review1.texts()
driver.find_elements(...).texts()
I will try to run codes above, But I don't know it's right.
Or Should I add more sentences in 'driver.find_elements'?
what you just need is 'text' attribute.
but as you used find_element's' method, you would have list type.
so the solution is
result = [element.text for element in texts_outer_review1]
or
result = [element.get_property('text') for element in texts_outer_review1]
and you can append or extend(cause the result is list type) on outer_TextData
Related
I am learning web scraping using selenium and I've come into an issue when trying to select an attribute inside of a selenium object. I can get the broader data if I just print elems.text inside the loop (this outputs the whole paragraph for each listing) however when I try to access the xpath of the h2 title tag of all the listings inside this broader element, it only appends the first listing to the titles array, whereas I want all of them. I checked the XPATH and they are the same for each listing. How can I get all of the listings instead of just the first one?
titles = []
driver.get("https://www.sellmytimesharenow.com/timeshare/All+Timeshare/vacation/buy-timeshare/")
results = driver.find_elements(By.CLASS_NAME, "results-list")
for elems in results:
print(elems.text) #this prints out full description paragraphs
elem_title = elems.find_element(By.XPATH, '//*[#id="search-page"]/div[3]/div/div/div[2]/div/div[2]/div/a[1]/div/div[1]/div/h2')
titles.append(elem_title.text)
If you aren't limited to accessing the elements by XPATH only, then here is my solution:
results = driver.find_elements(By.CLASS_NAME, "result-box")
for elems in results:
titles.append(elems.text.split("\n")[0])
When you try getting the listings, you use find_elements(By.CLASS_NAME, "results-list"), but on the website, there is only one element with the class name "results-list". This aggregates all the text in this div into one long string and therefore you can't get the heading.
However, there are multiple elements with the class name "result-box", so find_elements will store each as its own item in "results". Because the title of each listing is on the first line, you can split the text of each element by the newline.
So I have this site and I'm trying to obtain the location and size of an element based on this xpath "//div[#class='titlu']"
How you can see that is visible and has nothing special.
Now the problem I've faced is that when I'm doing the search for xpath like this
e = self.driver.find_element_by_xpath(xpath) the location and size
of e are both 0
Also, for some reason, if I'm trying to get the text like this:
e.text is going to show me an empty string, and I need to get the actual text in this way e.get_attribute("textContain")
So do you have any idea how can I get the location and size of this element?
There are two elements matching this xpath. driver.find_element_by_xpath returns the first one while you are looking for the second one. Use the ancestor <div> with id attribute for unique xpath
"//div[#id='content-detalii']//div[#class='titlu']"
I have been struggling with this for a while now.
I have tried various was of finding the xpath for the following highlighted HTML
I am trying to grab the dollar value listed under the highlighted Strong tag.
Here is what my last attempt looks like below:
try:
price = browser.find_element_by_xpath(".//table[#role='presentation']")
price.find_element_by_xpath(".//tbody")
price.find_element_by_xpath(".//tr")
price.find_element_by_xpath(".//td[#align='right']")
price.find_element_by_xpath(".//strong")
print(price.get_attribute("text"))
except:
print("Unable to find element text")
I attempted to access the table and all nested elements but I am still unable to access the highlighted portion. Using .text and get_attribute('text') also does not work.
Is there another way of accessing the nested element?
Or maybe I am not using XPath as it properly should be.
I have also tried the below:
price = browser.find_element_by_xpath("/html/body/div[4]")
UPDATE:
Here is the Full Code of the Site.
The Site I am using here is www.concursolutions.com
I am attempting to automate booking a flight using selenium.
When you reach the end of the process of booking and receive the price I am unable to print out the price based on the HTML.
It may have something to do with the HTML being a java script that is executed as you proceed.
Looking at the structure of the html, you could use this xpath expression:
//div[#id="gdsfarequote"]/center/table/tbody/tr[14]/td[2]/strong
Making it work
There are a few things keeping your code from working.
price.find_element_by_xpath(...) returns a new element.
Each time, you're not saving it to use with your next query. Thus, when you finally ask it for its text, you're still asking the <table> element—not the <strong> element.
Instead, you'll need to save each found element in order to use it as the scope for the next query:
table = browser.find_element_by_xpath(".//table[#role='presentation']")
tbody = table.find_element_by_xpath(".//tbody")
tr = tbody.find_element_by_xpath(".//tr")
td = tr.find_element_by_xpath(".//td[#align='right']")
strong = td.find_element_by_xpath(".//strong")
find_element_by_* returns the first matching element.
This means your call to tbody.find_element_by_xpath(".//tr") will return the first <tr> element in the <tbody>.
Instead, it looks like you want the third:
tr = tbody.find_element_by_xpath(".//tr[3]")
Note: XPath is 1-indexed.
get_attribute(...) returns HTML element attributes.
Therefore, get_attribute("text") will return the value of the text attribute on the element.
To return the text content of the element, use element.text:
strong.text
Cleaning it up
But even with the code working, there’s more that can be done to improve it.
You often don't need to specify every intermediate element.
Unless there is some ambiguity that needs to be resolved, you can ignore the <tbody> and <td> elements entirely:
table = browser.find_element_by_xpath(".//table[#role='presentation']")
tr = table.find_element_by_xpath(".//tr[3]")
strong = tr.find_element_by_xpath(".//strong")
XPath can be overkill.
If you're just looking for an element by its tag name, you can avoid XPath entirely:
strong = tr.find_element_by_tag_name("strong")
The fare row may change.
Instead of relying on a specific position, you can scope using a text search:
tr = table.find_element_by_xpath(".//tr[contains(text(), 'Base Fare')]")
Other <table> elements may be added to the page.
If the table had some header text, you could use the same text search approach as with the <tr>.
In this case, it would probably be more meaningful to scope to the #gdsfarequite <div> rather than something as ambiguous as a <table>:
farequote = browser.find_element_by_id("gdsfarequote")
tr = farequote.find_element_by_xpath(".//tr[contains(text(), 'Base Fare')]")
But even better, capybara-py provides a nice wrapper on top of Selenium, helping to make this even simpler and clearer:
fare_quote = page.find("#gdsfarequote")
base_fare_row = fare_quote.find("tr", text="Base Fare"):
base_fare = tr.find("strong").text
If I have the following HTML <a id="id" class="class" href="href">Element Text</a> how would I get "id" to be returned? My current procedure is:
print("Attribute: " + element.get_attribute('id'))
print("Property: " + element.get_property('id'))
print("Class: " + element.get_attribute('class'))
But all of those return empty strings. I am, however, able to get the text by using element.text
EDIT: Here's a more in depth explanation
I'm looking for an element but that element's ID varies. There is, however, an element linked to the element I want that I can find using it's xpath and comparing it's text to a specific text that I know beforehand. The ID of that element is something in the form of someID_XX. By taking the XX and appending it to another fixed string, I can then search for the element that I actually want. My issue is that once I get the second element (not the one I want directly, but the one that can lead me to the one I want) I can't seem to get it's ID attribute even though it seems to have one in the html. My question is, how do I get the id attribute?
For me is working with
element.get_attribute('id')
At the beginning didn't work because I selected incorrectly a hidden javascript element without id.
But if the error persist on your case, you can try this:
element.get_attribute('outerHTML')
get the DOM for the element, and then parse the html to extract the id
I have trouble getting nested Selectors to work as described in the documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html)
Here's what I got:
sel = Selector(response)
level3fields = sel.xpath('//ul/something/*')
for element in level3fields:
site = element.xpath('/span').extract()
When I print out "element" in the loop I get < Selector xpath='stuff seen above' data="u'< span class="something">text< /span>>
Now I got two problems:
Firstly, within the element, there should also be an "a"-node (as in <a href), but it doesn't show up in the print out, only if I extract it directly, then it does show up. Is that just a printing error or doesn't the "element-Selector" hold the a-node (without extraction)
when I print out "site" above, it should show a list with the span-nodes. However, it doesn't, it only prints out an empty list.
I tried a combination of changes (multiple to no slashes and stars (*) in different places), but none of it brought me any closer.
Essentially, I just want to get a nested Selector which gives me the span-node in the second step (the loop).
Anyone got any tips?
Regarding your first question, it's just a print "error". __repr__ and __str__ methods on Selectors only print the first 40 characters of the data (element represented as HTML/XML or text content). See https://github.com/scrapy/scrapy/blob/master/scrapy/selector/unified.py#L143
In your loop on level3fields you should use relative XPath expressions. Using /span will look for span elements directly under the root node, that's not what you want I guess.
Try this:
sel = Selector(response)
level3fields = sel.xpath('//ul/something')
for element in level3fields:
site = element.xpath('.//span').extract()