How to combine find_all() and find_next() in BeautifulSoup? - python

So I have such pieces of HTML that I'm trying to parse. What I want to grab is the price ("84.00 USD"):
<div class="HeaderAndValues_headerDetailSection__3c2SZ ProductCatalog_price__25i2r">
<div class="HeaderAndValues_header__3dB61">Wholesale</div>
<span class="notranslate">
<div class="">84.00 USD</div>
</span>
</div>
soup.find(text="Wholesale").find_next().text gives me exactly what I need but only for the first search result. Is there anyway I could combine find_all() and find_next()? smth like soup.find_all(text="Wholesale").find_next() that would grab next text for each found "Wholesale"

Ok, I've found it! Someone might still find it useful
[x.find_next().text for x in page.find_all(text = "Wholesale")]

Related

Exclude span from parsing with requests-html

I need help with parsing a web page with Python and requests-html lib. Here is the <div> that I want to analyze:
<div class="answer"><span class="marker">А</span>Te<b>x</b>t</div>
It renders as:
Text
I need to get Te<b>x</b>t as a result of parsing, without <div> and <span> but with <b> tags.
Using element as a requests-html object, here is what I am getting.
element.html:
<div class="answer"><span class="marker">А</span>Te<b>x</b>t</div>
element.text:
ATe\nx\nt
element.full_text:
AText
Could you please tell me how can I get rid of <span> but still get <b> tags in the parsing result?
Don't overcomplicate it.
How about some simple string processing and get the string between two boundaries:
Use element.html
take everything after the close </span>
Take everything before the close </div>
Like this
myHtml = '<div class="answer"><span class="marker">А</span>Te<b>x</b>t</div>'
myAnswer = myHtml.split("</span>")[1]
myAnswer = myAnswer.split("</div>")[0]
print(myAnswer)
output:
Te<b>x</b>t
Seems to work for your sample provided. If you have more complex requirements let us know and I'm sure someone can adapt thus further.

How can I click contains this?

I have following HTML:
<pre>
<div class="selected_text">
<span class="memo">
<span class="title">01.</span>I want This
</span>
<span class="blank"></span>
<span class="price">
100
</span>
</div>
<pre>
I want to find the string that contains text I want this and click it.
So I try to use
driver.find_element_by_xpath('//span[contains(text(),"want")]').click()
But It's not working
How can i perform click on this element ?
Seems there is an issue with your xpath. Use . instead of text() in contains method. Refer this answer to identify difference between them.
Use explicit wait to avoid unnecessary timeout related things.
Refer below code :
WebDriverWait(driver, 45).until(EC.element_to_be_clickable((By.XPATH, "//span[contains(.,'I want This')]"))).click()

When using BeautifulSoup, html has needed data in a different index number in some search results

I am having an issue with a website's format causing certain information within a container to have different index numbers from one search result to the next.
I am scraping pieces of data from search results. The location/Index Numbers are different in a few cases.
Basically, the exact text I need scraped from the html below is "7XB21".
<dl class="last">
::before
<dt>Part Code:</dt>
<dd>
"7XB21"
<span class="separator">,</span>
< /dd>
<dt>Weight:</dt>
<dd>97</dd>
</dl>
This is easy to do the with Python code below, as it gets me the result I need which is "7XB21"
modelcode_container = container.find_all("dd")
modelcode = (modelcode_container[5].text)
HOWEVER!
Some of the HTML code scraped, while being structured the same, lacks some information which the above example shows. Here is an example of the troublesome HTML:
<dl class="last">
<dt>Stock id:</dt>
<dd>c12
<span class="separator">,</span>
</dd>
<dt>Part Code:</dt>
<dd>
"8B727"
<span class="separator">,</span>
</dd>
<dt>Weight:</dt>
<dd>102</dd>
</dl>
Do you see the difference? I would need to specify a different index number to capture the proper data which is "8B727" in this case.
I am not sure how to go about setting that up, any help would be appreciated. Thank you!
If you are certain that <dt>Part Code:</dt> occurs before that you could use find_next_sibling() to get the dd tag next to it.
soup.find('dt',text="Part Code:").find_next_sibling('dd')

Selenium, select a particular span element

I'm trying to use selenium with python to make some tests. I'm having trouble in selecting an element. this element makes is a part of a drop-down list and it looks like this:
<li data-original-index="16">
<a tabindex="0" class="" data-normalized-text="<span class='text'>Porto</span>">
<li data-original-index="17">
<a tabindex="0" class="" data-normalized-text="<span class='text'>Santarem</span>">
And so on. I want to select the one with the span text "Porto".
I tried the following, but with no success:
driver.find_element_by_xpath("//span[text()="Porto"]")
Any idea on how can I do this?
Try
driver.find_element_by_link_text("Porto");
Based on the HTML you linked it seems like it might be:
driver.find_element_by_xpath("//a[#data-normalized-text="<span class='text'>Porto</span>"]")
I could be of more help if you posted all the HTML.

How to read a particular value from a web page in Python/Selenium

I want to read the amount value (24.40) from this HTML.
<div id="order-total" class="clear-fix" style="margin-bottom:20px;">
<h3 class="col-left">Order total</h3>
<h3 class="col-right" style="display: block;">
<span class="credit-total-to-order" data-total-to-order="24.40">$ 24.40</span>
credits
</h3>
</div>
xpath - /html/body/div/header/section/form/div[5]/h3[2]/span
css - html body.ui-lang-en div#slave-edit.string-v2 header#slave-edit-header.edit
section#order-form form#frm-order-translation div#order-total.clear-fix
h3.col-right span.credit-total-to-order
I know I should use find_element_by_class_name or find_element_by_css_selector.
But not sure what should be the argument.
How can I do it?
Why not select the value from the element and parse the string to get the answer you need. For example, you can split the string and disregard the dollar to return the number you need.
someString = selenium.find_element_by_css_selector(".credit-total-to-order").text
someString.split(' ')[1]
Bear in mind - this will only work for the example you have provided.
Its not necessary to use find_element_by_class_name or find_element_by_css_selector..You can achive it with xpath like this
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").text
UPDATE:
As per your updated html it looks like the style makes your element hidden.Mean while I also came to notice that the value you want to get is also stored in an attribute data-total-to-order.
So you can do somthing like this :
driver.find_element_by_xpath("//span[#class='credit-total-to-order']").get_Attribute("data-total-to-order")

Categories

Resources