Selenium : How to wait then click? - python

I'm using selenium for automation and i want to click in each one of the <ul>elements then wait before clicking again in the element. This is my code but it doesn't seem to be the solution :
def navBar():
driver=setup()
navBar_List = driver.find_element_by_class_name("nav")
listItem = navBar_List.find_elements_by_tag_name("li")
for item in listItem :
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.TAG_NAME,"li")))
item.click()
Here the HTLM code :
<ul class="nav navbar-nav">
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
</ul>

Is Thread.sleep(100) an option?

Define your li with .find_elements.
Use xpath for recognize them : //ul[#class='nav navbar-nav']//li.
With loop you can utilize increment to wait each li. I'm imagine it will produce like below:
(xpath)[1]
(xpath)[2]
etc...
And try the following code:
listItem = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH,"//ul[#class='nav navbar-nav']//li")))
for x in range(1, len(listItem)+1):
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"(//ul[#class='nav navbar-nav']//li)[" +str(x) +"]"))).click()

Related

How to click on li element of a dropdown list using Selenium in Python^

I'm trying to select the li element "US" in this dropdown list of the following website: https://proxyscrape.com/free-proxy-list
Here is the python code I have but does not work:
driver.find_element_by_xpath('/html/body/main/div/div[1]/div[3]/div/div[1]/div[2]').click()
time.sleep(4)
driver.find_element_by_css_selector("#list httpcountry countryselect [value='US']")
And here is the HTML I'm working with:
<div class="nice-select selectortypes open" tabindex="0">
<span>
Country: <span class="current">all</span>
</span>
<ul class="list httpcountry countryselect">
<li data-value="all" class="option">all</li>
<li data-value="US" class="option">US</li>
<li data-value="ES" class="option">ES</li>
<li data-value="RU" class="option">RU</li>
<li data-value="PL" class="option">PL</li>
<li data-value="BD" class="option">BD</li>
<li data-value="IR" class="option">IR</li>
<li data-value="FR" class="option">FR</li>
<li data-value="CN" class="option">CN</li>
<li data-value="CA" class="option">CA</li>
<li data-value="PK" class="option">PK</li>
<li data-value="IN" class="option">IN</li>
<li data-value="ID" class="option">ID</li>
<li data-value="BR" class="option">BR</li>
<li data-value="DE" class="option">DE</li>
<li data-value="GB" class="option">GB</li>
<li data-value="TH" class="option">TH</li>
<li data-value="SG" class="option">SG</li>
<li data-value="EG" class="option">EG</li>
<li data-value="UA" class="option">UA</li>
</ul>
</div>
Any clue on how to select this element?
Solution:
You need to wait before clicking on country dropdown list, because of top-banner appears and webdriver losts the focus, dropdown closes.
Here is the code which i wrote and script has passed after adding two sleeps in these two places:
driver.get('https://proxyscrape.com/free-proxy-list')
country_list = driver.find_element_by_css_selector('.list.socks4country.countryselect').find_element_by_xpath('./..')
sleep(2)
country_list.click()
sleep(1)
country_list.find_element_by_css_selector('[data-value="US"]').click()
us = driver.find_element_by_css_selector('[class="list socks4country countryselect"] [data-value="US"]')
assert us.get_attribute('class') == 'option selected'
If you look what happens when you select US option, then you can see, that it will change parameters in request here:
From:
Download
To:
Download
So, you actually don't want to click US option, but probably send request with appropriate parameters

Extracting full URL from href tag in scrapy

I'm trying to use scrapy to scrape URLs from offers from this site
This is the code I tried:
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
But my code returns something very different from a URL.
Here is the HTML code of the div I'm interested in.
<div class="offer-item-details">
<header class="offer-item-header">
<h3>
<a href="https://www.otodom.pl/oferta/gdansk-pod-inwestycje-cicha-lokalizacja-ID46DXu.html#ab04badaa0" data-tracking="click_body" data-tracking-data="{"touch_point_button":"title"}" data-featured-name="promo_top_ads">
<strong class="visible-xs-block">42 m²</strong>
<span class="text-nowrap">
<span class="offer-item-title">Gdańsk/ Pod Inwestycje/ Cicha Lokalizacja</span>
</span>
</a>
</h3>
<p class="text-nowrap"><span class="hidden-xs">Mieszkanie na sprzedaż: </span>Gdańsk, Ujeścisko-Łostowice, Łostowice</p>
<div class="vas-list-no-offer">
<a class="button-observed observe-link favourites-button observed-text svg-heart add-to-favourites" data-statkey="ad.observed.list" rel="nofollow" data-id="60688916" href="#" title="Obserwuj">
<div class="observed-text-container" style="display: flex;">
<span class="icon observed-60688916"></span>
<i class="icon-heart-filled"></i>
<div class="observed-label">Dodaj do ulubionych</div>
</div>
</a>
</div>
</header>
<ul class="params
" data-tracking="click_body" data-tracking-data="{"touch_point_button":"body"}">
<li class="offer-item-rooms hidden-xs">2 pokoje</li>
<li class="offer-item-price">
346 000 zł </li>
<li class="hidden-xs offer-item-area">42 m²</li>
<li class="hidden-xs offer-item-price-per-m">8 238 zł/m²</li>
</ul>
</div>
Copied selector of that tag:
#offer-item-ad_id45Wog > div.offer-item-details > header > h3 > a
Copied xPath
//*[#id="offer-item-ad_id45Wog"]/div[1]/header/h3/a
Copied full xPath
/html/body/div[3]/main/section[2]/div/div/div[1]/div/article[1]/div[1]/header/h3/a
Your code gives you a list of the URLs. The extract() method in this case gets a list. To allow scrapy to extract the data you will have to do a for loop and yield statement.
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
for a in url:
yield{'url', a}

Using Selenium Webdriver, grabbing data not showing up in innerhtml

I am trying to use selenium to grab text data from a page.
Printing the html attributes:
element = driver.find_element_by_id("divresults")
Results:
print(element.get_attribute('innerHTML'))
<div id="divDesktopResults"> </div>
Results:
print(element.get_attribute('outerHTML'))
<div id="divresults" data-bind="html:resultsContent"><div id="divDesktopResults"> </div></div>
Tried grabbing this element
Results:
driver.find_element_by_css_selector("span[class='glyphicon glyphicon-tasks']")
Message: no such element: Unable to locate element: {"method":"css selector","selector":"span[class='glyphicon glyphicon-tasks']"}
This is the code when copied from the Browser. There is much more below 'divresults' that did not show up in the innerhtml printout
<div id="divresults" data-bind="html:resultsContent">
<div>
<div class="row" style="font-size:8pt;">
<a data-toggle="tooltip" style="text-decoration:underline" href="#pdfviewer?ID=D218101736">
<strong>D218101736 </strong>
<span class="glyphicon glyphicon-new-window"></span>
</a>
<div class="btn-group" style="font-size:8pt;margin-left:10px;" id="btnD218101736">
<span style="display:none;font-size:8pt;" id="lblD218101736"> Added To Cart</span>
<button type="button" style="font-size:8pt;" class="btn btn-primary dropdown-toggle" data-toggle="dropdown"> Add To Cart
<span class="caret"></span>
</button>
<ul class="dropdown-menu" role="menu">
<li> <strong>Regular ($7.00)</strong> </li>
<li> <strong>Certified ($12.00)</strong> </li>
</ul>
</div>
</div> <br>
<ul class="nav nav-tabs compact">
<li class="active">
<a data-toggle="tab" href="#D218101736_Doc">
<span class="glyphicon glyphicon-file"></span>
<span>Doc Info</span>
</a>
</li>
<li class="hidden-xs">
<a data-toggle="tab" href="#D218101736_Thumbnail">
<span class="glyphicon glyphicon-th-large"></span>
<span>Thumbnail</span>
</a>
</li>
....
How to I get data beneath divresults in the instance?
My guess is that it's one of two things:
There is more than one element that matches that locator. To investigate this, try using $$("#divresults") in the dev console and make sure that it returns 1. If it returns more than one, run $$("#divresults")[0] and make sure the element returned is the one you want. If it is, go on to step 2. If it isn't, you will need to find a locator that is more specific. If you want our help, you will need to provide a link to the page or more of the surrounding HTML to the desired element.
You need to add a wait so that the contents of the element can finish loading. You could wait for a locator like #divresults strong or any number of locators to find some of the elements that were missing. You would wait for them to be visible (or at least present). See the docs for more info and options.

Can I access the subchild of a parent in XPath?

So as the title states I have some HTML code from http://chem.sis.nlm.nih.gov/chemidplus/name/acetone that I am parsing and want to extract some data like the Acetone under MeSH Heading from my similar post How to set up XPath query for HTML parsing?
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<div class="yui3-g-r">
<div class="yui3-u-1-4">
<ul>
<li id="ds2">
<div>2-Propanone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds3">
<div>Acetone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds4">
<div>Acetone [NF]</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds5">
<div>Dimethyl ketone</div>
</li>
</ul>
</div>
</div>
<h3>MeSH Heading</h3>
<ul>
<li id="ds6">
<div>Acetone</div>
</li>
</ul>
</div>
</div>
Previously in other pages I would do mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content() to extract the data because other pages had similar structures, but now I see that is not the case as I didn't account for inconsistency. So, is there a way of after going to the node that I want and then obtaining it's subchild, allowing for consistency across different pages?
Would doing tree.xpath('//*[text()="MeSH Heading"]//preceding-sibling::text()[1]') work?
From what I understand, you need to get the list of items by a heading title.
How about making a reusable function that would work for every heading in the "Names and Synonyms" container:
from lxml.html import parse
tree = parse("http://chem.sis.nlm.nih.gov/chemidplus/name/acetone")
def get_contents_by_title(tree, title):
return tree.xpath("//h3[. = '%s']/following-sibling::*[1]//div/text()" % title)
print get_contents_by_title(tree, "Name of Substance")
print get_contents_by_title(tree, "MeSH Heading")
Prints:
['2-Propanone', 'Acetone', 'Acetone [NF]', 'Dimethyl ketone']
['Acetone']

Edit text from html with BeautifulSoup

I'm currently trying to extract the html elements which have a text on their own and wrap them with a special tag.
For example, my HTML looks like this:
<ul class="myBodyText">
<li class="fields">
This text still has children
<b>
Simple Text
</b>
<div class="s">
<ul class="section">
<li style="padding-left: 10px;">
Hello <br/>
World
</li>
</ul>
</div>
</li>
</ul>
I'm trying to wrap tags only around the tags, so I can further parse them at a later time, so I tried to make it look like this:
<ul class="bodytextAttributes">
<li class="field">
[Editable]This text still has children[/Editable]
<b>
[Editable]Simple Text[/Editable]
</b>
<div class="sectionFields">
<ul class="section">
<li style="padding-left: 10px;">
[Editable]Hello [/Editable]<br/>
[Editable]World[/Editable]
</li>
</ul>
</div>
</li>
</ul>
My script so far, which iterates just fine, but the placement of the edit placeholders isn't working and I have currently no idea how I can check this:
def parseSection(node):
b = str(node)
changes = set()
tag_start, tag_end = extractTags(b)
# index 0 is the element itself
for cell in node.findChildren()[1:]:
if cell.findChildren():
cell = parseSection(cell)
else:
# safe to extract with regular expressions, only 1 standardized tag created by BeautifulSoup
subtag_start, subtag_end = extractTags(str(cell))
changes.add((str(cell), "[/EditableText]{0}[EditableText]{1}[/EditableText]{2}[EditableText]".format(subtag_start, str(cell.text), subtag_end)))
text = extractText(b)
for change in changes:
text = text.replace(change[0], change[1])
return bs("{0}[EditableText]{1}[/EditableText]{2}".format(tag_start, text, tag_end), "html.parser")
The script generates following:
<ul class="myBodyText">
[EditableText]
<li class="fields">
This text still has children
[/EditableText]
<b>
[EditableText]
Simple Text
[/EditableText]
</b>
[EditableText]
<div class="s">
<ul class="section">
<li style="padding-left: 10px;">
Hello [/EditableText]
<br/>
[EditableText][/EditableText]
<br/>
[EditableText]
World
</li>
</ul>
</div>
</li>
[/EditableText]
</ul>
How I can check this and fix it? I'm grateful for every possible answer.
There is a built-in replace_with() method that fits the use case nicely:
soup = BeautifulSoup(data)
for node in soup.find_all(text=lambda x: x.strip()):
node.replace_with("[Editable]{}[/Editable]".format(node))
print soup.prettify()

Categories

Resources