I am trying to use selenium to grab text data from a page.
Printing the html attributes:
element = driver.find_element_by_id("divresults")
Results:
print(element.get_attribute('innerHTML'))
<div id="divDesktopResults"> </div>
Results:
print(element.get_attribute('outerHTML'))
<div id="divresults" data-bind="html:resultsContent"><div id="divDesktopResults"> </div></div>
Tried grabbing this element
Results:
driver.find_element_by_css_selector("span[class='glyphicon glyphicon-tasks']")
Message: no such element: Unable to locate element: {"method":"css selector","selector":"span[class='glyphicon glyphicon-tasks']"}
This is the code when copied from the Browser. There is much more below 'divresults' that did not show up in the innerhtml printout
<div id="divresults" data-bind="html:resultsContent">
<div>
<div class="row" style="font-size:8pt;">
<a data-toggle="tooltip" style="text-decoration:underline" href="#pdfviewer?ID=D218101736">
<strong>D218101736 </strong>
<span class="glyphicon glyphicon-new-window"></span>
</a>
<div class="btn-group" style="font-size:8pt;margin-left:10px;" id="btnD218101736">
<span style="display:none;font-size:8pt;" id="lblD218101736"> Added To Cart</span>
<button type="button" style="font-size:8pt;" class="btn btn-primary dropdown-toggle" data-toggle="dropdown"> Add To Cart
<span class="caret"></span>
</button>
<ul class="dropdown-menu" role="menu">
<li> <strong>Regular ($7.00)</strong> </li>
<li> <strong>Certified ($12.00)</strong> </li>
</ul>
</div>
</div> <br>
<ul class="nav nav-tabs compact">
<li class="active">
<a data-toggle="tab" href="#D218101736_Doc">
<span class="glyphicon glyphicon-file"></span>
<span>Doc Info</span>
</a>
</li>
<li class="hidden-xs">
<a data-toggle="tab" href="#D218101736_Thumbnail">
<span class="glyphicon glyphicon-th-large"></span>
<span>Thumbnail</span>
</a>
</li>
....
How to I get data beneath divresults in the instance?
My guess is that it's one of two things:
There is more than one element that matches that locator. To investigate this, try using $$("#divresults") in the dev console and make sure that it returns 1. If it returns more than one, run $$("#divresults")[0] and make sure the element returned is the one you want. If it is, go on to step 2. If it isn't, you will need to find a locator that is more specific. If you want our help, you will need to provide a link to the page or more of the surrounding HTML to the desired element.
You need to add a wait so that the contents of the element can finish loading. You could wait for a locator like #divresults strong or any number of locators to find some of the elements that were missing. You would wait for them to be visible (or at least present). See the docs for more info and options.
Related
I am using Selenium for Python to scrape a site with multiple pages. To get to the next page, I use driver.find_element(By.XPATH, xpath). However, The xpath text changes. So, instead, I want to use other attributes.
I tried to find by class, using "page-link": driver.find_element(By.CLASS_NAME, "page-link". However, the "page-link" class is also present in the disabled list item. As a result, the Selenium driver won't stop after the last page, in this case page 2.
I want to stop the driver clicking the disabled item on the page, i.e. I want it to ignore the last item in the list, the one with "page-item disabled", aria-disabled="true" and aria-hidden="true". The idea is that if the script can't find that item, it will end a while loop that relies on the ">" button to be enabled.
See the source code below.
Please advise.
<nav>
<ul class="pagination">
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1" rel="prev" aria-label="« Previous">‹</a>
</li>
<li class="page-item">
<a class="page-link" href="https://www.blucap.net/app/FlightsReport?fromdate=2023-02-01&todate=2023-02-28&filterByMemberId=&view=View%20Report&page=1">1</a>
</li>
<li class="page-item active" aria-current="page">
<span class="page-link">2</span>
</li>
<li class="page-item disabled" aria-disabled="true" aria-label="Next »">
<span class="page-link" aria-hidden="true">›</span>
</li>
</ul>
</nav>
To go to the Next Page there can be a couple of approaches:
You can opt to find_element() and click it's descendant <span> of the <li> with aria-label="Next »" but doesn't contains aria-disabled="true" as follows:
driver.find_element(By.XPATH, "//li[starts-with(#aria-label, 'Next') and not(#aria-disabled='true')]/span").click()
So I want to find and click the last product that isn't sold on a product page. Im using xPath to click on the product but I am having issues:
Selecting, exclusively, an unsold product.
Selecting the last unsold product.
This is an example of the code:
<li class =“product_container”>
<a data-testid=“product__item”>
<div class=“hover overlay”>
<img>..</img>
</div>
</li>
<li class=“product_container”>
<a data-testid=“product__item”>…</a>
<div class=“hover overlay”>
<div data-testid=“product__sold”>Sold</div>
</div>
</li>
The first list tag is an unsold product and the second list tag is a sold product (A hover overlay stating "sold")
So far I can find the last loaded element that satisfies the a/[#data-testid="product__item"] but every attempt I've made to find element that doesn't contain div/[#data-testid='product__sold'] doesn't work.
I apologise in advance is my writing and terminology is off, this is the first script I've attempted.
Bases on this xml:
<li class="product_container">
<a data-testid="product__item">...</a>
<div class="hover overlay">
<img>..</img>
</div>
</li>
<li class="product_container">
<a data-testid="product__item">...</a>
<div class="hover overlay">
<div data-testid="product__sold">Sold</div>
</div>
</li>
You need:
//li[not(descendant::div[#data-testid='product__sold'])][position()=last()]/a
The result is:
<a data-testid="product__item">...</a>
You can search for an element that doesn't have sibling with data-testid="product__sold"
(//a[last()][#data-testid="product__item"][not(following-sibling::div/div[#data-testid="product__sold"])])[last()]
I'm trying to use scrapy to scrape URLs from offers from this site
This is the code I tried:
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
But my code returns something very different from a URL.
Here is the HTML code of the div I'm interested in.
<div class="offer-item-details">
<header class="offer-item-header">
<h3>
<a href="https://www.otodom.pl/oferta/gdansk-pod-inwestycje-cicha-lokalizacja-ID46DXu.html#ab04badaa0" data-tracking="click_body" data-tracking-data="{"touch_point_button":"title"}" data-featured-name="promo_top_ads">
<strong class="visible-xs-block">42 m²</strong>
<span class="text-nowrap">
<span class="offer-item-title">Gdańsk/ Pod Inwestycje/ Cicha Lokalizacja</span>
</span>
</a>
</h3>
<p class="text-nowrap"><span class="hidden-xs">Mieszkanie na sprzedaż: </span>Gdańsk, Ujeścisko-Łostowice, Łostowice</p>
<div class="vas-list-no-offer">
<a class="button-observed observe-link favourites-button observed-text svg-heart add-to-favourites" data-statkey="ad.observed.list" rel="nofollow" data-id="60688916" href="#" title="Obserwuj">
<div class="observed-text-container" style="display: flex;">
<span class="icon observed-60688916"></span>
<i class="icon-heart-filled"></i>
<div class="observed-label">Dodaj do ulubionych</div>
</div>
</a>
</div>
</header>
<ul class="params
" data-tracking="click_body" data-tracking-data="{"touch_point_button":"body"}">
<li class="offer-item-rooms hidden-xs">2 pokoje</li>
<li class="offer-item-price">
346 000 zł </li>
<li class="hidden-xs offer-item-area">42 m²</li>
<li class="hidden-xs offer-item-price-per-m">8 238 zł/m²</li>
</ul>
</div>
Copied selector of that tag:
#offer-item-ad_id45Wog > div.offer-item-details > header > h3 > a
Copied xPath
//*[#id="offer-item-ad_id45Wog"]/div[1]/header/h3/a
Copied full xPath
/html/body/div[3]/main/section[2]/div/div/div[1]/div/article[1]/div[1]/header/h3/a
Your code gives you a list of the URLs. The extract() method in this case gets a list. To allow scrapy to extract the data you will have to do a for loop and yield statement.
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
for a in url:
yield{'url', a}
I'm using selenium for automation and i want to click in each one of the <ul>elements then wait before clicking again in the element. This is my code but it doesn't seem to be the solution :
def navBar():
driver=setup()
navBar_List = driver.find_element_by_class_name("nav")
listItem = navBar_List.find_elements_by_tag_name("li")
for item in listItem :
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.TAG_NAME,"li")))
item.click()
Here the HTLM code :
<ul class="nav navbar-nav">
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
<li tabindex="0">
<a class="h">
<div class="icon-left-navbar">
...
</div>
</a>
</li>
</ul>
Is Thread.sleep(100) an option?
Define your li with .find_elements.
Use xpath for recognize them : //ul[#class='nav navbar-nav']//li.
With loop you can utilize increment to wait each li. I'm imagine it will produce like below:
(xpath)[1]
(xpath)[2]
etc...
And try the following code:
listItem = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH,"//ul[#class='nav navbar-nav']//li")))
for x in range(1, len(listItem)+1):
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"(//ul[#class='nav navbar-nav']//li)[" +str(x) +"]"))).click()
This is a html code of upload a photo:
<div id="choose-photo" class="controls avatar-settings inline-upload-avatar dropdown center">
<div class="uploader-image uploader-avatar clearfix">
<div class="dropdown-menu">
<div class="dropdown-caret">
<span class="caret-outer"></span>
<span class="caret-inner"></span>
</div>
<ul tabindex="-1" role="menu" aria-hidden="true">
<li id="photo-choose-existing" class="photo-choose-existing upload-photo" role="presentation">
<button type="button" class="dropdown-link" role="menuitem">Prześlij zdjęcie</button>
<div class="photo-selector">
<button class="btn" type="button">
Zmień zdjęcie
</button>
<span class="photo-file-name">Nie wybrano pliku</span>
<div class="image-selector">
<input type="hidden" name="media_file_name" class="file-name">
<input type="hidden" name="media_data_empty" class="file-data">
<label class="t1-label">
<span class="u-hiddenVisually">Dodaj zdjęcie</span>
<input type="file" name="media_empty" class="file-input js-tooltip" tabindex="-1" accept="image/gif,image/jpeg,image/jpg,image/png" data-original-title="Dodaj zdjęcie">
</label>
</div>
</div>
</li>
<li id="photo-choose-webcam" class="u-hidden" role="presentation">
<button type="button" class="dropdown-link">Zrób zdjęcie</button>
</li>
<li id="photo-delete-image" class="u-hidden" role="presentation">
<button type="button" class="dropdown-link" role="menuitem">Usuń</button>
</li>
<li class="dropdown-divider" role="presentation"></li>
<li class="cancel-options" role="presentation">
<button type="button" class="dropdown-link" role="menuitem">Anuluj</button>
</li>
</ul>
</div>
</div>
</div>
I've created a simple method to send text to input (it's not visible on screen):
fileInput = driver.find_element_by_name('media_empty')
fileInput.send_keys(path)
But it doesn't do anything. Also I'm getting not any errors.
So, here's a second method, which may work:
<div class="ProfileAvatarEditing-buttonContainer">
<button class="ProfileAvatarEditing-button u-boxShadowInsetUserColorHover" type="button" tabindex="2">
<div class="ProfileAvatarEditing-addAvatarHelp">
<span class="Icon Icon--cameraPlus"></span>
<p>Dodaj zdjęcie profilowe</p>
</div>
<div class="ProfileAvatarEditing-changeAvatarHelp">
<span class="Icon Icon--camera"></span>
<p>Zmień zdjęcie profilowe</p>
</div>
<div class="ProfileAvatarEditing-dropAvatarHelp">
<span class="Icon Icon--cameraPlus"></span>
<p>Upuść zdięcie profilowe tutaj</p>
</div>
</button>
Here user can drap and drop file. I've found this question: Selenium: Drag and Drop from file system to webdriver? however I still don't know how can I use it in this situation.
So the question is how to send file path to the input to trigger file upload. In this case when you choose a file from file dialog or drag and drop it you'll see confirm window with preview on your photo. So then all what's left to do is to click confirm. But I don't know how to send it in the first place.
Any help will be appreciated.
edit:
I've found a solution (my own answer below):
fileInput = driver.find_element_by_xpath('//*[#id="photo-choose-existing"]/div/div/label/input')
fileInput.send_keys(path)
but there's one more problem: photo is uploaded but file dialog still opens - I don't know how to close it. I tried accesing it:
dialog = driver.switch_to.active_element['value']
but I don't know how to close it.
Strangely enough I found send_keys do indeed work. When I inspected html code in different browser it wasn't "media_empty" anymore, but a different name ("media[]" or something similar). Instead I've used xpath and I was stunned that it actually worked:
fileInput = driver.find_element_by_xpath('//*[#id="photo-choose-existing"]/div/div/label/input')
fileInput.send_keys(path)
try using below code:
fileInput = driver.find_element_by_css_selector("div.image-selector label.t1-label input")
driver.execute_script("arguments[0].setAttribute('value', 'YOUR_PATH_HERE')",fileInput)
Assuming that element is present on page if not explicitly wait for element to exist on page.
then try this:
driver.execute_script("document.getElementById('ID_HERE').setAttribute('value', 'PATH_HERE')");
hope this will help you!