How to identify elements within #document - python

I'm using xpath to scrape information from a dynamic html table. The information I'm trying to scrape is inside of a tag called #document and I'm not sure how to include this in the xpath since it doesn't follow the normal <>...</> format of most html elements. I also don't have the option to select the xpath when I open the options to the left of the tag in the inspector. What do I do here? I've included a snippet below of the tag I'm referring to.
<iframe>
#document
<html>...</html>
</iframe>

The <html> underneth the #document is within an <iframe>. To access the elements within that <html> you have to:
Induce WebDriverWait for the desired frame to be available and switch to it as follows:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"iframe_xpath")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

Python Selenium - How to scrape URL from src attribute using Selenium and Python

I'm trying to download a bunch of images and categorize them into folders using Selenium. To do so, I need to grab two ID's associated with each image within the URL. However I'm having trouble scraping the image link from the src attribute. Whether I try to grab by tag, Xpath, or other method the end result is merely "None".
Here's an example of an inspected image page:
<html style="height: 100%;"
><head><meta name="viewport" content="width=device-width, minimum-scale=0.1">
<title>index.php (2448×3264)</title>
</head>
<body style="margin: 0px; background: #0e0e0e; height: 100%">
<img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&fieldname=DT006_picture&p=show" width="444" height="593">
</body>
</html>
For this example, I would need to grab "LQCMY" and "DT006_picture" as strings from the URL above. The code below shows my attempt at scraping the URL link (edited down since prior screens I click through are locked behind passwords that I can't give out).
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)
Are there certain src's that can't be scraped, or am I scraping the wrong tag?
I've tried grabbing by tag but that also returns "None" as well.
Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')
Other posts said WebDriverWait would help, but I've tried adjusting the wait time and am still receiving "None" too
To print the value of the src attribute you can use either of the following locator strategies:
Using css_selector:
print(driver.find_element_by_css_selector("body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']").get_attribute("src"))
Using xpath:
print(driver.find_element_by_xpath("//body//img[contains(#style, 'webkit-user-select') and starts-with(#src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]").get_attribute("src"))
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']"))).get_attribute("src"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//body//img[contains(#style, 'webkit-user-select') and starts-with(#src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]"))).get_attribute("src"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in Python Selenium - get href value

How to identify the iframe using Selenium?

HTML:
<iframe allowpaymentrequest="true" allowtransparency="true" src="https://shopify.wintopay.com/
cd_frame_id_="ca9e4ad6a1559de159faff5c1f563d59"
name="WinCCPay"
id="win-cc-pay-frame"
I'm trying to input text in a CC field. Apparently its in an iframe I picked the last one in the HTML and tried to select it from the identifiers above but I keep getting the element couldn't be found
iframe= wd.find_element_by_id("win-cc-pay-frame")
wd.switch_to.frame(iframe)
The frame is currently being shown in the browser so no need for implicit wait.
To identify the <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
You can use either of the following Locator Strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#win-cc-pay-frame[name='WinCCPay'][src^='https://shopify.wintopay.com']")))
Using XPATH:
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#id='win-cc-pay-frame' and #name='WinCCPay'][starts-with(#src, 'https://shopify.wintopay.com')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
The problem can be that the name and id of the element are dynamic and change for each unique checkout window? Can you check if adding class attribute at iframe tag and find element by this attribute?
It must be similar to:
iframe = wd.find_element_by_class_name('card-pay-iframe')
wd.switch_to.frame(iframe)
...
wd.switch_to.default_content()
good coding! ¯_(ツ)_/¯

How to click on GWT enabled elements using Selenium and Python

I'm working with Selenium as a newbie and also as a newbie developer.
I try to solve this XPath but with no results that is why I'm looking for help.
So I want to click in the checkbox which has a dynamic id but it is not that easy to find this checkbox based on tittle="Viewers"
Please notice there is a whole list of checkboxes with names on right from a checkbox in div which I want to include in my tests.
HTML:
<span class="row-selection"><input type="checkbox" value="on" id="gwt-uid-2215" tabindex="0"><label for="gwt-uid-2215"></label></span> <div class="row-label" title="Viewers">Viewers</div>
Snapshot;
The desired element is a GWT enabled element so to click on the element you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following xpath based Locator Strategy:
Using xpath based on title attribute:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#title='Viewers']//preceding::span[1]//label"))).click()
Using xpath based on innerHTML:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[text()='Viewers']//preceding::span[1]//label"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Why can't Selenium find this class on Wikipedia?

I am trying to pull a table from wikipedia. When I try and pull it using the following driver.find_element_by_class_name(name) it will not work. However when going to the html source code I can explicitly see the class name that I am looking for.
I do realize there are other ways to pull this table and I have moved on to easier ways. I am curious as to why Selenium does not find the class when it is in the HTML.
from selenium import webdriver
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
driver.implicitly_wait(2)
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
However, the error I get is
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".wikitable sortable jquery-tablesorter"}
(Session info: chrome=75.0.3770.142)
wikitable sortable jquery-tablesorter is 3 class names: wikitable, sortable, and jquery-tablesorter. .find_element_by_class_name() only takes a single parameter consisting of a single class name, e.g. .find_element_by_class_name("wikitable"). That may or may not find the element you want based on whether that class name uniquely locates the element that you want.
Another option would be to use a CSS selector so that you can use all three classes in a single locator, e.g.
.wikitable.sortable.jquery-tablesorter
where the . indicates a class name in CSS selector syntax. See the CSS selector references below for more info on CSS selectors and their syntax.
W3C Selectors Overview
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors
To handle dynamic element use WebdriverWait and visibility_of_element_located and following css selector.
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter")))
You need to import followings.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
If you want to print the value of table.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter"))).text)
Please use class name directly in function find_element_by_class_name(). So, instead of writing like:
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
Please write like:
driver.find_element_by_class_name('wikitable sortable jquery-tablesorter')
Hope it helps :)

Python Selenium: Getting dynamic content within iframe

I am trying to scrape the available apartment listings from the following webpage: https://3160599v2.onlineleasing.realpage.com/
I am using the Python implementation of Selenium, but so far I haven't found an effective solution to programmatically get the content. My most basic code is the following, which currently just returns the non-dynamic HTML source code:
from selenium import webdriver
driver = webdriver.Chrome('/path_to_driver')
driver.get('https://3160599v2.onlineleasing.realpage.com/')
html = driver.page_source
The returned html variable does not contain the apartment listings I need.
If I 'Inspect' the element using Chrome's built-in inspect tool, I can see that the content is within an un-classed iframe: <iframe frameborder="0" realpage-oll-widget="RealPage-OLL-Widget" style="width: 940px; border: none; overflow: hidden; height: 2251px;"></iframe>
Several children down within this iframe you can also see the div <div class="main-content"> which contains all the info I need.
Other solutions I have tried include implementing an explicit WebDriverWait:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'main-content')))
I get a TimeoutException with this method as the element is never found.
I also tried using the driver.switch_to.frame() method, with no success.
The only steps that have actually allowed me to get the apartment listings out of the webpage have been (using Chrome):
Manually right-click on an element of the listings within the webpage
Click Inspect
Find the div 'main-content'
Manually right-click on this div and select Copy -> Copy Element
This is not an effective solution since I'm seeking to automate this process.
How can I get this dynamically generated content out of the webpage in a programatic way?
Try to use below code to switch to iframe:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
wait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_xpath('//iframe[#realpage-oll-widget="RealPage-OLL-Widget"]')))
Also note that method that allows to switch to static iframe is switch_to.frame(), but not switch-to.frame()
You can not directly see the content which is in the iframe. You need to change frame. You can do this by firstly selecting 'iframe element' and then switching to it with driver.switch_to.frame() function.
iframe = driver.get_element_by_id('iframe')
driver.switch_to.frame(iframe)
After that you can access the iframe's content.
Alternatively, you can take the source attribute of iframe then going to that page with selenium. In the end, iframe content is another html page.

Categories

Resources