Python scraping hidden data - python

I need to scrape text from an object but firstly I have to hover on it. Original code looks like:
<app-tooltip-widget _nghost-tmf-c12="">
<div _ngcontent-tmf-c12="" triggers="" class="">
<img alt="Info icon" class="img-fluid shipments-info-icon" src="info.png">
</div>
</app-tooltip-widget>
When i hover on it:
<app-tooltip-widget _nghost-sqv-c21="">
<div _ngcontent-sqv-c21="" triggers="" class="" aria-describedby="tooltip-21">
<img alt="Info icon" class="img-fluid shipments-info-icon" src="info.png">
</div>
</app-tooltip-widget>
Appear 'aria-describedby="tooltip-21"'
I need scrape information inside
Im trying:
driver = webdriver.Chrome()
driver.get('example.com')
driver.maximize_window()
men_menu = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//div[#_ngcontent-sqv-c21=""]')))
ActionChains(driver).move_to_element(men_menu).perform()
data = driver.find_element_by_xpath('//*[#aria-describedby="tooltip-23"]').text
print(data)

Related

Getting links and background image from a certain div's using - Selenium in Python

I'm trying to get all the links and background images of the links inside a specific div But i can't seem to get them.
I'm trying to get the links and the background images inside paint_wrap div not paint_color.
HTML:
<div id="timer" style="display:inline-block">
<div class="paint_wrap">
<div class="paint_color">
<img src="http://link.com/img/6EAwxqqt6J7aKcn6B6pO.gif" class="pick" width="30" height="30" border="0" alt="">
<img src="http://link.com/img/arrow.gif" class="arrow" width="27" height="15" border="0" alt="">
</div>
<img src="http://link.com/tx/o.gif" width="15" height="15" border="0" alt="">
</div>
</div>
WHAT I HAVE TRIED:
WebDriverWait(self.driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#id='timer']/div/a"))) # Wait is fine
elements = self.driver.find_element(By.XPATH, "//*[#id='timer']/div/a")
for element in elements:
print(element.get_attribute("href"))
print(element.value_of_css_property("background-image"))
I have also tried it with using css selector.
THE ERROR I GET:
'WebElement' object is not iterable
I'm hoping maybe someone could point me in the right direction.
You have to use 'find_elements' in the below line instead of 'find_element':
elements = self.driver.find_elements(By.XPATH, "//*[#id='timer']/div/a")

How i can click on a div with role button without text? Using Python Selenium

This is the html code exemple:
<div aria-label="Continue" class="my-class" data-visualcompletion="ignore"></div>
<div class="div1-class">
<div class="div1-class2">
<span class="area-span" dir="auto">
<span class="text-span">Continue</span>
</span>
</div>
</div>
<div class="div2-class" data-visualcompletion="ignore"></div>
I'm trying:
continue = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, 'my-class')) )
continue.click()
but it doesn't work in any of the ways I tried.
You should check the xpaths in the browser console.
And try doing this ?
driver.findElement(By.xpath("//div[contains(#class,'my-class')]"));

How to get javascript generated html that I see by clicking "inspect element" in browser?

I'm trying to get the hours of the available time slots from this webpage (the boxes below the calendar):
https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/
I've read other related questions and wrote this code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
url = 'https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/'
wait_time = 10
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
driver.switch_to.frame(0)
wait = WebDriverWait(driver, wait_time)
first_result = wait.until(presence_of_element_located((By.ID, "sb_main")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)
driver.quit()
After switching to the iframe containing the time slots, I get this from printing soup
<script id="time_slots_view" type="text/html"><div class="slots-view{{#ifCond (getThemeOption 'timeline_modern_display') '==' 'as_table'}} as-table{{/ifCond}}">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
{{_t 'available_services_on_this_day'}}
</div>
{{#if error_message}}
<div class="alert alert-danger alert-dismissible" role="alert">
{{error_message}}
</div>
{{/if}}
{{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}}
<div id="sb_time_slots_container"></div>
{{> bookingTimeLegendPart legend="only_available" time_diff=0}}
</div>
</div>
</div></script>
<script id="time_slot_view" type="text/html"><div class="slot">
<a class="sb-cell free {{#ifPluginActive 'slots_count'}}{{#if available_slots}}has-available-slot{{/if}}{{/ifPluginActive}}" href="#{{bookingStepUrl time=time date=date}}">
{{formatDateTime datetime 'time' time_diff}}
{{#ifCond (getThemeOption 'timeline_show_end_time') '==' 1}}
-<span class="end-time">
{{formatDateTime end_datetime 'time' time_diff}}
</span>
{{/ifCond}}
{{#ifPluginActive 'slots_count'}}
{{#if available_slots}}
<span class="slot--available-slot">
{{available_slots}}
{{#ifConfigParam 'slots_count_show_total' '==' true}} / {{total_slots}} {{/ifConfigParam}}
</span>
{{/if}}
{{/ifPluginActive}}
</a>
</div></script>
while from right click > inspect element in the webpage I get this
<div class="slots-view">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
Orari d'inizio disponibili
</div>
<div id="sb_time_slots_container">
<div class="slot">
<a class="sb-cell free " href="#book/location/4/service/6/count/1/provider/6/date/2020-03-09/time/23:00:00/">
23:00
</a>
</div>
</div>
<div class="time-legend">
<div class="available">
<div class="circle">
</div>
- Disponibile
</div>
</div>
</div>
</div>
</div>
How can I get the hour of the available slots (23:00 in this example) using selenium?
To get the desired response you need to:
Correctly identify the iframe you want to switch to (and switch to it). You were trying to switch to frame[0] but needed frame[1]. The following code removes reliance on indexes and uses xpath instead.
Get the elements containing the time. Again this uses xpath to identify all child div's of an element with id=sb_time_slots_container.
We then iterate over these child div's and get the text property, which is nested within an <a> of these div's.
For both steps 1 & 2 you should also use wait.until so that the content can be loaded.
...
driver.get(url)
wait = WebDriverWait(driver, wait_time)
# Wait until the iframe exists then switch to it
iframe_element = wait.until(presence_of_element_located((By.XPATH, '//*[#id="prenota"]//iframe')))
driver.switch_to.frame(iframe_element)
# Wait until the times exist then get an array of them
wait.until(presence_of_element_located((By.XPATH, '//*[#id="sb_time_slots_container"]/div')))
all_time_elems = driver.find_elements_by_xpath('//*[#id="sb_time_slots_container"]/div')
# Iterate over each element and print the time out
for elem in all_time_elems:
print(elem.find_element_by_tag_name("a").text)
driver.quit()

Select checkbox using selenium in python

I want to select a checkbox using selenium in python. Following is the HTML of the checkbox. The span element is getting highlighted when hovering the mouse over checkbox
HTML
<div id="rc-anchor-container" class="rc-anchor rc-anchor-normal rc-anchor-light">
<div id="recaptcha-accessible-status" class="rc-anchor-aria-status" aria-hidden="true">Recaptcha requires verification. </div>
<div class="rc-anchor-error-msg-container" style="display:none"><span class="rc-anchor-error-msg" aria-hidden="true"></span></div>
<div class="rc-anchor-content">
<div class="rc-inline-block">
<div class="rc-anchor-center-container">
<div class="rc-anchor-center-item rc-anchor-checkbox-holder"><span class="recaptcha-checkbox goog-inline-block recaptcha-checkbox-unchecked rc-anchor-checkbox" role="checkbox" aria-checked="false" id="recaptcha-anchor" tabindex="0" dir="ltr" aria-labelledby="recaptcha-anchor-label"><div class="recaptcha-checkbox-border" role="presentation"></div><div class="recaptcha-checkbox-borderAnimation" role="presentation"></div><div class="recaptcha-checkbox-spinner" role="presentation"></div><div class="recaptcha-checkbox-spinnerAnimation" role="presentation"></div><div class="recaptcha-checkbox-checkmark" role="presentation"></div></span></div>
</div>
</div>
<div class="rc-inline-block">
<div class="rc-anchor-center-container">
<label class="rc-anchor-center-item rc-anchor-checkbox-label" aria-hidden="true" role="presentation" id="recaptcha-anchor-label"><span aria-live="polite" aria-labelledby="recaptcha-accessible-status"></span>I'm not a robot</label>
</div>
</div>
</div>
<div class="rc-anchor-normal-footer">
<div class="rc-anchor-logo-portrait" aria-hidden="true" role="presentation">
<div class="rc-anchor-logo-img rc-anchor-logo-img-portrait"></div>
<div class="rc-anchor-logo-text">reCAPTCHA</div>
</div>
<div class="rc-anchor-pt">Privacy<span aria-hidden="true" role="presentation"> - </span>Terms</div>
</div>
</div>
I am trying the following code but it is giving following exception selenium.common.exceptions.NoSuchElementException
My Code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
chromedriver = 'C:\Program Files (x86)\Google\Chrome\chromedriver'
browser = webdriver.Chrome(chromedriver)
browser.get(url)
checkBox = browser.find_element_by_id("recaptcha-anchor")
checkBox.click()
This is a recaptha stuff .. it's not like normal elements in the page
you have to navigate with selenium to the captcha frame .. then you can deal with the checkbox element..
to do that you need first to save the main window handle to be get back to it when you're done with the recaptcha
# save the main window handle
mainwindow = browser.current_window_handle
# get the recapthca iframe then navigate to it
frame = browser.find_element_by_tag_name("iframe")
browser.switch_to.frame(frame)
# now you can access the checkbox element
browser.find_element_by_id("recaptcha-anchor").click()
# navigate back to main window
browser.switch_to.window(mainwindow)
for further info about how to deal with the recaptcha challenge check this link

Can't select textbox in selenium

I am trying to access the comment textbox in a generic huffington post artical. When I right click inspect element I get the following HTML code:
<div class="UFIInputContainer">
<div class="_1cb _5yk1">
<div class="_5yk2" tabindex="-2">
<div class="_5rp7">
with the line <div class="_1cb _5yk1"> highlighted.
from selenium import webdriver
driver = webdriver.Chrome()
'''
Just pretend that I put in some code to log in to facebook
so I can actually post a comment on huffington post
'''
driver.get.('http://www.huffingtonpost.com/entry/worst-suicide-squad-reviews_us_57a1e213e4b0693164c34744?')
'''
Just a random artical about a movie
'''
comment_box = driver.find_element_by_css_selector('._1cb._5yk1')
'''
since this is a compound class I think I should use find_by_css_selector
'''
When I run this though, I get the error message: "no such element found". I have tried other methods of trying to get a hold of the comment textbox but I get the same error message and I am at a lost of how to access it. I am hoping somebody can shed some light on this problem.
edit: This is a more complete HTML code:
<html lang="en" id="facebook" class="svg ">
<head>...</head>
<body dir="ltr" class="plugin chrome webkit win x1 Locale_en_US">
<div class="_li">
<div class="pluginSkinLight pluginFontHelvetica">
<div id="u_0_0">
<div data-reactroot class="_56q9">
<div class="_2pi8">
<div class="_491z clearfix">...</div>
<div spacing="medium" class="_4uyl _1zz8 _2392 clearfix" direction="left">
<div class="_ohe lfloat">...</div>
<div class>
<div class="UFIInputContainer">
<div class="_1cb _5yk1">
<div class="_5yk2" tabindex="-2">
<div class="_5rp7">
</div>
</div>
<div class="UFICommentAttachmentButtons clearfix">...</div>
<!-- react-empty: 39 -->
<div class="_4uym">...</div>
</div>
</div>
</div>
::after
You have to switch to the iframe containing the text box. Try the following approach, it should work:
Clicking the load comments button might be required first if load comment button is displayed
load_comment = driver.find_element_by_css_selector('.comment-button.js-comment-button')
load_comment.click()
driver.switch_to_frame(driver.find_element_by_css_selector('.fb_ltr.fb_iframe_widget_lift'))
comment_box = driver.find_element_by_css_selector('._1cb._5yk1')
comment_box.send_keys('Test')

Categories

Resources