I'm trying to web scrape the person's name and company.
This is what I've tried.
<div id="viewcontact">
<table width="100%">
<tbody><tr>
<td style="display: inline-block; width: 30%">
<div class="formsection_light" style="margin-top:-8px;background:#eaeaea;">
<div style="padding-bottom:10px;">
<div class="left">
<h1>Company Name</h1>
<p class="f16">Person's Name</p>
<div class="theme">
Person's Name
</div>
</div>
<div class="right" style="margin-top:5px;">
driver.find_element_by_xpath('//h1[#class="left"]')
driver.find_element_by_class_name("f16")
And the output was nothing, no errors just didn't scrape anything
Try something like this :
details = driver.find_elements_by_xpath("//div[#id = 'viewcontact']//tr")
for detail in details:
name = detail.find_element_by_tag_name("h1").text #Or `.get_attribute("innerText")`
cpny = detail.find_element_by_tag_name("p").text
print("{} : {}".format(name,cpny))
to get the company name :
wait = WebDriverWait(driver, 50)
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light h1"))).text)
to get the first person name :
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light p.f16"))).text)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Related
I'm using Google Chrome browser. I'm running a python script to choose the correct dates in the datepicker. It is unable to select the correct date. It keeps selecting the end date to be "02/01/2022" but I want to choose the date of five (5) days ago from today's date every time I run the script. For example, today is "02/08/2022" so it should choose "02/03/2022" as the end date. The start date, "12/01/2021" is correct.
Here's my code:
from selenium import webdriver
import time
import os.path
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import calendar
import datetime
from datetime import date
chrome_options = Options()
chrome_options.add_argument("--incognito")
driver = webdriver.Chrome("/Users/myname/Documents/chromedriver", options=chrome_options)
todays_date = date.today()
print(todays_date)
driver.get("https://accessdata.broadridge.com/node952064/")
try:
myElem = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/form/table[2]/tbody/tr[3]/td[2]/input')))
print("Page is ready!")
except TimeoutException:
print("Loading took too much time!")
driver.find_element(By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/form/table[2]/tbody/tr[6]/td[2]/span/img').click()
driver.find_element(By.XPATH, '//*[#id="span8"]').click()
time.sleep(2)
driver.find_element(By.XPATH, '//*[#id="ext-gen157"]/div[3]/table/tbody/tr/td[4]/div/img').click()
driver.find_element(By.XPATH, '//*[#id="ext-gen209"]').click()
driver.find_element(By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[11]/a').click()
driver.find_element(By.XPATH, '//*[#id="dateAnchor1_0"]/img').click()
driver.find_element(By.XPATH, '//*[#id="caldiv"]/table/tbody/tr/td/center/table[2]/tbody/tr[2]/td[4]/a').click()
driver.find_element(By.ID, 'dateAnchor2_0').click()
driver.find_element(By.XPATH, '//*[#id="caldiv"]/table/tbody/tr/td/center/table[2]/tbody/tr[2]/td[3]/a').click()
How can I select the correct date for five days ago from today?
Here's the HTML code:
<td align="left" nowrap="">
<table border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td align="right" nowrap="">
<div id="firstValue1_0" style="visibility: visible">
<input type="text" name="values1" value="12/01/2021" size="11" maxlength="10" onkeypress="if(document.getElementById('firstCalendar1_0').style.visibility == 'visible' ) return processKeyPress(this); else return true;" onfocus="self.status='Date Format is ' + dtFormat;" onblur="self.status=' ';" onchange="if(document.getElementById('firstCalendar1_0').style.visibility == 'visible' ){if(isValidDate(this)) valueOnChange(0);} else valueOnChange(0);">
<span id="firstValue3_0" style="display: none">% may be used as a wildcard character. </span>
<div id="firstCalendar1_0" style="visibility: visible;display:inline">
<a id="dateAnchor1_0" name="dateAnchor1_0" onclick="dateSelect(0, 'values1');" style="vertical-align:middle"><img border="0" src="images/calendar.gif"></a>
</div>
</div>
</td>
<td valign="center" nowrap="">
<div id="firstValue2_0" style="display: none">
<span id="search0" class="search_sm" onclick="popupSearch('810', 'values1', 'Trade Date', 0)" onmouseover="doImgSwapOver('search_sm',0,true)" onmouseout="doImgSwapOut('search_sm',0,true)">
<img name="search_sm" src="images/btn_search_sm.gif" width="24" height="22" alt="Search Trade Date" border="0" align="absmiddle">
</span>
<a name="searchAnchor0" id="searchAnchor0"></a>
</div>
</td>
<td valign="center" nowrap="">
<div id="secondValue1_0" style="visibility: visible">
and
</div>
</td>
<td valign="center" nowrap="">
<div id="secondValue2_0" style="visibility: visible">
<input type="text" name="values2" value="02/03/2022" size="11" maxlength="10" onkeypress="return processKeyPress(this);" onfocus="self.status='Date Format is ' + dtFormat;" onblur="self.status=' ';" onchange="if(isValidDate(this)) valueOnChange(0);">
<a id="dateAnchor2_0" name="dateAnchor2_0" onclick="dateSelect(0, 'values2');">
<img border="0" src="images/calendar.gif">
</a>
</div>
</td>
<td>
</td>
<td valign="center" nowrap="">
<!-- visibility -->
<div id="tinMask_0" style="visibility:hidden">
<!-- Tin Mask Selection Box -->
<table>
<tbody><tr>
<td>
<!-- Previously set to SHOW -->
<!-- Previously set to HIDE -->
<!-- Not previously set -->
<!-- Tin Privileges -->
<!-- No Tin Privileges -->
<select name="tinMaskFlgs" onchange="flagOnChange(0,document.forms[0].tinMaskFlgs,document.forms[0].columnFilterTinMaskFlgs);">
<option value="false">Show Tin Values</option>
<option value="true" selected="">Hide Tin Values </option>
</select>
<!-- true -->
<!-- false -->
</td>
</tr>
</tbody></table>
</div>
</td>
</tr>
</tbody></table>
</td>
driver.find_element(BY_NAME,"values2").send_keys("02/03/2022")
You can try sending keys to the input tag with the name values2.
I figured out the answer. I had to clear the value first.
import calendar
import datetime
from datetime import date
from datetime import datetime, timedelta
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
date_element = driver.find_element(By.NAME, 'values2')
date_element.clear()
five_days_prior = datetime.now() - timedelta(5)
end_date = five_days_prior.strftime("%m/%d/%Y")
date_element.send_keys(end_date)
So I'm trying to write a test for a webpage which has some elements within an iframe. I've been able to successfully run the test using webdriver.Firefox() without any problems but if I switch it over to webdriver.Chrome() I get a timeout exception on the following lines of code:
self.driver.switch_to.frame(0)
self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
It'd be great to get a solution to this as I'm all out of ideas.
Thanks for your help.
edit, partial html for the page:
<iframe id="iframe_course_details" allowfullscreen="" src="../Course/Details.aspx?HidePageNav=true&IsInIframe=true"></iframe>
Close
Edit (Inactive)
Edit
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_favourite" class="favourite button tooltipstered" style="display: none;">Favourite</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket_dull" class="add-to-basket button delete tooltipstered" style="display: none;">Enrolled (Remove From Enrolments)</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_basket" class="add-to-basket button tooltipstered">Add to Enrolments</span>
<span id="ctl00_cph_main_content_area_ucCourseDetails_spn_print" class="print button tooltipstered">Print</span>
</div>
<section id="overview" style="opacity: 1;">
<div id="fullname" class="fullname w-100" style="display: none;">
</div>
<div class="image w-33" style="cursor: pointer;">
<div style="background-image:url(/App_Themes/MainTheme-responsive/Images/Course/webcast.jpg);"></div></div>
<div class="details w-66">
<div class="inner">
<h2>testing activity</h2>
<div class="star-rating-num-ratings">
<div class="star-rating">
<span></span><span></span><span></span><span></span><span></span>
</div>
<span class="num-of-ratings">0 Ratings</span>
</div>
<div class="duration-and-status">
<span class="duration">
<label>
Duration:
</label>
<span>0</span>
</span>
<span class="status stat_incomplete">Started</span>
</div>
Edit 2:
So we've managed to find a solution to this and its even more confusing than the original problem
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID, 'iframe_course_details')))
time.sleep(0)
self.activity_status = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="overview"]/div[3]/div/div[2]/span[2]')))
I'd be really curious to hear some theories on why this works, it times out without the 'time.sleep(0).
If you reference the iframe directly rather then an integer that will work between Firefox/Chrome.
self.driver.switch_to.frame(driver.find_element_by_name("iframe"))
You can find the iframe element any way you wish e.g by css/xpath etc
As the the desired element is within an <iframe> so to invoke click() on the element you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Code Block:
# as per your comment assuming -> there is only one frame on the page
WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.TAG_NAME,"iframe")))
self.element = self.activity_status = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#overview > div.details.w-66 > div > div.duration-and-status > span.status.stat_incomplete#')))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a relevant detailed discussion in:
Ways to deal with #document under iframe
I'm pretty new at using Python but this seems like a pretty straight forward script I'm trying to write. I have been able to log-in to the website properly, but to get to the next step I am trying to click on a button that says "Market Express".
I am able to see the xpath (//[#id="MarketExpress"]) as well as the button's id (MarketExpress). When I run the module I receive this error: "Unable to locate element: //[#id="MarketExpress"]"
I have even double checked the xpath using Firefox's addon "xpath finder" to make sure I have the right code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
usernameStr = '***'
passwordStr = '***'
driver = webdriver.Firefox()
driver.get(('https://www.myurl.com'))
username = driver.find_element_by_id('USERID')
username.send_keys(usernameStr)
password = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, 'currentPassword')))
password.send_keys(passwordStr)
nextButton = driver.find_element_by_id('submit-button')
nextButton.click()
password = driver.find_element_by_name('currentPassword')
password.send_keys(passwordStr)
nextButton = driver.find_element_by_name('Submit')
nextButton.click()
marketExpress = driver.find_element_by_xpath('//*[#id="MarketExpress"]').click();
I have tried so many different things but cannot get the script to click this button, I would appreciate any help!
Below is the html where the button is:
<input class="crtordbtn" type="button" value="Market Express" `id="MarketExpress" onclick="parent.location.href='/OMAPX?userId=051220665&clientId=8&UserType=null&BuyerCookie=null';">`
Below is the table where I believe the button is in:
<div id="sidebar-left" height="50%" style="margin-right:10px">
<div id="bar" style="margin-right:-3px"><h1>Select To Order</h1></div>
<div style="width:100%; height:50%; border-left:1px solid #cccccc;border-right:1px solid #cccccc;border-bottom:1px solid #cccccc;margin-bottom:10px;">
<table width="235px" cellspacing="0" cellpadding="0" border="0">
<!-- <tr align="center">
<td style="padding: 5px 40px 0px 40px;">
<input class="crtordbtn" type="button" value="eSysco Express" id="esyscoExpress" onClick="parent.location.href='http://flex2.esysco.net';" />
</td>
</tr>
<tr >
<td style="padding: 5px 40px 0px 40px;text-align:left">
<p>Our latest order management application with improved performance and enhanced usability</p>
</td>
</tr>-->
<tbody><tr align="center">
<td style="padding: 5px 40px 0px 40px;">
<input class="crtordbtn" type="button" value="Market Express" id="MarketExpress" onclick="parent.location.href='/OMAPX?userId=051220665&clientId=8&UserType=null&BuyerCookie=null';">
</td></tr>
<tr>
<td style="padding: 5px 40px 0px 40px;text-align:left">
<p>Our latest order management application with improved performance and enhanced usability</p>
</td></tr>
<tr>
</tr>
</tbody></table>
</div>
</div>
The desired element is a dynamic element so to locate the element you have to induce WebDriverWait for the element to be clickable and you can use either of the following solutions:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.crtordbtn#MarketExpress[onclick*='OMAPX?userId']"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[#class='crtordbtn' and #id='MarketExpress'][contains(#onclick, 'OMAPX?userId')]"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Try this.It should work.
driver.find_element_by_xpath("//table[#id='esyscoExpress']/tbody/tr//input[#class='crtordbtn']").click()
Perhaps trying with the id will work. Hope it helps
driver.find_element_by_id("MarketExpress").click()
I'm a relative newcomer to selenium so this might be something incredibly simple but I can't seem to access an element even though it appears on the page. I don't think it can be that it hasn't loaded yet because I can reference other elements. The line of code I am trying to use and the html is below.
max_questions = driver.find_element_by_xpath(xpath="//span[contains(#class, 'total-questions')]")
<div data-v-404a90e7="" data-v-084771db="" class="header animated fadeInDown anim-300-duration">
<div data-v-404a90e7="" class="left-section half-width">
<div data-v-404a90e7="" flow="right" class="menu-icon animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="menu-icon-image"></div>
</div>
<div data-v-404a90e7="" class="question-number-wrapper text-unselectable animated fadeIn anim-300-duration">
<span data-v-404a90e7="" class="current-question">1</span>
<span data-v-404a90e7="" class="total-questions">/10</span>
</div>
</div>
<div data-v-404a90e7="" class="right-section half-width">
<div data-v-404a90e7="" class="room-code animated fadeIn anim-300-duration">712851</div>
<div data-v-404a90e7="" flow="left" class="exit-game-btn-wrapper animated fadeIn anim-300-duration">
<div data-v-404a90e7="" class="exit-game-icon"></div>
</div>
</div>
</div>
You can use WebDriverWait with expected_conditions:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('d:\\chromedriver\\chromedriver.exe')
driver.get(url)
wait = WebDriverWait(driver, 10)
max_questions = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[contains(#class, 'total-questions')]")))
print(max_questions.text)
I am trying to target properties on a real estate website. Ideally, I want to pull the property marketing URL, the title, location, and email of each listing. The properties are all listed as so:
<div class="propertyList">
<div id="propertyList74495-sale" class="deal_on_market propertyListItem" data-property-id="74495-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=74495-sale" data-listing-id="148815"></div>
<table>
<tbody>
<tr>
<td class="thumbnail">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale"></a>
</td>
<td class="addressInfo">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale">
Engelberg Antik's
</a>
<p class="propertiesListCityStateZip">
<img src="/images/map-marker-tiny.png?1427481879" alt="Map-marker-tiny"></img>
Salem, OR
</p>
<p class="description">
Outstanding downtown Salem opportunity, right next…
</p>
<div class="smallAttributes">
<div></div>
<div></div>
<div></div>
</div>
</td>
<td class="propertyInfo">
<div>
$479,900
</div>
<div>
13,612 SF
</div>
<div>
Street Retail
</div>
</td>
</tr>
</tbody>
</table>
<div class="contactAdvisor">
::before
or call
503.588.0400
for more information
</div>
<div class="links"></div>
<div id="propertyList61436-sale" class="deal_under_contract propertyListItem" data-property-id="61436-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=61436-sale" data-listing-id="124490"></div>
<div id="propertyList89374-sale" class="deal_on_market propertyListItem" data-property-id="89374-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=89374-sale" data-listing-id="173124"></div>
<div id="propertyList84437-sale" class="deal_on_market propertyListItem" data-property-id="84437-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84437-sale" data-listing-id="164488"></div>
<div id="propertyList84478-sale" class="deal_on_market propertyListItem" data-property-id="84478-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84478-sale" data-listing-id="164538"></div>
...
this was my first attempt at it:
from selenium import webdriver
import sys
import smtplib
import pymongo
newProperties = []
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
for property in driver.find_elements_by_class_name('propertyList'):
#get title,location
info = property.find_elements_by_class_name('addressInfo')
email = property.find_elements_by_partial_link_text('.com')
When I run the above, it doesn't give any errors that the driver can't locate elements. However, when I print out the elements nothing appears. How can I better locate the elements? I would like for something like this, appended to a list:
-title: Engelberg Antik's
-location: Salem, OR
-url: http://svncommercialadvisors.com/properties/?propertyId=74495-sale
-email: brokeremail#svn.com
The key problem here is that the search results are loaded in an iframe.
You need to switch to iframe before searching for properties.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
# wait for frame to appear and switch
frame = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#buildout iframe")))
driver.switch_to.frame(frame)
for property in driver.find_elements_by_class_name('propertyList'):
info = property.find_element_by_class_name('addressInfo')
email = property.find_element_by_partial_link_text('Email')
print info.text
print print email.get_attribute('href')
I've also applied two fixes:
replaced find_elements_by_class_namme with find_elements_by_class_name
replaced property.find_elements_by_partial_link_text('.com') with property.find_element_by_partial_link_text('Email')
It prints:
Engelberg Antik's
Salem, OR
Outstanding downtown Salem opportunity, right next door to the newly renovated Roth and McGilchri...
mailto:jennifer.martin#svn.com