I am trying to target properties on a real estate website. Ideally, I want to pull the property marketing URL, the title, location, and email of each listing. The properties are all listed as so:
<div class="propertyList">
<div id="propertyList74495-sale" class="deal_on_market propertyListItem" data-property-id="74495-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=74495-sale" data-listing-id="148815"></div>
<table>
<tbody>
<tr>
<td class="thumbnail">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale"></a>
</td>
<td class="addressInfo">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale">
Engelberg Antik's
</a>
<p class="propertiesListCityStateZip">
<img src="/images/map-marker-tiny.png?1427481879" alt="Map-marker-tiny"></img>
Salem, OR
</p>
<p class="description">
Outstanding downtown Salem opportunity, right next…
</p>
<div class="smallAttributes">
<div></div>
<div></div>
<div></div>
</div>
</td>
<td class="propertyInfo">
<div>
$479,900
</div>
<div>
13,612 SF
</div>
<div>
Street Retail
</div>
</td>
</tr>
</tbody>
</table>
<div class="contactAdvisor">
::before
or call
503.588.0400
for more information
</div>
<div class="links"></div>
<div id="propertyList61436-sale" class="deal_under_contract propertyListItem" data-property-id="61436-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=61436-sale" data-listing-id="124490"></div>
<div id="propertyList89374-sale" class="deal_on_market propertyListItem" data-property-id="89374-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=89374-sale" data-listing-id="173124"></div>
<div id="propertyList84437-sale" class="deal_on_market propertyListItem" data-property-id="84437-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84437-sale" data-listing-id="164488"></div>
<div id="propertyList84478-sale" class="deal_on_market propertyListItem" data-property-id="84478-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84478-sale" data-listing-id="164538"></div>
...
this was my first attempt at it:
from selenium import webdriver
import sys
import smtplib
import pymongo
newProperties = []
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
for property in driver.find_elements_by_class_name('propertyList'):
#get title,location
info = property.find_elements_by_class_name('addressInfo')
email = property.find_elements_by_partial_link_text('.com')
When I run the above, it doesn't give any errors that the driver can't locate elements. However, when I print out the elements nothing appears. How can I better locate the elements? I would like for something like this, appended to a list:
-title: Engelberg Antik's
-location: Salem, OR
-url: http://svncommercialadvisors.com/properties/?propertyId=74495-sale
-email: brokeremail#svn.com
The key problem here is that the search results are loaded in an iframe.
You need to switch to iframe before searching for properties.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
# wait for frame to appear and switch
frame = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#buildout iframe")))
driver.switch_to.frame(frame)
for property in driver.find_elements_by_class_name('propertyList'):
info = property.find_element_by_class_name('addressInfo')
email = property.find_element_by_partial_link_text('Email')
print info.text
print print email.get_attribute('href')
I've also applied two fixes:
replaced find_elements_by_class_namme with find_elements_by_class_name
replaced property.find_elements_by_partial_link_text('.com') with property.find_element_by_partial_link_text('Email')
It prints:
Engelberg Antik's
Salem, OR
Outstanding downtown Salem opportunity, right next door to the newly renovated Roth and McGilchri...
mailto:jennifer.martin#svn.com
Related
I'm using Google Chrome browser. I'm running a python script to choose the correct dates in the datepicker. It is unable to select the correct date. It keeps selecting the end date to be "02/01/2022" but I want to choose the date of five (5) days ago from today's date every time I run the script. For example, today is "02/08/2022" so it should choose "02/03/2022" as the end date. The start date, "12/01/2021" is correct.
Here's my code:
from selenium import webdriver
import time
import os.path
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import calendar
import datetime
from datetime import date
chrome_options = Options()
chrome_options.add_argument("--incognito")
driver = webdriver.Chrome("/Users/myname/Documents/chromedriver", options=chrome_options)
todays_date = date.today()
print(todays_date)
driver.get("https://accessdata.broadridge.com/node952064/")
try:
myElem = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/form/table[2]/tbody/tr[3]/td[2]/input')))
print("Page is ready!")
except TimeoutException:
print("Loading took too much time!")
driver.find_element(By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/form/table[2]/tbody/tr[6]/td[2]/span/img').click()
driver.find_element(By.XPATH, '//*[#id="span8"]').click()
time.sleep(2)
driver.find_element(By.XPATH, '//*[#id="ext-gen157"]/div[3]/table/tbody/tr/td[4]/div/img').click()
driver.find_element(By.XPATH, '//*[#id="ext-gen209"]').click()
driver.find_element(By.XPATH, '//*[#id="main"]/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[11]/a').click()
driver.find_element(By.XPATH, '//*[#id="dateAnchor1_0"]/img').click()
driver.find_element(By.XPATH, '//*[#id="caldiv"]/table/tbody/tr/td/center/table[2]/tbody/tr[2]/td[4]/a').click()
driver.find_element(By.ID, 'dateAnchor2_0').click()
driver.find_element(By.XPATH, '//*[#id="caldiv"]/table/tbody/tr/td/center/table[2]/tbody/tr[2]/td[3]/a').click()
How can I select the correct date for five days ago from today?
Here's the HTML code:
<td align="left" nowrap="">
<table border="0" cellspacing="0" cellpadding="0">
<tbody><tr>
<td align="right" nowrap="">
<div id="firstValue1_0" style="visibility: visible">
<input type="text" name="values1" value="12/01/2021" size="11" maxlength="10" onkeypress="if(document.getElementById('firstCalendar1_0').style.visibility == 'visible' ) return processKeyPress(this); else return true;" onfocus="self.status='Date Format is ' + dtFormat;" onblur="self.status=' ';" onchange="if(document.getElementById('firstCalendar1_0').style.visibility == 'visible' ){if(isValidDate(this)) valueOnChange(0);} else valueOnChange(0);">
<span id="firstValue3_0" style="display: none">% may be used as a wildcard character. </span>
<div id="firstCalendar1_0" style="visibility: visible;display:inline">
<a id="dateAnchor1_0" name="dateAnchor1_0" onclick="dateSelect(0, 'values1');" style="vertical-align:middle"><img border="0" src="images/calendar.gif"></a>
</div>
</div>
</td>
<td valign="center" nowrap="">
<div id="firstValue2_0" style="display: none">
<span id="search0" class="search_sm" onclick="popupSearch('810', 'values1', 'Trade Date', 0)" onmouseover="doImgSwapOver('search_sm',0,true)" onmouseout="doImgSwapOut('search_sm',0,true)">
<img name="search_sm" src="images/btn_search_sm.gif" width="24" height="22" alt="Search Trade Date" border="0" align="absmiddle">
</span>
<a name="searchAnchor0" id="searchAnchor0"></a>
</div>
</td>
<td valign="center" nowrap="">
<div id="secondValue1_0" style="visibility: visible">
and
</div>
</td>
<td valign="center" nowrap="">
<div id="secondValue2_0" style="visibility: visible">
<input type="text" name="values2" value="02/03/2022" size="11" maxlength="10" onkeypress="return processKeyPress(this);" onfocus="self.status='Date Format is ' + dtFormat;" onblur="self.status=' ';" onchange="if(isValidDate(this)) valueOnChange(0);">
<a id="dateAnchor2_0" name="dateAnchor2_0" onclick="dateSelect(0, 'values2');">
<img border="0" src="images/calendar.gif">
</a>
</div>
</td>
<td>
</td>
<td valign="center" nowrap="">
<!-- visibility -->
<div id="tinMask_0" style="visibility:hidden">
<!-- Tin Mask Selection Box -->
<table>
<tbody><tr>
<td>
<!-- Previously set to SHOW -->
<!-- Previously set to HIDE -->
<!-- Not previously set -->
<!-- Tin Privileges -->
<!-- No Tin Privileges -->
<select name="tinMaskFlgs" onchange="flagOnChange(0,document.forms[0].tinMaskFlgs,document.forms[0].columnFilterTinMaskFlgs);">
<option value="false">Show Tin Values</option>
<option value="true" selected="">Hide Tin Values </option>
</select>
<!-- true -->
<!-- false -->
</td>
</tr>
</tbody></table>
</div>
</td>
</tr>
</tbody></table>
</td>
driver.find_element(BY_NAME,"values2").send_keys("02/03/2022")
You can try sending keys to the input tag with the name values2.
I figured out the answer. I had to clear the value first.
import calendar
import datetime
from datetime import date
from datetime import datetime, timedelta
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
date_element = driver.find_element(By.NAME, 'values2')
date_element.clear()
five_days_prior = datetime.now() - timedelta(5)
end_date = five_days_prior.strftime("%m/%d/%Y")
date_element.send_keys(end_date)
I'm new to Python and Selenium, try to automate filling out the form and get the game gift.
Here is a html, and it uses list items instead of select tag.
<tr>
<th>Server Name</th>
<td>
<!-- toDev liのdata-value属性に設定して頂いた値が下のhiddenに入ります。 --> <input
type="hidden" name="server" class="js-selected-server">
<ul class="serialForm__select js-select-form" data-type="server">
<li><span class="js-selected-text">Select Server</span>
<div class="serialForm__select--btn">
<img class="iconArrow" data-type="primary"
src="/img/nav/nav_arrow01.svg"> <img class="iconArrow"
data-type="secondary" src="/img/nav/nav_arrow02.svg">
</div></li>
<li data-value="1">Orchard(KOR)</li>
<li data-value="2">Dynasty(CHN)</li>
<li data-value="3">Warlord(SEA)</li>
<li data-value="4">Conquest(US)</li>
<li data-value="5">Invincible(JP)</li>
<li data-value="6">Inferno(TW)</li>
<li data-value="7">Ascension(KOR)</li>
</ul>
</td>
</tr>
<tr>
Script is able to click the dropdown menu, but not able to pick any of the listed items.
I tried the following and none of those works.
# dropdown = Select(select_server)
# dropdown.select_by_visible_text('Conquest(US)')
# dropdown.select_by_index('4')
My code:
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
edge_options = {
"executable_path": "/edgedriver_mac64/msedgedriver",
# ofcourse change path to driver you downloaded.
"capabilities": {
"platformName": 'mac os x', # I get this from Chrome driver's capabilities
# "os" : "OS X", # also ok.
}
}
browser = webdriver.Edge(**edge_options)
browser.get('https://kstory.hangame.com/en/index')
time.sleep(2)
select_server = browser.find_element_by_css_selector(
"span[class='js-selected-text']")
time.sleep(2)
select_server.click()
## below are still testing, not working yet
select_server.send_keys('Conquest(US)')
# dropdown = Select(select_server)
# dropdown.select_by_visible_text('Conquest(US)')
# dropdown.select_by_index('4')
Any help is appreciated, thanks.
You can try this
driver.find_element_by_xpath("//ul[#data-type='server']/li[text()='Conquest(US)']").click()
I'm trying to web scrape the person's name and company.
This is what I've tried.
<div id="viewcontact">
<table width="100%">
<tbody><tr>
<td style="display: inline-block; width: 30%">
<div class="formsection_light" style="margin-top:-8px;background:#eaeaea;">
<div style="padding-bottom:10px;">
<div class="left">
<h1>Company Name</h1>
<p class="f16">Person's Name</p>
<div class="theme">
Person's Name
</div>
</div>
<div class="right" style="margin-top:5px;">
driver.find_element_by_xpath('//h1[#class="left"]')
driver.find_element_by_class_name("f16")
And the output was nothing, no errors just didn't scrape anything
Try something like this :
details = driver.find_elements_by_xpath("//div[#id = 'viewcontact']//tr")
for detail in details:
name = detail.find_element_by_tag_name("h1").text #Or `.get_attribute("innerText")`
cpny = detail.find_element_by_tag_name("p").text
print("{} : {}".format(name,cpny))
to get the company name :
wait = WebDriverWait(driver, 50)
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light h1"))).text)
to get the first person name :
print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.formsection_light p.f16"))).text)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Lets say I have some HTML code that looks like this and I use CSS selectors to make a list of elements
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
<div class="item-cell">
<div class="item-container">
<div class ="item-price">
<div class = "item-info">
<span class = "price"> </span>
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
now I have a list of elements that are at the item-container level. How would I go about finding the href value of each element in elements.
I was thinking I do something like
for element in elements:
element.get_attribute("href")
I know I could explicitly go to the href level with the code but I want to check if each container contains href and if it does I want the value in that container. If I go specifically to the href level it will just skip the containers that do not have href in them.
You could try this
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
driver = webdriver.Chrome()
driver.get("file://{PATH_TO_YOUR_FILE}")
elements = driver.find_elements_by_css_selector('div.item-cell div.item-container')
for element in elements:
try:
link = element.find_element_by_tag_name('a')
print(link.get_attribute('href'))
except NoSuchElementException:
print('No Data Available!')
driver.close()
Besides, I'd suggest surrounding your divs with </div> and add https:// before your URLs.
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
<div class="item-price">
</div>
<div class="item-info">
<span class="price"> </span>
</div>
</div>
</div>
<div class="item-cell">
<div class="item-container">
</div>
</div>
If you don't add https:// before your URLs, python will interpret it as a local URL if you run selenium in a local file.
I'm trying to get the hours of the available time slots from this webpage (the boxes below the calendar):
https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/
I've read other related questions and wrote this code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
url = 'https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/'
wait_time = 10
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
driver.switch_to.frame(0)
wait = WebDriverWait(driver, wait_time)
first_result = wait.until(presence_of_element_located((By.ID, "sb_main")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)
driver.quit()
After switching to the iframe containing the time slots, I get this from printing soup
<script id="time_slots_view" type="text/html"><div class="slots-view{{#ifCond (getThemeOption 'timeline_modern_display') '==' 'as_table'}} as-table{{/ifCond}}">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
{{_t 'available_services_on_this_day'}}
</div>
{{#if error_message}}
<div class="alert alert-danger alert-dismissible" role="alert">
{{error_message}}
</div>
{{/if}}
{{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}}
<div id="sb_time_slots_container"></div>
{{> bookingTimeLegendPart legend="only_available" time_diff=0}}
</div>
</div>
</div></script>
<script id="time_slot_view" type="text/html"><div class="slot">
<a class="sb-cell free {{#ifPluginActive 'slots_count'}}{{#if available_slots}}has-available-slot{{/if}}{{/ifPluginActive}}" href="#{{bookingStepUrl time=time date=date}}">
{{formatDateTime datetime 'time' time_diff}}
{{#ifCond (getThemeOption 'timeline_show_end_time') '==' 1}}
-<span class="end-time">
{{formatDateTime end_datetime 'time' time_diff}}
</span>
{{/ifCond}}
{{#ifPluginActive 'slots_count'}}
{{#if available_slots}}
<span class="slot--available-slot">
{{available_slots}}
{{#ifConfigParam 'slots_count_show_total' '==' true}} / {{total_slots}} {{/ifConfigParam}}
</span>
{{/if}}
{{/ifPluginActive}}
</a>
</div></script>
while from right click > inspect element in the webpage I get this
<div class="slots-view">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
Orari d'inizio disponibili
</div>
<div id="sb_time_slots_container">
<div class="slot">
<a class="sb-cell free " href="#book/location/4/service/6/count/1/provider/6/date/2020-03-09/time/23:00:00/">
23:00
</a>
</div>
</div>
<div class="time-legend">
<div class="available">
<div class="circle">
</div>
- Disponibile
</div>
</div>
</div>
</div>
</div>
How can I get the hour of the available slots (23:00 in this example) using selenium?
To get the desired response you need to:
Correctly identify the iframe you want to switch to (and switch to it). You were trying to switch to frame[0] but needed frame[1]. The following code removes reliance on indexes and uses xpath instead.
Get the elements containing the time. Again this uses xpath to identify all child div's of an element with id=sb_time_slots_container.
We then iterate over these child div's and get the text property, which is nested within an <a> of these div's.
For both steps 1 & 2 you should also use wait.until so that the content can be loaded.
...
driver.get(url)
wait = WebDriverWait(driver, wait_time)
# Wait until the iframe exists then switch to it
iframe_element = wait.until(presence_of_element_located((By.XPATH, '//*[#id="prenota"]//iframe')))
driver.switch_to.frame(iframe_element)
# Wait until the times exist then get an array of them
wait.until(presence_of_element_located((By.XPATH, '//*[#id="sb_time_slots_container"]/div')))
all_time_elems = driver.find_elements_by_xpath('//*[#id="sb_time_slots_container"]/div')
# Iterate over each element and print the time out
for elem in all_time_elems:
print(elem.find_element_by_tag_name("a").text)
driver.quit()