I have been using Selenium to web scrape HomeDepot, but the page returns NoneType for the price. When I checked, the product price box is stuck on loading, but when I use a regular browser it loads almost instantly. Here is the code I'm using
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r"C:\Users\User\PycharmProjects\untitled\drivers\chromedriver_win32\chromedriver.exe")
driver.set_page_load_timeout(10)
driver.get('https://www.homedepot.ca/product/malibu-wide-plank-maple-cardiff-3-8-inch-thick-x-6-1-2-inch-wide-x-varying-length-engineered-click-hardwood-flooring-23-64-sq-ft-case-/1001341771')
time.sleep(5)
price = driver.find_element_by_class_name('hdca-product__description-pricing-price-value')
print(price.text)
Has anyone else encountered this?
Coincidentally I also scraped Home Depot's website
I used CSS selectors
productPrice = product.css('.price__dollars::text').getall()
I used scrapy, Selenium isn't necessary for this website since it isnt dynamically loaded
Related
I want to scrape the website but rn I'm testing with automating browser input *weights to he website, but still its not working. I have seen many videos everyone does the same but sometimes the result is "TIMEOUT etc." or "sub.click() is not clickable".
I have been working on this website for days now but couldn't scarp the data.
here's a link to website --> website takes multiple input and after enquiry it displays table below but the site is too slow so, it takes time to show the table
weights can only range from (0.1 to 2.0)
the table is displayed only when user input as for HTML table div cannot be seen but after submitting it displays the HTML code too.
** I'm not an expert in coding its new for me to scrap data. Any help would really appreciate it. THANK YOU
from time import time
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
mydriver = webdriver.Chrome(options=options)
mydriver.get('https://www.yunexpress.com/price/query.html')
mydriver.maximize_window()
eng = mydriver.find_element(By.XPATH, "//a[#class='sellang-en']")
eng.click()
weight = mydriver.find_element(By.XPATH, "//input[#name='txtweight']")
weight.send_keys('0.5')
xyz =mydriver.find_element(By.XPATH, "//div[#class='price-search-title']")
mydriver.execute_script("arguments[0].scrollIntoView();",xyz)
sub = mydriver.find_element(By.XPATH, "//button[#class='price-submit']")
sub.click()
result = mydriver.find_element(By.XPATH, "//div[#class='layui-none']")
print (result[0].text)
Note: I don't find a relevant worked solution in any other similar questions.
How to find price from udemy website with web scraping?
Scraping Data From Udemy , AngularJs Site Using PHP
How to GET promotional price using Udemy API?
My problem is how to scrape courses prices from Udemy using python & selenium?
This is the link:
https://www.udemy.com/courses/development/?p=1
My attempt is below.
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.udemy.com/courses/development/?p=1"
driver.get(url)
time.sleep(2)
#data = driver.find_element('//div[#class="price-text--price-part"]')
#data = driver.find_element_by_xpath('//div[contains(#class, "price-text--price-part"]')
#data=driver.find_element_by_css_selector('div.udlite-sr-only[attrName="price-text--price-part"]')
print(data)
Not of them worked for me. So, is there a way to select elements by classes that contain a specific text?
In this example, the text to find is: "price-text--price-part"
The first xpath doesn't highlight any element in the DOM.
The second xpath doesn't have a closing brackets for contains
//div[contains(#class, "price-text--price-part"]
should be
//div[contains(#class, "price-text--price-part")]
Try like below, it might work. (When I tried the website detected as a bot and price was not loaded)
driver.get("https://www.udemy.com/courses/development/?p=1")
options = driver.find_elements_by_xpath("//div[contains(#class,'course-list--container')]/div[contains(#class,'popper')]")
for opt in options:
title = opt.find_element_by_xpath(".//div[contains(#class,'title')]").text # Use a dot in the xpath to find element within in an element.
price = opt.find_element_by_xpath(".//div[contains(#class,'price-text--price-part')]/span[2]/span").text
print(f"{title}: {price}")
I am trying to extract data from https://www.realestate.com.au/
First I create my url based on the type of property that I am looking for and then I open the url using selenium webdriver, but the page is blank!
Any idea why it happens? Is it because this website doesn't provide web scraping permission? Is there any way to scrape this website?
Here is my code:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
PostCode = "2153"
propertyType = "house"
minBedrooms = "3"
maxBedrooms = "4"
page = "1"
url = "https://www.realestate.com.au/sold/property-{p}-with-{mib}-bedrooms-in-{po}/list-{pa}?maxBeds={mab}&includeSurrounding=false".format(p = propertyType, mib = minBedrooms, po = PostCode, pa = page, mab = maxBedrooms)
print(url)
# url should be "https://www.realestate.com.au/sold/property-house-with-3-bedrooms-in-2153/list-1?maxBeds=4&includeSurrounding=false"
driver = webdriver.Edge("./msedgedriver.exe") # edit the address to where your driver is located
driver.get(url)
time.sleep(3)
src = driver.page_source
soup = BeautifulSoup(src, 'html.parser')
print(soup)
you are passing the link incorrectly, try it
driver.get("your link")
api - https://selenium-python.readthedocs.io/api.html?highlight=get#:~:text=ef_driver.get(%22http%3A//www.google.co.in/%22)
I did try to access realestate.com.au through selenium, and in a different use case through scrapy.
I even got the results from scrapy crawling through use of proper user-agent and cookie but after a few days realestate.com.au detects selenium / scrapy and blocks the requests.
Additionally, it it clearly written in their terms & conditions that indexing any content in their website is strictly prohibited.
You can find more information / analysis in these questions:
Chrome browser initiated through ChromeDriver gets detected
selenium isn't loading the page
Bottom line is, you have to surpass their security if you want to scrape the content.
My main purpose is to go to this specific website, to click each of the products, have enough time to scrape the data from the clicked product, then go back to click another product from the page until all the products are clicked through and scraped (The scraping code I have not included).
My code opens up chrome to redirect to my desired website, generates a list of links to click by class_name. This is the part I am stuck on, I would believe I need a for-loop to iterate through the list of links to click and go back to the original. But, I can't figure out why this won't work.
Here is my code:
import csv
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
service = service.Service('path to chromedriver')
service.start()
capabilities = {'chrome.binary': 'path to chrome'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = driver.find_elements_by_class_name('product-name')
for link in links:
link.click()
driver.back()
link.click()
I have another solution to your problem.
When I tested your code it showed a strange behaviour. Fixed all problems that I had using xpath.
url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
driver.get(url)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(#class, 'product-name')]/a")]
htmls = []
for link in links:
driver.get(link)
htmls.append(driver.page_source)
Instead of going back and forward I saved all links (named as links) and iterate over this list.
I would like to scrape some interest rates. I need to use Selenium to access dynamically loaded content. For the Selenium part, the following works fine:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapy.selector import Selector
chromedriver = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.get("http://www.infochoice.com.au/banking/savings-account/term-deposit-interest-rates.aspx")
driver.find_element_by_xpath("//select[#name='SavingsTerm']/option[text()='7 days']").click()
Now I would like to parse the html content to get the interest rates using something like:
xpath("//*[#id='IC_ProductList107Rate']/table/tbody/tr[5]/td/text()").extract()
It should be very easy, however I am new to Python and could not figure out a suitable procedure so far.
How can this be implemented?
I don't know if I understand very well but you can try with this:
driver.find_element_by_xpath("//*[#id='IC_ProductList107Rate']/table/tbody/tr[5]/td/text()").text
or
driver.find_element_by_xpath("//*[#id='IC_ProductList107Rate']/table/tbody/tr[5]/td/text()").get_attribute(element_attribute_value)
element_attribute_value can be 'value', 'text' etc... depend which attrbute you have in your HTML code