How to scrape price from Udemy? - python

Note: I don't find a relevant worked solution in any other similar questions.
How to find price from udemy website with web scraping?
Scraping Data From Udemy , AngularJs Site Using PHP
How to GET promotional price using Udemy API?
My problem is how to scrape courses prices from Udemy using python & selenium?
This is the link:
https://www.udemy.com/courses/development/?p=1
My attempt is below.
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.udemy.com/courses/development/?p=1"
driver.get(url)
time.sleep(2)
#data = driver.find_element('//div[#class="price-text--price-part"]')
#data = driver.find_element_by_xpath('//div[contains(#class, "price-text--price-part"]')
#data=driver.find_element_by_css_selector('div.udlite-sr-only[attrName="price-text--price-part"]')
print(data)
Not of them worked for me. So, is there a way to select elements by classes that contain a specific text?
In this example, the text to find is: "price-text--price-part"

The first xpath doesn't highlight any element in the DOM.
The second xpath doesn't have a closing brackets for contains
//div[contains(#class, "price-text--price-part"]
should be
//div[contains(#class, "price-text--price-part")]
Try like below, it might work. (When I tried the website detected as a bot and price was not loaded)
driver.get("https://www.udemy.com/courses/development/?p=1")
options = driver.find_elements_by_xpath("//div[contains(#class,'course-list--container')]/div[contains(#class,'popper')]")
for opt in options:
title = opt.find_element_by_xpath(".//div[contains(#class,'title')]").text # Use a dot in the xpath to find element within in an element.
price = opt.find_element_by_xpath(".//div[contains(#class,'price-text--price-part')]/span[2]/span").text
print(f"{title}: {price}")

Related

Data scraping from dynamic sites

I am scraping data from dynamic site (https://www.mozzartbet.com/sr#/betting) using this code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service = s)
driver.get('https://www.mozzartbet.com/sr#/betting')
driver.maximize_window()
results = driver.find_elements('xpath', '//*[#id="focus"]/section[1]/div/div[2]/div[2]/article/div/div[2]/div/div[1]/span[2]/span')
for result in results:
print(result.text)
I want to scrape quotes from all matches in Premier League and once this code worked properly, and for some reason when I ran it next time it didn't make list results (it contained 0 elements), although I tried that xpath in inspect section of web page and it returned what I wanted, path is in code above.

Using Python (Selenium) to Scrape IMDB (.click() is not working)

I am trying to scrape a list of specific movies from IMDB using this tutorial.
The code is working fine expect for the for click to get the URL then saves in content. It is not working. The issue is that nothing change in chrome when running the code I really appreciate if anyone can help.
content = driver.find_element_by_class_name("tF2Cxc").click()
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import time
movie = 'Wolf Totem'
driver = webdriver.Chrome(executable_path=r"D:\chromedriver.exe")
#Go to Google
driver.get("https://www.google.com/")
#Enter the keyword
driver.find_element_by_name("q").send_keys(movie + " imdb")
time.sleep(1)
#Click the google search button
driver.find_element_by_name("btnK").send_keys(Keys.ENTER)
time.sleep(1)
You are using a wrong locator.
To open the a search result on Google page you should use this:
driver.find_element_by_xpath("//div[#class='yuRUbf']/a").click()
This locator will match all the 10 search results, so the first match is the first search result.
Also, clicking on that element will not give you any content, just open the first link below the title of the first search result.

Home Depot Product Page Not Showing Price Info

I have been using Selenium to web scrape HomeDepot, but the page returns NoneType for the price. When I checked, the product price box is stuck on loading, but when I use a regular browser it loads almost instantly. Here is the code I'm using
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r"C:\Users\User\PycharmProjects\untitled\drivers\chromedriver_win32\chromedriver.exe")
driver.set_page_load_timeout(10)
driver.get('https://www.homedepot.ca/product/malibu-wide-plank-maple-cardiff-3-8-inch-thick-x-6-1-2-inch-wide-x-varying-length-engineered-click-hardwood-flooring-23-64-sq-ft-case-/1001341771')
time.sleep(5)
price = driver.find_element_by_class_name('hdca-product__description-pricing-price-value')
print(price.text)
Has anyone else encountered this?
Coincidentally I also scraped Home Depot's website
I used CSS selectors
productPrice = product.css('.price__dollars::text').getall()
I used scrapy, Selenium isn't necessary for this website since it isnt dynamically loaded

Clicking multiple items on one page using selenium

My main purpose is to go to this specific website, to click each of the products, have enough time to scrape the data from the clicked product, then go back to click another product from the page until all the products are clicked through and scraped (The scraping code I have not included).
My code opens up chrome to redirect to my desired website, generates a list of links to click by class_name. This is the part I am stuck on, I would believe I need a for-loop to iterate through the list of links to click and go back to the original. But, I can't figure out why this won't work.
Here is my code:
import csv
import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup
url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
service = service.Service('path to chromedriver')
service.start()
capabilities = {'chrome.binary': 'path to chrome'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = driver.find_elements_by_class_name('product-name')
for link in links:
link.click()
driver.back()
link.click()
I have another solution to your problem.
When I tested your code it showed a strange behaviour. Fixed all problems that I had using xpath.
url = "https://www.vatainc.com/infusion/adult-infusion.html?limit=all"
driver.get(url)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(#class, 'product-name')]/a")]
htmls = []
for link in links:
driver.get(link)
htmls.append(driver.page_source)
Instead of going back and forward I saved all links (named as links) and iterate over this list.

LXML XPATH - Data returned from one site and not another

I'm just learning python and decided to play with some website scraping.
I created 1 that works, and a second, almost identical as far as I can tell, that doesn't work, and I can't figure out why.
from lxml import html
import requests
page = requests.get('https://thronesdb.com/set/Core')
tree = html.fromstring(page.content)
cards = [tree.xpath('//a[#class = "card-tip"]/text()'),tree.xpath('//td[#data-th = "Faction"]/text()'),
tree.xpath('//td[#data-th = "Cost"]/text()'),tree.xpath('//td[#data-th = "Type"]/text()'),
tree.xpath('//td[#data-th = "STR"]/text()'),tree.xpath('//td[#data-th = "Traits"]/text()'),
tree.xpath('//td[#data-th = "Set"]/text()'),tree.xpath('//a[#class = "card-tip"]/#data-code')]
print(cards)
That one does what I expect (I know it's not pretty). It grabs certain elements from a table on the site.
This one returns [[]]:
from lxml import html
import requests
page = requests.get('http://www.redflagdeals.com/search/#!/q=baby%20monitor')
tree = html.fromstring(page.content)
offers = [tree.xpath('//a[#class = "offer_title"]/text()')]
print(offers)
What I expect it to do is give me a list that has the text from each offer_title element on the page.
The xpath I'm gunning at I grabbed from Firebug, which is:
/html/body/div[1]/div/div/div/section/div[2]/ul[1]/li[2]/div/h3/a
Here's the actual string from the site:
Angelcare Digital Video And Sound Monitor - $89.99 ($90.00 Off)
I have also read a few other questions, but they didn't answer how this could work the first way, but not the second. Can't post them because of the link restrictions on new accounts.
Titles:
Python - Unable to Retrieve Data From Webpage Table Using Beautiful
Soup or lxml xpath
Python lxml xpath no output
Trouble with scraping text from site using lxml / xpath()
Any help would be appreciated. I did some reading on the lxml website about xpath, but I may be missing something in the way I'm building a query.
Thanks!
The reason why the first code is working is that required data is initially present in DOM while on second page required data is generated dynamically by JavaScript, so you cannot scrape it because requests doesn't support handling dynamic content.
You can try to use, for example, Selenium + PhantomJS to get required data as below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
driver = webdriver.PhantomJS(executable_path='/path/to/phantomJS')
driver.get('http://www.redflagdeals.com/search/#!/q=baby%20monitor')
xpath = '//a[#class = "offer_title"]'
wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath)))
offers = [link.get_attribute('textContent') for link in driver.find_elements_by_xpath(xpath)]

Categories

Resources