facing difficulty with automating browser using selenium is not working - python

I want to scrape the website but rn I'm testing with automating browser input *weights to he website, but still its not working. I have seen many videos everyone does the same but sometimes the result is "TIMEOUT etc." or "sub.click() is not clickable".
I have been working on this website for days now but couldn't scarp the data.
here's a link to website --> website takes multiple input and after enquiry it displays table below but the site is too slow so, it takes time to show the table
weights can only range from (0.1 to 2.0)
the table is displayed only when user input as for HTML table div cannot be seen but after submitting it displays the HTML code too.
** I'm not an expert in coding its new for me to scrap data. Any help would really appreciate it. THANK YOU
from time import time
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
mydriver = webdriver.Chrome(options=options)
mydriver.get('https://www.yunexpress.com/price/query.html')
mydriver.maximize_window()
eng = mydriver.find_element(By.XPATH, "//a[#class='sellang-en']")
eng.click()
weight = mydriver.find_element(By.XPATH, "//input[#name='txtweight']")
weight.send_keys('0.5')
xyz =mydriver.find_element(By.XPATH, "//div[#class='price-search-title']")
mydriver.execute_script("arguments[0].scrollIntoView();",xyz)
sub = mydriver.find_element(By.XPATH, "//button[#class='price-submit']")
sub.click()
result = mydriver.find_element(By.XPATH, "//div[#class='layui-none']")
print (result[0].text)

Related

Data scraping from dynamic sites

I am scraping data from dynamic site (https://www.mozzartbet.com/sr#/betting) using this code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
s = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(service = s)
driver.get('https://www.mozzartbet.com/sr#/betting')
driver.maximize_window()
results = driver.find_elements('xpath', '//*[#id="focus"]/section[1]/div/div[2]/div[2]/article/div/div[2]/div/div[1]/span[2]/span')
for result in results:
print(result.text)
I want to scrape quotes from all matches in Premier League and once this code worked properly, and for some reason when I ran it next time it didn't make list results (it contained 0 elements), although I tried that xpath in inspect section of web page and it returned what I wanted, path is in code above.

How to scrape price from Udemy?

Note: I don't find a relevant worked solution in any other similar questions.
How to find price from udemy website with web scraping?
Scraping Data From Udemy , AngularJs Site Using PHP
How to GET promotional price using Udemy API?
My problem is how to scrape courses prices from Udemy using python & selenium?
This is the link:
https://www.udemy.com/courses/development/?p=1
My attempt is below.
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://www.udemy.com/courses/development/?p=1"
driver.get(url)
time.sleep(2)
#data = driver.find_element('//div[#class="price-text--price-part"]')
#data = driver.find_element_by_xpath('//div[contains(#class, "price-text--price-part"]')
#data=driver.find_element_by_css_selector('div.udlite-sr-only[attrName="price-text--price-part"]')
print(data)
Not of them worked for me. So, is there a way to select elements by classes that contain a specific text?
In this example, the text to find is: "price-text--price-part"
The first xpath doesn't highlight any element in the DOM.
The second xpath doesn't have a closing brackets for contains
//div[contains(#class, "price-text--price-part"]
should be
//div[contains(#class, "price-text--price-part")]
Try like below, it might work. (When I tried the website detected as a bot and price was not loaded)
driver.get("https://www.udemy.com/courses/development/?p=1")
options = driver.find_elements_by_xpath("//div[contains(#class,'course-list--container')]/div[contains(#class,'popper')]")
for opt in options:
title = opt.find_element_by_xpath(".//div[contains(#class,'title')]").text # Use a dot in the xpath to find element within in an element.
price = opt.find_element_by_xpath(".//div[contains(#class,'price-text--price-part')]/span[2]/span").text
print(f"{title}: {price}")

Using Python (Selenium) to Scrape IMDB (.click() is not working)

I am trying to scrape a list of specific movies from IMDB using this tutorial.
The code is working fine expect for the for click to get the URL then saves in content. It is not working. The issue is that nothing change in chrome when running the code I really appreciate if anyone can help.
content = driver.find_element_by_class_name("tF2Cxc").click()
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import time
movie = 'Wolf Totem'
driver = webdriver.Chrome(executable_path=r"D:\chromedriver.exe")
#Go to Google
driver.get("https://www.google.com/")
#Enter the keyword
driver.find_element_by_name("q").send_keys(movie + " imdb")
time.sleep(1)
#Click the google search button
driver.find_element_by_name("btnK").send_keys(Keys.ENTER)
time.sleep(1)
You are using a wrong locator.
To open the a search result on Google page you should use this:
driver.find_element_by_xpath("//div[#class='yuRUbf']/a").click()
This locator will match all the 10 search results, so the first match is the first search result.
Also, clicking on that element will not give you any content, just open the first link below the title of the first search result.

Home Depot Product Page Not Showing Price Info

I have been using Selenium to web scrape HomeDepot, but the page returns NoneType for the price. When I checked, the product price box is stuck on loading, but when I use a regular browser it loads almost instantly. Here is the code I'm using
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r"C:\Users\User\PycharmProjects\untitled\drivers\chromedriver_win32\chromedriver.exe")
driver.set_page_load_timeout(10)
driver.get('https://www.homedepot.ca/product/malibu-wide-plank-maple-cardiff-3-8-inch-thick-x-6-1-2-inch-wide-x-varying-length-engineered-click-hardwood-flooring-23-64-sq-ft-case-/1001341771')
time.sleep(5)
price = driver.find_element_by_class_name('hdca-product__description-pricing-price-value')
print(price.text)
Has anyone else encountered this?
Coincidentally I also scraped Home Depot's website
I used CSS selectors
productPrice = product.css('.price__dollars::text').getall()
I used scrapy, Selenium isn't necessary for this website since it isnt dynamically loaded

BeautifoulSoup not returning everything in Facebook

I'm trying to extract all the pages liked by a given person on Facebook. Therefore, I'm using Python with BeautifulSoup and selenium to automatize the connection.
However, even though my code works, it doesn't actually return all the results (on my own profile, for instance, it only returns about 20% of all pages).
I read that it might be the parser used in BeautifulSoup, but I tried a bunch of them (html.parser, lxml...) and it's always the same thing.
Could that be because Facebook is dynamically generating the pages with AJAX? But then I have Selenium, which should correctly interpret it..!
Here is my code:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
id_user = ""
driver = webdriver.Chrome()
driver.get('https://facebook.com')
driver.find_element_by_id('email').send_keys('')
driver.find_element_by_id('pass').send_keys('')
driver.find_element_by_id('loginbutton').click()
time.sleep(2)
pages_liked = "https://www.facebook.com/search/" + id_user + "/pages-liked"
driver.get(pages_liked)
soup = BeautifulSoup(driver.page_source, 'html.parser')
likes_divs = soup.find_all('a', class_="_32mo")
for div in likes_divs:
print(div['href'].split("/?")[0])
print(div.find('span').text)
Thank you very much,
Loïc
Facebook is famous for make web scrapers's life dificult... That said, looks like you do your homework correctly, the snipet looks rigth to the point.
Start to look into 'driver.page_source', what Selenium gets... if the information is in there, the problem is within BeautifulSoup, if its not, Facebook found an strategy to hide the page (looking at browser signature or fingerprint - yes, these are diferent concepts).

Categories

Resources