I am trying to scrape product's delivery date data from a bunch of lists of product urls.
I am running the python file on terminal doing multi-processing so ultimately this opens up multiple chrome browsers (like 10 - 15 of them) which slows down my computer quite a bit.
My code basically clicks a block that contains shipping options, which would would show a pop up box that shows estimated delivery time. I have included an example of a product url in the code below.
I noticed that some of my chrome browsers freeze and do not locate the element and click it like I have in my code. I've incorporated refreshing the page in webdriver into my code just in case that will do the trick but it doesn't seem like the frozen browsers are even refreshing.
I don't know why it would do that as I have set the webdriver to wait until the element is clickable. Do I just increase the time in time.sleep() or the seconds in webdriverwait() to resolve this?
chromedriver = "path to chromedriver"
driver = webdriver.Chrome(chromedriver, options=options)
# Example url
url = "https://shopee.sg/Perfect-Diary-X-Sanrio-MagicStay-Loose-Powder-Weightless-Soft-velvet-Blurring-Face-Powder-With-Cosmetic-Puff-Oil-Control-Smooth-Face-Powder-Waterproof-Applicable-To-Mask-Face-Cinnamoroll-Purin-Gudetama-i.240344695.5946255000?ads_keyword=makeup&adsid=1095331&campaignid=563217&position=1"
driver.get(url)
time.sleep(2)
try:
WebDriverWait(driver,60).until(EC.element_to_be_clickable((By.XPATH,'//div[#class="flex flex-column"]//div[#class="shopee-drawer "]'))).click()
while retries <= 5:
try:
shipping_block = WebDriverWait(driver,60).until(EC.element_to_be_clickable((By.XPATH,'//div[#class="flex flex-column"]//div[#class="shopee-drawer "]'))).click()
break
except TimeoutException:
driver.refresh()
retries += 1
except (NoSuchElementException, StaleElementReferenceException):
delivery_date = None
The element which you desire will get displayed when you hover the mouse, V The type of the element is svg which you need to handle accordingly.
you can take the help of this XPath to hover the mouse
((//*[name()='svg']//*[name()='g'])/*[name()='path'][starts-with(#d, 'm11 2.5c0 .1 0 .2-.1.3l')])
Getting the text from the popUp you need to check with this XPath
((//div[#class='shopee-drawer__contents']//descendant::div[4]))
You can use get_attribute("innerText") to get all the values
You can just check it here the same answer, I hope it will help
First, don't use headless option in webdriver.ChromeOption() to let the webpage window pop up to observe if the element is clicked or not.
Second, your code is just click the element, it doesn't help you to GET anything. After click and open new window, it should do something like this:
items = WebDriverWait(driver, 60).until(
EC.visibility_of_all_elements_located((By.CLASS_NAME, 'load-done'))
)
for item in items:
deliver_date= item.get_attribute('text')
print(deliver_date)
Related
I'm creating a webscraper with Selenium that will add products to a cart, and then cycle through cities, states and zip codes to give me the total cost of shipping + taxes for each area.
The website I'm using is: https://www.power-systems.com/shop/product/proelite-competition-kettlebell
Everything in my code appears to be working normally - the window will open, close a few pop ups and I can see Selenium select the proper option. However, regardless of whatever I've tried, after selenium clicks the "Add to Cart" button, it always adds the first option, despite having selected the proper one
Here is what I have been trying:
#created to simplify the code since I'll be using this option often
def element_present_click1(path_type,selector):
element_present = EC.element_to_be_clickable((path_type , selector))
WebDriverWait(driver, 30).until(element_present)
try:
driver.find_element(path_type , selector).click()
except:
clicker = driver.find_element(path_type , selector)
driver.execute_script("arguments[0].click();", clicker)
path = "C:\Program Files (x86)\msedgedriver.exe"
driver = webdriver.Edge(path)
driver.get('https://www.power-systems.com/shop/product/proelite-competition-kettlebell')
element_present_click1(By.CSS_SELECTOR,'button[name="bluecoreCloseButton"]')
element_present_click1(By.CSS_SELECTOR,'a[id="hs-eu-decline-button"]')
###this will correctly select the proper element
element_present_click1(By.XPATH, f"//span[text()='32 kg']")
###after this is clicked, it will always add the first option, which is 8 kg
driver.find_element(By.CSS_SELECTOR,'button.btn.btn-primary.add-to-cart.js-add-to-cart-button').submit()
I've tried a few different things, adding in some time.sleep() after clicking the option, refreshing the browser page, or selecting the option twice - no matter what I try, when I click add to cart it always adds the first option
Is there something I'm missing? Any help would be appreciated
You are using a wrong selector in the last step.
button.btn.btn-primary.add-to-cart.js-add-to-cart-button is not a unique locator.
You need to click the button inside the selected element block.
This will work:
driver.find_element(By.CSS_SELECTOR, ".variant-info.selected .add-to-cart-rollover").click()
It looks to me that find_element returns ONLY the first element it can find. Having taken a look at find_element it looks like you'd want to replace
driver.find_element(By.CSS_SELECTOR,'button...').submit()
with
random.choice(driver.find_elements(By.CSS_SELECTOR,'button...')).submit()
I am trying to find news articles related to different web searches within a custom time frame. If someone knows a better way to do this than with selenium then please do tell. I tried to use API's but they are all expensive.
My Code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=r".\chromedriver.exe")
driver.get("https://www.google.com")
cookie_button = driver.find_element_by_id('L2AGLb')
cookie_button.click()
search_bar = driver.find_element_by_name("q")
search_bar.clear()
search_bar.send_keys("Taylor Swift")
search_bar.send_keys(Keys.RETURN)
news_button = driver.find_element_by_css_selector('[data-hveid="CAEQAw"]')
news_button.click()
tools_button = driver.find_element_by_id('hdtb-tls')
tools_button.click()
recent_button = driver.find_element_by_class_name('KTBKoe')
recent_button.click()
custom_range_button = dashboards_button = driver.find_elements_by_tag_name('g-menu-item')[6]
custom_range_button.click()
driver.close()
I am trying to click this button:
I have tried to select, the span, all parent divs and now the g-menu-item itself, in every case I have received the error:
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
I understand this can happen when the element is not visible or can not be clicked but I also tried adding in sleep(2) between each operation, and I can see in the browser that selenium opens that it gets to the point in the image I have sent and then can't find the custom range button ??
In short:
Do you know how I can click this custom range button,
Or do you know how to scrape a lot of news articles from specific dates (without expensive API's)
Those g-menu are special type of tag,
you can locate them using :
//*[name()='g-menu-item']
this xpath will point to all the nodes.
further, I see that you are interested in 6th item.
(//*[name()='g-menu-item'])[6]
in code :
custom_range_button = driver.find_element_by_xpath("(//*[name()='g-menu-item'])[6]")
custom_range_button.click()
I'm kinda of a newbie in Selenium, started learning it for my job some time ago. Right now I'm working with a code that will open the browser, enter the specified website, put the products ID in the search box, search, and them open it. Once it opens the product, it needs to extract its name and price and write it in a CSV file. I'm kinda struggling with it a bit.
The main problem right now is that Selenium is unable to open the product after searching it. I've tried by ID, name and class and it still didn't work.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import csv
driver = webdriver.Firefox()
driver.get("https://www.madeiramadeira.com.br/")
assert "Madeira" in driver.title
elem = driver.find_element_by_id("input-autocomplete")
elem.clear()
elem.send_keys("525119")
elem.send_keys(Keys.RETURN)
product_link = driver.find_element_by_id('variant-url').click()
The error I get is usually this:
NoSuchElementException: Message: Unable to locate element: [id="variant-url"]
There are multiple elements with the id="variant-url", So you could use the index to click on the desired element, Also you need to handle I think the cookie pop-Up. Check the below code, hope it will help
#Disable the notifications window which is displayed on the page
option = Options()
option.add_argument("--disable-notifications")
driver = webdriver.Chrome(r"Driver_Path",chrome_options=option)
driver.get("https://www.madeiramadeira.com.br/")
assert "Madeira" in driver.title
elem = driver.find_element_by_id("input-autocomplete")
elem.clear()
elem.send_keys("525119")
elem.send_keys(Keys.RETURN)
#Clicked on the cookie popup button
accept= driver.find_element_by_xpath("//button[#id='lgpd-button-agree']")
accept.click()
OR with explicitWait
accept=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[#id='lgpd-button-agree']")))
accept.click()
#Here uses the XPath with the index like [1][2] to click on the specific element
product_link = driver.find_element_by_xpath("((//a[#id='variant-url'])[1])")
product_link.click()
OR with explicitWait
product_link=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"((//a[#id='variant-url'])[1])")))
product_link.click()
I used to get this error all the time when starting. The problem is that there are probably multiple elements with the same ID "variant-url". There are two ways you can fix this
By using "driver.find_elementS_by_id"
product_links = driver.find_elements_by_id('variant-url')
product_links[2].click()
This will create an index off all the elements with id 'variant-url' then click the index of 2. This works but it is annoying to find the correct index of the button you want to click and also takes a long time if there are many elements with the same ID.
By using Xpaths or CSS selectors
This way is a lot easier as each element has a specific Xpath of Selector. It will look like this.
product_link = driver.get_element_by_xpath("XPATH GOES HERE").click()
To get an Xpath or selector 1. go into developer mode on your browser by inspecting the element 2. right click the element in the F12 menu 3. hover over copy 4. move to copy Xpath and click on it.
Hope this helps you :D
In Python 3 and selenium I have this script to automate the search of terms in a site with public information
from selenium import webdriver
# Driver Path
CHROME = '/usr/bin/google-chrome'
CHROMEDRIVER = '/home/abraji/Documentos/Code/chromedriver_linux64/chromedriver'
# Chosen browser options
chrome_options = webdriver.chrome.options.Options()
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.binary_location = CHROME
# Website accessed
link = 'https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7'
# Search term
nome = "MARCONI FERREIRA PERILLO JUNIOR"
# Waiting time
wait = 60
# Open browser
browser = webdriver.Chrome(CHROMEDRIVER, options = chrome_options)
# Implicit wait
browser.implicitly_wait(wait)
# Access the link
browser.get(link)
# Search by term
browser.find_element_by_xpath("//*[#id='NomeParte']").send_keys(nome)
browser.find_element_by_xpath("//*[#id='btnBuscarProcPublico']").click()
# Searches for the text of the last icon - the last page button
element = browser.find_element_by_xpath("//*[#id='divTabela']/div[2]/div[2]/div[4]/div[2]/ul/li[9]/a").text
element
'»'
This site when searching for terms paginates results and always shows as the last pagination button the "»" button.
The next to last button in the case will be "›"
So I need to capture the button text always twice before the last one. Here is this case the number "8", to automate page change - I will know how many clicks on next page will be needed
Please, when I search Xpath how do I capture the element two positions before?
I know this is not an answer to the original question.
But clicking the next button several times is not a good practice.
I checked the network traffic and see that they are calling their API url with offset parameter. You should be able to use this URL with proper offset as you need.
If you really need to access the last but two, you need to get the all navigation buttons first and then access by indexing as follows.
elems = self.browser.find_elements_by_xpath(xpath)
elems[-2]
EDIT
I just tested their API and it works with proper cookie value given.
This way is much faster than automation using Selenium.
Use Selenium just to extract cookie value to be used in the header of the web request.
I've never studied HTML seriously, so probably what I'm going to say is not right.
While I was writing my selenium code, I noticed that some buttons on some webpages does not redirect to other pages, but they change the structure of the the first one. From what I've understood, this happens because there's some JavaScript code that modifies it.
So, when I wanna get some data which is not present on the first page loading, I just have to click the right sequence of button to obtain it, rigth?
The page I wanted to load is https://watch.nba.com/, and what I want to get is the match list of a given day. The fastest path to get it is to open the calendary:
calendary = driver.find_element_by_class_name("calendar-btn")
calendary.click()
and click the selected day:
page = calendary.find_element_by_xpath("//*[contains(text(), '" + time.strftime("%d") + "')]")
page.click()
running this code, I get this error:
selenium.common.exceptions.ElementNotVisibleException
I read somewhere that the problem is that the page is not correctly loaded, or the element is not visible/clickable, so I tried with this:
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(text(), '" + time.stfrtime("%d") + "')]")))
page.click()
But this time I get this error:
selenium.common.exceptions.TimeoutException
Can you help me to solve at least one of these two problems?
The reason you are getting such behavior is because this page is loaded with iFrames (I can see 15 on the main page) once you click the calendar button, you will need to switch your context to either be on the defaultContext or a specific iframe where the calendar resides. There is tons of code outthere that shows you how to switch into and out of iframe. Hope this helps.