I am trying to webscrap some data from a website and for that I have to go through the age verification using selenium. I was wondering if there is way to change the store location in the popup. Below is my code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
url1="https://cannacabana.com/collections/all?page=1"
driver.get(url1)
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#store-select")))).select_by_visible_text('ajax')
if someone can help I would really appreciate. Thanks
Yes, you can select the store location.
It is a Select element there.
There is a special way to use such element with selenium.
You can select an option from the list of available options according to displayed text, index or value as described in the documentation
So your code could be something like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
url1="https://cannacabana.com/collections/all?page=1"
driver.get(url1)
select_element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "select#store-select")))
select = Select(select_element)
select.select_by_value('rideau')
Related
I would like to scrape the "Name" & "Address" from the following site:
https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3
However I am struggling with the referencing the correct field within the page and returning the results
Where I need your help is, to provide a working solution where the query, grabs the "name" from the webpage and provides the output of the "name"
Code:
import string
import pandas as pd
from lxml import html
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from IPython.core.display import display, HTML
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
Example Reference:
driver = webdriver.Chrome(chrome_options = options, executable_path=r'C:\Downloads\chromedriver.exe')
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
title = driver.find_elements(By.CSS_SELECTOR,'.slds-media__body h1 > a')
print(title.text)
Looking forward to your help!
Use webdriverwait and wait for visibility of element located.
driver.get("https://register.fca.org.uk/s/firm?id=001b000000MfNWNAA3")
name=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".slds-media__body h1"))).text
print(name)
address=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"h4[data-aura-rendered-by] ~p:nth-of-type(1)"))).text
print(address)
you need to import below libaries.
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
To extract the Name and Address ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using Name:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h1"))).text)
Using Address:
driver.get('https://register.fca.org.uk/s/firm?id=001b000000MfQU0AAN')
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[.//div[contains(., 'Address')]]//following-sibling::p[1]"))).text)
Console Output:
Mason Owen and Partners Ltd
Unity Building
20 Chapel Street
Liverpool
Merseyside
L3 9AG
L 3 9 A G
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
In addition to using WebDriverWait and visibility_of_element_located like others are suggesting, it's sometimes necessary to scroll an item into view.
This is a little function to make it more convenient to execute the JavaScript that does it:
def scrollto(element):
driver.execute_script("return arguments[0].scrollIntoView(true);", element)
I was trying to select an option using Selenium in python.
Below is my code:from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
url1="https://cannacabana.com/collections/all?page=1"
driver.get(url1)
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#store-select")))).select_by_visible_text('bayview')
I am getting timeout error, could it be because the website has Optgroup? I am not able to find a way through it.
No, it uses ".//" so Optgroup doesn't matter. You can see the implementation here.
I believe value="bayview" is not visible_text so you should use select_by_value() instead.
I am trying to webscrape from multiple pages, my code seems to work really well for just page one and when I use loop to do web scrapping for example first 5 pages then im getting below error:TimeoutException: Message:
Stacktrace:
Backtrace:
My code is below
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
please advise! Thanks in advance
The code block
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
Is relevant when you landing the home page first time.
Once you have selected the year and clicked Agree button you will be able to see all the pages of presented results with no need to select that year again.
So, your code could be something like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests as r
import time
from selenium.webdriver.support.ui import Select
PATH="chromedriver.exe"
driver=webdriver.Chrome(PATH)
_list=[]
for page_num in range(1,3):
#print("----")
url=f"https://valuebuds.com/pages/search-results-page?tab=products&page={page_num}"
driver.get(url)
if page_num == 1:
Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "select#year_field")))).select_by_visible_text('1999')
driver.find_element_by_class_name("agree").click()
else:
time.sleep(2)
title=driver.find_elements_by_class_name("snize-overhidden")
for j in title:
Pro=j.find_element_by_class_name("snize-title").text
Price=j.find_element_by_class_name("snize-price-list").text
Desc=j.find_element_by_class_name("snize-description").text
prec_item={
"Product":Pro,
"Price":Price,
"Description":Desc
}
_list.append(prec_item)
df = pd.DataFrame(_list)
df.to_csv("Value Buds HTML Pricing.csv")
print("saved to file.")
I have added a delay for non-first iteration to make pages loaded before you will scrape their data.
I will be better if you use Expected Conditions explicit waits there.
I don't know what condition to use there, leaved that for you decision.
I am trying to scrape this website: http://sekolah.data.kemdikbud.go.id/
I want to select "Jenjang" field, "SMA" value. After that, need to click "Cari Sekolah" button
Unfortunately, my code does not work. I manage to select SMA but then can't click "Cari Sekolah" to start the query. Anyone knows how to fix this.
Here is my code:
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from selenium.webdriver.support.ui import Select
option = webdriver.ChromeOptions()
option.add_argument('--incognito')
webdriver = "/Users/rs26/Desktop/learnpython/web/chromedriver"
driver = Chrome(executable_path=webdriver, chrome_options=option)
url="http://sekolah.data.kemdikbud.go.id/"
driver.get(url)
wait = WebDriverWait(driver,15)
select_element = Select(driver.find_element_by_id("bentuk"))
select_element.select_by_value("SMA")
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Cari Sekolah']"))).click()
Please find below solution to select from custom dropdown
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
# # Solution 1:
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.get('http://sekolah.data.kemdikbud.go.id/')
driver.maximize_window()
element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "select2-bentuk-container")))
element.click()
list=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//input[#class='select2-search__field']")))
list.send_keys("SMA")
select=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "select2-results__option")))
select.click()
You can use form button[type=submit] css selector to click.
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"form button[type=submit]"))).click()
this code when given a list of cities goes and searches on google and extract data then covert it into a dataframe
In some cases have to use different xpaths to extract the data. there are three xpaths in total.
Trying to do this :
if
1 doesnt work go to 2
2 doesnt work go to 3
3 doesnt work.
use driver.quit ()
tried this code used NoSuchElementException
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
from selenium.common.exceptions import NoSuchElementException
df_output = pd.DataFrame(columns=["City", "pincode"])
url = "https://www.google.com/"
chromedriver = ('/home/me/chromedriver/chromedriver.exe')
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get(url)
search = driver.find_element_by_name('q')
mlist1=['polasa']
for i in mlist1:
try:
search.send_keys(i,' pincode')
search.send_keys(Keys.RETURN)
WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, '//div[#class="IAznY"]//div[#class="title"]')))
elmts = driver.find_elements_by_xpath('//div[#class="IAznY"]//div[#class="title"]')
df_output = df_output.append(pd.DataFrame(columns=["City", "pincode"], data=[[i,elmts[0].text]]))
driver.quit()
except NoSuchElementException:
try:
elements=driver.find_element_by_xpath("//div[#class='Z0LcW']")
df_output = df_output.append(pd.DataFrame(columns=["City", "pincode"], data=[[i,elements.text]]))
driver.quit()
except NoSuchElementException:
try:
elements=driver.find_element_by_xpath("//div[#class='Z0LcW AZCkJd']")
df_output = df_output.append(pd.DataFrame(columns=["City", "pincode"], data=[[i,elements.text]]))
driver.quit()
except:
driver.quit()
this code works used one of the 3 tags here
need to combine 3 tags in a single code.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
import os
import html5lib
import json
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
url = "https://www.google.com/"
chromedriver = ('/home/me/chromedriver/chromedriver.exe')
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get(url)
search = driver.find_element_by_name('q')
search.send_keys('polasa',' pincode')
search.send_keys(Keys.RETURN)
elements=driver.find_element_by_xpath("//div[#class='Z0LcW']")
elements.text
``
You don't really need 3 try-catchs. You can do this without throwing exceptions by locating elements (plural) given a locator and then check the length of the collection returned. If length = 0, no elements were found.
The locators you are using don't require XPath so you can instead use a CSS selector and combine all three with an OR and avoid the three checks. (Note: you can do the same thing with XPath but the results are messier and harder to read)
Here are your 3 locators combined into one using OR (the comma) in CSS selector syntax
div.IAznY div.title, div.Z0LcW, div.Z0LcW.AZCkJd
...and the updated code using the combined locator and without the nested try-catch.
...
locator = (By.CSS_SELECTOR, 'div.IAznY div.title, div.Z0LcW, div.Z0LcW.AZCkJd')
for i in mlist1:
search.send_keys(i,' pincode')
search.send_keys(Keys.RETURN)
WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located(*locator)
elements = driver.find_elements_by_css_selector(*locator)
df_output = df_output.append(pd.DataFrame(columns=["City", "pincode"], data=[[i,elements[0].text]]))
driver.quit()
NOTE: I used your original locators and wasn't returning any results with any of the three. Are you sure they are correct?
Also note... I pulled the driver.quit() out of the loop. I'm not sure if you intended it to be inside or not but from the code provided, if the try succeeds in the first iteration, the browser will quit. You only have one item so you probably didn't notice this yet but would have been confused when you added another item to the iteration.