Hi I'm trying to select the name "Saleem" from the HTML. It is inside a table, but I don't know how if that is relevant. The name might not always be there, so I'm trying to find a way to select the element if the name is included in the table. For example "Liam" does not appear when searched, but "saleem" does. How do I click the link that appears when saleem is searched? For some reason, Selenium can't find the element with the code I wrote below.
Here is the website (I just put Saleem in the name category and searched):
https://sanctionssearch.ofac.treas.gov/default.aspx
I tried the code below, but unfortunately does not work.
driver.find_element_by_id("btnDetails").click()
<a> id="btnDetails" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$MainContent$gvSearchResults$ctl02$btnDetails", "", false, "", "Details.aspx?id=5839", false, true))" style="color:Blue">AL-IFRI, Saleem </a>
Any help is appreciated!
Yes, you can use it inside try..except
from selenium.common.exceptions import NoSuchElementException
# YOUR CODE
try:
webdriver.find_element_by_id('btnDetails')
except NoSuchElementException:
# Element does not exist
else:
# Element exists
This will select the search results from the name saleem and take you to Saleem's page. You can do what you want to do on that page then. To come back to the search results just use browser.back()
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser_options = Options()
browser_options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 4.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36"')
browser_options.add_argument('start-maximized')
browser = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe',options=browser_options)
browser.get('https://sanctionssearch.ofac.treas.gov/default.aspx')
search_box = WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="ctl00_MainContent_txtLastName"]')))
search_box.send_keys('Saleem')
search_box.send_keys(Keys.ENTER)
time.sleep(4)
try:
results = browser.find_element_by_xpath('//*[#id="gvSearchResults"]/tbody').find_elements_by_tag_name('td')
for result in results:
button = result.find_element_by_tag_name('a')
button.click()
#Do Something
browser.back(()
browser.refresh()
Related
I am trying to write a code that scrapes all reviews from a single hotel on tripadvisor. The code runs through all pages except the last one, where it has a problem. It says that the problem is the next.click() in the loop. I am assuming this is because "next" is still present in the element, but just disabled. Anyone know how to fix this problem? I basically want it to not try to click next when it reaches the last page/when it is disabled, but still technically present. Any help would be much appreciated!
#maybe3.1
from argparse import Action
from calendar import month
from distutils.command.clean import clean
from lib2to3.pgen2 import driver
from os import link
import unittest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import ElementNotInteractableException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from dateutil import relativedelta
from selenium.webdriver.common.action_chains import ActionChains
import time
import datetime
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import requests
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Extract the HTML and create a BeautifulSoup object.
url = ('https://www.tripadvisor.com/Hotel_Review-g46833-d256905-Reviews-Knights_Inn_South_Hackensack-South_Hackensack_New_Jersey.html#REVIEWS')
user_agent = ({'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/90.0.4430.212 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'})
driver = webdriver.Chrome()
driver.get(url)
# Find and extract the data elements.
wait = WebDriverWait(driver,30)
wait.until(EC.element_to_be_clickable((By.XPATH,'//*[#id="component_15"]/div/div[3]/div[13]/div')))
#explicit wait here
next = driver.find_element(By.XPATH,'.//a[#class="ui_button nav next primary "]')
here = next.is_displayed()
while here == True:
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(2)
Titles = []
for title in soup.findAll('a',{'Qwuub'}):
Titles.append(title.text.strip())
reviews = []
for review in soup.findAll('q',{'class':'QewHA H4 _a'}):
reviews.append(review.text.strip())
next.click()
if here != True:
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')
time.sleep(8)
break
# Create the dictionary.
dict = {'Review Title':Titles,'Reviews/Feedback':reviews}
# Create the dataframe.
datafr = pd.DataFrame.from_dict(dict)
datafr.head(10)
# Convert dataframe to CSV file.
datafr.to_csv('hotels1.855.csv', index=False, header=True)
This question might be in the same vein like:
python selenium to check if this text field is disabled or not
You can check if an element is enabled with:
driver.find_element_by_id("id").is_enabled
You can also wrap the code in a try/except block.
page=2
while True:
try:
#your code
driver.find_element(By.XPATH,f"//a[#class='pageNum ' and text()='{page}']").click()
page+=1
time.sleep(1)
except:
break
Should be a simple loop to go through all pages wait till the a tag in question is not valid anymore.
import time
I have a problem with even an open website using "webdriver Chrome". Only trying to open the website end with "Access denied" information and don't know why.
Below is my code:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
class PriceCheckPhoenix:
def __init__(self):
self.url_login = "https://www.phoenixcontact.com/online/portal/pl?1dmy&urile=wcm%3apath%3a/plpl/web/home"
self.create_session()
def create_session(self):
# Run browser with webdriver
driver = webdriver.Chrome(executable_path="D:/chromedriver_v84.exe")
driver.get(self.url_login)
time.sleep(2)
# Find link to sub-website with login
link = driver.find_element_by_xpath('//*[#id="pxc-funcnav"]/div[3]/ul/li[1]/a').get_attribute("href")
driver.get(link)
time.sleep(100)
Description to code:
#1 I create browser chrome session
#2 Loading first website from self.url_login
#3 Is loaded
#4 I need to find a link behind the active text on the website to log in
#5 I found it and try to open this, but the response after getting a link is:
Access Denied
You don't have permission to access
"http://www.phoenixcontact.com/online/portal/pl/pxc/offcontext/login/!ut/p/z1/tZJNa4NAEIZ_Sw45yszuuro9WkO1xqY2EqN7EbXGWPzYFDGlv74Gcio0oYTMZRgY3mcYHpAQg-yysa6yoe67rJnmRBqpu4zownzixDEYx2cWmIYTeYgrHSKQIFVRv0MieJZTZEITglFNLwTXRPaw03RGC6Qm10nOTttFN6hhD4lqVDPHY5nPcd-3JSQTy0ypQ5C4Onl5XUcmvgXCttzNWo-WCNuxLo-w6frPdjot_CfZxWsEciPhSjy7a7xN7xt_63M8kJdNmlSrPw4HaU2G9N1Qfg0Q_1Zke4JeiPHIeQH_KAshVE0a-GkQ24EPqm0F41WbLh5XWuKN3-fm78KgsmazH7dw0Ts!/dz/d5/L0lJSklKQ2dwUkEhIS9JRGpBQUF4QUFFUkNwcVlxLzRObEdRb1lwTWhUalVFZyEvWjZfR0FMNjE0ODI4RzNEQzBJMklPMlA2OTFHMDMvWjdfR0FMNjE0ODI4RzNEQzBJMklPMlA2OTFHSTcvdGFyZ2V0Vmlldy9sb2dpbg!!/" on this server.
Reference #18.d58655f.1597921471.5b29112
Is anyone know what is wrong here? :( When I try to load the website from the link in normal Chrome browser it's all fine :/
Thank you all for any help.
Please try the below code and let me know if it works for you :-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
options = Options()
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.517 Safari/537.36'
options.add_argument('user-agent={0}'.format(user_agent))
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 20)
action = ActionChains(driver)
driver.get("https://www.phoenixcontact.com/online/portal/pl?1dmy&urile=wcm%3apath%3a/plpl/web/home")
Login_Btn = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#class='pxc-fn-login']/a")))
action.move_to_element(Login_Btn).click().perform()
Note - Please make the changes in your code accordingly.
Google search brought me here. After trying several options. Undetected Chromedriver with a very simple script without any options worked for me.
import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get(<url here>)
I'm trying to select 'Newest' from the drop-down menu.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
# options.add_argument('--headless')
driver = webdriver.Chrome(chrome_options=options)
url = 'https://play.google.com/store/apps/details?id=com.whatsapp&hl=en&showAllReviews=true'
driver.get(url)
state_selection = driver.find_element_by_xpath("//div[.='%s']" % "Most relevant")
state_selection.click()
state_selection.send_keys(Keys.UP)
state_selection.send_keys(Keys.UP)
state_selection2 = driver.find_element_by_xpath("//div[.='%s']" % "Newest")
state_selection2.send_keys(Keys.RETURN)
but as soon as it reaches Newest and as I send command to press enter(as shown in code),it resets to "Most Relevent". I'm not able to get my head around on how to achieve this.
After you have clicked state_selection, something like this will click "Newest":
driver.find_element_by_xpath("//div[#role='option']/span[contains(text(),'Newest')]").click()
The more robust method would be working with WebdriverWait to allow the DOM to update, so:
WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.XPATH, "//div[#role='option']/span[contains(text(),'Newest')]"))).click()
Note you need these imports for WebdriverWait:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
There are different ways to find
Index
Value
Visible Text
When you use the xpath if the values are changed in future,it pick that element present in that location only.So its better to user select by visible text
state_selection=Select(driver.find_element_by_xpath("//div[.='%s']" % "Most relevant").click();
state_selection.select_by_visible_text("Dropdown Visible Text")
In the below URL i need to click a mail icon hyperlink, sometimes it is not working even code is correct, in this case driver needs to wait upto 10 seconds and go to the next level
https://www.sciencedirect.com/science/article/pii/S1001841718305011
tags = driver.find_elements_by_xpath('//a[#class="author size-m workspace-trigger"]//*[local-name()="svg"]')
if tags:
for tag in tags:
tag.click()
how to use explicitly or implicitly wait here-- "tag.click()"
from my understanding, after the element clicked it should wait until author popup appear then extract using details() ?
tags = driver.find_elements_by_css_selector('svg.icon-envelope')
if tags:
for tag in tags:
tag.click()
# wait until author dialog/popup on the right appear
WebDriverWait(driver, 10).until(
lambda d: d.find_element_by_class_name('e-address') # selector for email
)
try:
details()
# close the popup
driver.find_element_by_css_selector('button.close-button').click()
except Exception as ex:
print(ex)
continue
As an aside.. you can extract the author contact e-mails (which are same as for click) from json like string in one of the scripts
from selenium import webdriver
import json
d = webdriver.Chrome()
d.get('https://www.sciencedirect.com/science/article/pii/S1001841718305011#!')
script = d.find_element_by_css_selector('script[data-iso-key]').get_attribute('innerHTML')
script = script.replace(':false',':"false"').replace(':true',':"true"')
data = json.loads(script)
authors = data['authors']['content'][0]['$$']
emails = [author['$$'][3]['$']['href'].replace('mailto:','') for author in authors if len(author['$$']) == 4]
print(emails)
d.quit()
You can also use requests to get all the recommendations info
import requests
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
}
data = requests.get('https://www.sciencedirect.com/sdfe/arp/pii/S1001841718305011/recommendations?creditCardPurchaseAllowed=true&preventTransactionalAccess=false&preventDocumentDelivery=true', headers = headers).json()
print(data)
Sample view:
You have to wait until the element is clickable . You can do it with WebDriverWait function.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('url')
elements = driver.find_elements_by_xpath('xpath')
for element in elements:
try:
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.LINK_TEXT, element.text)))
finally:
element.click()
You can try like below to click on the hyperlinks containing mail icon. When a click is initiated, a pop up box shows up containing additional information. The following script can fetch the email address from there. It's always a great trouble to dig out anything when svg element are there. I've used BeautifulSoup library in order for the usage of .extract() function to kick out svg element so that the script can reach the content.
from bs4 import BeautifulSoup
from contextlib import closing
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
with closing(webdriver.Chrome()) as driver:
driver.get("https://www.sciencedirect.com/science/article/pii/S1001841718305011")
for elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[starts-with(#name,'baut')]")))[-2:]:
elem.click()
soup = BeautifulSoup(driver.page_source,"lxml")
[item.extract() for item in soup.select("svg")]
email = soup.select_one("a[href^='mailto:']").text
print(email)
Output:
weibingzhang#ecust.edu.cn
junhongqian#ecust.edu.cn
use the builtin time.sleep() function
from time import sleep
tags = driver.find_elements_by_xpath('//a[#class="author size-m workspace-trigger"]//*[local-name()="svg"]')
if tags:
for tag in tags:
sleep(10)
tag.click()
I have written a script in python using selenium to fetch the business summary (which is within p tag) located at the bottom right corner of a webpage under the header Company profile. The webpage is heavily dynamic, so I thought to use a browser simulator. I have created a css selector, which is able to parse the summary if I copy the html elements directly from that webpage and try on it locally. For some reason, when I tried the same selector within my below script, it doesn't do the trick. It throws timeout exception error instead. How can I fetch it?
This is my try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
link = "https://in.finance.yahoo.com/quote/AAPL?p=AAPL"
def get_information(driver, url):
driver.get(url)
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
driver.execute_script("arguments[0].scrollIntoView();", item)
print(item.text)
if __name__ == "__main__":
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
try:
get_information(driver,link)
finally:
driver.quit()
It seem that there is no Business Summary block initially, but it is generated after you scroll page down. Try below solution:
from selenium.webdriver.common.keys import Keys
def get_information(driver, url):
driver.get(url)
driver.find_element_by_tag_name("body").send_keys(Keys.END)
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
print(item.text)
You have to scroll the page down twice until the element will be present:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
link = "https://in.finance.yahoo.com/quote/AAPL?p=AAPL"
def get_information(driver, url):
driver.get(url)
driver.find_element_by_tag_name("body").send_keys(Keys.END) # scroll page
time.sleep(1) # small pause between
driver.find_element_by_tag_name("body").send_keys(Keys.END) # one more time
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
driver.execute_script("arguments[0].scrollIntoView();", item)
print(item.text)
if __name__ == "__main__":
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
try:
get_information(driver,link)
finally:
driver.quit()
If you will scroll only one time it won't work properly at some reason(at least for me). I think it depends on window dimensions, on the smaller window you have to scroll more than on a bigger one.
Here is a much simpler approach using requests and working with the JSON data that is already in the page. I would also recommend to always use request if possible. It may take some extra work but the end result is a lot more reliable / cleaner. You could also take my example a lot further and parse the JSON to work directly with it (you need to clean up the text to be valid JSON). In my example I just use split which was just faster to do but it could lead to problems down the road when doing something more complex.
import requests
from lxml import html
url = 'https://in.finance.yahoo.com/quote/AAPL?p=AAPL'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
r = requests.get(url, headers=headers)
tree = html.fromstring(r.text)
data= [e.text_content() for e in tree.iter('script') if 'root.App.main = ' in e.text_content()][0]
data = data.split('longBusinessSummary":"')[1]
data = data.split('","city')[0]
print (data)