Wait for every element in Selenium - python

I want to retrieve from a website every div class='abcd' element using Selenium together with 'waiter' and 'XPATH' classes from Explicit.
The source code is something like this:
<div class='abcd'>
<a> Something </a>
</div>
<div class='abcd'>
<a> Something else </a>
...
When I run the following code (Python) I get only 'Something' as a result. I'd like to iterate over every instance of the div class='abcd' appearing in the source code of the website.
from explicit import waiter, XPATH
from selenium import webdriver
driver = webdriver.Chrome(PATH)
result = waiter.find_element(driver, "//div[#class='abcd']/a", by=XPATH).text
Sorry if the explanation isn't too technical, I'm only starting with webscraping. Thanks

I've used like this. You can also use if you like this procedure.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(PATH)
css_selector = "div.abcd"
results = WebDriverWait(driver, 10).until((expected_conditions.presence_of_all_elements_located((By.CSS_SELECTOR, css_selector))))
for result in results:
print(result.text)

Related

How to perform click on element that inside flexbox

I'm working on a web-scraping project , I encounter a problem that I couldn't locate the element(1H) by using find_element_by_xpath/id/css-selector/class_name and perform click()on it. Does anyone have any ideas how to make it work ? Thanks in advance!
Here's the part of my code
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from datetime import timedelta, date
import time
import datetime
import re
import mouse
def scrape():
website = 'https://www.binance.com/en/trade-margin/BTC_USDT'
path = '/Users/admin/Downloads/chromedriver'
driver = webdriver.Chrome(path)
driver.get(website)
driver.maximize_window()
#click on 1H
actions = ActionChains(driver)
one_hour = driver.find_element_by_xpath('//div[#class="css-e2pgpg"][#id='1h']')
actions.click(one_hour).perform()
Here's the html
#it is flexbox
<div class="css-e2pgpg">
<div id="Time" class="css-1pj8e72">Time</div>
<div id="15m" class="css-1pj8e72">15m</div>
<div id="1h" class="css-ktyfxp">1H</div> #i want to locate this element and perform click on it
<div id="4h" class="css-1pj8e72">4H</div>
<div id="1d" class="css-1pj8e72">1D</div>
<div id="1w" class="css-1pj8e72">1W</div>
</div>
I went to see the webpage myself. In my opinion, you don't have to locate the flexbox itself.
Just try with:
driver.find_element_by_id('1h').click();
Also, I would add some kind of wait until the page loads fully. You can use Thread.sleep, implicit waits or waiting for an element.
If you are just looking to click on 1H web element, you can do it by using the below code. We have to induce explicit wait to get the job done.
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.binance.com/en/trade-margin/BTC_USDT")
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[text()='1H']"))).click()
print("Clicked")
except:
pass
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Working with Selenium, clicking on JS as href link

I was hoping for maybe some clarification or suggestions on how to get selenium working with hyperlink. I have tried selecting it by pretty much every element possible. I have also attached a image with the source to look at and compare...
Example of html:
<div class="x-grid3-cell-inner x-grid3-col-ACTION_COLUMN" id="5005a00001rJ22q_ACTION_COLUMN"><span>Edit</span> | </div>
Examples:
#clicky = driver.find_element_by_xpath('//a[contains(#href,"javascript:srcUp(%27%2F5005a00001pKfzm%3Fisdtp%3Dvw%27);")]').click()
#clicky = driver.find_elements_by_partial_link_text('javascript:srcUp(%27%2F5005a00001pKfzm%3Fisdtp%3Dvw%27);').click()
#clicky = driver.find_element_by_xpath("//a[#href='javascript:srcUp(%27%2F5005a00001pKfzm%3Fisdtp%3Dvw%27);']").click()
Try this:
driver.find_element_by_xpath("//div[.//a[contains(#class,'chatterFollowUnfollowAction')]]//a[contains(#href,'javascript:srcUp')]").click()
In order to add an explicit wait before accessing this element use this:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.XPATH, "//div[.//a[contains(#class,'chatterFollowUnfollowAction')]]//a[contains(#href,'javascript:srcUp')]"))).click()
Try this:
someElement = driver.find_element_by_xpath("//a[contains(#class, 'chatterFollowUnfollowAction' and #title,'Follow this case')]")
driver.execute_script("arguments[0].click();",someElement)

How to handle drop down list with div tag and multiple attributes?

I am trying to automate the location selection process, however I am struggle with it.
So for, I can only open the menu and select the first item.
And my code is:
import bs4
from bs4 import BeautifulSoup
import selenium
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(ChromeDriverManager().install())
url = 'https://www.ebay.com/b/Food-Beverages/14308/bn_1642947?listingOnly=1&rt=nc'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
button = driver.find_element_by_id('gh-shipto-click') //find the location button
button.click()
button2 = driver.find_element_by_id('gh-shipto-click-body-cnt') //open the menu
button2.click()
driver.find_element(By.XPATH,"//div[#role='menuitemradio']").click() //choose the first location
I believe the attribute "data-makeup-index" (show in the pic) would help, but I don't know how to use it.
Sine some of you may not able to find the "ship to" button. Here is the html code I copied from the web.
<li id="gh-shipto-click" class="gh-eb-li">
<div class="gh-menu">
<button _sp="m570.l46241" title="Ship to" class="gh-eb-li-a gh-icon" aria-expanded="false" aria-controls="gh-shipto-click-o" aria-label="Ship to Afghanistan"><i class="flgspr gh-flag-bg flaaf"></i><span>Ship to</span></button>
<i class="flgspr gh-flag-bg flaaf"></i>
I have found my answer as below:
driver.find_element(By.XPATH,"//div[#role='menuitemradio'][{an integer}]").click()
Although the Ship to was not visible to my location, I've inspected it from different Geo location and guess what I was able to find this elementstyle="display: none;
I've removed the none temporarily, and it got displayed.
<li id="gh-shipto-click" class="gh-eb-li gh-modal-active" style="display:none;" aria-hidden="false">
You could find the element by these XPath and handle the dropdown
#This XPath is taking you to the specific div where you need to provide the element index [1][2][3] the ID of this XPath is dynamic
driver.find_element_by_xpath("((((//*[contains(#id,'nid-')])[2])//child::div/div")
driver.find_element_by_xpath("//*[#class='cn']")
I know you got your solution, but this what I've tried It might help

Use Python to Scrape for Data in Family Search Records

I am trying to scrape the following record table in familysearch.org. I am using the Chrome webdriver with Python, using BeautifulSoup and Selenium.
Upon inspecting the page I am interested in, I wanted to scrape from the following bit in HTML. Note this is only one element part of a familysearch.org table that has 100 names.
<span role="cell" class="td " name="name" aria-label="Name"> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> <span><sr-cell-name name="Jame Junior " url="ZS" relationship="Principal" collection-name="Index"></sr-cell-name></span> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> </span>
Alternatively, the name also shows in this bit of HTML
<a class="name" href="/ark:ZS">Jame Junior </a>
From all of this, I only want to get the name "Jame Junior", I have tried using driver.find.elements_by_class_name("name"), but it prints nothing.
This is the code I used
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
from getpass import getpass
username = input("Enter Username: " )
password = input("Enter Password: ")
chrome_path= r"C:\Users...chromedriver_win32\chromedriver.exe"
driver= webdriver.Chrome(chrome_path)
driver.get("https://www.familysearch.org/search/record/results?q.birthLikeDate.from=1996&q.birthLikeDate.to=1996&f.collectionId=...")
usernamet = driver.find_element_by_id("userName")
usernamet.send_keys(username)
passwordt = driver.find_element_by_id("password")
passwordt.send_keys(password)
login = driver.find_element_by_id("login")
login.submit()
driver.get("https://www.familysearch.org/search/record/results?q.birthLikeDate.from=1996&q.birthLikeDate.to=1996&f.collectionId=.....")
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "name")))
#for tag in driver.find_elements_by_class_name("name"):
# print(tag.get_attribute('innerHTML'))
for tag in soup.find_all("sr-cell-name"):
print(tag["name"])
Try to access the sr-cell-name tag.
Selenium:
for tag in driver.find_elements_by_tag_name("sr-cell-name"):
print(tag.get_attribute("name"))
BeautifulSoup:
for tag in soup.find_all("sr-cell-name"):
print(tag["name"])
EDIT: You might need to wait for the element to fully appear on the page before parsing it. You can do this using the presence_of_element_located method:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("...")
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "name")))
for tag in driver.find_elements_by_class_name("name"):
print(tag.get_attribute('innerHTML'))
I was looking to do something very similar and have semi-decent python/selenium scraping experience. Long story short, FamilySearch (and many other sites, I'm sure) use some kind of technology (I'm not a JS or web guy) that involves shadow host. The tags are essentially invisible to BS or Selenium.
Solution: pyshadow
https://github.com/sukgu/pyshadow
You may also find this link helpful:
How to handle elements inside Shadow DOM from Selenium
I have now been able to successfully find elements I couldn't before, but am still not all the way where I'm trying to get. Good luck!

Not able to Scrape data using BeautifulSoup

I'm using Selenium to login to the webpage and getting the webpage for scraping
I'm able to get the page.
I have searched the html for a table that I wanted to scrape.
here it is:-
<table cellspacing="0" class=" tablehasmenu table hoverable sensors" id="table_devicesensortable">
This is the script :-
rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,'html.parser') #parsing the webpage
tbody=souppage.find('table', attrs={'id':'table_devicesensortable'}) #scrapping
I'm able to get the parsed webpage in souppage variable.
but not able to scrape and store in tbody variable.
Required table might be generated dynamically, so you need to wait until its presence on page:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
tbody = wait(driver, 10).until(EC.presence_of_element_located((By.ID, "table_devicesensortable")))
Also note that there is no need in using BeautifulSoup as Selenium has enough built-in methods and properties to do the same job for you, e.g.
headers = tbody.find_elements_by_tag_name("th")
rows = tbody.find_elements_by_tag_name("tr")
cells = tbody.find_elements_by_tag_name("td")
cell_values = [cell.text for cell in cells]
etc...
I was searching on stackoverflow for the issue and came across this post
BeautifulSoup returning none when element definitely exists
By reading the answer provided by luiyezheng i got the hint that might be as the data is fetched dynamically.So, the table might got created dynamically and hence i was unable to find.
So, the work around is :-
before storing the webpage i put a delay
so the code goes like this
time.sleep(4)
rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
tbody=souppage.find("table",{"id":"table_devicesensortable"}) #scrapping
i hope it might help someone.
As per the HTML you have shared to scrape the <table> you have induce WebDriverWait with expected_conditions clause set to presence_of_element_located and to achieve that you can use either of the following code blocks :
Using class:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//table[#class=' tablehasmenu table hoverable sensors' and #id='table_devicesensortable']")))
rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
tbody=souppage.find("table",{"class":" tablehasmenu table hoverable sensors"}) #scrapping
Using id:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//table[#class=' tablehasmenu table hoverable sensors' and #id='table_devicesensortable']")))
rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
tbody=souppage.find("table",{"id":"table_devicesensortable"}) #scrapping

Categories

Resources