I linked two pictures below. Looking within both a tag, I want to extract only the 'quick apply' job postings which are defined with the target='self' compared to the external apply which is defined by target='_blank'. I want to put a conditional to exlcude all the _blank profiles. I'm confused but I assume it would follow some logic like:
quick_apply = driver.find_element(By.XPATH, "//a[#data-automation='job-detail-apply']")
internal = driver.find_element(By.XPATH, "//a[#target='_self']")
if internal in quick_apply:
quick_apply.click()
else:
pass
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
driver_service = Service(executable_path="C:\Program Files (x86)\chromedriver.exe")
driver = webdriver.Chrome(service=driver_service)
driver.maximize_window() # load web driver
wait = WebDriverWait(driver, 5)
driver.get('https://www.seek.com.au/data-jobs-in-information-communication-technology/in-All-Perth-WA')
looking_job = [x.get_attribute('href') for x in driver.find_elements(By.XPATH, "//a[#data-automation='jobTitle']")]
for job in looking_job:
driver.get(job)
quick_apply = driver.find_element(By.XPATH, "//a[#data-automation='job-detail-apply']").click()
You can merged two conditions in single xpath.
1.Use WebDriverWait() and wait for element to be clickable.
2.Use try..except block to check if element there then click.
3.There are pages where you found two similar elements, where last element is clickable, that's why you need last() option to identify the element.
code.
driver.get('https://www.seek.com.au/data-jobs-in-information-communication-technology/in-All-Perth-WA')
looking_job = [x.get_attribute('href') for x in driver.find_elements(By.XPATH, "//a[#data-automation='jobTitle']")]
for job in looking_job:
driver.get(job)
try:
quick_apply = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"(//a[#data-automation='job-detail-apply' and #target='_self'])[last()]")))
quick_apply.click()
except:
print("No records found")
pass
Related
I'm beginner on programming. I trying get my Instagram follower list but i have just 12 follower. I tried firstly click to box and scroll down but it didn't work.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
url= "https://www.instagram.com/"
driver.get(url)
time.sleep(1)
kullaniciAdiGir = driver.find_element(By.XPATH, "//*[#id='loginForm']/div/div[1]/div/label/input"")
kullaniciAdiGir.send_keys("USERNAME")
sifreGir = driver.find_element(By.XPATH, "//input[#name='password']")
sifreGir.send_keys("PASS")
girisButonu = driver.find_element(By.XPATH, "//*[#id='loginForm']/div/div[3]/button/div").click()
time.sleep(5)
driver.get(url="https://www.instagram.com/USERNAME/")
time.sleep(3)
kutucuk= driver.get(url="https://www.instagram.com/USERNAME/followers/")
time.sleep(5)
box =driver.find_element(By.XPATH, "//div[#class='xs83m0k xl56j7k x1iy3rx x1n2onr6 x1sy10c2 x1h5jrl4 xieb3on xmn8rco x1hfn5x7 x13wlyjk x1v7wizp x1l0w46t xa3vuyk xw8ag78']")
box.click()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
takipciler = driver.find_elements(By.CSS_SELECTOR, "._ab8y._ab94._ab97._ab9f._ab9k._ab9p._abcm")
for takipci in takipciler:
print(takipci.text)
time.sleep(10)
How can i fix it? How can scroll down in box? Thanks
You can select multiple elements with this.
#get all followers
followers = driver.find_elements(By.CSS_SELECTOR, "._ab8y._ab94._ab97._ab9f._ab9k._ab9p._abcm")
# loop each follower
for user in followers:
#do something here.
Using css selectors, in my opinion, is much easier.
Also note I used find_elemets not find_element, as the latter only returns a single result.
As the data is loaded dynamically, you will, infact have to scroll to make the site load more results. Then compare what you have agaisnt whats loaded. Probably execute some javascrtipt like scroll into view on the last element in that container etc.
or take a look here for an alternative solution
https://www.folkstalk.com/2022/10/how-to-get-a-list-of-followers-on-instagram-python-with-code-examples.html
or look at instragrams API, probably something in there for getting your followers.
So this question has been asked before but I am still struggling to get it working.
The webpage has a table with links, I want to iterate through clicking each of the links.
So this is my code so far
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "table-scroll")))
table = element.find_elements_by_xpath("//table//tbody/tr")
for row in table[1:]:
print(row.get_attribute('innerHTML'))
# link.click()
finally:
driver.close()
Sample of output
<td>FOUR</td>
<td>4imprint Group plc</td>
<td>Media & Publishing</td>
<td>888</td>
<td>888 Holdings</td>
<td>Hotels & Entertainment Services</td>
<td>ASL</td>
<td>Aberforth Smaller Companies Trust</td>
<td>Collective Investments</td>
How do a click the href and iterate to the next href?
Many thanks.
edit
I went with this solution (a few small tweaks on Prophet's solution)
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
try:
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(2)
except StaleElementReferenceException:
pass
To perform what you want to do here you first need to close cookies banner on the bottom of the page.
Then you can iterate over the links in the table.
Since by clicking on each link you are opening a new page, after scaring the data there you will have to get back to the main page and get the next link. You can not just get all the links into some list and then iterate over that list since by navigating to another web page all the existing elements grabbed by Selenium on the initial page become Stale.
Your code can be something like this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome(executable_path=r'C:\Users\my_path\chromedriver_96.exe')
driver.get(r"https://www.fidelity.co.uk/shares/ftse-350/")
actions = ActionChains(driver)
#close the cookies banner
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "ensCloseBanner"))).click()
#wait for the first link in the table
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
#extra wait to make all the links loaded
time.sleep(1)
#get the total links amount
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
for index, val in enumerate(links):
#get the links again after getting back to the initial page in the loop
links = driver.find_elements_by_xpath('//table//tbody/tr/td/a')
#scroll to the n-th link, it may be out of the initially visible area
actions.move_to_element(links[index]).perform()
links[index].click()
#scrape the data on the new page and get back with the following command
driver.execute_script("window.history.go(-1)") #you can alternatevely use this as well: driver.back()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table//tbody/tr/td/a")))
time.sleep(1)
You basically have to do the following:
Click on the cookies button if available
Get all the links on the page.
Iterate over the list of links and then click on the first (by first scrolling to the web element and doing that for the list item) and then navigate back to the original screen.
Code:
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
wait = WebDriverWait(driver, 30)
driver.get("https://www.fidelity.co.uk/shares/ftse-350/")
try:
wait.until(EC.element_to_be_clickable((By.ID, "ensCloseBanner"))).click()
print('Click on the cookies button')
except:
print('Could not click on the cookies button')
pass
driver.execute_script("window.scrollTo(0, 750)")
try:
all_links = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table//tbody/tr/td/a")))
print("We have got to deal with", len(all_links), 'links')
j = 0
for link in range(len(all_links)):
links = wait.until(EC.presence_of_all_elements_located((By.XPATH, f"//table//tbody/tr/td/a")))
driver.execute_script("arguments[0].scrollIntoView(true);", links[j])
time.sleep(1)
links[j].click()
# here write the code to scrape something once the click is performed
time.sleep(1)
driver.execute_script("window.history.go(-1)")
j = j + 1
print(j)
except:
print('Bot Could not exceute all the links properly')
pass
Import:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS to handle stale element reference you'd have to define the list of web elements again inside the loop.
I am using selenium to try to scrape data from a website (https://www.mergentarchives.com/), and I am attempting to get the innerText from this element:
<div class="x-paging-info" id="ext-gen200">Displaying reports 1 - 15 of 15</div>
This is my code so far:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.maximize_window()
search_url = 'https://www.mergentarchives.com/search.php'
driver.get(search_url)
assert 'Mergent' in driver.title
company_name_input = '//*[#id="ext-comp-1009"]'
search_button = '//*[#id="ext-gen287"]'
driver.implicitly_wait(10)
driver.find_element_by_xpath(company_name_input).send_keys('3com corp')
driver.find_element_by_xpath(search_button).click()
driver.implicitly_wait(20)
print(driver.find_element_by_css_selector('#ext-gen200').text)
basically I am just filling out a search form, which works, and its taking me to a search results page, where the number of results is listed in a div element. When I attempt to print the text of this element, I simply get a blank space, there is nothing written and no error.
[Finished in 21.1s]
What am I doing wrong?
I think you may need explicit Wait :
wait = WebDriverWait(driver, 10)
info = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[#class = 'x-paging-info' and #id='ext-gen200']"))).get_attribute('innerHTML')
print(info)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You may need to put a condition by verifying if search results loaded or not and once its loaded you can use below code
print(driver.find_element_by_id('ext-gen200').text)
My code looks like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
PATH = "D:\CDriver\chromedriver.exe"
driver = webdriver.Chrome(PATH)
website = "https://jobs.siemens.com/jobs?page=1"
driver.get(website)
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "_ngcontent-wfx-c163="""))
)
print(element.text)
except:
driver.quit()
driver.quit()
Im trying to get the 6 numbers inside span _ngcontent-wfx-c163="">215022</span but cant seem to get it working, many others have had problems using span, but they have had a class inside it, mine doesnt.
How can I print the insides of the span tag that I have bolded?
If you are looking for req.ID to extract you can use the below CSS_SELECTOR :
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "p.req-id.ng-star-inserted>span"))
Note that there are 10 spans for req ID. you may use find_elements instead of find_element or probably EC.presence_of_all_elements_located which will give you list object. you can manipulate list as per your requirement.
read more about their difference here
I've created a python script together with selenium to parse a specific content from a webpage. I can get this result AARONS INC located under QUOTE in many different ways but the way I wish to scrape that is by using pseudo selector which unfortunately selenium doesn't support. The commented out line within the script below represents that selenium doesn't support pseudo selector.
However, when I use pseudo selector within driver.execute_script() then I can parse it flawlessly. To make this work I had to use hardcoded delay for the element to be avilable. Now, I wish to do the same wrapping this driver.execute_script() within Explicit Wait condition.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
driver.get("https://www.nyse.com/quote/XNYS:AAN")
time.sleep(15)
# item = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span:contains('AARONS')")))
item = driver.execute_script('''return $('span:contains("AARONS")')[0];''')
print(item.text)
How can I wrap driver.execute_script() within Explicit Wait condition?
This is one of the ways you can achieve that. Give it a shot.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 10)
driver.get('https://www.nyse.com/quote/XNYS:AAN')
item = wait.until(
lambda driver: driver.execute_script('''return $('span:contains("AARONS")')[0];''')
)
print(item.text)
You could do the while thing in the browser script which is probably safer:
item = driver.execute_async_script("""
var span, interval = setInterval(() => {
if(span = $('span:contains("AARONS")')[0]){
clearInterval(interval)
arguments[0](span)
}
}, 1000)
""")
Here is the simple approach.
url = 'https://www.nyse.com/quote/XNYS:AAN'
driver.get(url)
# wait for the elment to be presented
ele = WebDriverWait(driver, 30).until(lambda driver: driver.execute_script('''return $('span:contains("AARONS")')[0];'''))
# print the text of the element
print (ele.text)