I'm trying to scrape some profiles of people in linkedin from a specific job. To do this I was trying to find the people button and click it to specifically look at the relevant people.
The path is as follows:
From signed out Linkedin home -> I sign in and go to LinkedIn home -> I write in the search bar "hr" and hit enter.
In the result page of hr, on the left side of the page, there is a navigation list that says "On this page". One of the options includes "People" and that is what I want to target.
The link to the page is: https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2
The HTML of the button for 'People' in the navigation list is:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People
I have tried to find this button through By.Link_text and found the keyword People but did not work. I have also tried to do By.XPATH "//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but it also does not find it.
How can I make selenium find this custom attribute so I can find this button through data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="?
Another issue that I am having is that I can target all the relevant people on the page and loop through them but I cannot extract the link of each of the profiles. It only takes the first link of the first person and never updates the variable again through the loop.
For example, if the first person is Ian, and the second is Brian, it gives me the link for Ian's profile even if 'users' is Brian.
Debugging the loop I can see the correct list of people in all_users but it only gets the href of the first person in the list and never updates.
Here is the code of that:
all_users = driver.find_elements(By.XPATH, "//*[contains(#class, 'entity-result__title-line entity-result__title-line--2-lines')]")
for users in all_users:
print(users)
get_links = users.find_element(By.XPATH, "//*[contains(#href, 'miniProfileUrn')]")
print(get_links.get_attribute('href'))
I have also tried to do By.XPATH
"//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but
it also does not find it.
The data-target-section-id that you mention is not the same as the one that the button has (PTFmMNSPSz2LQRzwynhRBQ==). Check that this is not dynamic before targeting it.
Your xPath is not bad but as I told you, fix the target-id:
driver.findElement(By.xpath("//button[#data-target-section-id='PTFmMNSPSz2LQRzwynhRBQ==']")).click()
Where "driver" is your WebDriver instance.
Given the HTML:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People </button>
</li>
The data-target-section-id attribute values like PTFmMNSPSz2LQRzwynhRBQ== are dynamically generated and is bound to chage sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.
Solution
The desired element being a dynamic element to click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.search-navigation-panel_button[data-target-section-id]"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#class='search-navigation-panel_button' and #data-target-section-id][contains(., 'People')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
If you want to locate several elements with the same attribute replace find_element with find_elements. See if that works to find not just the first element matching your search, but all elements with that attribute.
Review the Selenium: Locating Elements documentation and see if you can try each and every option they have for locating elements.
Something else to try:
button_element = driver.find_element(By.XPATH, "//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")
list_element.find_element(By.TAG_NAME, "button").click()
It looks like the reason your People button locator isn't working is because the data-target-section-id is dynamic. Mine is showing as hopW8RkwTN2R9dPgL6Fm/w==. We can get around that by using an XPath to locate the element based on the text contained, "People", e.g.
//button[text()='People']
Turns out that matches two elements on the page because many of the left nav links are repeated as rounded buttons on the top of the page so we can further refine our locator to
//button[text()='People'][#data-target-section-id]
Having said that, that link only scrolls the page so you don't really need to click that.
From there, you want to get the links to each person listed under the People heading. We first need the DIV that contains the People section. It's kinda messy because the IDs on those elements are also dynamic so we need to find the H2 that contains "People" and then work our way back up the DOM to the DIV that contains only that section. We can get that using the XPath below
//div[#class='search-results-container']/div[.//h2[text()='People']]
From there, we want all of the A tags that uniquely link to a person... and there's a lot of A tags in that section but most are not ones we want so we need to do more filtering. I found that the below XPath locates each unique URL in that section.
//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]
Combining the two XPaths, we get
//div[#class='search-results-container']/div[.//h2[text()='People']]//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]
which locates all unique URLs belonging to a person in the People section of the page.
Using this, your code would look like
all_users = driver.find_elements(By.XPATH, "//div[#class='search-results-container']/div[.//h2[text()='People']]//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]")
for user in all_users:
print(user.get_attribute('href'))
NOTE: The reason your code was only returning the first href repeatedly is because you are searching from an existing element with an XPath so you need to add a "." at the start of the XPath to indicate to start searching from the referenced element.
get_links = users.find_element(By.XPATH, ".//*[contains(#href, 'miniProfileUrn')]")
^ add period here
I've eliminated that step in my code so you won't need it there.
Related
I have a list of domains that I would like to loop over and screenshot using selenium. However, the cookie consent column means the full page is not viewable. Most of them have different consent buttons - what is the best way of accepting these? Or is there another method that could achieve the same results?
urls for reference: docjournals.com, elcomercio.com, maxim.com, wattpad.com, history10.com
You'll need to click accept individually for every website.
You can do that, using
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, "your_XPATH_locator").click()
To get around the XPATH selectors varying from page to page you can use
driver.current_url and use the url to figure out which selector you need to use.
Or alternatively if you iterate over them anyways you can do it like this:
page_1 = {
'url' : 'docjournals.com'
'selector' : 'example_selector_1'
}
page_2 = {
'url' = 'elcomercio.com'
'selector' : 'example_selector_2'
}
pages = [page_1, page_2]
for page in pages:
driver.get(page.url)
driver.find_element(By.XPATH, page.selector).click()
From the snapshot
as you can observe diffeent urls have different consent buttons, they may vary with respect to:
innerText
tag
attributes
implementation (iframe / shadowRoot)
Conclusion
There can't be a generic solution to accept/deny the cookie concent as at times:
You may need to induce WebDriverWait for the element_to_be_clickable() and click on the concent.
You may need to switch to an iframe. See: Unable to locate cookie acceptance window within iframe using Python Selenium
You may need to traverse within a shadowRoot. See: How to get past a cookie agreement page using Python and Selenium?
I am trying to scrape a website with product listings that if clicked on redirect the user to a new tab with further information/contact the seller details. I am trying to retrieve said URL without actually having to click on each listing in the catalog and wait for the page to load as this would take a lot of time.
I have searched in web inspector for the "href" but the only link available is to the image source of each listing. However, I noticed that after clicking each element, a GET request method gets sent and this is the URL (https://api.wallapop.com/api/v3/items/v6g2v4y045ze?language=es) it contains pretty much all the information I need, I'm not sure if it's of any use, but its the furthest I've gotten.
UPDATE: I tried the code I was suggested (with modifications to specifically find the 'href' attributes in the clickable elements), but I get None returning. I have been looking into finding an 'onclick' element or something similar that might have what I'm looking for but so far it looks like the solution will end up being clicking each element and extracting all the information from there.
elements123 = driver.find_elements(By.XPATH, '//a[contains(#class,"ItemCardList__item")]')
for e in elements123:
print(e.get_attribute('href'))
I appreciate any insights, thank you in advance.
You need something like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://google.com")
# Get all the elements available with tag name 'a'
elements = driver.find_elements(By.TAG_NAME, 'a')
for e in elements:
print(e.get_attribute('href'))
I am trying to access a link that is only available when I select a filter button element.
Filter Button
Desired Link
I have tried to access the element using CSS Selector, since the link text contains "include-out-of-stock".
driver.get("https://www.target.com/c/young-adult/-/N-qh1tf?Nao=0")
#Selects the filter button
link = driver.find_element(By.ID, "filterButton")
link.click()
#The code that is given me issues. It doesn't find the desired link even though it's in the html inspector
element = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[href*='include-out-of-stock']")))
element.click()
However, the element is seemingly unfound as I encounter a TimeoutException. I did play around to see if xpath would work, but I still meet the same issues. Is the element not interactable since it's not directly on the webpage? Could I just not be accessing the element right?
You have incorrect identifier as it has no attribute id with filtersButton.
A simple xpath for that button would be
//button[#data-test='filtersButton']
In a previous question
a user provides the following solution to the problem.
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='title
login_title' and text()='Login']//following::div[1]//input[#class='text header_login_text_box
ignore_interaction']"))).send_keys("someemail#email.com")
However, when I go into my chrome inspect element, I get the following XPATH by going copy>XPATH, which when added like the following, no longer works. It also doesn't give an error, just no email is typed into the box.
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='__w2_wHsckeIm21_email']"))).send_keys("someemail#email.com")
What's the difference between the two? Why does only the first one work and how do I obtain this long working version of xpath.
Well, not a concrete solution as such but do try out ChroPath Plugin.
Also available on Chrome Web Store.
https://autonomiq.io/chropath/
First of all you don't need such a long xpath to locate the email element. Simply you can use
//form[#class='inline_login_form']//input[#name='email']
And I don't recommend using id to identify in this case as it's dynamic (meaning the id will change each time you navigate to this page). So it's not a good idea to use the id to locate the element.
There are multiple ways to write locator for this element like
//form[#class='inline_login_form']//input[#name='email']
//input[#name='email'][#class='text header_login_text_box ignore_interaction']
//input[#name='email'][starts-with(#class,'text header_login_text_box')]
I don't want to keep on giving all the possible options. The idea I chosen the //form[#class='inline_login_form']//input[#name='email'] is, it's clear that I am locating the input element with name email under form. If you try to locate the element with only //input[#name='email'], then there are 2 elements and Selenium will pick the first element (which we don't want this case) and it's not intractable.
If you want to learn more about xpath and how to develop the correct xpath for your target element refer to this post
I suspect the id is not a stable selector for Quora.
When I try to repeat your steps today I find the XPath is slightly different, because the ID of the input field is different.
Today: //*[#id="__w2_wtEXFdHr21_email"]
In your example: //*[#id='__w2_wHsckeIm21_email']
XPath is loosely speaking a description of how you navigate the DOM to get to the element(s) of interest. There are many ways to get to a particular element. Chrome's dev tools will give you one way (or two if you count "Copy full XPath").
The question you linked has several answers that suggest different XPath expressions, and also CSS selectors. The ones looking for an input with name = email will find more than one element, where the input you're looking for is not the first.
I'm using Python / Selenium to submit a form then I have the web driver waiting for the next page to load by using an expected condition using class id.
My problem is that there are two pages that can be displayed but they do not share an unique element (that I can find) that is not in the original page. One page has a unique class is of mobile_txt_holder and the other possible page has a class id of notfoundcopy. I would like to use a wait that is looking for mobile_txt_holder OR notfoundcopy to appear.
Is it possible to combine two expected conditions into one wait?
Basic idea of what I am looking for but obviously won't work:
WebDriverWait(driver, 30).until(EC.presence_of_element_located(
(By.CLASS_NAME, "mobile_txt_holder")))
or .until(EC.presence_of_element_located((By.CLASS_NAME, "notfoundcopy")))
I really just need to program to wait until the next page loads so that I can parse the source.
Sample HTML:
<p class="notfoundcopy">Unfortunately, the number you entered is not in our tracking system.</p>
Apart from clubbing up 2 expected_conditions through or clause, we can easily construct a CSS to take care of our requirement The following CSS will look either for the EC either in mobile_txt_holder class or in notfoundcopy class:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".mobile_txt_holder, .notfoundcopy"))
You can find a detailed discussion in selenium two xpath tests in one