Python Selenium LinkedIn company webscraping

Python Selenium LinkedIn company webscraping - python

I'm trying to use webscraping (via Python and Selenium) to create a worksheet with companies of interest to my boss. Most of it is working, I just can't seem to get hold of the "Next Page" button. Relative and absolute XPaths, CSS selectors, nothing seems to work, since every time you generate/switch pages they're diferent. (The relative XPath usually is '//*[#id="ember{SOME RANDOM NUMBER}"]') What could I do? There are other buttons with the same relative XPath structure in the page.

The Next page button has the same XPath for all the pages.
It is //button[#aria-label="Next"]
You should locate this element according to the aria-label attribute, not the id attribute value.

You can use class_name function to locate 'Next' element
next_button = wd.find_element_by_class_name('artdeco-pagination__button next').click()

Related

Selenium not finding custom attribute 'data-target-section-id'

I'm trying to scrape some profiles of people in linkedin from a specific job. To do this I was trying to find the people button and click it to specifically look at the relevant people.
The path is as follows:
From signed out Linkedin home -> I sign in and go to LinkedIn home -> I write in the search bar "hr" and hit enter.
In the result page of hr, on the left side of the page, there is a navigation list that says "On this page". One of the options includes "People" and that is what I want to target.
The link to the page is: https://www.linkedin.com/search/results/all/?keywords=hr&origin=GLOBAL_SEARCH_HEADER&sid=Xj2
The HTML of the button for 'People' in the navigation list is:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People
I have tried to find this button through By.Link_text and found the keyword People but did not work. I have also tried to do By.XPATH "//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but it also does not find it.
How can I make selenium find this custom attribute so I can find this button through data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ=="?
Another issue that I am having is that I can target all the relevant people on the page and loop through them but I cannot extract the link of each of the profiles. It only takes the first link of the first person and never updates the variable again through the loop.
For example, if the first person is Ian, and the second is Brian, it gives me the link for Ian's profile even if 'users' is Brian.
Debugging the loop I can see the correct list of people in all_users but it only gets the href of the first person in the list and never updates.
Here is the code of that:
all_users = driver.find_elements(By.XPATH, "//*[contains(#class, 'entity-result__title-line entity-result__title-line--2-lines')]")
for users in all_users:
print(users)
get_links = users.find_element(By.XPATH, "//*[contains(#href, 'miniProfileUrn')]")
print(get_links.get_attribute('href'))

I have also tried to do By.XPATH
"//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")"" but
it also does not find it.
The data-target-section-id that you mention is not the same as the one that the button has (PTFmMNSPSz2LQRzwynhRBQ==). Check that this is not dynamic before targeting it.
Your xPath is not bad but as I told you, fix the target-id:
driver.findElement(By.xpath("//button[#data-target-section-id='PTFmMNSPSz2LQRzwynhRBQ==']")).click()
Where "driver" is your WebDriver instance.

Given the HTML:
<li>
<button aria-current="false" class="search-navigation-panel_button" data-target-section-id="PTFmMNSPSz2LQRzwynhRBQ==" role="link" type="button"> People </button>
</li>
The data-target-section-id attribute values like PTFmMNSPSz2LQRzwynhRBQ== are dynamically generated and is bound to chage sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.
Solution
The desired element being a dynamic element to click on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.search-navigation-panel_button[data-target-section-id]"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#class='search-navigation-panel_button' and #data-target-section-id][contains(., 'People')]"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

If you want to locate several elements with the same attribute replace find_element with find_elements. See if that works to find not just the first element matching your search, but all elements with that attribute.
Review the Selenium: Locating Elements documentation and see if you can try each and every option they have for locating elements.
Something else to try:
button_element = driver.find_element(By.XPATH, "//button[#data-target-section-id='RIK0XK7NRnS21bVSiNaicw==']")
list_element.find_element(By.TAG_NAME, "button").click()

It looks like the reason your People button locator isn't working is because the data-target-section-id is dynamic. Mine is showing as hopW8RkwTN2R9dPgL6Fm/w==. We can get around that by using an XPath to locate the element based on the text contained, "People", e.g.
//button[text()='People']
Turns out that matches two elements on the page because many of the left nav links are repeated as rounded buttons on the top of the page so we can further refine our locator to
//button[text()='People'][#data-target-section-id]
Having said that, that link only scrolls the page so you don't really need to click that.
From there, you want to get the links to each person listed under the People heading. We first need the DIV that contains the People section. It's kinda messy because the IDs on those elements are also dynamic so we need to find the H2 that contains "People" and then work our way back up the DOM to the DIV that contains only that section. We can get that using the XPath below
//div[#class='search-results-container']/div[.//h2[text()='People']]
From there, we want all of the A tags that uniquely link to a person... and there's a lot of A tags in that section but most are not ones we want so we need to do more filtering. I found that the below XPath locates each unique URL in that section.
//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]
Combining the two XPaths, we get
//div[#class='search-results-container']/div[.//h2[text()='People']]//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]
which locates all unique URLs belonging to a person in the People section of the page.
Using this, your code would look like
all_users = driver.find_elements(By.XPATH, "//div[#class='search-results-container']/div[.//h2[text()='People']]//a[contains(#href,'miniProfileUrn')][contains(#class,'scale-down')]")
for user in all_users:
print(user.get_attribute('href'))
NOTE: The reason your code was only returning the first href repeatedly is because you are searching from an existing element with an XPath so you need to add a "." at the start of the XPath to indicate to start searching from the referenced element.
get_links = users.find_element(By.XPATH, ".//*[contains(#href, 'miniProfileUrn')]")
^ add period here
I've eliminated that step in my code so you won't need it there.

How to scrollIntoView() inside a specific dropdown(div) in Python

I am trying to scrape a website that requires me to first fill out certain dropdowns.
However, most of the dropdown selections are hidden and only appear in the DOM tree when I scroll down WITHIN the dropdown. Is there a solution I can use to somehow mimic a scroll wheel, or are there other libraries that could complement Selenium?

There are several ways to scroll an element into view but the most reliable one in Selenium is evaluating javascripts scrollIntoview() function.
For example I use this example for scraping twitch.tv in my blog:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.twitch.tv/directory/game/Art")
# find last item and scroll to it
driver.execute_script("""
let items=document.querySelectorAll('.tw-tower>div');
items[items.length-1].scrollIntoView();
""")
The javascript finds all "items" in the pagination page and scrolls the last one into view. In your case you should use:
driver.execute_script("""
let item = document.querySelector('DROPDOWN CSS SELECTOR');
item.scrollIntoView();
""")
You can read more about it here: https://scrapfly.io/blog/web-scraping-with-selenium-and-python/#advanced-selenium-functions

requests and BeautifulSoup are two libraries in python that can assist with scraping data. They allow you to get the url and make instances within the html language.
In order to inspect a specific part of a website you just need to right click & inspect on the item you want to scrape. This will open all the hidden paths you speak of to that specific tag.

selenium and XPATH unable to locate a button

I am trying to save the same stuff with selenium in my Yandex account, the problem is that when I try to pass the code to click the button "save to Yandex disk", selenium pass me the message unable to locate the element.
Thats my code:
browser.find_element_by_xpath('/html/body/div[1]/div/div[2]/div[1]/div[1]/div/div[1]/div[3]/button[1]').click()
this is the page with the button "Save to yandex disk": https://yadi.sk/d/0ReZErv_cLl1-w
I read that u can also pass elements by name or by CSS selector, but when I try with firefox inspector to copy element, the browser gives me strange code.
Any suggestions?
..of course, the same error with or without logged into Yandex.
Thank you

You can use this XPath for detecting needed element:
//div[#class = 'folder-content content content_other content_dir']//button[contains(#class, "save")]

Use this xpath //*[contains(text(),'Save to Yandex.Disk')] to click "Save to Yandex.Disk" button

First thing is that we need to improve the xpath you are using to find the element we should use the xpath that is relative but in your case you are using the xpath that is absolute, I use the extension chropath in chrome to find the xpath of the element
Below mentioned is the chro path I would recommend you to use although the above two mentioned xpath can also be used
Let me know If you have any more queries, I can form a good xpath for you
//div[#class='folder-content__header']//span[contains(text(),'Save to Yandex.Disk')]
Below mentioned is one of the xpath I have used in my own project take a look maybe this can improve your horizon
//label[contains(text(),'Plant Code*')]//parent::div[#class='rb_Work_FieldContainer']//following-sibling::div[contains(#class,'rb_Work_FieldValueArea rb_Work_FieldValueArea_create ')]//textarea[#class='textarea']

How to find title="xyz" element with Selenium (Python)

I am trying to click an element with Selenium, that has a specific title attribute. I have tired do use an xpath before, however, the problem is that there are two buttons on the website with the same xpath. If one button is active, it has the same xpath as the other when its active and vice versa.
The only thing that differentiates these two buttons in the title attribute.
<a class="qPKfxd" href="SOME LINK" title="List">
Basically I am trying to only click that element if the title is "List".
Has anyone got an idea of how to specify that with Selenium?
Please let me know if you need to view more code.

You can locate an element by attribute.
xpath:
//a[#title="List"]
css_selector
[title="List"]

Pythno Selenium clicking list item within ui

I am working with python and selenium to click on the Photo/Video button on a facebook page. The HTML associated with this seems to have a list item (li) inside a ui. The html is as in the following image. The button circles is the one I am trying to press.
Can anyone please tell me how should I press the Photo/Video button?

Can you try this code?
I used the xPath method and contains() to compare the text in the div.
By the way, the found object does not have the click related function, and the click function seems to be a tag among its parents
The syntax for finding a parent in xPath is /.. and I used this
https://stackoverflow.com/a/3655588/12582501
driver.find_element_by_xpath('//div[contains(text(),"Photo/Video")]/../../../a').click()

Facebook has an intresting thing: testids
With this IDs you can click all of clickable elements on the site
driver.find_element_by_xpath('//div[#data-testid="photo-video-button"]').click()
In this case you can exec your code, when on page will be another element with text "Photo/Video"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Selenium LinkedIn company webscraping - python

The Next page button has the same XPath for all the pages. It is //button[#aria-label="Next"] You should locate this element according to the aria-label attribute, not the id attribute value.

You can use class_name function to locate 'Next' element next_button = wd.find_element_by_class_name('artdeco-pagination__button next').click()

Related

Selenium not finding custom attribute 'data-target-section-id'

How to scrollIntoView() inside a specific dropdown(div) in Python

selenium and XPATH unable to locate a button

How to find title="xyz" element with Selenium (Python)

Pythno Selenium clicking list item within ui

Categories

Resources