Selenium on Twitter for user information: no such elements - python

I try to use Selenium to get specific users' information (e.g., num of followers) by entering their page using ids. The thing is, though I find the needed information in the INSPECT, I cannot position it using Selenium, even with the help of ChroPath, which tells you the Xpath or CssSelector that you can use to position. It keeps saying: No such element...I'm quite confused. I'm not even trying to automatically log in or anything.
here are the codes:
from selenium import webdriver
driver = webdriver.Chrome(executable_path='E:/data mining/chromedriver.exe')
driver.get('https://twitter.com/intent/user?user_id=823730524426477568')
ele = driver.find_element_by_class_name('css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0').text
print(ele)
Error:
Message: no such element: Unable to locate element: {"method":"css selector","selector":".r-poiln3 r-bcqeeo r-qvutc0"}
(Session info: chrome=88.0.4324.104)
It's so strange, because everything is right on the first page, and I don't even need to scroll down to see the information, but it won't be scraped...

To fix the issue with your code, try this:
ele = driver.find_element_by_css_selector('.css-901oao.css-16my406.r-poiln3.r-bcqeeo.r-qvutc0').text
However, because of how the site is formatted, it won't get you the result you want. It appears that not only do they rotate the class name, but there isn't enough variability in how the elements are labelled to make anything but XPath a viable option for getting specific data. (Unless you want to go through a list of all the elements with the same class to find what you need). After some initial site interactions, these XPaths worked for me:
following = driver.find_element_by_xpath('//*[#id="react-root"]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/div[1]/div[2]/div[4]/div[1]/a/span[1]/span').text
followers = driver.find_element_by_xpath('//*[#id="react-root"]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/div[1]/div[2]/div[4]/div[2]/a/span[1]/span').text

Problem is, there are over 100 elements with classes 'css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0' on that page.
To avoid absolute xpath you could use something like this for e.g. number of followers:
Find span with descendant span with text 'Followers'. Above that, on same level (sibling) there is span with child - number of followers for text
ele = driver.find_element_by_xpath("//span[descendant::span[text()='Followers']]/preceding-sibling::span/span").text

The selectors which you are using are very much fragile. chropath gives fragile xpath which is changing in next run so script is failing. You might like to use relative xpath generated by selectorshub which is much robust.

Related

find_element_by_css_selector('a').get_attribute('href') returns NoSuchElementException

I am trying to scrape the target website for product_links. The program should open the required URL in the browser and scrape all the links with a particular class name. But for some reason, I am getting a NoSuchElementException for this piece of code
links = driver.find_elements_by_class_name("styles__StyledTitleLink-mkgs8k-5")
for link in links:
self.driver.implicitly_wait(15)
product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))
I tried printing out the text in each link with link.text in the for loop. The code is actually selecting the required elements. But for some reason is not able to extract the href URL from each link. I am not sure what I am doing wrong.
This is the entire error message
NoSuchElementException: Message: no such element: Unable to locate
element: {"method":"css selector","selector":"a"} (Session info:
chrome=83.0.4103.106)
Error seems there is no css element with 'a' so you need to try with other locators to identify elements. try with xpath=//a[contains(text(),'text of that element')]
You are looking for a class name generated by a builder, check the random string at the end of the class name, these classes won't be found in every web.
if you want to scrape them, find a different generic class or find all classes with a subset string "StyledTitleLink"
Here's how to do it with JQuery
You should try and find a different solution to your problem

how do i access nested html elements using selenium?

i am using a school class schedule website and i want to access the div element that contains info on how many seats are in a class and who is teaching it in order to scrape it. i first find the element which contains the div element i want, after that i try to find the div element i want by using xpaths. the problem i face is when i try to use either the find_element_by_xpath or find_elements_by_xpath to get the div i want i get this error:
'list' object has no attribute 'find_element_by_xpath'
is this error happening because the div element i want to find is nested? is there a way to get nested elements using a div tag?
here is the code i have currently :
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_elements_by_id('INST366')
sixteen_grid = section_container.find_element_by_xpath(".//div[#class = 'sections sixteen colgrid']").text
the info i want is this:
<div class = "sections sixteen colgrid"</div>
its currently inside this id tag:
<div id="INST366" class="course"</div>
greatly appreciated if anyone could help me out with this
From documentation of find_elements_by_id:
Returns : list of WebElement - a list with elements if any was found. An empty list if not
Which means section_container is a list. You can't call find_element_by_xpath on a list but you can on each element within the list because they are WebElement.
What says the documentation about find_element_by_id?
Returns : WebElement - the element if it was found
In this case you can use find_element_by_xpath directly. Which one you should use? Depends on your need, if need to find the first match to keep digging for information or you need to use all the matches.
After fixing that you will encounter a second problem: your information is displayed after executing javascript code when clicking on "Show Sections", so you need to do that before locating what you want. For that go get the a href and click on it.
The new code will look like this:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_element_by_id('INST366')
section_container.find_element_by_xpath(".//a[#class='toggle-sections-link']").click()
sleep(1)
section_info = section_container.find_element_by_xpath(".//div[#class='sections sixteen colgrid']").text
driver.quit()

Python, Selenium: can't find element by xpath when ul list is too long

I'm trying to create a program extracting all persons I follow on Instagram. I'm using Python, Selenium and Chromedriver.
To do so, I first get the number of followed persons and click on the 'following' button : `
nb_abonnements = int(webdriver.find_element_by_xpath('/html/body/span[1]/section[1]/main/div[1]/header/section[1]/ul/li[3]/a/span').text)
sleep(randrange(1,3))
abonnements = webdriver.find_element_by_xpath('/html/body/span[1]/section[1]/main/div[1]/header/section[1]/ul/li[3]/a')
abonnements.click()
I then use the following code to get the followers and scroll the popup page in case I can't find one:
followers_panel = webdriver.find_element_by_xpath('/html/body/div[3]/div/div/div[2]')
while i < nb_abonnements:
try:
print(i)
followed = webdriver.find_element_by_xpath('/html/body/div[3]/div/div/div[2]/ul/div/li[{}]/div/div[2]/div/div/div/a'.format(i+1)).text
#the followeds are in an ul-list
i += 1
followed_list.append(followed)
except NoSuchElementException:
webdriver.execute_script(
"arguments[0].scrollBy(0,400)",followers_panel
)
sleep(7)
The problem is once i is at 12, the program raises the exception and scrolls. From there, he still can't find the next follower and is stuck in a loop where he does nothing but scroll. I've checked the source codeof the IG page, and it turns out the path is still good, but apparently I can't access the elements as I do anymore, probably because the ul-list in which I am accessing them has become to long (line 5 of the program).
I can't work out how to solve this. I hope you will be of some help.
UPDATE: the DOM looks like this:
html
body
span
script
...
div[3]
div
...
div
div
div[2]
ul
div
li
li
li
li
...
li
The ul is the list of the followers.
The lis contain the info i'm trying to extract (username). Even when I go go by myself on the webpage, open the popup window, scroll a little and let everything load, I can't find the element I'm looking for by typing the xpath in the search bar of the DOM manually. Although the path is correct, I can check it by looking at the DOM.
I've tried various webdrivers for selenium, currently I am using chromedriver 2.45.615291. I've also put an explicit wait to wait for the element to show (WebDriverWait(webdriver, 10).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[3]/div/div/div[2]/ul/div/li[{}]/div/div[2]/div/div/div/a'.format(i+1))))), but I just get a timeout exception: selenium.common.exceptions.TimeoutException: Message:.
It just seems like once the ul list is too long (which is from the moment I've scrolled down enough to load new people), I can't access any element of the list by its XPATH, even the elements that were already loaded before I began scrolling.
Instead of using xpath for each of the child element... find the ul-list element then find all the child elements using something like : ul-list element.find_elements_by_tag_name(). Then iterate through each element in the collection & get the required text
I've foud a solution: i just access the element through the XPATH like this: find_element_by_xpath("(//*[#class='FPmhX notranslate _0imsa '])[{}]".format(i)). I don't know why it didn't work the other way, but like this it works just fine.

Unable to click on the Link Text, have tried Find_ELEMENT_BY_LINK_TEXT and quite a few other things , but its not working

SCREENSHOT OF THE HTML
Here is the screenshot of html code with which I am struggling , I want to click on the Smart Watches in the left-Nav and I am using the following code to click on it
driver.implicitly_wait(30)
driver.find_element_by_link_text('Smart Watches').click()
But I am getting the following error and I am clueless why i just cant find it on page
selenium.common.exceptions.NoSuchElementException: Message: no such
element: Unable to locate element: {"method":"link
text","selector":"Smart Watches"} (Session info:
chrome=60.0.3112.113) (Driver info: chromedriver=2.29.461591
(62ebf098771772160f391d75e589dc567915b233),platform=Windows NT
6.2.9200 x86_64)
I have also tried the explicit code and Expected conditions as follows -:
wait = WebDriverWait(driver, 20)
link = wait.until(expected_conditions.presence_of_element_located((By.LINK_TEXT,'"Smart Watches"')))
link.click()
But even its giving me Timeout exception
Here is the link of the page where I am stuck since morning
https://www.kogan.com/au/shop/phones/
I am very new to coding , any help would be helpful !! I just want to know why find_element_by_link_text is not working here , it looks weird to me!!
Thanks in advance
The problem is that when you use find_element_by_link_text(), it must be an exact match to the text contained in the link. In your HTML picture, you can see "Smart Watches" but what you aren't seeing is the SPAN just below but still inside the A is closed. Most likely if you expand it, you will see additional text that you must include if you are going to use find_element_by_link_text().
Another option is find_element_by_partial_link_text() which is more like a contains() instead of equals(). Depending on the page, it may find too many matches. You would have to try and see if it works.
Yet another option is using an XPath. There are a lot of different ways to create an XPath for this depending on exactly what you want.
This is the most general and thus most likely to find unwanted links but it may work. It's pretty much the same as find_element_by_partial_link_text()
//a[contains(.,'Smart Watches')
Other options include
//a[starts-with(.,'Smart Watches')
//li[#data-filter-facet='smart-watches']/a[contains(.,'Smart Watches')
//li[#data-filter-facet='smart-watches']/a[starts-with(.,'Smart Watches')
... and so on...
You can try this way...
Accessing text: find_element_by_xpath["//a[contains(text(), 'Smart Watches')]"].click()
Don't know why it is not wotks but partial link text works. Please see my java code for the same:
WebDriver driver=new FirefoxDriver();
driver.get("https://www.kogan.com/au/shop/phones/");
WebElement watch=driver.findElement(By.partialLinkText("Smart Watch"));
WebDriverWait waitElement=new WebDriverWait(driver, 30);
waitElement.until(ExpectedConditions.elementToBeClickable(watch));
watch.click();
You need to add double quotation marks, as it is in the html code:
driver.find_element_by_link_text('"Smart Watches"').click()
most of the time it might happen that link is taking some more time to load after your page has loaded. rather than using implicit wait use explicit wait.
wait = WebDriverWait(driver, 30)
link = wait.until(expected_conditions.presence_of_element_located((By.LINK_TEXT,"Smart Watches")))
link.click()
it could also be the case that link is inside another frame, in this case you will have to switch to this frame.

selenium get element by css selector

I am trying to get user details from each block as given
driver.get("https://www.facebook.com/public/karim-pathan")
wait = WebDriverWait(driver, 10)
li_link = []
for s in driver.find_elements_by_class_name('clearfix'):
print s
print s.find_element_by_css_selector('_8o._8r.lfloat._ohe').get_attribute('href')
print s.find_element_by_tag_name('img').get_attribute('src')
it says:
unable to find element with css selector
any hint appreciable.
Just a mere guess based on assumption that you are not logged in. You are getting exception cause for all class clearfix, element with ._8o._8r.lfloat._ohe does not exists. So your code isn't reaching the required elements. Anyhow, if you are trying to fetch href and img source of results, you need not iterate over all clearfix cause as suggested by #leo.fcx, your css is incorrect, trying the css provided by leo, you can achieve the desired result as:
driver.get("https://www.facebook.com/public/karim-pathan")
for s in driver.find_elements_by_css_selector('._8o._8r.lfloat._ohe'): // there didn't seemed to iterate over each class of clearfix
print s.get_attribute('href')
print s.find_element_by_tag_name('img').get_attribute('src')
P.S. sorry for any syntax, never explored python binding :)
Since you are using all class-names that the element applies, adding a . to the beginning of your CSS selector should fix it.
Try this:
s.find_element_by_css_selector('._8o._8r.lfloat._ohe')
instead of:
s.find_element_by_css_selector('_8o._8r.lfloat._ohe')
Adding to what #leo.fcx pointed about the selector, wait for search results to become visible:
wait.until(EC.visibility_of_element_located((By.ID, "all_search_results")))

Categories

Resources