Selenium parsing whole document instead of webelement - python

this problem is really driving me crazy! Here's my code:
list_divs = driver.find_elements_by_xpath("//div[#class='myclass']")
print(f'Number of divs found: {len(list_divs)}') #Correct number displayed
for art in list_divs:
mybtn = art.find_elements_by_xpath('//button') #There are 2 buttons in each div
print(f'Number of buttons found = {len(mybtn)}') #Incorrect number (129 instead of 2)
mybtn[2].click() #Wrong button clicked!
The button clicked IS NOT in the art Html but at the very beginning of webpage!!! Seems like Selenium is parsing the whole document instead of webelement art...
I've printed the outerHTML of variable art and it's correct: only the div code which contains 2 buttons!!!! So why the find_elements_by_xpath() function applied to the webelement art is not parsing the div but the whole html page??!!!
Totally incomprehensible for me!

Because you are using mybtn = art.find_elements_by_xpath('//button') where //button ignores your search context since it starts from //. Change it to:
mybtn = art.find_elements_by_xpath('.//button')

I can't post any html code (the page is about 1,000 lines long).
So far, the only way I saw to go through this is to avoid parsing webelements and make parsing of entire webpage for each element I need:
list_divs = driver.find_elements(By.XPATH, "//div[#class='myclass']")
buttons = driver.find_elements(By.XPATH,"//div[#class='myclass']//button")
and then iterate through the lists to access the button I need for each div. Works perfectly like this. I still don't catch how a xpath applied to a given html code can return something that is not inside this html code...
I'll make other tests with other webpages to see if the problem comes from Selenium.
Thanks for help!

Related

How would you click all texts on a page with Xpath - Python

So, this won't be a long description, but I am trying to have xpath click on all of the elements (more specifically text elements) that are on a page. I really don't know where to start, and all of the other questions on clicking everything on a page is based on a class, not a text using xpath.
Here is some of my code:
browser.find_element_by_xpath("//*[text()='sample']").click()
I really don't know how I would go about to make it click all of the "sample" texts throughout the whole page.
Thanks in advance!
Well, let's say that you have lots of Divs or spans that contains text. Let's figure out Divs :
<div class="some class name" visibility ="visible" some other attribute> Text here </div>
Now when you go to developer mode(F12) in elements section and if you do this //div[contains(#class,'some class name')] and if there are more than 1 entry then you can store all of them in a list just like below :
driver.find_elements(By.XPATH, '//div[contains(#class,'some class name')]')
this will give you a list of divs web element.
div_list = driver.find_elements(By.XPATH, '//div[contains(#class,'some class name')]')
Now you have a python list and you can manipulate this list as per your requirement.
for div_text in div_list:
print(div_text.text)
Same way you can try for span or different web elements.
You just need to use that xpath to define an array of elements instead, like this:
my_elements = browser.find_elements_by_xpath("//*[text()='sample']")
for element in my_elements:
element.click();
That loop may not work as is (you could maybe add a wait for element) but that's the idea.

Python selenium crawling

Here is code
driver = webdriver.Chrome()
driver.get('https://tieba.baidu.com/f?kw=比特币&ie=utf-8&tab=good')
driver.find_elements_by_css_selector('a.j_th_tit')[0].click()
a = driver.find_elements_by_css_selector('div.d_post_content.j_d_post_content.clearfix')
for i in a:
print(i.text)
Here is HTML I'm struggling with. There are many texts at the page, but those all have same class; d_post_content j_d_post_content clearfix.
<div id='post_content_52497574149' class='d_post_content j_d_post_content clearfix' style='display:;'> Here is the Text that I need to get; it is written in Chinese and stackoverflow may not permit to writhe Chinese in the body </div>
I want to automatically access to the website and get some texts for my homework assignment. With this code above, I could open the website, click the link, but I cannot access to the text needed. All of the texts needed are in the class, so I tried to access to the Class to get the texts, but it didn't work. When I check the length of the list a, len(a) is zero. Could anyone help me?
This line bring you to a new tab:
driver.find_elements_by_css_selector('a.j_th_tit')[0].click()
So you need switch it first. After perform the above, just add a line:
driver.switch_to.window(driver.window_handles[-1])
When you click the link in this statement:
driver.find_elements_by_css_selector('a.j_th_tit')[0].click()
A new tab is opened. But you are not switching to that tab.
I would recommend adding this statement:
driver.switch_to.window(driver.window_handles[-1])
Before you actually call find_elements_by_css_selector.
It will solve your issue.

How do I find a more descriptive XML Path for my Selenium webscrape?

I'm building a website scraper using Selenium and I want to "click" the highlighted div in the image below.
My current code (which works, but isn't very descriptive) is:
button = driver.find_element_by_xpath("//div/div/div/div/div/div/div/div[5]/div[8]")
button.click()
I'm glad it works, but it feels fragile, since I'm accessing the divs purely by index, without any other identifying features. Is there a way, at least for the last div, that I can specify my choice by the text within span? What would the syntax be for choosing the div that contains a span with the text "Grandmaster"?
It's worth noting that this is the only div in any of the "filter-group"s that contains the text "Grandmaster". Is there a way to select this div specifically, without listing all the nested divs (as I've done in my code above)?
Any other ideas on how to make the XML path's code a bit more robust would be appreciated.
What would the syntax be for choosing the div that contains a span with the text "Grandmaster"?
The syntax would be:
driver.find_element_by_xpath("//*[contains(text(), 'Grandmaster')]")
What would the syntax be for choosing the div that contains a span
with the text "Grandmaster"?
You can use this xPath:
//span[contains(., 'Grandmaster')]/parent::div
more information you can get here.

Xpath clicking not working at all

Quick info: I'm using Mac OS, Python 3.
I have like 800 links that need to be clicked on a page (and many more pages to go so need automation).
They were hidden because you only see those links when you hover over.
I fixed that by injecting CSS rule (just saying in case its the reason it's not working).
When I try to find elements by xpath it does not want to click the links afterwards and it also doesn't find all of them always just 4 (even when more are displayed in view).
HTML:
Display
When i click ok copy xpath in inspect it gives me:
//*[#id="tiles"]/li[3]/div[2]/ul/li[2]/a
But it doesn't work when I use it like this:
driver.find_elements_by_xpath('//*[#id="tiles"]/li[3]/div[2]/ul/li[2]/a')
So two questions:
How do I get them all?
How do I get it to click on each of them?
The pattern in the XPath is the same, with the /li[3] being the only number that changes, for this I created a for loop to create them all based on the count on page which I did successfully.
So if it can be done with the XPaths generated by myself that are corresponding to when I copy XPath in inspector then I only need question 2 answered.
PS.: this is HTML of parent of that first HTML:
<li onclick="openPopup(event, 'collect', {item_id: 165214})" class="collect" data-item-id="165214">Display</li>
This XPath,
//a[.="Display"]
will select all a links with anchor text equal to "Display".
As per your question, the HTML you have shared and your code attempts there is no necessity to get the <li> tags. Instead we will get the <a> tags in a list. So to answer your first question How do I get them all you can use the following line of code :
all_Display = driver.find_elements_by_xpath("//*[#id='tiles']//li/div[2]/ul/li[#class='collect']/a[#title='Display']")
Next to click on each of them you have to create a loop to iterate through all the <a> tag as follows :
all_Display = driver.find_elements_by_xpath("//*[#id='tiles']//li/div[2]/ul/li[#class='collect']/a[#title='Display']")
for each_Display in all_Display :
each_Display.click()
Using an XPath with elements by position is not ideal. Instead use a CSS selector to match the attributes for the targeted elements.
Something like:
all_Display = driver.find_elements_by_css_selector("#tiles li[onclick][data-item-id] a[title]")
You can then click them in a loop if none of them is loading a new page:
for element in all_Display:
element.click()

Should be Simple XPATH?

Using Python and Selenium I'm trying to click a link if it contains text. In this case say 14:10 and this would be the DIV I'm after.
<div class="league_check" id="hr_selection_18359391" onclick="HorseRacingBranchWindow.showEvent(18359391);" title="Odds Available"> <span class="race-status"> <img src="/i/none_v.gif" width="12" height="12" onclick="HorseRacingBranchWindow.toggleSelection(18359391); cancelBubble(event);"> </span>14:10 * </div>
I've been watching the browser move manually. I know the DIV has loaded before my code fires but I can't figure out what the heck it is actually doing.
Looked pretty straightforward. I'm not great at XPATH but I usually manage the basics.
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div*[contains(.,justtime)")
As far as I can see no other link on that page contains the text 14:10 but when I loop through links and print it out it's showing basically every link on that page.
I've tried to narrow it down to that class name and containing the text
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div[contains(.,justtime) and (contains(#class, 'league_check'))]")
Which doesn't return anything at all. Really stumped on this it's making no sense to me at all.
Currently, your XPath didn't make use of justtime python variable. Instead, it references child element <justtime> which doesn't exists within the <div>. Expression of form contains(., nonExistentElement) will always evaulates to True, because nonExistentElement translates to empty string here. This is possibly one of the reason why your initial XPath returned more elements than the expected.
Try to incorporate value from justtime variable into your XPath by using string interpolation, and don't forget to enclose the value with quotes so that it can be properly recognized as XPath literal string :
justtime = "14:10"
links = Driver.find_elements_by_xpath("//div[contains(.,'%s')]" % justtime)
You have need to use wait for element
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID,'someid')))

Categories

Resources