Unable to get all children (dynamic loading) selenium python - python

This question has already been answered and one of the easiest ways is to get the tag name, if already known, within the element
child_elements = element.find_elements_by_tag_name("<tag name>")
However, for the following element pasted, only 9 out of 25 instances of the tag name is returned. I am novice in JavaScript and thus, I am not able to zero down on the reason. In this example, I am trying to get the dt tag within the ol element. The code snippet I am using for that is,
par_element = browser.find_element_by_class_name('search-results__result-list')
child_elements = par_element.find_elements_by_tag_name("dt")
The element skeleton/structure from the page source is shown in the image below:
(the structure is the same for all the div tags, as one is expanded to show for example.
I have also tried getting the class name result-lockup__name directly, and it still returns only 9 out of the 25 instances. What could be the reason?
EDIT
Initially,all the elements were not loaded, and thus I had to scroll through the page by
browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')
When the problem occurred once again, and I was not able to figure out, I raised this question. Apparently, it looks like even the scroll is not helping, as certain elements look hidden
After manually scrolling through them again, keeping the code in pause, I was able to "enable" them.
Is this a type of mask to save sites from being scraped? I feel now that I would probably have to scroll up in increments to reveal them all, but is there a smarter way?

The elements are loading dynamically and you need to scroll the page slowly to get all the child elements.Try the below code hopefully it will work.This is just an workaround.
element_list=[]
while True:
browser.find_element_by_tag_name("body").send_keys(Keys.DOWN)
time.sleep(2)
listlen_before=len(element_list)
par_element = browser.find_element_by_class_name('search-results__result-list')
child_elements = par_element.find_elements_by_tag_name("dt")
for ele in child_elements:
if ele.text in element_list:
continue
else:
element_list.append(ele.text)
listlen_after = len(element_list)
if listlen_before==listlen_after:
break

Related

Selenium - How can I click the next item in a list with a For loop?

I'm very new to programming so apologies in advance if I'm not communicating my issue clearly.
Essentially, using Selenium I have created a list of elements on a webpage by finding all the elements with the same class name I'm looking for.
In this case, I'm finding songs, which have the html class 'item-song' on this website.
On the website, there are lots of clickable options for each listed song . I just want to click the title of the song, which opens a popup modal window in which I edit the note attached to the song, then click save, which closes the popup.
I have successfully been able to do that by using what I guess would be called the title’s XPATH 'relative' to the song class.
songs = driver.find_elements(By.CLASS_NAME, "item-song")
songs[0].find_element(By.XPATH, "div[5]/a").click()
# other code that ends by closing popup
This works, hooray! It also works for any other list index that I put in that line of code.
However, it does not work sequentially, or in a for loop.
i.e.
songs[0].find_element(By.XPATH, "div[5]/a").click()
# other code
time.sleep(5) # to ensure the popup has finished closing
songs[1].find_element(By.XPATH, "div[5]/a").click()
Does not work.
for song in songs:
song.find_element(By.XPATH, "div[5]/a").click()
# other code
time.sleep(5)
continue
Also does not work.
I get a traceback error:
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
After going back to the original page, the song does now say note(1) so I suppose the site has changed slightly. But as far as I can tell, the 'songs' list object and the xpath for the title of the next song should be exactly the same. To verify this, I even tried:
for song in songs:
print(song)
print(songs)
print()
song.find_element(By.XPATH, "div[5]/a").click()
# other code
Sure enough, on the first iteration, print(song) matched the first index of print(songs) and on the second iteration, print(song) matches the second index of print(songs). And print(songs) is identical both times. (Only prints twice as the error happens halfway through the second iteration)
Any help is greatly appreciated, I'm stumped!
---------------------------------
Edit: Of course, it would be easier if my songs list could be all the song titles instead of the class ‘item-song’, that was what I was trying first. However I couldn’t find anything common between the titles in the HTML that would let me use find_elements to just get the song title element, as each song has a different title, and there are also other items like videos listed in between each song.
Through the comments, the solution is to use an iterative loop and an xpath.
songs = driver.find_elements(By.CLASS_NAME, "item-song")
for i in range(songs.count):
driver.find_element(By.XPATH, "(//*[#class='item-song'][" + i + "])/div[5]/a").click()
Breaking this down:
this: By.XPATH, "//*[#class='item-song']" is the same as this: By.CLASS_NAME, "item-song". The former is the xpath equivalent of the latter. I did this so we can build a single identification string to the link instead of trying to find elements within elements.
The [" + i + "] is the iteration for the the loop. If you were to print this you'd see (//*[#class='item-song'][1])") then (//*[#class='item-song'][2])"). That [x] is the ordinal identifier - it means the xth instance of the element in the DOM. The brackets around it ensure the entire thing is matched for the next part - you can sometimes get unexpected matches without it.
The last part /div[5]/a is just the original solution. Doing div[5] isn't great. Your link must ALWAYS be inside the 5th div else it will fail - but as i can't see your application I can't comment on another way.
The original approach throws a StaleElementReferenceException because of the way Selenium stores identified elements.
Once you've identified an element by doing driver.find_elements(By.CLASS_NAME, "item-song") Selenium essentially captures a reference to it - it doesn't store the identifier you used. Stick a break point and after you identify an element and you'll see something like this:
That image is from visual studio as I have it hand but you can see it's a GUID on the ID.
Once you change the page that reference is lost.
Repeat the same steps, identify the same object and the ID is unique every time. This is same break point, same element on a second test run:
Page has changed == Selenium can no longer find it == Stale element.
The solution in this answer works because we're not storing an element.
Every action in the loop freshly identifies the element.
..Then add some clever pun about fresh vs stale... ;-)

Python Selenium Error - StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

Today I'm having troubles due to a "a href" button that does not have any ID to be identified, then let's explain a little bit more about the problem... I have an structure like this one(let's assume XXX is an anonymous path):
wait = WebDriverWait(driver, 5)
el=wait.until(EC.presence_of_element_located((By.ID, 'XXX1')))
entries = el.find_elements_by_tag_name('tr')
for i in range(len(entries)):
if(entries[i].find_element_by_xpath(XXX2).text==compare):
el = wait.until(EC.element_to_be_clickable((By.ID,XXX3)))
el.click()
el=wait.until(EC.presence_of_element_located((By.ID, XXX4)))
entries2 = el.find_elements_by_tag_name('tr')
for j in range(len(entries2)):
#Some statements...
xpath = ".../a"
your_element=WebDriverWait(driver,10)\
.until(EC.element_to_be_clickable((By.XPATH,xpath)))##Here the problem
your_element.click()
Then, I'm getting information from an hibrid page (dynamic and static) using as driver a ChromeDriver one, once I get a big table, inside every row there is a button that shows more info, then I need to click it for open it too, the main problem is when this operation iterates, that error is shown by the output. This driver is a ChromeDriver one. In summary, first I search something and click on search button, then I get a table where every row(at the end in the last column) has a button that once is open, shows more information, consequently I need to open it and close it due to the next row, it works with the first row, but with the second one, it crashes. I would really appreciate any advice of how to handle this problem.
Thanks in advance!
The problem is that you change with the click the dom within your loop. For mr this never worked.
One solution is, to try to re-query within the loop to make sure your at the correct position. Your third line:
entries = el.find_elements_by_tag_name('tr')
should be executed every time and with a counter make sure you are at the correct position of your <tr> entries.

How to select, copy and paste everything within an element using selenium webdriver (python)

Here is basically what I'm trying to do. I have 2 websites, website A contains the data I need to move over to website B. In essence
I'm migrating data from website A to B as website A is going down
soon.
What I need to move is not just text, it can be text, images or hyperlinked text, aswell as there is some format things that I need to keep. I think It is simplest to copy and past rather then store all of this data in a way that would allow me to insert it to website B the exact same way as if it was copy and pasted. Before me making a code solution they where just literally copying and pasting everything from A to B. Right now I have everything implemented(getting of links and everything else required) in my code but I cant move the data over. So basically here's what I'm doing right before I try to copy and paste the data. I am using python 3.
original_window = driver.current_window_handle
driver.execute_script("window.open()")
wait.until(EC.number_of_windows_to_be(2))
driver.switch_to.window(driver.window_handles[1])
actURL = a.getlink()
driver.get(actURL)
e = a.getactivitydata(driver)
driver.close()
driver.switch_to.window(driver.window_handles[0])
Here A is a custom object and has the method get link which returns the link to website A that i need the data from. A also contains the method getactivitydata, which is where I want to select, copy and return the driver. The methods code is
def getactivitydata(self, driver):
r = driver.page_source
soup = BeautifulSoup(r, 'html.parser') # Raw html obj
ty = self.typef
if ty == 'page':
elem = driver.find_element_by_id("page-content")
end = driver.find_element_by_class_name('course-nav')
a = ActionChains(driver)
#elem.send_keys("bar")
a.move_to_element(elem)
a.click_and_hold().perform()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
a.move_to_element(end)
a.key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
#elem.send_keys(Keys.CONTROL, 'a') # Select all
#elem.send_keys(Keys.CONTROL, 'c') # Copy
return(elem)
elif ty == 'quiz':
pass
elif ty == 'assign':
pass
elif ty == 'folder':
pass
elif ty == 'glossary':
pass
elif ty == 'resource':
pass
elif ty == 'forum':
pass
ty represents the type of page as each page will need to be handled slightly differently. What I want this to do is basically select all of the text and images that is inside of the HTML with the element id 'page-content'. When ran the code(with plenty of other code that works) I get the following exception
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
This exception is being raised on the line where I try to actually copy the data I need.
While the element that actually contains the text is <h1, <h2... ect.(its closed >, this is my first time posting and I cant figure out how to get stack overflow to display it with it closed) Images are contained by <img, If I was to somehow run a loop through all of thease and highlight them all then copy and paste, how would I do that preserving the order the images show up within the text and how would I get the amount of <hn as it would be different for every page. I have tried a few different elements/ methods to try to select and copy the text and so far I have not been able to successfully highlight any text(That I can visually see, I use chrome webdriver 89).
The smallest HTML tag they all belong to is <div class ="row" and there are multiple rows on even a basic single box of text/images.
Any help or guidance is welcomed, I'm not against using another method other then copy/paste but I do need it to output on website B as if it had been copy/pasted. as well there are types of pages that have multiple separate elements, (Like an online quiz, where you have question 1, a)... , b...) ... question 2) ect...) Thank you!
So just to update anyone down the line trying to do the same thing. clicking a dragging did not work, what did was
a = ActionChains(driver)
#elem.send_keys("bar")
elem = driver.find_element(By.ID, "maincontent")
#wait = WebDriverWait(driver, 10)
#first = wait.until(EC.element_to_be_clickable(elem))
a.move_to_element_with_offset(elem, 0, 0)
a.key_down(Keys.SHIFT)
a.double_click(elem).double_click(elem)
end = driver.find_element_by_xpath('/html/body/div[1]/div[3]/div[3]')
a.move_to_element(end).double_click(end)
a.key_up(Keys.SHIFT)
a.key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
basically here what Im doing is holding shift and quad clicking(I think 3 would do technically) at the start of the element and at another element that's always below it, I tried to implement and offset but could not get it to work

Get a page with Selenium but wait for unknown element value to not be empty

Context
This is a repost of Get a page with Selenium but wait for element value to not be empty, which was Closed without any validity so far as I can tell.
The linked answers in the closure reasoning both rely on knowing what the expected text value will be. In each answer, it explicitly shows the expected text hardcoded into the WebDriverWait call. Furthermore, neither of the linked answers even remotely touch upon the final part of my question:
[whether the expected conditions] come before or after the page Get
"Duplicate" Questions
How to extract data from the following html?
Assert if text within an element contains specific partial text
Original Question
I'm grabbing a web page using Selenium, but I need to wait for a certain value to load. I don't know what the value will be, only what element it will be present in.
It seems that using the expected condition text_to_be_present_in_element_value or text_to_be_present_in_element is the most likely way forward, but I'm having difficulty finding any actual documentation on how to use these and I don't know if they come before or after the page Get:
webdriver.get(url)
Rephrase
How do I get a page using Selenium but wait for an unknown text value to populate an element's text or value before continuing?
I'm sure that my answer is not the best one but, here is a part of my own code, which helped me with similar to your question.
In my case I had trouble with loading time of the DOM. Sometimes it took 5 sec sometimes 1 sec and so on.
url = 'www.somesite.com'
browser.get(url)
Because in my case browser.implicitly_wait(7) was not enought. I made a simple for loop to check if the content is loaded.
some code...
for try_html in range(7):
""" Make 7 tries to check if the element is loaded """
browser.implicitly_wait(7)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
raw_data = soup.find_all('script', type='application/ld+json')
"""if SKU in not found in the html page we skip
for another loop, else we break the
tryes and scrape the page"""
if 'sku' not in html:
continue
else:
scrape(raw_data)
break
It's not perfect, but you can try it.

Python crawler not finding specific Xpath

I asked my previous question here:
Xpath pulling number in table but nothing after next span
This worked and i managed to see the number i wanted in a firefox plugin called xpath checker. the results show below.
so I know i can find this number with this xpath, but when trying to run a python scrpit to find and save the number it says it cannot find it.
try:
views = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']/preceding-sibling::text()")
except NoSuchElementException:
print "NO views"
views = 'n/a'
pass
I no that pass is not best practice but i am just testing this at the moment trying to find the number. I'm wondering if i need to change something on the end of the xpath like .text as the xpath checker normally shows a results a little differently. Like below:
i needed to use the xpath i gave rather than the one used in the above picture because i only want the number and not the date. You can see part of the source in my previous question.
Thanks in advance! scratching my head here.
The xpath used in find_element_by_xpath() has to point to an element, not a text node and not an attribute. This is a critical thing here.
The easiest approach here would be to:
get the td's text (parent)
get the span's text (child)
remove child's text from parent's
Code:
span = browser.find_element_by_xpath("//div[#class='video-details-inside']/table//span[#class='added-time']")
td = span.find_element_by_xpath('..')
views = td.text.replace(span.text, '').strip()

Categories

Resources