How to scrape specific information from javascript webpage by Selenium Python?

How to scrape specific information from javascript webpage by Selenium Python? - python

I can't scrape the 'Resolution' field from the javascript webpage, as I believe.
Webpage address:
https://support.na.sage.com/selfservice/viewdocument.do?noCount=true&externalId=60390&sliceId=1&noCount=true&isLoadPublishedVer=&docType=kc&docTypeID=DT_Article&stateId=4183&cmd=displayKC&dialogID=197243&ViewedDocsListHelper=com.kanisa.apps.common.BaseViewedDocsListHelperImpl&openedFromSearchResults=true
I need to extract Description, Cause, and Resolution.
Tried various ways to get element, including:
find_element_by_xpath
find_element_by_id
find_element_by_class_name.
Nothing gave the desired result.
Could you direct me in which way should I work?

https://support.na.sage.com/selfservice/viewContent.do?externalId=60390&sliceId=1
This is the correct url that you can crawl html, use Network tab of your browser devtool to find that.
Example with Chrome

Related

Capturing html source as text and then searching with xpath

Using Selenium in Python.
I have a scraping tool that opens a site, grabs text elements using XPath, cleans those elements, and closes the page. The tool is getting too bulky to clean the elements while the driver is still connected. So what I want to do is instead open the page, grab the entire HTML source, close the page, and then grab what I want out of the page using xpaths. But since the page is now just text, I'm unable to use the XPath methods in selenium. Any recommendations?

Retrieving dynamic DOM content in selenium

I am trying to scrape some content from a website, but selenium.page_source() does not contain all the content I need beacuse the webiste is dynamically rendered. When opening DevTools in Chrome you are able to inspect all of the DOM-elements - even those rendered dynamically. This made me believe that there must be a way in selenium to do this as well.
Any suggestions?

Get the inner html of the html or body
driver.find_element_by_xpath("/html/body").get_attribute('innerHTML')
If that does not get everything, please post the source html/website.

Get current HTML from browser tab with Python

I know there are plenty ways to get a HTML source passing the page url.
But is there a way to get the current html of a page if it displays data after some action ?
For example: A simple html page with a button (thats the source html) that displays random data when you click it.
Thanks

I believe you're looking for a tool collectively known as a "headless browser". The only one I've used that is available in Python (and can vouch for) is Selenium WebDriver, but there are plenty to choose from if you're searching up headless browsers for Python.
https://pypi.org/project/selenium
With this you should be able to programmatically load a web page, look up and click the button in the virtually rendered DOM, then lookup the innerHTML property of the targeted element.

Python Selenium - Source from href link

I would like to be able to get the source page of the link I got from a a href without making Selenium changing page.
I am getting the a element using
driver.find_element(By.XPATH, "//a[contains(#class, 'css-1xyedec e1pf1lj70')]")
Then I can get the link in the href using
elem.get_attribute('href')
But I cannot find a way to get the source page of the link using selenium without changing the page of the browser.
EDIT: Here is the website on which I am trying to do it. The <a> is located for each sale in each div that includes the photo and the part with the title, price...

Check this answer https://sqa.stackexchange.com/questions/17022/how-to-fill-captcha-using-test-automation .
You cannot automate captcha. You should ask dev team for a workaround.
I would ask to disable CAPTCHA in test environment. It is no sense to have it there.

xpath on browser and response are different

When I search for an xpath in my browser after inspection it show the required result,but when I used the same xpath of my response in scrapy it should an empty list.
So when find an element on the browser, I get showing the number of satisfying element see picture for example.
Now, when I run the same xpath off my response in scrapy shell, I get an empty list,even though the response status is 200. What could be causing this?

You browser renders Javascript code and this leads to change in HTML code. So, in this case, you need to use a Javascript engine for requests in Scrapy. Please look at scrapy-splash to render JS and get same results as in browser.

If you use chrome browser, if would be a a little different in some tag with you get from requests or scrapy.
Like chrome will auto add in the html.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to scrape specific information from javascript webpage by Selenium Python? - python

https://support.na.sage.com/selfservice/viewContent.do?externalId=60390&sliceId=1 This is the correct url that you can crawl html, use Network tab of your browser devtool to find that. Example with Chrome

Related

Capturing html source as text and then searching with xpath

Retrieving dynamic DOM content in selenium

Get current HTML from browser tab with Python

Python Selenium - Source from href link

xpath on browser and response are different

Categories

Resources