I can't scrape the 'Resolution' field from the javascript webpage, as I believe.
Webpage address:
https://support.na.sage.com/selfservice/viewdocument.do?noCount=true&externalId=60390&sliceId=1&noCount=true&isLoadPublishedVer=&docType=kc&docTypeID=DT_Article&stateId=4183&cmd=displayKC&dialogID=197243&ViewedDocsListHelper=com.kanisa.apps.common.BaseViewedDocsListHelperImpl&openedFromSearchResults=true
I need to extract Description, Cause, and Resolution.
Tried various ways to get element, including:
find_element_by_xpath
find_element_by_id
find_element_by_class_name.
Nothing gave the desired result.
Could you direct me in which way should I work?
https://support.na.sage.com/selfservice/viewContent.do?externalId=60390&sliceId=1
This is the correct url that you can crawl html, use Network tab of your browser devtool to find that.
Example with Chrome
Related
Using Selenium in Python.
I have a scraping tool that opens a site, grabs text elements using XPath, cleans those elements, and closes the page. The tool is getting too bulky to clean the elements while the driver is still connected. So what I want to do is instead open the page, grab the entire HTML source, close the page, and then grab what I want out of the page using xpaths. But since the page is now just text, I'm unable to use the XPath methods in selenium. Any recommendations?
I am trying to scrape some content from a website, but selenium.page_source() does not contain all the content I need beacuse the webiste is dynamically rendered. When opening DevTools in Chrome you are able to inspect all of the DOM-elements - even those rendered dynamically. This made me believe that there must be a way in selenium to do this as well.
Any suggestions?
Get the inner html of the html or body
driver.find_element_by_xpath("/html/body").get_attribute('innerHTML')
If that does not get everything, please post the source html/website.
I know there are plenty ways to get a HTML source passing the page url.
But is there a way to get the current html of a page if it displays data after some action ?
For example: A simple html page with a button (thats the source html) that displays random data when you click it.
Thanks
I believe you're looking for a tool collectively known as a "headless browser". The only one I've used that is available in Python (and can vouch for) is Selenium WebDriver, but there are plenty to choose from if you're searching up headless browsers for Python.
https://pypi.org/project/selenium
With this you should be able to programmatically load a web page, look up and click the button in the virtually rendered DOM, then lookup the innerHTML property of the targeted element.
I would like to be able to get the source page of the link I got from a a href without making Selenium changing page.
I am getting the a element using
driver.find_element(By.XPATH, "//a[contains(#class, 'css-1xyedec e1pf1lj70')]")
Then I can get the link in the href using
elem.get_attribute('href')
But I cannot find a way to get the source page of the link using selenium without changing the page of the browser.
EDIT: Here is the website on which I am trying to do it. The <a> is located for each sale in each div that includes the photo and the part with the title, price...
Check this answer https://sqa.stackexchange.com/questions/17022/how-to-fill-captcha-using-test-automation .
You cannot automate captcha. You should ask dev team for a workaround.
I would ask to disable CAPTCHA in test environment. It is no sense to have it there.
When I search for an xpath in my browser after inspection it show the required result,but when I used the same xpath of my response in scrapy it should an empty list.
So when find an element on the browser, I get showing the number of satisfying element see picture for example.
Now, when I run the same xpath off my response in scrapy shell, I get an empty list,even though the response status is 200. What could be causing this?
You browser renders Javascript code and this leads to change in HTML code. So, in this case, you need to use a Javascript engine for requests in Scrapy. Please look at scrapy-splash to render JS and get same results as in browser.
If you use chrome browser, if would be a a little different in some tag with you get from requests or scrapy.
Like chrome will auto add in the html.