How to get image of particular web element as base64 - python

I'm using Selenium web driver and Python to create automation scripts for web application testing. I need to implement verification that will compare two strings of encoded png files as base64: saved basic image and current image on page of same web element. There is a method in Selenium that allow to get page screenshot as base64 object
driver.get_screenshot_as_base64()
But how to get base64 screen of not the whole page, but just of particular image element on page without downloading it?
P.S. Other ways of comparing two images are acceptable also:)

There is an answer to another question that explains how to take a screenshot of an element here. Once you have that, you should be able to do a pixel by pixel comparison of the two images. You can google and find code examples for that.
I don't see a lot of info on base64 images. It seems like it would be a really cool, easy way to compare two images since you'd just do a quick string compare but selenium doesn't seem to support taking a screenshot of an element in base64. You could probably do some work to take the screenshot, convert it and the reference image to base64, but that would likely be more work than just using a library or comparing two images that has been done a bunch of times before and is all over the web.

The following should work according to the docs, but does not, and there is an open issue for it here: https://github.com/SeleniumHQ/selenium/issues/912. In the meantime, I would suggest https://stackoverflow.com/a/15870708/1415130
Find you web page element however you want - see docs Locating
Elements
login_form = driver.find_element_by_id('loginForm')
Then screen grab the element
screenshot = login_form.screenshot_as_base64()
To compare screenshots, I'm using Pillow.

Related

How do I check if a website is responsive using python?

I am using python3 in combination with beautifulsoup.
I want to check if a website is responsive or not. First I thought checking the meta tags of a website and see if there is something like this in it:
content="width=device-width, initial-scale=1.0
Accuracy is not that good using this method but I have not found something better.
Has anybody an idea?
Basically I want to do the same as Google did it here: https://search.google.com/test/mobile-friendly reduced to the output if the website is responsive or not (Y/N)
(Just a suggestion)
I am not an expert on this but my first thought is that you need to render the website and see if it "responds" to different screen sizes. I would normally use something like phantomjs to do this.
Apparently, you can do this in python with selenium (more info at https://stackoverflow.com/a/15699761/3727050). A more comprehensive list of technologies that can be used for this task can be found here. Note that these resources seem a bit old/outdated and some solutions fallback to python subprocess calling phantomjs.
The linked google test seems to
Load the page in a small browser and check:
The font-size to be readable
The distance between clickable elements to ensure the page is usable
I would however do the following:
Load the page in desktop mode, record each div's style.
Gradually reduce the size of the screen and see which percentage of these change style
In most cases, from a large screen to a phone size you should be seeing 1-3 distinct layouts which should be identifiable from the percentage of elements changing style
The above does not guarantee that the page is "mobile-friendly" (ie usable in a mobile) but it shows if the CSS are responsive.

How to get visible text from a webpage using Selenium & python?

I am trying to grab a bunch numbers that are presented in a table on a web page that I’ve accessed using python and Selenium running headless on a Raspberry Pi. The numbers are not in the page source, rather they are deeply embedded in complex html served by several URLs called by the main page (the numbers update every few seconds). I know I could parse the html to get the numbers I want, but the numbers are already sitting on the front page in perfect format all in one place. I can select and copy the numbers when I view the web page in Chrome on my PC.
How can I use python and get Selenium webdriver to get me those numbers? Can Selenium simply provide all the visible text on a page? How? (I've tried driver.page_source but the text returned does not contain the numbers). Or is there a way to essentially copy text and numbers from a table visible on the screen using python and Selenium? (I’ve looked into xdotool but didn’t find enough documentation to help). I’m just learning Selenium so any suggestions will be much appreciated!
Well, I figured out the answer to my question. It's embarrassingly easy. This line gets just what I need - all the text that is visible on the web page:
page_text = driver.find_element_by_tag_name('body').text
So, there are some different situations why you can not get some info on the page:
Information doesn't loaded yet. You must waiting for some time to get your information ready. You may watch this theme for the better understanding. Some times you get dynamically added page elements with JS and so on, which loading is very slowly.
Information may consists of different type of data. For example you are waiting for a text with numbers, but you may get picture with numbers on the page. In this situation you must change your programming tactics and use another functions to get what you need.

How to print online webpage target element into image programatically?

Given an online webpage :
https://stackoverflow.com/users/1974961
Given a target element with id="REPUTATION" (here artificially bordered in red) in that webpage :
How to print into an image reputation_1974961.ext this element ?
Take a look at this library: https://www.npmjs.com/package/html2png
The html2png library lets you pass in an HTML string to its render method, and it will render the HTML into a PNG (returned as a buffer in its callback). You should then be able to save the buffer contents to a file using standard file I/O.
As for grabbing the HTML string of just that element: grab the full page with request or your request library of choice, then use something like Cheerio to target just the element you want and get its HTML. (Cheerio: https://www.npmjs.com/package/cheerio ).
There may be some gotchas, such as you may need to also grab some styling from the returned HTML and copy that into the rendering string, too, but this should help you find the right direction :)
Not exactly using a div id,but I was able to get this much using imgkit and playing around with wkhtmltopdf options. You need to install imgkit and wkhtmltopdf as mentioned in the link.
The crop options given might be different for you so play around with it. You can find all the wkhtmltopdf options here.
import imgkit
options = {
'crop-h': '300',
'crop-w': '400',
'crop-x': '100',
'crop-y': '430'
}
imgkit.from_url('https://stackoverflow.com/users/1974961/hugolpz?tab=questions', 'out.jpg',options=options)
Output (out.jpg)
This is not perfect as you can see, but is certainly one of the options you can consider.

Scraping information from a flash object on a website using python or any other method

I was just wondering if it is possible to scrape information form this website that contained in a flash file.(http://www.tomtom.com/lib/doc/licensing/coverage/)
I am trying to get the all the text from the different components of this website.
Can anyone suggest a good starting point in python or any simpler method.
I believe the following blog post answers your question well. The author had the same need, to scrape Flash content using Python. And the same problem came up. He realized that he just needed to instantiate a browser (even just an in-memory one that did not even display to the screen) and then scrape its output. I think this could be a successful approach for what you need, and he makes it easy to understand.
http://blog.motane.lu/2009/06/18/pywebkitgtk-execute-javascript-from-python/

I need to extract dom element of main contents in python

i'm working to extract main contents from web page without removing anything like image in python, yet most library just gives me back the text itself or cleaned dom elements.
i need dom elements themselves that contain main contents of article including image.
Is there any library for that purpose?
Thanks
If you mean getting whole dom node with img src="" then i believe beautifulsoup4 can do that.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
but with actual image i dont know you have to make separate request for image.
Or you can use selenium https://pypi.python.org/pypi/selenium, It will use your browser (Firefox, Chrome) so can do anything with extracting web contents

Categories

Resources