Rendering float divs from html to pdf - python

Is there any way to generate PDF from html with floating divs (I can event use fixed width and height values for divs), margins and padding in Python? Does anybody know python libs which work correctly with this css property or may be system tools? Any info will be helpfull.
I have tried wkhtmltopdf. Pisa excluded immediately...

not python, but you could try http://phantomjs.org/ simple js to generate a page, then just call .render to generate a pdf

Related

How do I check if a website is responsive using python?

I am using python3 in combination with beautifulsoup.
I want to check if a website is responsive or not. First I thought checking the meta tags of a website and see if there is something like this in it:
content="width=device-width, initial-scale=1.0
Accuracy is not that good using this method but I have not found something better.
Has anybody an idea?
Basically I want to do the same as Google did it here: https://search.google.com/test/mobile-friendly reduced to the output if the website is responsive or not (Y/N)
(Just a suggestion)
I am not an expert on this but my first thought is that you need to render the website and see if it "responds" to different screen sizes. I would normally use something like phantomjs to do this.
Apparently, you can do this in python with selenium (more info at https://stackoverflow.com/a/15699761/3727050). A more comprehensive list of technologies that can be used for this task can be found here. Note that these resources seem a bit old/outdated and some solutions fallback to python subprocess calling phantomjs.
The linked google test seems to
Load the page in a small browser and check:
The font-size to be readable
The distance between clickable elements to ensure the page is usable
I would however do the following:
Load the page in desktop mode, record each div's style.
Gradually reduce the size of the screen and see which percentage of these change style
In most cases, from a large screen to a phone size you should be seeing 1-3 distinct layouts which should be identifiable from the percentage of elements changing style
The above does not guarantee that the page is "mobile-friendly" (ie usable in a mobile) but it shows if the CSS are responsive.

How to print online webpage target element into image programatically?

Given an online webpage :
https://stackoverflow.com/users/1974961
Given a target element with id="REPUTATION" (here artificially bordered in red) in that webpage :
How to print into an image reputation_1974961.ext this element ?
Take a look at this library: https://www.npmjs.com/package/html2png
The html2png library lets you pass in an HTML string to its render method, and it will render the HTML into a PNG (returned as a buffer in its callback). You should then be able to save the buffer contents to a file using standard file I/O.
As for grabbing the HTML string of just that element: grab the full page with request or your request library of choice, then use something like Cheerio to target just the element you want and get its HTML. (Cheerio: https://www.npmjs.com/package/cheerio ).
There may be some gotchas, such as you may need to also grab some styling from the returned HTML and copy that into the rendering string, too, but this should help you find the right direction :)
Not exactly using a div id,but I was able to get this much using imgkit and playing around with wkhtmltopdf options. You need to install imgkit and wkhtmltopdf as mentioned in the link.
The crop options given might be different for you so play around with it. You can find all the wkhtmltopdf options here.
import imgkit
options = {
'crop-h': '300',
'crop-w': '400',
'crop-x': '100',
'crop-y': '430'
}
imgkit.from_url('https://stackoverflow.com/users/1974961/hugolpz?tab=questions', 'out.jpg',options=options)
Output (out.jpg)
This is not perfect as you can see, but is certainly one of the options you can consider.

Weasyprint HTML to PDF huge gap in right margin

I'm using Weasyprint to print an HTML template to PDF, and I keep getting a gap of 10cm on the right side.
I'm using #page:(size:letter;) as only page attribute.
I've tried setting the page size manually, but I still keep getting a huge space to the right of all the pages.
Any thoughts on what could be the problem?
Found the solution. It was a CSS problem. The class used to style the body was not at the beginning of the css file and that caused erratic behavior with other styles declared before it.

How to programmatically measure the elements' sizes in HTML source code using python?

I'm doing webpage layout analysis in python. A fundamental task is to programmatically measure the elements' sizes given HTML source codes, so that we could obtain statistical data of content/ad ratio, ad block position, ad block size for the webpage corpus.
An obvious approach is to use the width/height attributes, but they're not always available. Besides, things like width: 50% needs to be calculated after loading into DOM. So I guess loading the HTML source code into a window-size-predefined-browser (like mechanize although I'm not sure if window's size could be set) is a good way to try, but mechanize doesn't support the return of an element size anyway.
Is there any universal way (without width/height attributes) to do it in python, preferably with some library?
Thanks!
I suggest You to take a look at Ghost - webkit web client written in python. It has JavaScript support so you can easily call JavaScript functions and get its return value.
Example shows how to find out google text box width:
>>> from ghost import Ghost
>>> ghost = Ghost()
>>> ghost.open('https://google.lt')
>>> width, resources = ghost.evaluate("document.getElementById('gbqfq').offsetWidth;")
>>> width
541.0 # google text box width 541px
To properly get all the final sizes, you need to render the contents, taking in account all CSS style sheets, and possibly all javascript. Therefore, the only ways to get the sizes from a Python program are to have a full web browser implementation in Python, use a library that can do so, or pilot a browser off-process, remotely.
The later approach can be done with use of the Selenium tools - check how you can get the result of javascript expressions from within a Python program here: Can Selenium web driver have access to javascript global variables?

Python : Rendering part of webpage with proper styling from server

I am building a screen clipping app.
So far:
I can get the html mark up of the part of the web page the user has selected including images and videos.
I then send them to a server to process the html with BeautifulSoup to sanitize the html and convert all relative paths if any to absolute paths
Now I need to render the part of the page. But I have no way to render the styling. Is there any library to help me in this matter or any other way in python ?
One way would be to fetch the whole webpage with urllib2 and remove the parts of the body I don't need and then render it.
But there must be a more pythonic way :)
Note: I don't want a screenshot. I am trying to render proper html with styling.
Thanks :)
Download the complete webpage, extract the style elements and the stylesheet link elements and download the files referenced the latter. That should give you the CSS used on the page.

Categories

Resources