Django - Creating & Store PDF Files using XHTML2PDF - python

As of now we are using XHTML2PDF to dynamically generate PDFs and outputting to browser whenever required. Now our requirements is changed to generate the PDF only once and store it in the server. The link should be displayed to user to view the PDF. Could you please point out any resources or snippets to achieve this?

This is pretty easy to do. Observe:
from django.core.files.base import ContentFile
# get_pdf_contents should return the binary information for
# a properly formed pdf doc.
pdf_contents = get_pdf_contents()
file_to_be_saved = ContentFile(pdf_contents)
item = Item.objects.get(pk=1)
item.myfilefield.save('blarg.pdf', file_to_be_saved)
The get_pdf_contents function shouldn't be too hard to write - basically take whatever function you have already and chop it off before it funnels the results into an HttpResponse object. If you need help with that, post the code you have already.

Related

Print an object's URL in Django

OK, I think this is actually simpler than I'm making it, but I can't figure out what to search for in the django documentation.
Here's what I need to do. I'm about to run a script (using the writing custom django-management commands guide) which will add a whole bunch of records to my database. A mass import of all the student data. After I create these 130 new records, I need to create a QR code for each of them. I've found several QR code generator sites, but that's slow as hell and the school won't pay for me to sign up for the ones that come with mass QR code generation. No problem, this is Python. Batteries included!
I have the qrcode module installed, but what I don't understand is how I can get the URL after it creates the object. I know which view I want (the relative url is /awards/student/<student_id>/). I know that in the HTML template language, that {% url 'StudentHome' student.id %} gets me that url. Can I access that from the command line and pass it to qrcode, which would then generate and save the qrcodes as png files?
Use reverse method to reverse a URL.
reverse("url_name", args=(args1))
As for your case
reverse("StudentHome", args=(student.id))
For getting absolute Uri you can use this method.
abs_uri = request.build_absolute_uri(reverse("StudentHome", args=(student.id)))
Refs: docs

Django reportlab - Print current page (admin console) to PDF

What I am trying to accomplish is to allow users to view information in the django admin console and allow them to save and print out a PDF of the information infront of them based upon how ever they sorted/filtered the data.
I have seen a lot of documentation on report lab but mostly for just drawing lines and what not. How can I simply output the admin results to a PDF? If that is even possible. I am open to other suggestions if report lab is not the ideal way to get this done.
Thanks in advance.
Better use some kind of html2pdf because you already have html there.
If html2pdf doesn't do what you need, you can do everything you want to do with ReportLab. Have a look at the ReportLab manual, in particular the parts on Platypus. This is a part of the ReportLab library that allows you to build PDFs out of objects representing page parts (paragraphs, tables, frames, layouts, etc.).

How to use a report from a view inside another view in Django?

I have a form where I upload a file, and I generate a report out of it. The thing is, I would also like to make the report available for download, as an archive. I would like to somehow include the CSS and the JS ( that I inherit from my layout ) inside the report, but I don't really know how to go about this.
So far, I am not storing the file ( the report's being generated from ) on server side, I delete it after I'm done with it.
The only solution I could think of so far, was: from my archive generating view, use urllib to post to the form generating the report, save the response, and just rewrite the links to the stylesheet/JS files.
Is there a simpler way to go about this? Is there a way to keep some files on server side as long as the client's session lives?
Use the HttpResponse in the view you use to show the generated report instead of posting with urllib. If you have something along the lines of
def report_view(request):
...
return render_to_response(request,....)
Then use the response object to create an archive
def report_view(request):
...
archive_link = "/some/nice/url/to/the/archive"
response = render_to_response(request, ... { "archive-link" : archive_link})
store_archive(response)
return response
def store_archive(response):
# here you will need to find css/js files etc
# and bundle them in whatever type of archive you like
# then temporarily store that archive so it can be accessed by the archive_link
# you previously used in your view to allow for downloading
def report_archive_view(request):
# serve the temporarily stored archive, then delete it if you like
You can find all you need to know about HttpResponse in the Django docs.
Although this might work for you I doubt that is what you are really after, maybe what you are really looking for is to generate a pdf report using ReportLab?
you could always keep the files, and have a cron job that deletes files whose session has expired

PDF Form Field Manipulation

I'm making a web interface to autofill pdf forms with user data from a database. The admin needs to be able to upload a pdf (right now targeted at IRS pdf forms) and then associate the fields in the pdf with data fields in the database.
I need a way to help the admin associate the field names (stuff like "topmostSubform[0].Page2[0].p2-t66[0]") with the the data fields in the database. I'm looking for a way to modify the PDF programatically to in some way provide this information.
Basically I'm open to suggestions on how I might make the field names appear in an obvious manner on a modified version of the original pdf. The closest I've gotten is being able to insert Tooltips into the fields in the pdf by just editting the raw pdf line by line. However when editting the pdf in this manner the field names are gibberish, and so I can't just use them.
An optimal solution would be anything that could automatically parse a pdf and set each field's tooltip to be the fields name. Anything that can be run from the command line, or any python tool, or just a basic how to correctly parse a field's name from a raw pdf file would be amazing.
There may be an easier solution than this, but you could definitely get the job done with http://www.reportlab.com/software/opensource/rl-toolkit/'>ReportLab.
If you can save the current tax forms as an image, you could determine where each of the items need to be written and develop your code so that it automatically layers the appropriate values from the database on top of the image (the tax form, or whatever it might be).
Once you've determined 1) What fields need to be pulled from the database, and 2) where they 're supposed to go on within the form...
this is essentially what you'd be doing:
from reportlab.pdfgen import canvas
report_string_values = ['Alex',500,500],['Guido',400,400],
c = canvas.Canvas('hello.pdf')
c.drawImage(background_image,x_pos,y_pos) # x_pos and w_pos are # pixels from bl origin
for rsv in report_string_values:
c.drawString(rsv.x_pos,rsv.,rsv.text)
c.showPage()
c.save()
A postscript parser lives here: https://github.com/haxwithaxe/py-ps-parser
I've been interested in playing with it, but haven't yet.
This may be way off your intended track; but, it might be worth a think. I've been working on parsing scanned structured documents into Django model instances. Using tesseract and unpaper to do the pre-processing and OCR, I get over 99% accuracy. That lets me parse the OCR output text with the Levenshtein and re modules and do a simple new_instance = MyModel(parsed1, parsed2, ...).
It seems that you are trying to do something similar. Looking at the forms at http://www.irs.gov/formspubs/ They tend to have text labels left-adjacent to the fields. Using something like py-tesseract, you should be able to OCR the labels, overlay the OCR text over the form image and allow the user to select/edit the field labels.
There is a nice little tool, ocrfeeder https://live.gnome.org/OCRFeeder, that is written in python and should give you a basic idea of how the process works in a desktop app. Good luck.
Government Forms are usually not a standard PDF but a JavaScript driven XFA in a PDF wrapper, thus to enter the data programmatically you need a lookup table as the order is rarely the visual order.
Here the first field "single yes or no" "topmostSubform[0].Page1[0].c1_01[0]" is a checkbox designated well down the list of entries. of course none in this Form are "topmostSubform[0].Page2[0].p2-t66[0]" so you need totally different look-up table for each XFA. Otherwise follow the entries (luckilly there is some sence of sequencing in this form) so free format field "topmostSubform[0].Page1[0].f1_01[0]" is near "dependent:" etc.
There are XFA dedicated applications that can extract the positions of static fields, but if the fields are dynamically adapting then the page position would be a moving feast.
For XFA you need an intelligent dedicated Adobe listing (often xml / xlsx input output supplied on request from the relevant department), or build your own if Acrobat Pro does not block the attempt.
I may be interpreting the question wrong but I have a lot of experience in pdf generation with python/django because of the site that I worked on for 5 months. I would suggest using texlive. Basically what I did was built a generic tex template for a document and then used django templating to insert the fields. I rendered the template as if it were html using render_to_string and then generated it using the pdflatex command. I ran pdflatex using pythons subprocess module and a little extra. To do the generating I used this guys pdflatex module http://bit.ly/KaDMBp , with some modifications. All the things you need are in the core.py inside of the pdflatex directory.
Ex tex document (test.tex) )
\begin{document}
my name is {{input_name}} and i live in {{input_location}}.
\end{document}
Ex rendering template with django templating and render_to_string )
params={input_name:"andrew",input_location:"nyc"}
tex_doc = render_to_string('test.tex', params)
Ex generating as pdf)
pdflatex = PDFLatex(texfile=tex_path,outputdir=pdf_path)
pdflatex.transform()
Latex has a somewhat annoying, difficult learning curve but if you put in the time you can learn what you need to know in order to create these pdfs.
Hope this helps.
The SDAPS framework was designed for scenarios like this: It aids in batch-processing PDF-based forms, extract contents from designated fields and e.g. funnel those into a database for further processing.

Generate ODT/DOC(X) and convert to PDF, without OO.o/MS

I have a WSGI application that generates invoices and stores them as PDF.
So far I have solved similar problems with FPDF (or equivalents), generating the PDF from scratch like a GUI. Sadly this means the entire formatting logic (positioning headers, footers and content, styling) is in the application, where it really shouldn't be.
As the templates already exist in Office formats (ODT, DOC, DOCX), I would prefer to simply use those as a basis and fill in the actual content. I've found the Appy framework, which does pretty much that with annotated ODT files.
That still leaves the bigger problem open, tho: converting ODT (or DOC, or DOCX) to PDF. On a server. Running Linux. Without GUI libraries. And thus, without OO.o or MS Office.
Is this at all possible or am I better off keeping the styling in my code?
The actual content that would be filled in is actually quite restricted: a few paragraphs, some of which may be optional, a headline or two, always at the same place, and a few rows of a table. In HTML this would be trivial.
EDIT: Basically, I want a library that can generate ODT files from ODF files acting as templates and a library that can convert the result into PDF (which is probably the crux).
I don't know how to go about automatic ODT -> PDF conversion, but a simpler route might be to generate your invoices as HTML and convert them to PDF using http://www.xhtml2pdf.com/. I haven't tried the library myself, but it definitely seems promising.
You can use QTextDocument, QTextCursor and QTextDocumentWriter in PyQt4. A simple example to show how to write to an odt file:
>>>from pyqt4 import QtGui
# Create a document object
>>>doc = QtGui.QTextDocument()
# Create a cursor pointing to the beginning of the document
>>>cursor = QtGui.QTextCursor(doc)
# Insert some text
>>>cursor.insertText('Hello world')
# Create a writer to save the document
>>>writer = QtGui.QTextDocumentWriter()
>>>writer.supportedDocumentFormats()
[PyQt4.QtCore.QByteArray(b'HTML'), PyQt4.QtCore.QByteArray(b'ODF'), PyQt4.QtCore.QByteArray(b'plaintext')]
>>>odf_format = writer.supportedDocumentFormats()[1]
>>>writer.setFormat(odf_format)
>>>writer.setFileName('hello_world.odt')
>>>writer.write(doc) # Return True if successful
True
If not sure the difference between odt and odf in this case. I checked the file type and it said 'application/vnd.oasis.opendocument.text'. So I assume it is odt. You can print to a pdf file by using QPrinter.
More information at:
http://qt-project.org/doc/qt-4.8/

Categories

Resources