What I am trying to accomplish is to allow users to view information in the django admin console and allow them to save and print out a PDF of the information infront of them based upon how ever they sorted/filtered the data.
I have seen a lot of documentation on report lab but mostly for just drawing lines and what not. How can I simply output the admin results to a PDF? If that is even possible. I am open to other suggestions if report lab is not the ideal way to get this done.
Thanks in advance.
Better use some kind of html2pdf because you already have html there.
If html2pdf doesn't do what you need, you can do everything you want to do with ReportLab. Have a look at the ReportLab manual, in particular the parts on Platypus. This is a part of the ReportLab library that allows you to build PDFs out of objects representing page parts (paragraphs, tables, frames, layouts, etc.).
Related
I want to edit a certain page while creating a PDF with ReportLab only. (There are some solutions with PyPDF2, but I want to use ReportLab only - if it is possible).
Description what I am doing / try to do:
I am creating a PDF-File which is giving the reader a good overview of certain data. I get the data from a server. I know the data structure, but from PDF to PDF it varies how much data I get. That's why some PDFs will be 20 pages long, some can be 50 pages+.
After getting all the data and creating a beautiful PDF (this work is done by now), I want to go back to page 2 of this PDF and create my own, very individual table of content.
But I can't find anywhere how to edit a certain page after creating several new pages.
What I've done so far for trying to solve my problem / search:
I read the documentation
I checked stackoverflow
I checked git-repos
Help would be really appreciated. In case that it is not possible to add a certain page after other pages got added with ReportLab, I think about using PyPDF2 then. I have little to no experience with PyPDF2 so far, so if you have some good links you can send me I'd very thankful.
I'm building a website that'll have a django backend. I want to be able to serve the medical billing data from a database that django will have access to. However, all of the data we receive is in excel spreadsheets. So I've been looking for a way to get the data from a spreadsheet, and then import it into a django model. I know there are some different django packages that can do this, but I'm having a hard time understanding how to use these packages. On top of that I'm using python 3 for this project. I've used win32com for automation stuff in excel in the past. I could write a function that could grab the data from the spreadsheet. Though what I want figure out is how would I write the data to a django model? Any advice is appreciated.
Use http://www.python-excel.org/ and consider this process:
Make a view where user can upload the xls file.
Open the file with xlrd. xlrd.open_workbook(filename)
Extract, create dict to map the data you want to sync in db.
Use the models to add, update or delete the information.
If you follow the process, you can learn a lot of how loading and extracting works and how does it fits with the requirements. I recommend to you first do the step 2 and 3 in shell to get more quicker experiments and avoid to be uploading/testing/error with a django view.
Hope this kickoff base works for you.
Why don't you use django-import-export?
It's a widget that allows you to import excel files from admin section.
It's very easy to install, here you find the installation tutorial, and here an example.
Excel spreadsheets are saved as .csv files, and there are plenty of examples and explanations on how to work with them, such as here and here, online already.
In general, if you are having difficulty understanding documentation or packages, my advice would be to search for specific examples or see if whatever you are trying to do has already been done. Play with it to get a working understanding, and then modify it to fit your needs.
I have a parent file type that is folderish, and I would like to include a thumbnail of the first page of a child pdf in the template. Can someone roughly outline the tools and process you can imagine would achieve this, so that I can investigate further?
Getting out the first page of pdf can be achieved by using ghostscript.
This is an example script which forms an gostscript command and stores the images. I took this from collective.pdfpeek. Which by the way could solve your problem right away :-)
Until few days ago I would have recommended you not to use it, since it was a little bit buggy, but they recently shipped a new version, so give it a try! I'm not sure whether they now support DX or not.
So the workflow for you should be.
Uploading a PDF
Subscribe modified/creation events.
create image of first page using ghostscript (check my command, or collective.pdfpeek)
store it as blob (NamedBlobImage) on your uploaded pdf.
Also implement some queueing like collective.pdfpeek to not block all your threads with ghostscript commands.
OR
Give collective.pdfpeek a shot!
BTW:
imho on a large scale the preview generation for pdfs needs to be implemented as a service, which stores/manages the images for you.
I need to be able to allow a user to save and preview a PDF file, but
not print it. The reportlab instructions seem pretty
straightforward, but I didn't see it in the docs for pisa.
Thanks!
Previously I was using POD Appy Framework. It is very ease and clean to handle. You can check the document http://appyframework.org/pod.html
I'm making a web interface to autofill pdf forms with user data from a database. The admin needs to be able to upload a pdf (right now targeted at IRS pdf forms) and then associate the fields in the pdf with data fields in the database.
I need a way to help the admin associate the field names (stuff like "topmostSubform[0].Page2[0].p2-t66[0]") with the the data fields in the database. I'm looking for a way to modify the PDF programatically to in some way provide this information.
Basically I'm open to suggestions on how I might make the field names appear in an obvious manner on a modified version of the original pdf. The closest I've gotten is being able to insert Tooltips into the fields in the pdf by just editting the raw pdf line by line. However when editting the pdf in this manner the field names are gibberish, and so I can't just use them.
An optimal solution would be anything that could automatically parse a pdf and set each field's tooltip to be the fields name. Anything that can be run from the command line, or any python tool, or just a basic how to correctly parse a field's name from a raw pdf file would be amazing.
There may be an easier solution than this, but you could definitely get the job done with http://www.reportlab.com/software/opensource/rl-toolkit/'>ReportLab.
If you can save the current tax forms as an image, you could determine where each of the items need to be written and develop your code so that it automatically layers the appropriate values from the database on top of the image (the tax form, or whatever it might be).
Once you've determined 1) What fields need to be pulled from the database, and 2) where they 're supposed to go on within the form...
this is essentially what you'd be doing:
from reportlab.pdfgen import canvas
report_string_values = ['Alex',500,500],['Guido',400,400],
c = canvas.Canvas('hello.pdf')
c.drawImage(background_image,x_pos,y_pos) # x_pos and w_pos are # pixels from bl origin
for rsv in report_string_values:
c.drawString(rsv.x_pos,rsv.,rsv.text)
c.showPage()
c.save()
A postscript parser lives here: https://github.com/haxwithaxe/py-ps-parser
I've been interested in playing with it, but haven't yet.
This may be way off your intended track; but, it might be worth a think. I've been working on parsing scanned structured documents into Django model instances. Using tesseract and unpaper to do the pre-processing and OCR, I get over 99% accuracy. That lets me parse the OCR output text with the Levenshtein and re modules and do a simple new_instance = MyModel(parsed1, parsed2, ...).
It seems that you are trying to do something similar. Looking at the forms at http://www.irs.gov/formspubs/ They tend to have text labels left-adjacent to the fields. Using something like py-tesseract, you should be able to OCR the labels, overlay the OCR text over the form image and allow the user to select/edit the field labels.
There is a nice little tool, ocrfeeder https://live.gnome.org/OCRFeeder, that is written in python and should give you a basic idea of how the process works in a desktop app. Good luck.
Government Forms are usually not a standard PDF but a JavaScript driven XFA in a PDF wrapper, thus to enter the data programmatically you need a lookup table as the order is rarely the visual order.
Here the first field "single yes or no" "topmostSubform[0].Page1[0].c1_01[0]" is a checkbox designated well down the list of entries. of course none in this Form are "topmostSubform[0].Page2[0].p2-t66[0]" so you need totally different look-up table for each XFA. Otherwise follow the entries (luckilly there is some sence of sequencing in this form) so free format field "topmostSubform[0].Page1[0].f1_01[0]" is near "dependent:" etc.
There are XFA dedicated applications that can extract the positions of static fields, but if the fields are dynamically adapting then the page position would be a moving feast.
For XFA you need an intelligent dedicated Adobe listing (often xml / xlsx input output supplied on request from the relevant department), or build your own if Acrobat Pro does not block the attempt.
I may be interpreting the question wrong but I have a lot of experience in pdf generation with python/django because of the site that I worked on for 5 months. I would suggest using texlive. Basically what I did was built a generic tex template for a document and then used django templating to insert the fields. I rendered the template as if it were html using render_to_string and then generated it using the pdflatex command. I ran pdflatex using pythons subprocess module and a little extra. To do the generating I used this guys pdflatex module http://bit.ly/KaDMBp , with some modifications. All the things you need are in the core.py inside of the pdflatex directory.
Ex tex document (test.tex) )
\begin{document}
my name is {{input_name}} and i live in {{input_location}}.
\end{document}
Ex rendering template with django templating and render_to_string )
params={input_name:"andrew",input_location:"nyc"}
tex_doc = render_to_string('test.tex', params)
Ex generating as pdf)
pdflatex = PDFLatex(texfile=tex_path,outputdir=pdf_path)
pdflatex.transform()
Latex has a somewhat annoying, difficult learning curve but if you put in the time you can learn what you need to know in order to create these pdfs.
Hope this helps.
The SDAPS framework was designed for scenarios like this: It aids in batch-processing PDF-based forms, extract contents from designated fields and e.g. funnel those into a database for further processing.