Pdftron - creating new element with same styles as existing element - python

I am trying to create PDF editing prototype using PdfTron software.
I have successfully created interface where user can click on image, created from PDF, select region and will be presented a text input where he/she can then enter text, that will replace the content in PDF file.
Now the text replacing part is problematic. Since there is no API doc for Python (only examples) I am following Java / Android API documentation.
Where I am for now. I have following code to find out the elements that are in user selected rectangle. Values x1, y1, x2, y2 are PDF coordinates based on user selection in the front end.
rect = Rect(x1, y1, x2, y2)
text = ''
extractor = TextExtractor()
extractor.Begin(page)
line = extractor.GetFirstLine()
words = []
while line.IsValid():
word = line.GetFirstWord()
while word.IsValid():
elRect = word.GetBBox()
elRect.Normalize()
if elRect.IntersectRect(elRect, rect):
text += ' ' + word.GetString()
words.append(word)
word = word.GetNextWord()
line = line.GetNextLine()
words is basically array where I store the content that will later need to be replaced for new element.
Now the problem. I want the new element have the same style and font that the old text has.
Api (link) tells me that using
style = words[0].GetStyle()
gives me style of the word and I can get font from style using
font = style.GetFont()
doc : https://www.pdftron.com/pdfnet/mobile/docs/Android/pdftron/PDF/TextExtractor.Style.html
But this returned font is of Obj class not Font class.
And apparently creating new text element with font requires object of Font class.
Because
element = eb.CreateTextBegin(font, 10.0);
generates an error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/alan/.virtualenvs/pdfprint/local/lib/python2.7/site-packages/PDFNetPython2.py", line 5056, in CreateTextBegin
def CreateTextBegin(self, *args): return _PDFNetPython2.ElementBuilder_CreateTextBegin(self, *args)
NotImplementedError: Wrong number or type of arguments for overloaded function 'ElementBuilder_CreateTextBegin'.
Possible C/C++ prototypes are:
pdftron::PDF::ElementBuilder::CreateTextBegin(pdftron::PDF::Font,double)
pdftron::PDF::ElementBuilder::CreateTextBegin()
Perhaps there is better approach to achieving same result?
Edit1
Reading docs I found that you can create Font object based on Object like:
font = Font(style.GetFont())
Still stuck on creating element with those styles though.
/edit1
Edit2
I use following code to test writing into file:
style = elements[0].GetStyle()
font = Font(style.GetFont())
fontsize = style.GetFontSize()
eb = ElementBuilder()
element = eb.CreateTextBegin(font, 10.0)
writer.WriteElement(element)
element = eb.CreateTextRun('My Name')
element.SetTextMatrix(10, 0, 0, 10, 100, 100)
gstate = element.GetGState()
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB())
gstate.SetStrokeColor(ColorPt(1, 1, 1))
element.UpdateTextMetrics()
writer.WriteElement(element)
writer.WriteElement(eb.CreateTextEnd())
writer.End()
from core.helpers import ensure_dir
ensure_dir(output_filename)
doc.Save(output_filename, SDFDoc.e_linearized)
doc.Close()
What I cant figure out is:
How to copy styles from existing element.
How to position new element in document.
Why this test code does not give me visible results. As far as I see new file gets created by it does not have "My Name" anywhere in it.
/Edit2

Based on the code above it looks like you want to append some text to an existing page based on the font style (font name + color) used by the first word on the page.
There are couple issue with the above code. You are setting the stroke color rather than fill:
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetStrokeColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(ColorPt(1, 1, 1))
try
gstate.SetTextRenderMode(GState.e_fill_text)
gstate.SetFillColorSpace(ColorSpace.CreateDeviceRGB());
gstate.SetFillColor(ColorPt(1, 0, 0)) // hardcode to red … for testing purposes only
The main issue is most likely related to Font handling. You are hijacking an existing font and are assuming that this font is using ‘standard encoding’. However this font is likely not using standard encoding. Also fonts in existing PDFs are often sub-setted (this means that the font does not contain a full list of glyphs, but only character references that are present in the document). As a result, you may see notdef or whitespace instead of the expected text. This and some other issues are covered here:
https://groups.google.com/d/msg/pdfnet-sdk/RBTuJG2uILk/pGkrKnqZ_YIJ
https://groups.google.com/d/msg/pdfnet-sdk/2y8s5aehq-c/xyknr9W5r-cJ
As an solution, instead of using embedded font directly you can find a matching system font (e.g. based on font name and other properties) and create a new font. PDFNet offers a utility method Font.Create(doc, font) , or Font.Create(doc, "Font name").
This methods will create a Unicode font so you should use eb.CreateUnicodeTextRun() rather than eb.CreateTextRun().
Alternatively you could use AcroForm as a template (see InteractiveForms sample) and pdfdoc.FattenAnnotations() to end-up with read-only version of the document.

Related

change all fonts in powerpoint without opening the file

I wanted change the all fonts in about 100 powerpoint files, without opening the files. There are several shape types in each slide and each might have a different font. I used python-pptx package and wrote the following code to change the fonts of all texts in a powerpoint presentation. Although it does not give any error, it does not work, and the fonts in the file are still whatever they were, for example Arial. I also added print(shape.text) to make sure that it has found all texts, and it seems that there is no issue there. Is it a bug? Or am I missing anything?
prs = Presentation('f10.pptx')
for i, slide in enumerate(prs.slides):
for shape in slide.shapes:
print (shape.has_text_frame)
if shape.has_text_frame:
print(shape.text)
for p in shape.text_frame.paragraphs:
for r in p.runs:
print(r.font.name)
r.font.name = 'Tahoma'
print(r.font.name)
prs.save('f10_tahoma.pptx')
Besides, it seems that the package does not work for utf-8 characters. I added a text-box on the last slide by adding:
text_frame = shape.text_frame
text_frame.clear() # not necessary for newly-created shape
p = text_frame.paragraphs[0]
run = p.add_run()
run.text = 'سلام '
font = run.font
font.name = 'Andalus'
font.size = Pt(18)
before saving the file to add a textbox with utf-8 characters. It adds it there, and when I check the font it shows that it is set to Andalus, but actually it is not Andalus.
With Aspose.Slides for Python via .NET, you can easily change all fonts for all texts in your presentations. The following code example shows you how to do this:
import aspose.slides as slides
with slides.Presentation('example.pptx') as presentation:
for slide in presentation.slides:
for shape in slide.shapes:
if isinstance(shape, slides.AutoShape):
for paragraph in shape.text_frame.paragraphs:
for portion in paragraph.portions:
portion.portion_format.latin_font = slides.FontData('Tahoma')
You can also evaluate Aspose.Slides Cloud SDK for Python for presentation manipulating. This REST-based API allows you to make 150 free API calls per month for API learning and presentation processing.
Aspose Slides Online Viewer can be used to view presentations without PowerPoint installed.
I work at Aspose.
What language is the text of the file? Run.font properties work fine for UTF-8, but there is a separate font for cursive scripts like Arabic. Access to that secondary font is not implemented in python-pptx unfortunately, but that could explain at least part of the behavior you're seeing.
For roman character text (like that we're using here), there are a couple things to check.
The font in question needs to be installed on the machine PowerPoint is running on when the document is opened. Otherwise PowerPoint will substitute a font.
The font (typeface) name used in the XML will not always exactly match what appears in the PowerPoint drop-down selection box. You need to give that name to python-pptx in the exact form it should appear in the XML. You may need to make an example file that works by hand, perhaps containing a single slide with a single textbox for simplicity, and then inspect the XML of that file to find the "spelling" used for that typeface by PowerPoint.
You could do that with code like this:
prs = Presentation("example.pptx")
shape = prs.slides[0].shapes[0]
print(shape._element.xml)
You should be able to locate the typeface name somewhere in an element like <p:rPr> or <p:defRPr>.

Python- How to add text on desired coordinates on new word file

If the size of the document (.doc) say 1000x3000 (WidthxHeight), I need to place a text "Hello" at 400X300 coordinate points.
How to do it using python + any libraries? (Ubuntu platform)
Similar kind of problem has been addressed using Java
How to add text on desired coordinates on new word file using openxml
You can't place regular text at an arbitrary position. But you can use a textbox instead. Make sure to set top and left margins to 0. The top left corner of the text will then be at the specified coordinates.
Unfortunately, Python-docx doesn't support floating shapes. But you can do it using COM Automation on a Windows computer where Word is installed. You need PyWin32 which can also be downloaded from here.
import win32com.client as win32
word = win32.gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Add()
tb = doc.Shapes.AddTextbox(1, 400, 300, 100, 100)
tb.TextFrame.TextRange.Text = "Hello"
tb.TextFrame.MarginTop = 0
tb.TextFrame.MarginLeft = 0
tb.Fill.Visible = 0
tb.Line.Visible = 0
doc.SaveAs2("Hello.docx")
doc.Close()
word.Application.Quit()

Creating a lightweight fallback font with fontforge and fonttools

For a webapp I need a way to prevent that a browser falls back to another font if my web font doesn't include a character. It seems the only way to do this is to add another font to the fontstack which includes "all" possible characters 1.
There are already existing fallback fonts, but those are more debug helpers as they show the codepoint as number, therefore they are much to heavy (>2MB).
The fallback font for my usecase should just show something like a box to signal a missing character.
My idea was to generate a simple font with only one glyph and apply a feature file which will replace all glyphs with this one.
My script for fontforge:
import fontforge
import fontTools.feaLib.builder as feaLibBuilder
from fontTools.ttLib import TTFont
font_name = 'maeh.ttf'
font = fontforge.font()
glyph = font.createChar(33, "theone")
pen = glyph.glyphPen()
pen.moveTo((100,100))
pen.lineTo((100,500))
pen.lineTo((500,500))
pen.lineTo((500,100))
pen.closePath()
for i in range(34, 99):
glyph = font.createChar(i)
glyph.width=10
font.cidConvertTo('Adobe', 'Identity', 0) # doesn't make a difference
font.generate(font_name)
font = TTFont(font_name)
feaLibBuilder.addOpenTypeFeatures(font, 'fallback.fea')
font.save("fea_"+font_name)
My feature file:
languagesystem DFLT dflt;
#all=[\00035-\00039];
##all=[A-Z] this works
feature liga {
sub #all by theone;
} liga;
But the above results in a
KeyError: ('cid00037', 'SingleSubst[0]', 'Lookup[0]', 'LookupList')
with changing numbers for cid00037.
If I use the out commented A-Z from the Feature file it works, so this approach doesn't seem to be completely wrong.
Why can't fonttools find the glyphs if I specify the range in CID notation?
Is there another way to crate a class for the OpenType feature file which includes all glyphs?
While working on the above problem, somebody hinted me to the Adobe NotDef font, which is pretty much what I was looking for. For some reason I wasn't able to convert the .otf of the Adobe NotDef to woff or woff2 with fontforge. Also all the online tools to create the web font files like fontsquirrel failed. To create the woff file I used sfnt2woff from the woff-tools package. For the woff2 file I used https://github.com/google/woff2.

Python reportlab paragraph function only draws a third of the input text to a pdf file

Forgive me if this is a common issue, i could not find an answer.
As the title states, I've got a problem with report lab drawing text to a .pdf file.
The input is a big plaintext string, extracted from a json object.
This is the code i use too generate and automaticly open the pdf file.
To open the document I use a system call.
def importAsPdf():
#Open the plain tekst
document = json.load(urllib2.urlopen(documenturl))
documentId = document["id"].encode("utf-8")
text = document["text"].encode("utf-8")
print(text)
#Create the pdf file
doc = SimpleDocTemplate("pdftextfile.pdf")
parts = []
#setting page width and height. just used a standard A4 page measurement.
PAGE_WIDTH, PAGE_HEIGHT = A4
aW = PAGE_WIDTH - 4*inch # available width and height
aH = PAGE_HEIGHT - 4*inch
#importing the styles
style = ParagraphStyle(name='fancy')
style.fontSize = 12
style.leading = 18
#Build the pdf
p = Paragraph(text, style)
parts.append(p)
doc.build(parts)
print(doc.filename)
#Open the pdf
subprocess.call(('gnome-open', "pdftextfile.pdf"))
When it opens the pdf, only about on third of the text is in there and the other 2 thirds are no where to be found.
It doesn't throw any exception or anything, it just stops somewhere mid sentence about 1 third of the way there.
Any thoughts?
edit:
I've found that it stops at the only "<" in the text. Are these a problem for generating pdf files with reportlab?
Yes, this happens to you because you can use XML tags inside the text to further format the text, so using your variables e.g:
text = """<font size=12>Hello<br/>World!</font>"""
p = Paragraph(text, style)
parts.append(p)
doc.build(parts)
That would produce the following paragraph with a font size of 12:
Hello
World!
So when the tags are incomplete ReportLab malfunctions. If you want to use angle brackets I suspect you must escape them from the given text. Cheers
I have found that an "<" in the text stops reportlab from completely drawing the text to a pdf document. I removed the "<" with text.replace("<", "")and everything is now fine.
As to the why, I have no idea.

Draw bold/italic text with PIL?

How to draw bold/italic text with PIL? ImageFont.truetype(file, size) has an option to specify font size only.
Use the bold/italic version of the font
There is no bold features as parameter till now but easily you can solve by add stroke to the text with same color of text. it will make sense as bold font next code elaborate how to use stroke
draw.text((x, y), text, fill=color, font=font, stroke_width=2,
stroke_fill="black")
A rather hacky solution to make a font bold if (for whatever reason) you don't have a separate bold version of the font is to print the same text several times with a slight offset.
andaleMono = ImageFont.truetype(ANDALE_MONO_PATH,16)
text = "hello world"
mainOffset = (50,50)
xoff, yoff = mainOffset
draw.text(mainOffset,text,font=andaleMono,fill='black')
draw.text((xoff+1,yoff+1),text,font=andaleMono,fill='black')
draw.text((xoff-1,yoff-1),text,font=andaleMono,fill='black')
Many fonts use different TTF files for their bold/italic versions, so I'd imagine if you just specify that file it would work.
Well, this is my first comment. Here we go.
I'll try to clarify the procedure. At first What I did was use the "name" of the font like this
font = ImageFont.truetype("C:\Windows\Fonts\\Arial Negrita.ttf",25)
but only got some errors like this:
Traceback (most recent call last):
File "C:/Users/555STi/PycharmProjects/PIL/img.py", line 8, in <module>
font = ImageFont.truetype("C:\Windows\Fonts\Arial negrita.ttf",25)
File "C:\Python27\lib\site-packages\PIL\ImageFont.py", line 262, in truetype
return FreeTypeFont(font, size, index, encoding)
File "C:\Python27\lib\site-packages\PIL\ImageFont.py", line 142, in __init__
self.font = core.getfont(font, size, index, encoding)
IOError: cannot open resource
Then I remembered that sometimes fonts has other "names" or "filenames", so, what I did was going to fonts folder, then opened the Arial Font wich displayed all the styles like negrita (bold], cursiva(italic), etc.
Did a right click on the "negrita" style, selected "properties" and then there was the "real name" of the font.
In my case, the name was "ariblk"
Then, finally, just used the name like this.
font = ImageFont.truetype("C:\Windows\Fonts\\ariblk.ttf",25)
I know this post is old, but today helped me to get to the solution. So I hope to help anybody.
=)
With reference to the other answers here, my search for the name for the bold variant of Arial produced the following (arialbd.ttf):
def FindFontsVariantsWithBase(fontBase="arial"):
import matplotlib
system_fonts = matplotlib.font_manager.findSystemFonts(fontpaths=None, fontext='ttf')
# for font in np.sort(system_fonts):
# print(font)
fonts = np.sort(system_fonts).tolist()
res = [i for i in fonts if fontBase in i]
print(res)
FindFontsVariantsWithBase("arial")
['C:\WINDOWS\Fonts\arial.ttf', 'C:\WINDOWS\Fonts\arialbd.ttf', 'C:\WINDOWS\Fonts\arialbi.ttf', 'C:\WINDOWS\Fonts\ariali.ttf', 'C:\Windows\Fonts\arial.ttf', 'C:\Windows\Fonts\arialbd.ttf', 'C:\Windows\Fonts\arialbi.ttf', 'C:\Windows\Fonts\ariali.ttf'] (

Categories

Resources