How do I pass multiple StringIO into python-pdfkit?

How do I pass multiple StringIO into python-pdfkit? - python

Goal:
- Use django templateing language.
- Render the template in memory (no disk writes).
- Push rendered content to StringIO instance.
- Use instance in python-pdfkit.
Issue:
I keep getting TypeError: coercing to Unicode: need string or buffer, instance found when trying to pass more than one file in the list.
The below code works without the [] and just one StringIO instance.
from django.template import loader, Context
from django import template
import StringIO
STATIC_URL = "https://d1i1yohwujljp9.cloudfront.net/static/"
t = loader.get_template('pdf_coverpage.html')
c = template.Context( {'STATIC_URL': STATIC_URL })
output = StringIO.StringIO()
output.write(t.render(c))
output1 = StringIO.StringIO()
output1.write(t.render(c))
pdfkit.from_file([ output, output1 ] , 'out.pdf' )
Traceback.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python27\lib\site-packages\pdfkit\api.py", line 44, in from_file
configuration=configuration)
File "C:\Python27\lib\site-packages\pdfkit\pdfkit.py", line 37, in __init__
self.source = Source(url_or_file, type_)
File "C:\Python27\lib\site-packages\pdfkit\source.py", line 12, in __init__
self.checkFiles()
File "C:\Python27\lib\site-packages\pdfkit\source.py", line 28, in checkFiles
if not os.path.exists(path):
File "C:\Python27\lib\genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, instance found

It's not your fault. This happens because pdf kit assumes each element in the list as a file path instead of file descriptor.
here is the relevant code.
I had a similar situation of HTML spread across multiple templates. I put them all in one string and pass the StringIO to pdfkit. I used CSS to manage page breaks and other wkhtmltopdf formatting options.
Hope that helps.

Using StringIO doesn't appear to be a recommended approach in the documentation.
I just tried this and it worked fine. Is there a reason you don't want to do it this way?
pdfkit.from_url(['google.com', 'yandex.ru', 'engadget.com'], 'out.pdf')
https://pypi.python.org/pypi/pdfkit

Related

Python Configparser. Whitespace causes AttributeError

I recieve some files with .ini file with them. I have to recieve file names from [FILES] section.
Sometimes there is an extra witespace in another section of .ini-file which raises exception in ConfigParser module
The example of "bad" ini-file:
[LETTER]
SUBJECT=some text
some text
and text with whitespace in the beggining
[FILES]
0=file1.txt
1=file2.doc
My code(Python 3.7):
import configparser
def get_files_from_ini_file(info_file):
ini = configparser.ConfigParser(allow_no_value=True)
ini.read(info_file) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))
Traceback (most recent call last):
File "D:/test.py", line 10, in <module>
print(get_files_from_ini_file("D:\\bad.ini"))
File "D:/test.py", line 5, in get_files_from_ini_file
ini.read(info_file) # ERROR
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 696, in read
self._read(fp, filename)
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 1054, in _read
cursect[optname].append(value)
AttributeError: 'NoneType' object has no attribute 'append'
I can't influence on files I recieve so that is there any way to ignore this error? In fact I need only [FILES] section to parse.
Have tried empty_lines_in_values=False with no result
May be that's invalid ini file and I should write my own parser?

If you only need the "FILES" part, a simple way is to:
open the file and read into a string
get the part after "[FILES]" using .split() method
add "[FILES]" before the string
use the configparser read_string method on the string
This is a hacky solution but it should work:
import configparser
def get_files_from_ini_file(info_file):
with open(info_file, 'r') as file:
ini_string = file.read()
useful_part = "[FILES]" + ini_string.split("[FILES]")[-1]
ini = configparser.ConfigParser(allow_no_value=True)
ini.read_string(useful_part) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))

How to iterate through and delete certain files from Python fcache?

In my PyQt5 app, I've been using fache (https://pypi.org/project/fcache/) to cache lots of small files to the user's temp folder for speed. It's working well for caching, but now I need to be able to iterate through the cached files and selectively delete files that are no longer needed.
However when I try to iterate through the FileCache object, I'm getting an error.
thisCache is the name of my cache, and if I print(thisCache) I get:
which is fine.
Then if I do print(thisCache.keys()) I get KeysView(<fcache.cache.FileCache object at 0x000001F7BA0F2848>), which seems correct (I think?). Similarly, printing .values() gives me a ValuesView.
Then if I do print(len(thisCache.keys()) I get: 1903, showing that there are 1903 files in there, which is probably correct. But here's where I get stuck.
If I try to iterate through the KeysView in any way, I get an error. Each of the following attempts:
for f in thisCache.values():
for f in thisCache.keys():
always throws an error:
Process finished with exit code -1073740791 (0xC0000409)
I'm fairly new to Python, so am I just misunderstanding how I'm supposed to iterate through this list? Or is there a bug or gotcha here that I need to work around?
Thanks
::::::::: EDIT ::::::::
After a bit of a delay, here's a reproducile (but not especially minimal or quality) bit of example code.
import random
import string
from fcache.cache import FileCache
from shutil import copyfile
def random_string(stringLength=10):
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(stringLength))
cacheName = "TestCache"
cache = FileCache(cacheName)
sourceFile = "C:\\TestFile.mov"
targetCount = 50
# copy the file 50 times:
for w in range(1, targetCount+1):
fileName = random_string(50) + ".mov"
targetPath = cache.cache_dir + "\\" + fileName
print("Copying file ", w)
copyfile(sourceFile, targetPath)
cache[str(w)] = targetPath
print("Cached", targetCount, "items.")
print("Syncing cache...")
cache.sync()
# iterate through the cache:
print("Item keys:", cache.keys())
for key in cache.keys():
v = cache[key]
print(key, v)
print("Cache read.")
There is one dependency, which is having a file called "C:\TestFile.mov" on your system, but the path isn't important so this can be pointed to any file. I've tested with other file formats, with the same result.
The error that is thrown is:
Traceback (most recent call last):
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\encodings\hex_codec.py", line 19, in hex_decode
return (binascii.a2b_hex(input), len(input))
binascii.Error: Non-hexadecimal digit found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\stuart.bruce\PycharmProjects\testproject\test_code.py", line 32, in <module>
for key in cache.keys():
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\_collections_abc.py", line 720, in __iter__
yield from self._mapping
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\site-packages\fcache\cache.py", line 297, in __iter__
yield self._decode_key(key)
File "C:\Users\stuart.bruce\AppData\Local\Programs\Python\Python37\lib\site-packages\fcache\cache.py", line 211, in _decode_key
bkey = codecs.decode(key.encode(self._keyencoding), 'hex_codec')
binascii.Error: decoding with 'hex_codec' codec failed (Error: Non-hexadecimal digit found)
Line 32 of test_code.py (as mentioned in the error) is the line for key in cache.keys():, so this is where it seems a non-hexidecimal character is being found. But firstly I'm not sure why, and secondly I don't know how to get around it?
(PS. Please note that if you run this code, you'll end up with 50 copies of your chosen file in your temp folder, and nothing will tidy it up automatically!)

After reading the sources of fcache, it seems that the cache_dir should only be used by fcache itself, as it reads all its files to find previously created cache data.
The program (or, better, the module) crashes because you created the other files in that directory, and it cannot deal with them.
The solution is to use another directory to store those files.
import os
# ...
data_dir = os.path.join(os.path.dirname(cache.cache_dir), 'data')
if not os.path.exists(data_dir):
os.mkdir(data_dir)
for w in range(1, targetCount+1):
fileName = random_string(50) + ".mov"
targetPath = os.path.join(data_dir, fileName)
copyfile(sourceFile, targetPath)
cache[str(w)] = targetPath

Getting TypeError: ord() expected string of length 1, but int found error

Code is
from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf','rb') as file:
pdf=PdfFileReader(file)
pagedd=pdf.getPage(0)
print(pagedd.extractText())
This code raises the error shown below:
TypeError: ord() expected string of length 1, but int found
I searched on internet and found this Troubleshooting "TypeError: ord() expected string of length 1, but int found"
but it doesn't help much. I am aware of what is the background of this error but not sure how is it related here?
Tried changing the pdf file and it works fine. Then what is wrong: pdf file or PyPDF2 is not able to handle it? I know this method is not much reliable as per documentation:
This works well for some PDF files, but poorly for others, depending on the generator used
How should this be handled?
Traceback:
Traceback (most recent call last):
File "pdf_reader.py", line 71, in <module>
print(pagedd.extractText())
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2595, in ex
tractText
content = ContentStream(content, self.pdf)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2673, in __
init__
stream = BytesIO(b_(stream.getData()))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\generic.py", line 841, in
getData
decoded._data = filters.decodeStreamData(self)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 350, in
decodeStreamData
data = LZWDecode.decode(data, stream.get("/DecodeParms"))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 255, in
decode
return LZWDecode.decoder(data).decode()
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 228, in
decode
cW = self.nextCode();
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 205, in
nextCode
nextbits=ord(self.data[self.bytepos])
TypeError: ord() expected string of length 1, but int found

I got the issue. This is just a limitation of PyPDF2. I used tika and BeautifulSoup to parse and extract the text, it worked fine. Although it needs little more work.
from tika import parser
from bs4 import BeautifulSoup
raw=parser.from_file('HTTP_Book.pdf',xmlContent=True)['content']
data=BeautifulSoup(raw,'lxml')
message=data.find(class_='page') # for first page
print(message.text)

Django Issues with Uploading a Python File

My code so far:
from django.core.files import File
file = open('ChicagoTimesScores.py')
djangofile = File(file)
myfile = File(file)
myfile.save('new', djangofile)
file.close()
I'm aware of repeating myfile, but I was a bit lost and tried what I could to see if it work.
My error code:
Traceback (most recent call last):
File "C:/Python33/TestRun.py", line 6, in <module>
myfile.save('new', djangofile)
AttributeError: 'File' object has no attribute 'save'
I'm trying to save this python file with django, but it appears that python doesn't recognize these attributes.
Oh, and yeah, I installed django correctly. No issues.

the Django File class doesn't have a method "save" Here are the Django Docs for the File class:
https://docs.djangoproject.com/en/dev/ref/files/file/

PyPDF's PdfFileReader() having problems reading file, file not callable

So here is my import:
from pyPdf import PdfFileWriter, PdfFileReader
Here is were I write my pdf:
filenamer = filename + '.pdf'
pdf = PdfPages(filenamer)
(great naming convention, I know!)
I write some things to it.
I close it here:
pdf.close()
Here is where I try and read it:
input1 = PdfFileReader(file(filenamer, "rb"))
And here is the error:
Traceback (most recent call last):
File "./datamine.py", line 405, in <module>
input1 = PdfFileReader(file(filenamer, "rb"))
TypeError: 'file' object is not callable
I dont understand the error, because I know the file exists, and when I comment out this line, and subsequent lines to input1, the program runs fine.

It looks like you've assigned an open file to the name file, and then you can't use the builtin any more.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I pass multiple StringIO into python-pdfkit? - python

Using StringIO doesn't appear to be a recommended approach in the documentation. I just tried this and it worked fine. Is there a reason you don't want to do it this way? pdfkit.from_url(['google.com', 'yandex.ru', 'engadget.com'], 'out.pdf') https://pypi.python.org/pypi/pdfkit

Related

Python Configparser. Whitespace causes AttributeError

How to iterate through and delete certain files from Python fcache?

Getting TypeError: ord() expected string of length 1, but int found error

Django Issues with Uploading a Python File

PyPDF's PdfFileReader() having problems reading file, file not callable

Categories

Resources