Python issue in writing file to pdf - python

I am working with pdf content in python and my input from a service response is of the type _io.BufferedRandom. I need to save this file as pdf within my service for further usage
response = open('test_file.pdf', 'rb+')
this is the input to my service and is of the type _io.BufferedRandom
with open('output.pdf', 'wb+') as f:
f.write(response)
doing this I get the error - TypeError: a bytes-like object is required, not '_io.BufferedRandom'
Any help is appreciated thank you.

As an open method return the file object to open a file for reading/write or append. like
open(filename, mode)
f = open('workfile', 'w')
and in your case, you try to write file object to another file, not the content
f.write(response)
So you will need to use read function as
f.read(size) - read file and return a string (in text mode) or bytes object (in binary mode).
so the final procedure will be
with open('output.pdf', 'wb+') as f:
f.write(response.read())

Related

Python Write to file as binary (application-octet stream) file type format

We have a file without any extension. This file is given by client and used by another program
Sample content of the file is given below. It is just json content.
{"pid":23,"list":[{"pid":11}]}
From properties we can see that,this file is of type Binary (application/octet-stream) . Now we will read this file and load it as json and do some modifications to the this and finally we will write it to a new result file.
import json
r = {"a": 2, "B": 3}
with open("jres", "wb") as w:
txt = json.dumps(r, separators=(',', ':'))
w.write(txt.encode())
After writing to the file, the file type is changed as plain/text.
How to create result file with same file type as previous one? If we use the result file (plain/text), the application is not accepting it. Hence we are trying to write the file in the accepted format which is Binary (application/octet-stream)
with open('filename', 'w', encoding='utf-16') as fw:
fw.write('ee')
This will help.
you can try encoding the content before writing to file stream
with open('filename', 'wb') as fw:
content = 'ee'.encode('utf-16')
fw.write(content)

Python3 write gzip file - memoryview: a bytes-like object is required, not 'str'

I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)

Open file to read from web form

I am working on a project where I have to upload a file from file storage (via web form) to MongoDB. In order to achieve this, I need to open the file in "rb" mode, then encode the file and finally upload to MongoDb. I am stuck when opening the file "rb" mode.
if form.validate():
for inFile in request.files.getlist("file"):
connection = pymongo.MongoClient()
db = connection.test
uploads = db.uploads
with open(inFile, "rb") as fin:
f = fin.read()
encoded = Binary(f,0)
try:
uploads.insert({"binFile": encoded})
check = True
except Exception as e:
self.errorList.append("Document upload is unsuccessful"+e)
check = False
The above code is throwing TypeError: coercing to Unicode: need string or buffer, FileStorage found in the open step, i.e. this line:
with open(inFile, "rb") as fin:
Is there a way I can change my code to make it work?
Thanks in advance
The FileStorage object is already file-like so you can use as a file. You don't need to use open on it, just call inFile.read().
If this doesn't work for you for some reason, you can save the file to disk first using inFile.save() and open it from there.
Reference: http://werkzeug.pocoo.org/docs/0.11/datastructures/#werkzeug.datastructures.FileStorage

PyPDF2 IOError: [Errno 22] Invalid argument on PyPdfFileReader Python 2.7

Goal = Open file, encrypt file, write encrypted file.
Trying to use the PyPDF2 module to accomplish this. I have verified theat "input" is a file type object. I have researched this error and it translates to "file not found". I believe that it is linked somehow to the file/file path but am unsure how to debug or troubleshoot. and getting the following error:
Traceback (most recent call last):
File "CommissionSecurity.py", line 52, in <module>
inputStream = PyPDF2.PdfFileReader(input)
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1065, in __init__
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1660, in read
IOError: [Errno 22] Invalid argument
Below is the relevant code. I'm not sure how to correct this issue because I'm not really sure what the issue is. Any guidance is appreciated.
for ID in FileDict:
if ID in EmailDict :
path = "C:\\Apps\\CorVu\\DATA\\Reports\\AlliD\\Monthly Commission Reports\\Output\\pdcom1\\"
#print os.listdir(path)
file = os.path.join(path + FileDict[ID])
with open(file, 'rb') as input:
print type(input)
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = inputStream.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
else : continue
I think your problem might be caused by the fact that you use the same filename to both open and write to the file, opening it twice:
with open(file, 'rb') as input :
with open(file, 'wb') as outputStream :
The w mode will truncate the file, thus the second line truncates the input.
I'm not sure what you're intention is, because you can't really try to read from the (beginning) of the file, and at the same time overwrite it. Even if you try to write to the end of the file, you'll have to position the file pointer somewhere.
So create an extra output file that has a different name; you can always rename that output file to your input file after both files are closed, thus overwriting your input file.
Or you could first read the complete file into memory, then write to it:
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = input.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
Notes:
you assign inputStream, but never use it
you assign PdfFileWriter() to output, and then assign something else to output in the next line. Hence, you never used the result from the first output = line.
Please check carefully what you're doing, because it feels there are numerous other problems with your code.
Alternatively, here are some other tips that may help:
The documentation suggests that you can also use the filename as first argument to PdfFileReader:
stream – A File object or an object that supports the standard read
and seek methods similar to a File object. Could also be a string
representing a path to a PDF file.
So try:
inputStream = PyPDF2.PdfFileReader(file)
You can also try to set the strict argument to False:
strict (bool) – Determines whether user should be warned of all
problems and also causes some correctable problems to be fatal.
Defaults to True.
For example:
inputStream = PyPDF2.PdfFileReader(file, strict=False)
Using open(file, 'rb') was causing the issue becuase PdfFileReader() does that automagically. I just removed the with statement and that corrected the problem.
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
This error raised up because of PDF file is empty.
My PDF file was empty that's why my error was raised up. So First of all i fill my PDF file with some data and Then start reeading it using PyPDF2.PdfFileReader,
And it solved my Problem!!!
Late but, you may be opening an invalid PDF file or an empty file that's named x.pdf and you think it's a PDF file

TypeError: coercing to Unicode: need string or buffer

This code returns the following error message:
with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f:
TypeError: coercing to Unicode: need string or buffer, file found
# Opens each file to read/modify
infile=open('110331_HS1A_1_rtTA.result','r')
outfile=open('2.txt','w')
import re
with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f:
f = (i for i in in_f if i.rstrip())
for line in f:
_, k = line.split('\t',1)
x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
if not x:
continue
out_f.write(' '.join(x[0]) + '\n')
Please someone help me.
You're trying to open each file twice! First you do:
infile=open('110331_HS1A_1_rtTA.result','r')
and then you pass infile (which is a file object) to the open function again:
with open (infile, mode='r', buffering=-1)
open is of course expecting its first argument to be a file name, not an opened file!
Open the file once only and you should be fine.
For the less specific case (not just the code in the question - since this is one of the first results in Google for this generic error message. This error also occurs when running certain os command with None argument.
For example:
os.path.exists(arg)
os.stat(arg)
Will raise this exception when arg is None.
You're trying to pass file objects as filenames. Try using
infile = '110331_HS1A_1_rtTA.result'
outfile = '2.txt'
at the top of your code.
(Not only does the doubled usage of open() cause that problem with trying to open the file again, it also means that infile and outfile are never closed during the course of execution, though they'll probably get closed once the program ends.)
Here is the best way I found for Python 2:
def inplace_change(file,old,new):
fin = open(file, "rt")
data = fin.read()
data = data.replace(old, new)
fin.close()
fin = open(file, "wt")
fin.write(data)
fin.close()
An example:
inplace_change('/var/www/html/info.txt','youtub','youtube')

Categories

Resources