I'm having issues with code that used to work during weeks.
The problem comes from this part of my code:
TypeError: ifile = open('0_Inputs/CompaniesList.csv', "r", encoding = 'utf-8')
I got the following message:
open() got an unexpected keyword argument 'encoding'
If i try:
ifile = open('0_Inputs/CompaniesList.csv', "r")
then i have an other error:
OSError: cannot identify image file '0_Inputs/CompaniesList.csv'
Im doing from PyPDF2 import PdfFileReader, PdfFileWriter but I don't if there's a conflict between libraries?
Thank you!
You should use the csv module instead of opening the file this way.
with open('0_Inputs/CompaniesList.csv', newline='') as ifile:
pass
More information: https://docs.python.org/3/library/csv.html
Related
I am doing a assignment and I need to extract text from PDF using PyPDF2 and while trying to do that am getting this error. How to fix this?
can someone help me? thank you in advance.
import PyPDF2
textFile = open('foo.txt', 'w')
file = open('foo.pdf','rb')
readpdf = PyPDF2.PdfFileReader(file)
print(readpdf.getNumPages())
1
read_pdf = readpdf.getPage(0)
textFile.write(read_pdf.extractText())
--------------------------------------------------------------------------
ValueError Traceback (most recent call
last)
<ipython-input-42-5a892ea3012b> in <module>
----> 1 textFile.write(read_pdf.extractText())
ValueError: I/O operation on closed file.
file.close
textFile.close()
I am not sure how you ended up with this error, but this might help:
textFile = open('foo.txt', 'w')
read_pdf = readpdf.getPage(0)
textFile.write(read_pdf.extractText())
Opening the file right before you do something with it seems to work for me, so give it a try and we'll see ;]
using with open you dont need to handle the exception and closing file, it handle this by itself
import PyPDF2
with open('foo.txt','w') textFile:
with open('foo.pdf','rb') as file:
readpdf = PyPDF2.PdfFileReader(file)
print(readpdf.getNumPages())
read_pdf = readpdf.getPage(0)
textFile.write(read_pdf.extractText())
I want to change the metadata of the pdf file using this code:
from PyPDF2 import PdfFileReader, PdfFileWriter
title = "Vice-présidence pour l'éducation"
fin = open(filename, 'rb')
reader = PdfFileReader(fin)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
metadata.update({'/Title':title})
writer.addMetadata(metadata)
fout = open(filename, 'wb')
writer.write(fout)
fin.close()
fout.close()
It works fine if the title is in english(no accents) but when it has accents I get the following error:
TypeError: createStringObject should have str or unicode arg
How can I add a title with accent to the metadata ?
Thank you
The only way to get this error message is to have the wrong type for the parameter string in the createStringObject(string)-function in the library itself.
It's looking for type string or bytes using these functions in utils.py
import builtins
bytes_type = type(bytes()) # Works the same in Python 2.X and 3.X
string_type = getattr(builtins, "unicode", str)
I can only reproduce your error if I rewrite your code with an obviously wrong type like this (code is rewritten using with statement but only the commented line is important):
from PyPDF2 import PdfFileReader, PdfFileWriter
with open(inputfile, "rb") as fr, open(outputfile, "wb") as fw:
reader = PdfFileReader(fr)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
# metadata.update({'/Title': "Vice-présidence pour l'éducation"})
metadata.update({'/Title': [1, 2, 3]}) # <- wrong type here !
writer.addMetadata(metadata)
writer.write(fw)
It seems that the type of your string title = "Vice-présidence pour l'éducation" is not matching to whatever bytes_type or string_type is resolved. Either you have a weird type of the title variable (which I cannot see in your code, maybe because of creating a MCVE) or you have trouble getting bytes_type or string_type as types intended by library writer (this can be a bug in the library or an erroneous installation, hard to tell for me).
Without reproducible code, it's hard to provide a solution. But hopefully this will give you the right direction to go. Maybe it's enough to set the type of your string to whatever bytes_type or string_type is resolved to. Other solutions would be on library site or simply hacks.
I am trying to write some data out into a unicode XML file with the following statement:
filepath = 'G:\Kodi EPG\ChannelGuide.xml'
with open(filepath, "w", encoding = 'UTF-8') as xml_file:
xml_file.write(file_blanker)
xml_file.close
...but am getting the following error:
Traceback (most recent call last):
File "G:\Python27\Kodi\Sky TV Guide Scraper.py", line 35, in <module>
class tv_guide:
File "G:\Python27\Kodi\Sky TV Guide Scraper.py", line 47, in tv_guide
with open(filepath, "w", encoding = 'UTF-8') as xml_file:
TypeError: 'encoding' is an invalid keyword argument for this function
I have seen this given as an accepted answer on here to a question, but that was for Python 3xx. Is the syntax slightly different for version 2?
Thanks
Yes, the syntax is different for Python2 - regarding the encoding argument.
Python2 open description:
open(name[, mode[, buffering]])
Python3 open description:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
As you can see, in Python 2.7 open doesn't accept the encoding argument, hence the Type Error.
However you can use the built-in io module to open your files. This would allow you to specify the encoding, and also provide compatibility with Python3. For example,
import io
filepath = r'G:\Kodi EPG\ChannelGuide.xml'
with io.open(filepath, "w", encoding = 'UTF-8') as xml_file:
xml_file.write(file_blanker)
Note that you don't have to explicitly close your files when using the the with statement.
Goal = Open file, encrypt file, write encrypted file.
Trying to use the PyPDF2 module to accomplish this. I have verified theat "input" is a file type object. I have researched this error and it translates to "file not found". I believe that it is linked somehow to the file/file path but am unsure how to debug or troubleshoot. and getting the following error:
Traceback (most recent call last):
File "CommissionSecurity.py", line 52, in <module>
inputStream = PyPDF2.PdfFileReader(input)
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1065, in __init__
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1660, in read
IOError: [Errno 22] Invalid argument
Below is the relevant code. I'm not sure how to correct this issue because I'm not really sure what the issue is. Any guidance is appreciated.
for ID in FileDict:
if ID in EmailDict :
path = "C:\\Apps\\CorVu\\DATA\\Reports\\AlliD\\Monthly Commission Reports\\Output\\pdcom1\\"
#print os.listdir(path)
file = os.path.join(path + FileDict[ID])
with open(file, 'rb') as input:
print type(input)
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = inputStream.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
else : continue
I think your problem might be caused by the fact that you use the same filename to both open and write to the file, opening it twice:
with open(file, 'rb') as input :
with open(file, 'wb') as outputStream :
The w mode will truncate the file, thus the second line truncates the input.
I'm not sure what you're intention is, because you can't really try to read from the (beginning) of the file, and at the same time overwrite it. Even if you try to write to the end of the file, you'll have to position the file pointer somewhere.
So create an extra output file that has a different name; you can always rename that output file to your input file after both files are closed, thus overwriting your input file.
Or you could first read the complete file into memory, then write to it:
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = input.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
Notes:
you assign inputStream, but never use it
you assign PdfFileWriter() to output, and then assign something else to output in the next line. Hence, you never used the result from the first output = line.
Please check carefully what you're doing, because it feels there are numerous other problems with your code.
Alternatively, here are some other tips that may help:
The documentation suggests that you can also use the filename as first argument to PdfFileReader:
stream – A File object or an object that supports the standard read
and seek methods similar to a File object. Could also be a string
representing a path to a PDF file.
So try:
inputStream = PyPDF2.PdfFileReader(file)
You can also try to set the strict argument to False:
strict (bool) – Determines whether user should be warned of all
problems and also causes some correctable problems to be fatal.
Defaults to True.
For example:
inputStream = PyPDF2.PdfFileReader(file, strict=False)
Using open(file, 'rb') was causing the issue becuase PdfFileReader() does that automagically. I just removed the with statement and that corrected the problem.
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
This error raised up because of PDF file is empty.
My PDF file was empty that's why my error was raised up. So First of all i fill my PDF file with some data and Then start reeading it using PyPDF2.PdfFileReader,
And it solved my Problem!!!
Late but, you may be opening an invalid PDF file or an empty file that's named x.pdf and you think it's a PDF file
This code returns the following error message:
with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f:
TypeError: coercing to Unicode: need string or buffer, file found
# Opens each file to read/modify
infile=open('110331_HS1A_1_rtTA.result','r')
outfile=open('2.txt','w')
import re
with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f:
f = (i for i in in_f if i.rstrip())
for line in f:
_, k = line.split('\t',1)
x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
if not x:
continue
out_f.write(' '.join(x[0]) + '\n')
Please someone help me.
You're trying to open each file twice! First you do:
infile=open('110331_HS1A_1_rtTA.result','r')
and then you pass infile (which is a file object) to the open function again:
with open (infile, mode='r', buffering=-1)
open is of course expecting its first argument to be a file name, not an opened file!
Open the file once only and you should be fine.
For the less specific case (not just the code in the question - since this is one of the first results in Google for this generic error message. This error also occurs when running certain os command with None argument.
For example:
os.path.exists(arg)
os.stat(arg)
Will raise this exception when arg is None.
You're trying to pass file objects as filenames. Try using
infile = '110331_HS1A_1_rtTA.result'
outfile = '2.txt'
at the top of your code.
(Not only does the doubled usage of open() cause that problem with trying to open the file again, it also means that infile and outfile are never closed during the course of execution, though they'll probably get closed once the program ends.)
Here is the best way I found for Python 2:
def inplace_change(file,old,new):
fin = open(file, "rt")
data = fin.read()
data = data.replace(old, new)
fin.close()
fin = open(file, "wt")
fin.write(data)
fin.close()
An example:
inplace_change('/var/www/html/info.txt','youtub','youtube')