Getting an error: Line Contains Null, Not sure the cause [duplicate] - python

This question already has answers here:
"Line contains NULL byte" in CSV reader (Python)
(14 answers)
Closed 3 years ago.
I am getting and error: line contains NUL. I think it means there's a strange character in my CSV file. But this program and import file worked on a different machine (both Macs), so I don't know if the cause is a different version of Python or how I am running it. From reading the other entries, I am thinking this line may also be the cause:
reader = csv.reader(open(filePath, 'r', encoding="utf-8-sig", errors="ignore"))
Appreciate any help / advice!
paths CWD=/Users/sternit/Downloads/Ten-code-4, CPD=/Users/sternit/Downloads/Ten-code/
Traceback (most recent call last):
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 145, in <module>
main()
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 114, in main
playerLists = loadFiles(CPD + "PlayerFiles/")
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 50, in loadFiles
for n, row in enumerate(reader):
_csv.Error: line contains NUL

this should work fine:
data_initial = open(filePath, "rb")
data = csv.reader((line.replace('\0','') for line in data_initial), delimiter=",")

If the csv module says you have a "NULL" (silly message, should be "NUL") byte in your reading file, I would suggest checking out what is in your file.
Try use rb, it might make problem go away:
reader = csv.reader(open(filePath, 'rb', encoding="utf-8-sig", errors="ignore"))
Depends on how the file generated, there might include NULL byte, so you might need to
Open it in an editor, to see whether it is a reasonable CSV file, if the file too big, use nano or head in CLI.
Using another library like pandas, which could be more robust.
If the problem persists, you can replace all the '\x00', with empty string:
fi = open(filePath, 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()

Related

CSV Should Return Strings, Not Bytes Error

I am trying to read CSV files from a directory that is not in the same directory as my Python script.
Additionally the CSV files are stored in ZIP folders that have the exact same names (the only difference being one ends with .zip and the other is a .csv).
Currently I am using Python's zipfile and csv libraries to open and get the data from the files, however I am getting the error:
Traceback (most recent call last): File "write_pricing_data.py", line 13, in <module>
for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
My code:
import os, csv
from zipfile import *
folder = r'D:/MarketData/forex'
localFiles = os.listdir(folder)
for file in localFiles:
zipArchive = ZipFile(folder + '/' + file)
with zipArchive.open(file[:-4] + '.csv') as csvFile:
reader = csv.reader(csvFile, delimiter=',')
for row in reader:
print(row[0])
How can I resolve this error?
It's a bit of a kludge and I'm sure there's a better way (that just happens to elude me right now). If you don't have embedded new lines, then you can use:
import zipfile, csv
zf = zipfile.ZipFile('testing.csv.zip')
with zf.open('testing.csv', 'r') as fin:
# Create a generator of decoded lines for input to csv.reader
# (the csv module is only really happy with ASCII input anyway...)
lines = (line.decode('ascii') for line in fin)
for row in csv.reader(lines):
print(row)

PyPDF2 IOError: [Errno 22] Invalid argument on PyPdfFileReader Python 2.7

Goal = Open file, encrypt file, write encrypted file.
Trying to use the PyPDF2 module to accomplish this. I have verified theat "input" is a file type object. I have researched this error and it translates to "file not found". I believe that it is linked somehow to the file/file path but am unsure how to debug or troubleshoot. and getting the following error:
Traceback (most recent call last):
File "CommissionSecurity.py", line 52, in <module>
inputStream = PyPDF2.PdfFileReader(input)
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1065, in __init__
File "build\bdist.win-amd64\egg\PyPDF2\pdf.py", line 1660, in read
IOError: [Errno 22] Invalid argument
Below is the relevant code. I'm not sure how to correct this issue because I'm not really sure what the issue is. Any guidance is appreciated.
for ID in FileDict:
if ID in EmailDict :
path = "C:\\Apps\\CorVu\\DATA\\Reports\\AlliD\\Monthly Commission Reports\\Output\\pdcom1\\"
#print os.listdir(path)
file = os.path.join(path + FileDict[ID])
with open(file, 'rb') as input:
print type(input)
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = inputStream.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
else : continue
I think your problem might be caused by the fact that you use the same filename to both open and write to the file, opening it twice:
with open(file, 'rb') as input :
with open(file, 'wb') as outputStream :
The w mode will truncate the file, thus the second line truncates the input.
I'm not sure what you're intention is, because you can't really try to read from the (beginning) of the file, and at the same time overwrite it. Even if you try to write to the end of the file, you'll have to position the file pointer somewhere.
So create an extra output file that has a different name; you can always rename that output file to your input file after both files are closed, thus overwriting your input file.
Or you could first read the complete file into memory, then write to it:
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
output = PyPDF2.PdfFileWriter()
output = input.encrypt(EmailDict[ID][1])
with open(file, 'wb') as outputStream:
output.write(outputStream)
Notes:
you assign inputStream, but never use it
you assign PdfFileWriter() to output, and then assign something else to output in the next line. Hence, you never used the result from the first output = line.
Please check carefully what you're doing, because it feels there are numerous other problems with your code.
Alternatively, here are some other tips that may help:
The documentation suggests that you can also use the filename as first argument to PdfFileReader:
stream – A File object or an object that supports the standard read
and seek methods similar to a File object. Could also be a string
representing a path to a PDF file.
So try:
inputStream = PyPDF2.PdfFileReader(file)
You can also try to set the strict argument to False:
strict (bool) – Determines whether user should be warned of all
problems and also causes some correctable problems to be fatal.
Defaults to True.
For example:
inputStream = PyPDF2.PdfFileReader(file, strict=False)
Using open(file, 'rb') was causing the issue becuase PdfFileReader() does that automagically. I just removed the with statement and that corrected the problem.
with open(file, 'rb') as input:
inputStream = PyPDF2.PdfFileReader(input)
This error raised up because of PDF file is empty.
My PDF file was empty that's why my error was raised up. So First of all i fill my PDF file with some data and Then start reeading it using PyPDF2.PdfFileReader,
And it solved my Problem!!!
Late but, you may be opening an invalid PDF file or an empty file that's named x.pdf and you think it's a PDF file

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

I am getting an interesting error while trying to use Unpickler.load(), here is the source code:
open(target, 'a').close()
scores = {};
with open(target, "rb") as file:
unpickler = pickle.Unpickler(file);
scores = unpickler.load();
if not isinstance(scores, dict):
scores = {};
Here is the traceback:
Traceback (most recent call last):
File "G:\python\pendu\user_test.py", line 3, in <module>:
save_user_points("Magix", 30);
File "G:\python\pendu\user.py", line 22, in save_user_points:
scores = unpickler.load();
EOFError: Ran out of input
The file I am trying to read is empty.
How can I avoid getting this error, and get an empty variable instead?
Most of the answers here have dealt with how to mange EOFError exceptions, which is really handy if you're unsure about whether the pickled object is empty or not.
However, if you're surprised that the pickle file is empty, it could be because you opened the filename through 'wb' or some other mode that could have over-written the file.
for example:
filename = 'cd.pkl'
with open(filename, 'wb') as f:
classification_dict = pickle.load(f)
This will over-write the pickled file. You might have done this by mistake before using:
...
open(filename, 'rb') as f:
And then got the EOFError because the previous block of code over-wrote the cd.pkl file.
When working in Jupyter, or in the console (Spyder) I usually write a wrapper over the reading/writing code, and call the wrapper subsequently. This avoids common read-write mistakes, and saves a bit of time if you're going to be reading the same file multiple times through your travails
I would check that the file is not empty first:
import os
scores = {} # scores is an empty dict already
if os.path.getsize(target) > 0:
with open(target, "rb") as f:
unpickler = pickle.Unpickler(f)
# if file is not empty scores will be equal
# to the value unpickled
scores = unpickler.load()
Also open(target, 'a').close() is doing nothing in your code and you don't need to use ;.
It is very likely that the pickled file is empty.
It is surprisingly easy to overwrite a pickle file if you're copying and pasting code.
For example the following writes a pickle file:
pickle.dump(df,open('df.p','wb'))
And if you copied this code to reopen it, but forgot to change 'wb' to 'rb' then you would overwrite the file:
df=pickle.load(open('df.p','wb'))
The correct syntax is
df=pickle.load(open('df.p','rb'))
As you see, that's actually a natural error ..
A typical construct for reading from an Unpickler object would be like this ..
try:
data = unpickler.load()
except EOFError:
data = list() # or whatever you want
EOFError is simply raised, because it was reading an empty file, it just meant End of File ..
You can catch that exception and return whatever you want from there.
open(target, 'a').close()
scores = {};
try:
with open(target, "rb") as file:
unpickler = pickle.Unpickler(file);
scores = unpickler.load();
if not isinstance(scores, dict):
scores = {};
except EOFError:
return {}
if path.exists(Score_file):
try :
with open(Score_file , "rb") as prev_Scr:
return Unpickler(prev_Scr).load()
except EOFError :
return dict()
Had the same issue. It turns out when I was writing to my pickle file I had not used the file.close(). Inserted that line in and the error was no more.
I have encountered this error many times and it always occurs because after writing into the file, I didn't close it. If we don't close the file the content stays in the buffer and the file stays empty.
To save the content into the file, either file should be closed or file_object should go out of scope.
That's why at the time of loading it's giving the ran out of input error because the file is empty. So you have two options :
file_object.close()
file_object.flush(): if you don't wanna close your file in between the program, you can use the flush() function as it will forcefully move the content from the buffer to the file.
This error comes when your pickle file is empty (0 Bytes). You need to check the size of your pickle file first. This was the scenario in my case. Hope this helps!
Note that the mode of opening files is 'a' or some other have alphabet 'a' will also make error because of the overwritting.
pointer = open('makeaafile.txt', 'ab+')
tes = pickle.load(pointer, encoding='utf-8')
temp_model = os.path.join(models_dir, train_type + '_' + part + '_' + str(pc))
# print(type(temp_model)) # <class 'str'>
filehandler = open(temp_model, "rb")
# print(type(filehandler)) # <class '_io.BufferedReader'>
try:
pdm_temp = pickle.load(filehandler)
except UnicodeDecodeError:
pdm_temp = pickle.load(filehandler, fix_imports=True, encoding="latin1")
from os.path import getsize as size
from pickle import *
if size(target)>0:
with open(target,'rb') as f:
scores={i:j for i,j in enumerate(load(f))}
else: scores={}
#line 1.
we importing Function 'getsize' from Library 'OS' sublibrary 'path' and we rename it with command 'as' for shorter style of writing. Important is hier that we loading only one single Func that we need and not whole Library!
line 2.
Same Idea, but when we dont know wich modul we will use in code at the begining, we can import all library using a command '*'.
line 3.
Conditional Statement... if size of your file >0 ( means obj is not an empty). 'target' is variable that schould be a bit earlier predefined.
just an Example : target=(r'd:\dir1\dir.2..\YourDataFile.bin')
Line 4.
'With open(target) as file:' an open construction for any file, u dont need then to use file.close(). it helps to avoid some typical Errors such as "Run out of input" or Permissions rights.
'rb' mod means 'rea binary' that u can only read(load) the data from your binary file but u cant modify/rewrite it.
Line5.
List comprehension method in applying to a Dictionary..
line 6. Case your datafile is empty, it will not raise an any Error msg, but return just an empty dictionary.

Confusing Error when Reading from a File in Python

I'm having a problem opening the names.txt file. I have checked that I am in the correct directory. Below is my code:
import os
print(os.getcwd())
def alpha_sort():
infile = open('names', 'r')
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
infile.close()
return 0
alpha_sort()
And the error I got:
FileNotFoundError: [Errno 2] No such file or directory: 'names'
Any ideas on what I'm doing wrong?
You mention in your question body that the file is "names.txt", however your code shows you trying to open a file called "names" (without the ".txt" extension). (Extensions are part of filenames.)
Try this instead:
infile = open('names.txt', 'r')
As a side note, make sure that when you open files you use universal mode, as windows and mac/unix have different representations of carriage returns (/r/n vs /n etc.). Universal mode gets python to handle this, so it's generally a good idea to use it whenever you need to read a file. (EDIT - should read: a text file, thanks cameron)
So the code would just look like this
infile = open( 'names.txt', 'rU' ) #capital U indicated to open the file in universal mode
This doesn't solve that issue, but you might consider using with when opening files:
with open('names', 'r') as infile:
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
return 0
This closes the file for you and handles exceptions as well.

Python newbie: trying to create a script that opens a file and replaces words

im trying to create a script that opens a file and replace every 'hola' with 'hello'.
f=open("kk.txt","w")
for line in f:
if "hola" in line:
line=line.replace('hola','hello')
f.close()
But im getting this error:
Traceback (most recent call last):
File "prueba.py", line 3, in
for line in f: IOError: [Errno 9] Bad file descriptor
Any idea?
Javi
open('test.txt', 'w').write(open('test.txt', 'r').read().replace('hola', 'hello'))
Or if you want to properly close the file:
with open('test.txt', 'r') as src:
src_text = src.read()
with open('test.txt', 'w') as dst:
dst.write(src_text.replace('hola', 'hello'))
Your main issue is that you're opening the file for writing first. When you open a file for writing, the contents of the file are deleted, which makes it quite difficult to do replacements! If you want to replace words in the file, you have a three-step process:
Read the file into a string
Make replacements in that string
Write that string to the file
In code:
# open for reading first since we need to get the text out
f = open('kk.txt','r')
# step 1
data = f.read()
# step 2
data = data.replace("hola", "hello")
f.close()
# *now* open for writing
f = open('kk.txt', 'w')
# step 3
f.write(data)
f.close()
You've opened the file for writing, but you're reading from it. Open the original file for reading and a new file for writing. After the replacement, rename the original out and the new one in.
You could also have a look at the with statement.

Categories

Resources