Python help reading csv file failing due to line-endings

Python help reading csv file failing due to line-endings - python

I'm trying to create this script that will check the computer host name then search a master list for the value to return a corresponding value in the csv file. Then open another file and do a find an replace. I know this should be easy but haven't done so much in python before. Here is what I have so far...
masterlist.txt (tab delimited)
Name UID
Bob-Smith.local bobs
Carmen-Jackson.local carmenj
David-Kathman.local davidk
Jenn-Roberts.local jennr
Here is the script that I have created thus far
#GET CLIENT HOST NAME
import socket
host = socket.gethostname()
print host
#IMPORT MASTER DATA
import csv, sys
filename = "masterlist.txt"
reader = csv.reader(open(filename, "rU"))
#PRINT MASTER DATA
for row in reader:
print row
#SEARCH ON HOSTNAME AND RETURN UID
#REPLACE VALUE IN FILE WITH UID
#import fileinput
#for line in fileinput.FileInput("filetoreplace",inplace=1):
# line = line.replace("replacethistext","UID")
# print line
Right now, it's just set to print the master list. I'm not sure if the list needs to be parsed and placed into a dictionary or what. I really need to figure out how to search the first field for the hostname and then return the field in the second column.
Thanks in advance for your help,
Aaron
UPDATE: I removed line 194 and last line from masterlist.txt and then re-ran the script. The results were the following:
Traceback (most recent call last):
File "update.py", line 3, in
for row in csv.DictReader(open(fname),
delimiter='\t'): File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/csv.py",
line 103, in next
self.fieldnames File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/csv.py",
line 90, in fieldnames
self._fieldnames = self.reader.next()
_csv.Error: new-line character seen in unquoted field - do you need to open
the file in universal-newline mode?
The current script being used is...
import csv
fname = "masterlist.txt"
for row in csv.DictReader(open(fname), delimiter='\t'):
print(row)

The two occurrences of '\xD5' in line 194 and the last line have nothing to do with the problem.
The problem appears to be a bug, or a misleading error message, or incorrect/vague documentation, in the Python 2.6 csv module.
In the file, the lines are terminated by '\x0D' aka '\r' in the Classic Mac tradition. The last line is not terminated, but that is nothing to do with the problem.
The docs for csv.reader say "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference." It is widely known that it does make a difference on Windows. However opening the file with 'rb' or 'r' makes no difference in this case -- still the same error message.
The docs for csv.Dialect.lineterminator say "The string used to terminate lines produced by the writer. It defaults to '\r\n'. Note: The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." It appears to be recognising '\r' as new-line but not as end-of-line/end-of-field.
The error message "_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?" is confusing; it's recognised '\r' as a new-line, but it's not treating new-line as an end-of line (and thus implicitly end-of-field).
It appears necessary to open the file in 'rU' mode to get it to "work". It's not apparent why the same '\r' recognised in universal-newline mode is any better.

To get iterate over a reader you'd do:
>>> import csv
>>> for row in csv.DictReader(open(fname), delimiter='\t'):
print(row)
{'Name': 'Bob-Smith.local', 'UID': 'bobs'}
{'Name': 'Carmen-Jackson.local', 'UID': 'carmenj'}
{'Name': 'David-Kathman.local', 'UID': 'davidk'}
{'Name': 'Jenn-Roberts.local', 'UID': 'jennr'}
But since you want to associate Name with UID:
>>> reader = csv.reader(open("masterlist.txt"), delimiter='\t')
>>> _ = next(reader) # just discarding header
>>> d = dict(reader)
>>> d['Carmen-Jackson.local']
'carmenj'

I would populate a dictionary like this:
>>> import csv
>>> name_to_UID = {}
>>> for row in csv.DictReader(open(filename, 'rU'), delimiter='\t'):
name_to_UID[row['Name']] = row['UID']
>>> name_to_UID['Carmen-Jackson.local']
'carmenj'

Related

Appending encrypted data to file

I am using the cryptography library for python. My goal is to take a string, encrypt it and then write to to a file.
This may be done multiple times, with each appending to the end of the file additional data; which is also encrypted.
I have tried a few solutions, such as:
Using the hazmat level API to avoid as much meta data stored in the encrypted text.
Writing each encrypted string to a new line in a text file.
This is the code that uses ECB mode and the hazmat API. It attempts to read the file and decrypt line by line. I understand it is unsafe, my main use is to log this data only locally to a file and then use a safe PKCS over the wire.
from cryptography import fernet
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt_obj = fernet.Fernet(key)
file_handle = open('test.txt', 'a')
data = 'Hello1'
data = crypt_obj.encrypt(data.encode())
file_handle.write(data.decode() + '\n')
file_handle.close()
file_handle_two = open('test.txt', 'a')
data_two = 'Hello2'
data_two = crypt_obj.encrypt(data_two.encode())
file_handle_two.write(data_two.decode() + '\n')
file_handle_two.close()
file_read = open('test.txt', 'r')
file_lines = file_read.readlines()
file_content = ''
for line in file_lines:
line = line[:-2]
file_content = crypt_obj.decrypt(line.encode()).decode()
print(file_content)
file_read.close()
For the code above I get the following error:
Traceback (most recent call last):
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 110, in _get_unverified_token_data
data = base64.urlsafe_b64decode(token)
File "C:\Users\19097\AppData\Local\Programs\Python\Python39\lib\base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "C:\Users\19097\AppData\Local\Programs\Python\Python39\lib\base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Dev\Python\local_crypt_test\main.py", line 25, in <module>
file_content = crypt_obj.decrypt(line.encode()).decode()
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 83, in decrypt
timestamp, data = Fernet._get_unverified_token_data(token)
File "C:\Dev\Python\local_crypt_test\venv\lib\site-packages\cryptography\fernet.py", line 112, in _get_unverified_token_data
raise InvalidToken
cryptography.fernet.InvalidToken
Process finished with exit code 1
These examples are only to demonstrate the issue, my real code looks much different so you may ignore errors in the example that do not pertain to my main issue. That is, appending encrypted data to a file and decrypting/reading that data from the file at a later time. The file does not need to be in any specific format, as long as it can be read from and decrypted to obtain the original message. Also, the mode of operation is not tied to ECB, if your example uses another type, that works too.
I am honestly stumped and would appreciate any help!

There are a couple details at play here...
1. Trailing newline character(s) are included in each line
When you loop through file_lines, each line includes the trailing newline character(s).
I say "character(s)" because this can vary based on the platform (e.g. Linux/macOS = '\n' versus Windows = '\r\n').
2. base64 decoding silently discards invalid characters
Fernet.encrypt(data) returns a bytes instance containing a base64 encoded "Fernet token".
Conversely, the first step Fernet.decrypt(token) takes is decoding the token by calling base64.urlsafe_b64decode(). This function uses the default non-validating behavior in which characters not within the base64 set are discarded (described here).
Note: This is why the answer from TheTS happens to work despite leaving the extraneous newline character intact.
Solution
I'd recommend making sure you provide Fernet.decrypt() the token exactly as produced by Fernet.encrypt(). I'm guessing this is what you were trying to do by stripping the last two characters.
Here's an approach that should be safe and not platform dependent.
When you call open() for writing, provide the newline='\n' argument to prevent the default behavior of converting instances of '\n' to the platform dependent os.linesep value (in the section describing the newline argument, see the second bullet point detailing how the argument applies when writing files).
When processing each line, use rstrip('\n') to remove the expected trailing newline.
Here's a code example that demonstrates this:
#!/usr/bin/python3
from cryptography import fernet
to_encrypt = ['Hello1', 'Hello2']
output_file = 'test.txt'
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt = fernet.Fernet(key)
print("ENCRYPTING...")
for data in to_encrypt:
data_bytes = data.encode('utf-8')
token_bytes = crypt.encrypt(data_bytes)
print(f'data: {data}')
print(f'token_bytes: {token_bytes}\n')
with open(output_file, 'a', newline='\n') as f:
f.write(token_bytes.decode('utf-8') + '\n')
print("\nDECRYPTING...")
with open(output_file, 'r') as f:
for line in f:
# Create a copy of line which shows the trailing newline.
line_escaped = line.encode('unicode_escape').decode('utf-8')
line_stripped = line.rstrip('\n')
token_bytes = line_stripped.encode('utf-8')
data = crypt.decrypt(token_bytes).decode('utf-8')
print(f'line_escaped: {line_escaped}')
print(f'token_bytes: {token_bytes}')
print(f'decrypted data: {data}\n')
Output:
Note the trailing newline when line escaped is printed.
$ python3 solution.py
ENCRYPTING...
data: Hello1
token_bytes: b'gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A=='
data: Hello2
token_bytes: b'gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w=='
DECRYPTING...
line_escaped: gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A==\n
token_bytes: b'gAAAAABi-LAo-h8w-ayc267hrLbswMZtkT4RQQ9wt0EusYNrZGjuzbpyRLoKDZZF4oQPOU-iH1PnCc7vSIOoTVMLlCFnHTkN6A=='
decrypted data: Hello1
line_escaped: gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w==\n
token_bytes: b'gAAAAABi-LAoHUT8Iu1bVMcGSIrFRvtVZQFh4O52XYSCgd0leYWS-n38irhv3Ch7oEx6SXazHwAL7a57ncFoMJTQQAms52yf3w=='
decrypted data: Hello2

from cryptography import fernet
key = 'WqSAOfEoOdSP0c6i1CiyoOpTH2Gma3ff_G3BpDx52sE='
crypt_obj = fernet.Fernet(key)
file_handle = open('test.txt', 'a')
data = 'Hello1'
data = crypt_obj.encrypt(data.encode('utf-8'))
file_handle.write(data.decode('utf-8') + '\n')
file_handle.close()
file_handle_two = open('test.txt', 'a')
data_two = 'Hello2'
data_two = crypt_obj.encrypt(data_two.encode('utf-8'))
file_handle_two.write(data_two.decode('utf-8') + '\n')
file_handle_two.close()
file_read = open('test.txt', 'r')
file_lines = file_read.readlines()
file_content = ''
for line in file_lines:
# line = line[:-2]
file_content = crypt_obj.decrypt(line.encode('utf-8')).decode()
print(file_content)
file_read.close()
By removing the last characters from the string you also remove important characters for decoding.

Extra blank line is getting printed at the end of the output in Python

I am trying to read a file from command line and trying to replace all the commas in that file with blank. Below is my code:
import sys
datafile = sys.argv[1];
with open(datafile, 'r') as data:
plaintext = data.read()
plaintext = plaintext.replace(',', '')
print(plaintext)
But while printing the plaintext I am getting one extra blank row at the end. Why is it happening and how can I get rid of that?

You might be able to use
plaintext.rstrip('\n')
This should remove the extra line

Getting an error: Line Contains Null, Not sure the cause [duplicate]

This question already has answers here:
"Line contains NULL byte" in CSV reader (Python)
(14 answers)
Closed 3 years ago.
I am getting and error: line contains NUL. I think it means there's a strange character in my CSV file. But this program and import file worked on a different machine (both Macs), so I don't know if the cause is a different version of Python or how I am running it. From reading the other entries, I am thinking this line may also be the cause:
reader = csv.reader(open(filePath, 'r', encoding="utf-8-sig", errors="ignore"))
Appreciate any help / advice!
paths CWD=/Users/sternit/Downloads/Ten-code-4, CPD=/Users/sternit/Downloads/Ten-code/
Traceback (most recent call last):
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 145, in <module>
main()
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 114, in main
playerLists = loadFiles(CPD + "PlayerFiles/")
File "/Users/sternit/Downloads/Ten-code-4/Master.py", line 50, in loadFiles
for n, row in enumerate(reader):
_csv.Error: line contains NUL

this should work fine:
data_initial = open(filePath, "rb")
data = csv.reader((line.replace('\0','') for line in data_initial), delimiter=",")

If the csv module says you have a "NULL" (silly message, should be "NUL") byte in your reading file, I would suggest checking out what is in your file.
Try use rb, it might make problem go away:
reader = csv.reader(open(filePath, 'rb', encoding="utf-8-sig", errors="ignore"))
Depends on how the file generated, there might include NULL byte, so you might need to
Open it in an editor, to see whether it is a reasonable CSV file, if the file too big, use nano or head in CLI.
Using another library like pandas, which could be more robust.
If the problem persists, you can replace all the '\x00', with empty string:
fi = open(filePath, 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('\x00', ''))
fo.close()

CSV new-line character seen in unquoted field error

the following code worked until today when I imported from a Windows machine and got this error:
new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
import csv
class CSV:
def __init__(self, file=None):
self.file = file
def read_file(self):
data = []
file_read = csv.reader(self.file)
for row in file_read:
data.append(row)
return data
def get_row_count(self):
return len(self.read_file())
def get_column_count(self):
new_data = self.read_file()
return len(new_data[0])
def get_data(self, rows=1):
data = self.read_file()
return data[:rows]
How can I fix this issue?
def upload_configurator(request, id=None):
"""
A view that allows the user to configurator the uploaded CSV.
"""
upload = Upload.objects.get(id=id)
csvobject = CSV(upload.filepath)
upload.num_records = csvobject.get_row_count()
upload.num_columns = csvobject.get_column_count()
upload.save()
form = ConfiguratorForm()
row_count = csvobject.get_row_count()
colum_count = csvobject.get_column_count()
first_row = csvobject.get_data(rows=1)
first_two_rows = csvobject.get_data(rows=5)

It'll be good to see the csv file itself, but this might work for you, give it a try, replace:
file_read = csv.reader(self.file)
with:
file_read = csv.reader(self.file, dialect=csv.excel_tab)
Or, open a file with universal newline mode and pass it to csv.reader, like:
reader = csv.reader(open(self.file, 'rU'), dialect=csv.excel_tab)
Or, use splitlines(), like this:
def read_file(self):
with open(self.file, 'r') as f:
data = [row for row in csv.reader(f.read().splitlines())]
return data

I realize this is an old post, but I ran into the same problem and don't see the correct answer so I will give it a try
Python Error:
_csv.Error: new-line character seen in unquoted field
Caused by trying to read Macintosh (pre OS X formatted) CSV files. These are text files that use CR for end of line. If using MS Office make sure you select either plain CSV format or CSV (MS-DOS). Do not use CSV (Macintosh) as save-as type.
My preferred EOL version would be LF (Unix/Linux/Apple), but I don't think MS Office provides the option to save in this format.

For Mac OS X, save your CSV file in "Windows Comma Separated (.csv)" format.

If this happens to you on mac (as it did to me):
Save the file as CSV (MS-DOS Comma-Separated)
Run the following script
with open(csv_filename, 'rU') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print ', '.join(row)

Try to run dos2unix on your windows imported files first

This is an error that I faced. I had saved .csv file in MAC OSX.
While saving, save it as "Windows Comma Separated Values (.csv)" which resolved the issue.

This worked for me on OSX.
# allow variable to opened as files
from io import StringIO
# library to map other strange (accented) characters back into UTF-8
from unidecode import unidecode
# cleanse input file with Windows formating to plain UTF-8 string
with open(filename, 'rb') as fID:
uncleansedBytes = fID.read()
# decode the file using the correct encoding scheme
# (probably this old windows one)
uncleansedText = uncleansedBytes.decode('Windows-1252')
# replace carriage-returns with new-lines
cleansedText = uncleansedText.replace('\r', '\n')
# map any other non UTF-8 characters into UTF-8
asciiText = unidecode(cleansedText)
# read each line of the csv file and store as an array of dicts,
# use first line as field names for each dict.
reader = csv.DictReader(StringIO(cleansedText))
for line_entry in reader:
# do something with your read data

I know this has been answered for quite some time but not solve my problem. I am using DictReader and StringIO for my csv reading due to some other complications. I was able to solve problem more simply by replacing delimiters explicitly:
with urllib.request.urlopen(q) as response:
raw_data = response.read()
encoding = response.info().get_content_charset('utf8')
data = raw_data.decode(encoding)
if '\r\n' not in data:
# proably a windows delimited thing...try to update it
data = data.replace('\r', '\r\n')
Might not be reasonable for enormous CSV files, but worked well for my use case.

Alternative and fast solution : I faced the same error. I reopened the "wierd" csv file in GNUMERIC on my lubuntu machine and exported the file as csv file. This corrected the issue.

Getting "newline inside string" while reading the csv file in Python?

I have this utils.py file in Django Architecture:
def range_data(ip):
r = []
f = open(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'))
for num,row in enumerate(csv.reader(f)):
if row[0] <= ip <= row[1]:
r.append([r[4]])
return r
else:
continue
return r
Here the ip parameter is just the IPv4 Address, I am using open source MAXMIND GeoIPCountrywhois.csv file.
Some starting content of GeopIOCountrywhois.csv:
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
I have also read about the issue, But didn't found so much understandable. Would you please help me to solve that error?
According to my method in utils, I am checking country name of paasing parameter IP address to the method.

had similar problem earlier today, there was an end quote missing from a line and the solution is by instructing reader to perform no special processing of quote characters (quoting=csv.QUOTE_NONE).

You can preprocess the csv by removing the newline like below.
import csv
content = open("GeoIPCountryWhois.csv", "r").read().replace('\r\n','\n')
with open("GeoIPCountryWhois2.csv", "w") as g:
g.write(content)
Then Use GeoIPCountryWhois2 for csv reader.
A wild Guess using a lineterminator may solve your problem
for num,row in enumerate(csv.reader(f,lineterminator='\n'))
See also: http://docs.python.org/lib/csv-fmt-params.html

You must open your files as binary:
def range_data(ip):
r = []
f = open(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'), 'rb')
for num,row in enumerate(csv.reader(f)):
# Your things.
Note the 'rb' mode there; otherwise the file could be opened with native line endings, and the CSV reader doesn't handle the various forms very well. Certainly the copy of GeoIPCountryWhois.csv that I downloaded has clean \n line endings.
This is documented for the .reader() method:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
If, however, your csv file is so corrupted as to still contain unexpected newline characters in unexpected places, use this file subclass instead as a stop-gap measure:
class CleanlinesFile(file):
def next(self):
line = super(CleanlinesFile, self).next()
return line.replace('\r', '').replace('\n', '') + '\n'
This class guarantees there will be no newlines anywhere in the returned results except as the very last character (just the way the csv module wants it). Use it instead of the open call; the 'rb' mode modifier becomes optional in this case:
def range_data(ip):
r = []
f = CleanlinesFile(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'))
for num,row in enumerate(csv.reader(f)):
# Your things.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python help reading csv file failing due to line-endings - python

I would populate a dictionary like this: >>> import csv >>> name_to_UID = {} >>> for row in csv.DictReader(open(filename, 'rU'), delimiter='\t'): name_to_UID[row['Name']] = row['UID'] >>> name_to_UID['Carmen-Jackson.local'] 'carmenj'

Related

Appending encrypted data to file

Extra blank line is getting printed at the end of the output in Python

Getting an error: Line Contains Null, Not sure the cause [duplicate]

CSV new-line character seen in unquoted field error

Getting "newline inside string" while reading the csv file in Python?

Categories

Resources