Getting "newline inside string" while reading the csv file in Python?

Getting "newline inside string" while reading the csv file in Python? - python

I have this utils.py file in Django Architecture:
def range_data(ip):
r = []
f = open(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'))
for num,row in enumerate(csv.reader(f)):
if row[0] <= ip <= row[1]:
r.append([r[4]])
return r
else:
continue
return r
Here the ip parameter is just the IPv4 Address, I am using open source MAXMIND GeoIPCountrywhois.csv file.
Some starting content of GeopIOCountrywhois.csv:
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
I have also read about the issue, But didn't found so much understandable. Would you please help me to solve that error?
According to my method in utils, I am checking country name of paasing parameter IP address to the method.

had similar problem earlier today, there was an end quote missing from a line and the solution is by instructing reader to perform no special processing of quote characters (quoting=csv.QUOTE_NONE).

You can preprocess the csv by removing the newline like below.
import csv
content = open("GeoIPCountryWhois.csv", "r").read().replace('\r\n','\n')
with open("GeoIPCountryWhois2.csv", "w") as g:
g.write(content)
Then Use GeoIPCountryWhois2 for csv reader.
A wild Guess using a lineterminator may solve your problem
for num,row in enumerate(csv.reader(f,lineterminator='\n'))
See also: http://docs.python.org/lib/csv-fmt-params.html

You must open your files as binary:
def range_data(ip):
r = []
f = open(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'), 'rb')
for num,row in enumerate(csv.reader(f)):
# Your things.
Note the 'rb' mode there; otherwise the file could be opened with native line endings, and the CSV reader doesn't handle the various forms very well. Certainly the copy of GeoIPCountryWhois.csv that I downloaded has clean \n line endings.
This is documented for the .reader() method:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
If, however, your csv file is so corrupted as to still contain unexpected newline characters in unexpected places, use this file subclass instead as a stop-gap measure:
class CleanlinesFile(file):
def next(self):
line = super(CleanlinesFile, self).next()
return line.replace('\r', '').replace('\n', '') + '\n'
This class guarantees there will be no newlines anywhere in the returned results except as the very last character (just the way the csv module wants it). Use it instead of the open call; the 'rb' mode modifier becomes optional in this case:
def range_data(ip):
r = []
f = CleanlinesFile(os.path.join(settings.PROJECT_ROOT, 'static', 'csv ',
'GeoIPCountryWhois.csv'))
for num,row in enumerate(csv.reader(f)):
# Your things.

Related

How does one parse a .lua file with Python and pull out the require statements?

I am not very good at parsing files but have something I would like to accomplish. The following is a snippet of a .lua script that has some require statements. I would like to use Python to parse this .lua file and pull the 'require' statements out.
For example, here are the require statements:
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
From the example above I would then like to split the directory from the required file. In the example 'require "common.acme_1"' the directory would be common and the required file would be acme_1. I would then just add the .lua extention to acme_1. I need this information so I can validate if the file exists on the file system (which I know how to do) and then against luac (compiler) to make sure it is a valid lua file (which I also know how to do).
I simply need help pulling these require statements out using Python and splitting the directory name from the filename.

You can do this with built in string methods, but since the parsing is a little bit complicated (paths can be multi-part) the simplest solution might be to use regex. If you're using regex, you can do the parsing and splitting using groups:
import re
data = \
'''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require "common.core.acme_4"
'''
finds = re.findall(r'require\s+"(([^."]+\.)*)?([^."]+)"', data, re.MULTILINE)
print [dict(path=x[0].rstrip('.'),file=x[2]) for x in finds]
The first group is the path (including the trailing .), the second group is the inner group needed for matching repeated path parts (discarded), and the third group is the file name. If there is no path you get path=''.
Output:
[{'path': 'common', 'file': 'acme_1'}, {'path': 'common', 'file': 'acme_2'}, {'path': '', 'file': 'acme_3'}, {'path': 'common.core', 'file': 'acme_4'}]

Here ya go!
import sys
import os.path
if len(sys.argv) != 2:
print "Usage:", sys.argv[0], "<inputfile.lua>"
exit()
f = open(sys.argv[1], "r")
lines = f.readlines()
f.close()
for line in lines:
if line.startswith("require "):
path = line.replace('require "', '').replace('"', '').replace("\n", '').replace(".", "/") + ".lua"
fName = os.path.basename(path)
path = path.replace(fName, "")
print "File: " + fName
print "Directory: " + path
#do what you want to each file & path here

Here's a crazy one-liner, not sure if this was exactly what you wanted and most certainly not the most optimal one...
In [270]: import re
In [271]: [[s[::-1] for s in rec[::-1].split(".", 1)][::-1] for rec in re.findall(r"require \"([^\"]*)", text)]
Out[271]:
[['common', 'acme_1'],
['common', 'acme_2'],
['acme_3'],
['common.core', 'acme_4']]

This is straight forward
One liners are great but they take too much effort to understand early and this is not a job for using regular expressions in my opinion
mylines = [line.split('require')[-1] for line in open(mylua.lua).readlines() if line.startswith('require')]
paths = []
for line in mylines:
if 'common.' in line:
paths.append('common, line.split('common.')[-1]
else:
paths.append('',line)

You could use finditer:
lua='''
require "common.acme_1"
require "common.acme_2"
require "acme_3"
require 'common.core.acme_4'
'''
import re
print [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]
# ['common.acme_1', 'common.acme_2', 'acme_3', 'common.core.acme_4']
Then just split on the '.' to split into paths:
for e in [m.group(2) for m in re.finditer(r'^require\s+(\'|")([^\'"]+)(\1)', lua, re.S | re.M)]:
parts=e.split('.')
if parts[:-1]:
print '/'.join(parts[:-1]), parts[-1]
else:
print parts[0]
Prints:
common acme_1
common acme_2
acme_3
common/core acme_4

file = '/path/to/test.lua'
def parse():
with open(file, 'r') as f:
requires = [line.split()[1].strip('"') for line in f.readlines() if line.startswith('require ')]
for r in requires:
filename = r.replace('.', '/') + '.lua'
print(filename)
The with statement opens the file in question. The next line creates a list of all lines that start with 'require ' and splits them, ignoring the 'require' and grabbing only the last part and strips off the double quotes. Then go though the list and replace the dots with slashes and appends '.lua'. The print statement shows the results.

CSV new-line character seen in unquoted field error

the following code worked until today when I imported from a Windows machine and got this error:
new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
import csv
class CSV:
def __init__(self, file=None):
self.file = file
def read_file(self):
data = []
file_read = csv.reader(self.file)
for row in file_read:
data.append(row)
return data
def get_row_count(self):
return len(self.read_file())
def get_column_count(self):
new_data = self.read_file()
return len(new_data[0])
def get_data(self, rows=1):
data = self.read_file()
return data[:rows]
How can I fix this issue?
def upload_configurator(request, id=None):
"""
A view that allows the user to configurator the uploaded CSV.
"""
upload = Upload.objects.get(id=id)
csvobject = CSV(upload.filepath)
upload.num_records = csvobject.get_row_count()
upload.num_columns = csvobject.get_column_count()
upload.save()
form = ConfiguratorForm()
row_count = csvobject.get_row_count()
colum_count = csvobject.get_column_count()
first_row = csvobject.get_data(rows=1)
first_two_rows = csvobject.get_data(rows=5)

It'll be good to see the csv file itself, but this might work for you, give it a try, replace:
file_read = csv.reader(self.file)
with:
file_read = csv.reader(self.file, dialect=csv.excel_tab)
Or, open a file with universal newline mode and pass it to csv.reader, like:
reader = csv.reader(open(self.file, 'rU'), dialect=csv.excel_tab)
Or, use splitlines(), like this:
def read_file(self):
with open(self.file, 'r') as f:
data = [row for row in csv.reader(f.read().splitlines())]
return data

I realize this is an old post, but I ran into the same problem and don't see the correct answer so I will give it a try
Python Error:
_csv.Error: new-line character seen in unquoted field
Caused by trying to read Macintosh (pre OS X formatted) CSV files. These are text files that use CR for end of line. If using MS Office make sure you select either plain CSV format or CSV (MS-DOS). Do not use CSV (Macintosh) as save-as type.
My preferred EOL version would be LF (Unix/Linux/Apple), but I don't think MS Office provides the option to save in this format.

For Mac OS X, save your CSV file in "Windows Comma Separated (.csv)" format.

If this happens to you on mac (as it did to me):
Save the file as CSV (MS-DOS Comma-Separated)
Run the following script
with open(csv_filename, 'rU') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print ', '.join(row)

Try to run dos2unix on your windows imported files first

This is an error that I faced. I had saved .csv file in MAC OSX.
While saving, save it as "Windows Comma Separated Values (.csv)" which resolved the issue.

This worked for me on OSX.
# allow variable to opened as files
from io import StringIO
# library to map other strange (accented) characters back into UTF-8
from unidecode import unidecode
# cleanse input file with Windows formating to plain UTF-8 string
with open(filename, 'rb') as fID:
uncleansedBytes = fID.read()
# decode the file using the correct encoding scheme
# (probably this old windows one)
uncleansedText = uncleansedBytes.decode('Windows-1252')
# replace carriage-returns with new-lines
cleansedText = uncleansedText.replace('\r', '\n')
# map any other non UTF-8 characters into UTF-8
asciiText = unidecode(cleansedText)
# read each line of the csv file and store as an array of dicts,
# use first line as field names for each dict.
reader = csv.DictReader(StringIO(cleansedText))
for line_entry in reader:
# do something with your read data

I know this has been answered for quite some time but not solve my problem. I am using DictReader and StringIO for my csv reading due to some other complications. I was able to solve problem more simply by replacing delimiters explicitly:
with urllib.request.urlopen(q) as response:
raw_data = response.read()
encoding = response.info().get_content_charset('utf8')
data = raw_data.decode(encoding)
if '\r\n' not in data:
# proably a windows delimited thing...try to update it
data = data.replace('\r', '\r\n')
Might not be reasonable for enormous CSV files, but worked well for my use case.

Alternative and fast solution : I faced the same error. I reopened the "wierd" csv file in GNUMERIC on my lubuntu machine and exported the file as csv file. This corrected the issue.

Confusing Error when Reading from a File in Python

I'm having a problem opening the names.txt file. I have checked that I am in the correct directory. Below is my code:
import os
print(os.getcwd())
def alpha_sort():
infile = open('names', 'r')
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
infile.close()
return 0
alpha_sort()
And the error I got:
FileNotFoundError: [Errno 2] No such file or directory: 'names'
Any ideas on what I'm doing wrong?

You mention in your question body that the file is "names.txt", however your code shows you trying to open a file called "names" (without the ".txt" extension). (Extensions are part of filenames.)
Try this instead:
infile = open('names.txt', 'r')

As a side note, make sure that when you open files you use universal mode, as windows and mac/unix have different representations of carriage returns (/r/n vs /n etc.). Universal mode gets python to handle this, so it's generally a good idea to use it whenever you need to read a file. (EDIT - should read: a text file, thanks cameron)
So the code would just look like this
infile = open( 'names.txt', 'rU' ) #capital U indicated to open the file in universal mode

This doesn't solve that issue, but you might consider using with when opening files:
with open('names', 'r') as infile:
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
return 0
This closes the file for you and handles exceptions as well.

Python help reading csv file failing due to line-endings

I'm trying to create this script that will check the computer host name then search a master list for the value to return a corresponding value in the csv file. Then open another file and do a find an replace. I know this should be easy but haven't done so much in python before. Here is what I have so far...
masterlist.txt (tab delimited)
Name UID
Bob-Smith.local bobs
Carmen-Jackson.local carmenj
David-Kathman.local davidk
Jenn-Roberts.local jennr
Here is the script that I have created thus far
#GET CLIENT HOST NAME
import socket
host = socket.gethostname()
print host
#IMPORT MASTER DATA
import csv, sys
filename = "masterlist.txt"
reader = csv.reader(open(filename, "rU"))
#PRINT MASTER DATA
for row in reader:
print row
#SEARCH ON HOSTNAME AND RETURN UID
#REPLACE VALUE IN FILE WITH UID
#import fileinput
#for line in fileinput.FileInput("filetoreplace",inplace=1):
# line = line.replace("replacethistext","UID")
# print line
Right now, it's just set to print the master list. I'm not sure if the list needs to be parsed and placed into a dictionary or what. I really need to figure out how to search the first field for the hostname and then return the field in the second column.
Thanks in advance for your help,
Aaron
UPDATE: I removed line 194 and last line from masterlist.txt and then re-ran the script. The results were the following:
Traceback (most recent call last):
File "update.py", line 3, in
for row in csv.DictReader(open(fname),
delimiter='\t'): File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/csv.py",
line 103, in next
self.fieldnames File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/csv.py",
line 90, in fieldnames
self._fieldnames = self.reader.next()
_csv.Error: new-line character seen in unquoted field - do you need to open
the file in universal-newline mode?
The current script being used is...
import csv
fname = "masterlist.txt"
for row in csv.DictReader(open(fname), delimiter='\t'):
print(row)

The two occurrences of '\xD5' in line 194 and the last line have nothing to do with the problem.
The problem appears to be a bug, or a misleading error message, or incorrect/vague documentation, in the Python 2.6 csv module.
In the file, the lines are terminated by '\x0D' aka '\r' in the Classic Mac tradition. The last line is not terminated, but that is nothing to do with the problem.
The docs for csv.reader say "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference." It is widely known that it does make a difference on Windows. However opening the file with 'rb' or 'r' makes no difference in this case -- still the same error message.
The docs for csv.Dialect.lineterminator say "The string used to terminate lines produced by the writer. It defaults to '\r\n'. Note: The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." It appears to be recognising '\r' as new-line but not as end-of-line/end-of-field.
The error message "_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?" is confusing; it's recognised '\r' as a new-line, but it's not treating new-line as an end-of line (and thus implicitly end-of-field).
It appears necessary to open the file in 'rU' mode to get it to "work". It's not apparent why the same '\r' recognised in universal-newline mode is any better.

To get iterate over a reader you'd do:
>>> import csv
>>> for row in csv.DictReader(open(fname), delimiter='\t'):
print(row)
{'Name': 'Bob-Smith.local', 'UID': 'bobs'}
{'Name': 'Carmen-Jackson.local', 'UID': 'carmenj'}
{'Name': 'David-Kathman.local', 'UID': 'davidk'}
{'Name': 'Jenn-Roberts.local', 'UID': 'jennr'}
But since you want to associate Name with UID:
>>> reader = csv.reader(open("masterlist.txt"), delimiter='\t')
>>> _ = next(reader) # just discarding header
>>> d = dict(reader)
>>> d['Carmen-Jackson.local']
'carmenj'

I would populate a dictionary like this:
>>> import csv
>>> name_to_UID = {}
>>> for row in csv.DictReader(open(filename, 'rU'), delimiter='\t'):
name_to_UID[row['Name']] = row['UID']
>>> name_to_UID['Carmen-Jackson.local']
'carmenj'

How to modify a text file?

I'm using Python, and would like to insert a string into a text file without deleting or copying the file. How can I do that?

Unfortunately there is no way to insert into the middle of a file without re-writing it. As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you'll have to rewrite it.
This is an operating system thing, not a Python thing. It is the same in all languages.
What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that. This is better than reading the whole file into memory because the file may be too large for that. Once the temporary file is completed, I rename it the same as the original file.
This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.

Depends on what you want to do. To append you can open it with "a":
with open("foo.txt", "a") as f:
f.write("new line\n")
If you want to preprend something you have to read from the file first:
with open("foo.txt", "r+") as f:
old = f.read() # read everything in the file
f.seek(0) # rewind
f.write("new line\n" + old) # write the new line before

The fileinput module of the Python standard library will rewrite a file inplace if you use the inplace=1 parameter:
import sys
import fileinput
# replace all occurrences of 'sit' with 'SIT' and insert a line after the 5th
for i, line in enumerate(fileinput.input('lorem_ipsum.txt', inplace=1)):
sys.stdout.write(line.replace('sit', 'SIT')) # replace 'sit' and write
if i == 4: sys.stdout.write('\n') # write a blank line after the 5th line

Rewriting a file in place is often done by saving the old copy with a modified name. Unix folks add a ~ to mark the old one. Windows folks do all kinds of things -- add .bak or .old -- or rename the file entirely or put the ~ on the front of the name.
import shutil
shutil.move(afile, afile + "~")
destination= open(aFile, "w")
source= open(aFile + "~", "r")
for line in source:
destination.write(line)
if <some condition>:
destination.write(<some additional line> + "\n")
source.close()
destination.close()
Instead of shutil, you can use the following.
import os
os.rename(aFile, aFile + "~")

Python's mmap module will allow you to insert into a file. The following sample shows how it can be done in Unix (Windows mmap may be different). Note that this does not handle all error conditions and you might corrupt or lose the original file. Also, this won't handle unicode strings.
import os
from mmap import mmap
def insert(filename, str, pos):
if len(str) < 1:
# nothing to insert
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
# or this could be an error
if pos > origSize:
pos = origSize
elif pos < 0:
pos = 0
m.resize(origSize + len(str))
m[pos+len(str):] = m[pos:origSize]
m[pos:pos+len(str)] = str
m.close()
f.close()
It is also possible to do this without mmap with files opened in 'r+' mode, but it is less convenient and less efficient as you'd have to read and temporarily store the contents of the file from the insertion position to EOF - which might be huge.

As mentioned by Adam you have to take your system limitations into consideration before you can decide on approach whether you have enough memory to read it all into memory replace parts of it and re-write it.
If you're dealing with a small file or have no memory issues this might help:
Option 1)
Read entire file into memory, do a regex substitution on the entire or part of the line and replace it with that line plus the extra line. You will need to make sure that the 'middle line' is unique in the file or if you have timestamps on each line this should be pretty reliable.
# open file with r+b (allow write and binary mode)
f = open("file.log", 'r+b')
# read entire content of file into memory
f_content = f.read()
# basically match middle line and replace it with itself and the extra line
f_content = re.sub(r'(middle line)', r'\1\nnew line', f_content)
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(f_content)
# close file
f.close()
Option 2)
Figure out middle line, and replace it with that line plus the extra line.
# open file with r+b (allow write and binary mode)
f = open("file.log" , 'r+b')
# get array of lines
f_content = f.readlines()
# get middle line
middle_line = len(f_content)/2
# overwrite middle line
f_content[middle_line] += "\nnew line"
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(''.join(f_content))
# close file
f.close()

Wrote a small class for doing this cleanly.
import tempfile
class FileModifierError(Exception):
pass
class FileModifier(object):
def __init__(self, fname):
self.__write_dict = {}
self.__filename = fname
self.__tempfile = tempfile.TemporaryFile()
with open(fname, 'rb') as fp:
for line in fp:
self.__tempfile.write(line)
self.__tempfile.seek(0)
def write(self, s, line_number = 'END'):
if line_number != 'END' and not isinstance(line_number, (int, float)):
raise FileModifierError("Line number %s is not a valid number" % line_number)
try:
self.__write_dict[line_number].append(s)
except KeyError:
self.__write_dict[line_number] = [s]
def writeline(self, s, line_number = 'END'):
self.write('%s\n' % s, line_number)
def writelines(self, s, line_number = 'END'):
for ln in s:
self.writeline(s, line_number)
def __popline(self, index, fp):
try:
ilines = self.__write_dict.pop(index)
for line in ilines:
fp.write(line)
except KeyError:
pass
def close(self):
self.__exit__(None, None, None)
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
with open(self.__filename,'w') as fp:
for index, line in enumerate(self.__tempfile.readlines()):
self.__popline(index, fp)
fp.write(line)
for index in sorted(self.__write_dict):
for line in self.__write_dict[index]:
fp.write(line)
self.__tempfile.close()
Then you can use it this way:
with FileModifier(filename) as fp:
fp.writeline("String 1", 0)
fp.writeline("String 2", 20)
fp.writeline("String 3") # To write at the end of the file

If you know some unix you could try the following:
Notes: $ means the command prompt
Say you have a file my_data.txt with content as such:
$ cat my_data.txt
This is a data file
with all of my data in it.
Then using the os module you can use the usual sed commands
import os
# Identifiers used are:
my_data_file = "my_data.txt"
command = "sed -i 's/all/none/' my_data.txt"
# Execute the command
os.system(command)
If you aren't aware of sed, check it out, it is extremely useful.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting "newline inside string" while reading the csv file in Python? - python

had similar problem earlier today, there was an end quote missing from a line and the solution is by instructing reader to perform no special processing of quote characters (quoting=csv.QUOTE_NONE).

Related

How does one parse a .lua file with Python and pull out the require statements?

CSV new-line character seen in unquoted field error

Confusing Error when Reading from a File in Python

Python help reading csv file failing due to line-endings

How to modify a text file?

Categories

Resources