Python CSV writer to quote strings with extra spaces - python

My data looks something like this:
data = [
[" trailing space", 19, 100],
[" ", 19, 100],
]
writer = csv.writer(csv_filename, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
Output
trailing space,19,100
,19,100
What I want
" trailing space",19,100
" ",19,100
Python default CSV writer has the option to "QUOTE_MINIMAL" but it doesn't include quoting strings with extra spaces in it. In my case, those empty spaces are actually critical, but without quoting, the reader (like libre-office) strips the spaces if not quoted.
Is there any built in options or quick cheap way to tell the writer to quote empty strings with spaces?
Also, "QUOTE_NONNUMERIC" is quoting too much. The actual data is huge ( few hundred megabytes with 60% - 70% of strings). It may sounds silly, but I'm trying to reduce the csv size by minimizing the quotes.

It's a bit of a hack but one way of achiving this could be
df.to_csv(quoting=csv.QUOTE_MINIMAL, escapechar=' ')
It's not document but QUOTE_MINIMAL seems to quote fields containing escapechar although it has no effect (as quoting is not NONE and doublequote is True by default)

Why not just use QUOTE_NONNUMERIC? That'll quote all strings, not just those with spaces, but it'll certainly quote those too.
with open("quote.csv", "w", newline="") as fp:
writer = csv.writer(fp, quoting=csv.QUOTE_NONNUMERIC)
writer.writerows(data)
gives me
(3.5.1) dsm#notebook:~/coding$ cat quote.csv
" leading space",19,100
" ",19,100

Have you tried csv writer in Python with custom quoting
Though make sure you know what you are quoting and take to manually escape stuff

Try removing the quoting altogether. Will keep all quote characters as required.
writer = csv.writer(csv_filename, delimiter=',', quoting=csv.QUOTE_NONE)

Related

How to write string to csv that contain escape chars?

I am trying to write a list of strings to csv using csv.writer.
writer = csv.writer(f)
writer.writerow(some_text)
However, some of the strings contain a random escape character, which seems to be causing the following error : _csv.Error: need to escape, but no escapechar set
I've tried using the escapechar option in csv.writer like the following
writer = csv.writer(f, escapechar='\\')
but this seems to be a partial solution, since all the newline characters(\n) are not recognized.
How would I solve this problem? An example of a problematic string would be the following:
problem_string = "this \n sentence \% is \n problematic \g"
What format do you want to achieve in the end? Writing this to a csv seems to be leading to some odd outcomes anyway.
In any case, both of these code work for me without errors, both giving slightly different results with respect to escape characters.
With normal string:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = "this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
With raw input:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = r"this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)

Unwanted blanks while writing data into csv using python [duplicate]

import csv
with open('test.csv', 'w') as outfile:
writer = csv.writer(outfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['hi', 'dude'])
writer.writerow(['hi2', 'dude2'])
The above code generates a file, test.csv, with an extra \r at each row, like so:
hi,dude\r\r\nhi2,dude2\r\r\n
instead of the expected
hi,dude\r\nhi2,dude2\r\n
Why is this happening, or is this actually the desired behavior?
Python 3:
The official csv documentation recommends opening the file with newline='' on all platforms to disable universal newlines translation:
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
...
The CSV writer terminates each line with the lineterminator of the dialect, which is '\r\n' for the default excel dialect on all platforms because that's what RFC 4180 recommends.
Python 2:
On Windows, always open your files in binary mode ("rb" or "wb"), before passing them to csv.reader or csv.writer.
Although the file is a text file, CSV is regarded a binary format by the libraries involved, with \r\n separating records. If that separator is written in text mode, the Python runtime replaces the \n with \r\n, hence the \r\r\n observed in the file.
See this previous answer.
While #john-machin gives a good answer, it's not always the best approach. For example, it doesn't work on Python 3 unless you encode all of your inputs to the CSV writer. Also, it doesn't address the issue if the script wants to use sys.stdout as the stream.
I suggest instead setting the 'lineterminator' attribute when creating the writer:
import csv
import sys
doc = csv.writer(sys.stdout, lineterminator='\n')
doc.writerow('abc')
doc.writerow(range(3))
That example will work on Python 2 and Python 3 and won't produce the unwanted newline characters. Note, however, that it may produce undesirable newlines (omitting the LF character on Unix operating systems).
In most cases, however, I believe that behavior is preferable and more natural than treating all CSV as a binary format. I provide this answer as an alternative for your consideration.
In Python 3 (I haven't tried this in Python 2), you can also simply do
with open('output.csv','w',newline='') as f:
writer=csv.writer(f)
writer.writerow(mystuff)
...
as per documentation.
More on this in the doc's footnote:
If newline='' is not specified, newlines embedded inside quoted fields
will not be interpreted correctly, and on platforms that use \r\n
linendings on write an extra \r will be added. It should always be
safe to specify newline='', since the csv module does its own
(universal) newline handling.
You can introduce the lineterminator='\n' parameter in the csv writer command.
import csv
delimiter='\t'
with open('tmp.csv', '+w', encoding='utf-8') as stream:
writer = csv.writer(stream, delimiter=delimiter, quoting=csv.QUOTE_NONE, quotechar='', lineterminator='\n')
writer.writerow(['A1' , 'B1', 'C1'])
writer.writerow(['A2' , 'B2', 'C2'])
writer.writerow(['A3' , 'B3', 'C3'])
You have to add attribute newline="\n" to open function like this:
with open('file.csv','w',newline="\n") as out:
csv_out = csv.writer(out, delimiter =';')
Note that if you use DictWriter, you will have a new line from the open function and a new line from the writerow function.
You can use newline='' within the open function to remove the extra newline.

Python CSV Parsing, Escaped Quote Character

I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.
The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')

Python CSV Reader splitting on comma inside of quotes

from csv import reader
csv_reader_results = reader(["办公室弥漫着\"女红\"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!"],
escapechar='\\',
quotechar='"',
delimiter=',',
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
for result in csv_reader_result:
print result[0]
What I'm expecting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!
But what I'm getting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉
Because it splits on the four commas inside the sentence.
I'm escaping the quotes inside of the sentence. I've set the quotechar and escapechar for csv.reader. What am I doing wrong here?
Edit:
I used the answer by j6m8 https://stackoverflow.com/a/19881343/3945463 as a workaround. But it would be preferable to learn the correct way to do this with csv reader.

Insert whitespace after delimiter with CSV writer

f = open("file1.csv", "r")
g = open("file2.csv", "w")
a = csv.reader(f, delimiter=";", skipinitialspace=True)
b = csv.writer(g, delimiter=";")
for line in a:
b.writerow(line)
In the above code, I try to load file1.csv using the csv module in Python2.7, and then write it in file2.csv using a csv.writer.
My issue comes from existing whitespaces (a single space character) after the delimiter in the input file. I need to remove them in order to do some data manipulation later on, so I used the skipinitialspace=True argument for the reader. However, I cannot get the writer to print the space char after the delimiter, and therefore disturbing any subsequent diffing of the two files.
I tried to use the Sniffer class to auto-generate a Dialect but I guess my input files (coming from a large complex legacy system, with dozens of fields and poor quoting and escaping) are proving to be too complex for this.
In more simple terms I'm looking for the answers to the following questions:
How can I insert a space character after each delimiter in the writer?
Incidently, what are the reasons to prohibit the use of multi-character strings as delimiters? delimiter="; " would've solved my problem.
You can wrap your file objects in proxies that add the whitespace:
>>> class DelimitedFile(file):
... def write(self, value):
... super(DelimitedFile, self).write(value.replace(";", "; "))
...
>>> f = DelimitedFile("foo", "w")
>>> f.write("hello;world")
>>> f.close()
>>> open("foo").read()
'hello; world'
If you left the whitespace you want written in (removing/restoring it during processing), or put it back after processing but before writing, that would take care of it.
One solution would be to write to a StringIO object, and then to replace the semicolons with '; ', or to do so during processing of the lines, if you do any other processing.
As for the first, I would probably do something like this:
for k, line in enumerate(a):
if k == 0:
b.writerow(line)
else:
b.writerow(' ' + line) #assuming line is always a string, if not just use str() on it
As for the second, I have no idea.

Categories

Resources