remove double quotes in each row (csv writer)

remove double quotes in each row (csv writer) - python

I'm writing API results to CSV file in python 3.7. Problem is it adds double quotes ("") to each row when it writes to file.
I'm passing format as csv to API call, so that I get results in csv format and then I'm writing it to csv file, store to specific location.
Please suggest if there is any better way to do this.
Here is the sample code..
with open(target_file_path, 'w', encoding='utf8') as csvFile:
writer = csv.writer(csvFile, quoting=csv.QUOTE_NONE, escapechar='\"')
for line in rec.split('\r\n'):
writer.writerow([line])
when I use escapechar='\"' it adds (") at the of every column value.
here is sample records..
2264855868",42.38454",-71.01367",07/15/2019 00:00:00",07/14/2019 20:00:00"
2264855868",42.38454",-71.01367",07/15/2019 01:00:00",07/14/2019 21:00:00"

API gives string/bytes which you can write directly in file.
data = request.get(..).content
open(filename, 'wb').write(data)
With csv.writer you would have to convert string/bytes to Python's data using csv.reader and then convert it back to string/bytes with csv.writer - so there is no sense to do it.
The same method should work if API send any file: JSON, CSV, XML, PDF, images, audio, etc.
For bigger files you could use chunk/stream in requests. Doc: requests - Advanced Usage

Have you tried removing the backward-slash from escapechar='\"'? It shouldn't be necessary, since you are using single quotes for the string.
EDIT: From the documentation:
A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character.
And the delimeter:
A one-character string used to separate fields. It defaults to ','
So it is going to escape the delimeter (,) with whatever you set as the escapechar, in this case ,
If you don't want any escape, try leaving it empty

Try:
import codecs
def find_replace(file, search_characters, replace_with):
text = codecs.open(file, "r", "utf-8-sig")
text = ''.join([i for i in text]).replace(
search_characters, replace_with)
x = codecs.open(file, "w", "utf-8-sig")
x.writelines(text)
x.close()
if __name__ == '__main__':
file = "target_file_path"
search_characters = '"'
replace_with = ''
find_replace(file, search_characters, replace_with)
output:
2264855868,42.38454,-71.01367,07/15/2019 00:00:00,07/14/2019 20:00:00
2264855868,42.38454,-71.01367,07/15/2019 01:00:00,07/14/2019 21:00:00

Related

Python CSV Parsing, Escaped Quote Character

I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.

The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')

Escape commas when writing string to CSV

I need to prepend a comma-containing string to a CSV file using Python. Some say enclosing the string in double quotes escapes the commas within. This does not work. How do I write this string without the commas being recognized as seperators?
string = "WORD;WORD 45,90;WORD 45,90;END;"
with open('doc.csv') as f:
prepended = string + '\n' + f.read()
with open('doc.csv', 'w') as f:
f.write(prepended)

So as you point out, you can typically quote the string as below. Is the system that reads these files not recognizing that syntax? If you use python's csv module it will handle the proper escaping:
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(myIterable, quoting=csv.QUOTE_ALL)
The quoted strings would look like:
"string1","string 2, with, commas"
Note if you have a quote character within your string it will be written as "" (two quote chars in a row):
"string1","string 2, with, commas, and "" a quote"

Writing a string to CSV using line escapes in python 3

Working in Python 3.7.
I'm currently pulling data from an API (Qualys's API, fetching a report) to be specific. It returns a string with all the report data in a CSV format with each new line designated with a '\r\n' escape.
(i.e. 'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n')
The problem I'm having is writing this string properly to a CSV file. Every iteration of code I've tried writes the data cell by cell when viewed in Excel with the \r\n appended to where ever it was in the string all on one row, rather than on a new line.
(i.e |foo|bar|stuff\r\n|more stuff|data|report\r\n|etc|etc|etc\r\n|)
I'm just making the switch from 2 to 3 so I'm almost positive it's a syntactical error or an error with my understanding of how python 3 handles new line delimiters or something along those lines, but even after reviewing documentation, here and blog posts I just cant either cant get my head around it, or I'm consistently missing something.
current code:
def dl_report(id, title):
data = {'action': 'fetch', 'id': id}
res = a.request('/api/2.0/fo/report/', data=data)
print(type(res)) #returns string
#input('pause')
f_csv = open(title,'w', newline='\r\n')
f_csv.write(res)
f_csv.close
but i've also tried:
with open(title, 'w', newline='\r\n') as f:
writer = csv.writer(f,<tried encoding here, no luck>)
writer.writerows(res)
#anyone else looking at this, this didn't work because of the difference
#between writerow() and writerows()
and I've also tried various ways to declare newline, such as:
newline=''
newline='\n'
etc...
and various other iterations along these lines. Any suggestions or guidance or... anything at this point would be awesome.
edit:
Ok, I've continued to work on it, and this kinda works:
def dl_report(id, title):
data = {'action': 'fetch', 'id': id}
res = a.request('/api/2.0/fo/report/', data=data)
print(type(res)) #returns string
reader = csv.reader(res.split(r'\r\n'), delimiter=',')
with open(title, 'w') as outfile:
writer = csv.writer(outfile, delimiter= '\n')
writer.writerow(reader)
But its ugly, and does create errors in the output CSV (some rows (less than 1%) don't parse as a CSV row, probably a formatting error somewhere..), but more concerning is that it works wonky when a "\" is presented in data.
I would really be interested in a solution that works... better? More pythonic? more consistently would be nice...
Any ideas?

Based on your comments, the data you're being served doesn't actually include carriage returns or newlines, it includes the text representing the escapes for carriage returns and newlines (so it really has a backslash, r, backslash, n in the data). It's otherwise already in the form you want, so you don't need to involve the csv module at all, just interpret the escapes to their correct value, then write the data directly.
This is relatively simple using the unicode-escape codec (which also handles ASCII escapes):
import codecs # Needed for text->text decoding
# ... retrieve data here, store to res ...
# Converts backslash followed by r to carriage return, by n to newline,
# and so on for other escapes
decoded = codecs.decode(res, 'unicode-escape')
# newline='' means don't perform line ending conversions, so you keep \r\n
# on all systems, no adding, no removing characters
# You may want to explicitly specify an encoding like UTF-8, rather than
# relying on the system default, so your code is portable across locales
with open(title, 'w', newline='') as f:
f.write(decoded)
If the strings you receive are actually wrapped in quotes (so print(repr(s)) includes quotes on either end), it's possible they're intended to be interpreted as JSON strings. In that case, just replace the import and creation of decoded with:
import json
decoded = json.loads(res)

If I understand your question correctly, can't you just replace the string?
with open(title, 'w') as f: f.write(res.replace("¥r¥n","¥n"))

Check out this answer:
Python csv string to array
According to CSVReader's documentation, it expects \r\n as the line delimiter by default. Your string should work fine with it. If you load the string into the CSVReader object, then you should be able to check for the standard way to export it.

Python strings use the single \n newline character. Normally, a \r\n is converted to \n when a file is read
and the newline is converted \n or \r\n depending on your system default and the newline= parameter on write.
In your case, \r wasn't removed when you read it from the web interface. When you opened the file with newline='\r\n', python expanded the \n as it was supposed to, but the \r passed through and now your neline is \r\r\n. You can see that by rereading the text file in binary mode:
>>> res = 'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n'
>>> open('test', 'w', newline='\r\n').write(res)
54
>>> open('test', 'rb').read()
b'foo,bar,stuff\r\r\n,more stuff,data,report\r\r\n,etc,etc,etc\r\r\n'
Since you already have the line endings you want, just write in binary mode and skip the conversions:
>>> open('test', 'wb').write(res.encode())
54
>>> open('test', 'rb').read()
b'foo,bar,stuff\r\n,more stuff,data,report\r\n,etc,etc,etc\r\n'
Notice I used the system default encoding, but you likely want to standardize on an encoding.

Reading a tab separated file with Python Pandas

I have encountered a problem reading a tab separated file using Pandas.
All the cell values have double quotations but for some rows, there is an extra double quotation that breaks the whole procedure. For instance:
Column A Column B Column C
"foo1" "121654" "unit"
"foo2" "1214" "unit"
"foo3" "15884""
The error I get is: Error tokenizing data. C error: Expected 31 fields in line 8355, saw 58
The code I used is:
csv = pd.read_csv(file, sep='\t',  lineterminator='\n', names=None)
and it works fine for the rest of the files but not for the ones where this extra double quotation appears.

If you cannot change the buggy input, the best way would be to read the input file into a io.StringIO object, replacing the double quotes, then pass this file-like object to pd.read (it supports filenames and file-like objects)
That way you don't have to create a temporary file or to alter the input data.
import io
with open(file) as f:
fileobject = io.StringIO(f.read().replace('""','"'))
csv = pd.read_csv(fileobject, sep='\t', lineterminator='\n', names=None)

You can do the preprocessing step to fix the quotation issue:
with open(file, 'r') as fp:
text = fp.read().replace('""', '"')
with open(file, 'w') as fp:
fp.write(text)

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?

You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.

The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.

you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.