Python CSV Parsing, Escaped Quote Character - python

I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.

The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')

Related

How to write string to csv that contain escape chars?

I am trying to write a list of strings to csv using csv.writer.
writer = csv.writer(f)
writer.writerow(some_text)
However, some of the strings contain a random escape character, which seems to be causing the following error : _csv.Error: need to escape, but no escapechar set
I've tried using the escapechar option in csv.writer like the following
writer = csv.writer(f, escapechar='\\')
but this seems to be a partial solution, since all the newline characters(\n) are not recognized.
How would I solve this problem? An example of a problematic string would be the following:
problem_string = "this \n sentence \% is \n problematic \g"
What format do you want to achieve in the end? Writing this to a csv seems to be leading to some odd outcomes anyway.
In any case, both of these code work for me without errors, both giving slightly different results with respect to escape characters.
With normal string:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = "this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
With raw input:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = r"this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)

remove double quotes in each row (csv writer)

I'm writing API results to CSV file in python 3.7. Problem is it adds double quotes ("") to each row when it writes to file.
I'm passing format as csv to API call, so that I get results in csv format and then I'm writing it to csv file, store to specific location.
Please suggest if there is any better way to do this.
Here is the sample code..
with open(target_file_path, 'w', encoding='utf8') as csvFile:
writer = csv.writer(csvFile, quoting=csv.QUOTE_NONE, escapechar='\"')
for line in rec.split('\r\n'):
writer.writerow([line])
when I use escapechar='\"' it adds (") at the of every column value.
here is sample records..
2264855868",42.38454",-71.01367",07/15/2019 00:00:00",07/14/2019 20:00:00"
2264855868",42.38454",-71.01367",07/15/2019 01:00:00",07/14/2019 21:00:00"
API gives string/bytes which you can write directly in file.
data = request.get(..).content
open(filename, 'wb').write(data)
With csv.writer you would have to convert string/bytes to Python's data using csv.reader and then convert it back to string/bytes with csv.writer - so there is no sense to do it.
The same method should work if API send any file: JSON, CSV, XML, PDF, images, audio, etc.
For bigger files you could use chunk/stream in requests. Doc: requests - Advanced Usage
Have you tried removing the backward-slash from escapechar='\"'? It shouldn't be necessary, since you are using single quotes for the string.
EDIT: From the documentation:
A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character.
And the delimeter:
A one-character string used to separate fields. It defaults to ','
So it is going to escape the delimeter (,) with whatever you set as the escapechar, in this case ,
If you don't want any escape, try leaving it empty
Try:
import codecs
def find_replace(file, search_characters, replace_with):
text = codecs.open(file, "r", "utf-8-sig")
text = ''.join([i for i in text]).replace(
search_characters, replace_with)
x = codecs.open(file, "w", "utf-8-sig")
x.writelines(text)
x.close()
if __name__ == '__main__':
file = "target_file_path"
search_characters = '"'
replace_with = ''
find_replace(file, search_characters, replace_with)
output:
2264855868,42.38454,-71.01367,07/15/2019 00:00:00,07/14/2019 20:00:00
2264855868,42.38454,-71.01367,07/15/2019 01:00:00,07/14/2019 21:00:00

Escape commas when writing string to CSV

I need to prepend a comma-containing string to a CSV file using Python. Some say enclosing the string in double quotes escapes the commas within. This does not work. How do I write this string without the commas being recognized as seperators?
string = "WORD;WORD 45,90;WORD 45,90;END;"
with open('doc.csv') as f:
prepended = string + '\n' + f.read()
with open('doc.csv', 'w') as f:
f.write(prepended)
So as you point out, you can typically quote the string as below. Is the system that reads these files not recognizing that syntax? If you use python's csv module it will handle the proper escaping:
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(myIterable, quoting=csv.QUOTE_ALL)
The quoted strings would look like:
"string1","string 2, with, commas"
Note if you have a quote character within your string it will be written as "" (two quote chars in a row):
"string1","string 2, with, commas, and "" a quote"

Python CSV Reader splitting on comma inside of quotes

from csv import reader
csv_reader_results = reader(["办公室弥漫着\"女红\"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!"],
escapechar='\\',
quotechar='"',
delimiter=',',
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
for result in csv_reader_result:
print result[0]
What I'm expecting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!
But what I'm getting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉
Because it splits on the four commas inside the sentence.
I'm escaping the quotes inside of the sentence. I've set the quotechar and escapechar for csv.reader. What am I doing wrong here?
Edit:
I used the answer by j6m8 https://stackoverflow.com/a/19881343/3945463 as a workaround. But it would be preferable to learn the correct way to do this with csv reader.

csv module automatically writing unwanted carriage returns

When using pythons csv module to create a csv it is automatically putting carriage return characters at the end of strings if the string has a comma inside it e.g:
['this one will have a carriage return, at the end','this one wont']
in an excel sheet this will turn out like:
| |this on|
because of the extra carriage return, it will also surround the string with the comma inside in double quotes, as expected.
The code I am using is:
with open(oldfile, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
How do I create a csv using the same data format which won't have carriage returns if the strings have commas inside, I don't mind the strings being surrounded by double quotes though
Here's a link to the diagnosis of the problem with the output .csv:
Excel showing empty cells when importing file created with csv module
It's the accepted answer.
I have changed my code to:
with open(oldfile, 'w', newline='', quoting=csv.QUOTE_MINIMAL) as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
I am now getting the error:
TypeError: 'quoting' is an invalid keyword argument for this function
The built-in CSV module of python has the option: csv.QUOTE_MINIMAL. When this option is added as an argument to the writer, it adds quotemarks when the delimeter is in the given string: "your text, with comma", "other field". This will eliminate the need for carriage returns.
The code is:
with open(oldfile, 'w') as csvfile: writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL) for row in data: writer.writerow(row)

Categories

Resources