How to parse csv with quoted strings - advanced case - python

I'm trying to make csv module to parse lines containing quoted strings and quoted separators. Unfortunately I'm not able to achieve desired results with any dialect/format parameters. Is there any way to parse this:
'"AAA", BBB, "CCC, CCC"'
and get this:
['"AAA"', 'BBB', '"CCC, CCC"'] # 3 elements, one quoted separator
?
Two fundamental requirements:
Quotations have to be preserved
Quoted, and not escaped separators have to be copied as regular characters
Is it possible?

There are 2 issues to overcome:
spaces around comma separator: skipinitialspace=True does the job (see also Python parse CSV ignoring comma with double-quotes)
preserving quoting when reading: replacing quotes by tripled quotes allows to preserve quotes
That second part is described in the documentation as:
Dialect.doublequote
Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.
standalone example, without file:
import csv
data = ['"AAA", BBB, "CCC, CCC"'.replace('"','"""')]
cr = csv.reader(data,skipinitialspace=True)
row = next(cr)
print(row)
result:
['"AAA"', 'BBB', '"CCC, CCC"']
with a file as input:
import csv
with open("input.csv") as f:
cr = csv.reader((l.replace('"','"""' for l in f),skipinitialspace=True)
for row in cr:
print(row)

Have you tried this ?
import csv
with open('file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
print row

Related

how to Use multiple different characters for quotechar in python csv reader

I am trying to read a CSV file that sometimes uses double quotes (") for strings and sometimes uses single quotes (') for strings.
I would like to read the file to properly handle these strings.
It is not necessary but it would be helpful if " don't " was parsed correctly. This is why I want to avoid just replacing every ' for ".
A crude way to handle this would be to use regex to detect any single quotation which is either preceded by a space, or followed by a space.
We can then replace just these quotations with " and ignore the ones which have letters directly next to them.
CSV
"""Let's do a test""","""We will replace all 'single' quotation's not within""","""A word to """""
Python
import re
pattern = r'((?<=\s)\')|(\'(?=\s))'
data = []
with open('hello.csv', 'r') as file:
for row in file.readlines():
data.append(re.sub(pattern, '"', row))
Output
['"""Let\'s do a test""","""We will replace all "single" quotation\'s not within""","""A word to """""\n']
You can use quoting=csv.QUOTE_NONE to prevent quote processing by csv.reader, then use ast.literal_eval to interpret values as Python literals (or, if that fails, keep them as strings).
import io
import csv
from ast import literal_eval
def unquote(item):
item = item.strip()
try:
return literal_eval(item.strip())
except ValueError:
return item
f = io.StringIO(r'''
bare, "John \"O'Brien\" Smith", 'John "O\'Brien" Smith', 42
'''.strip())
reader = csv.reader(f, quoting=csv.QUOTE_NONE)
for row in reader:
parsed_row = [unquote(item) for item in row]
print(parsed_row)
# => ['bare', 'John "O\'Brien" Smith', 'John "O\'Brien" Smith', 42]
Note though that since they are evaluated as Python literals, any unquoted field values that represent valid Python literals (e.g. True or 42) will not remain as strings.

python reading csv, text inside comma inside a column having double double inverted comma "" text,text2 ""causing split with the comma

Well I have been sent a csv by othe system with comma as the delimiter. one row has one column with sample values as:
,""ABC. & XYZ (CfdfB,afGgM)_0110"" , .
This row cause of this column is causing error.
While debugging Now when I read this using python and printed row, this particular value is printed as:
'ABC. & XYZ (CfdfB', ' afGgM)_0110""'
so this valus is getting split, reason being double of " and a comma in between.
code used is:
with open(abccsv, "r", newline='',encoding="UTF-8") as file:
reader = csv.reader(file, quotechar='"', delimiter=",",quoting=csv.QUOTE_ALL)
# counter = 0
for row in reader:
print(row)
Try a different delimiter than comma. It's splitting because it's using comma as it's delimiter for CSV files. Try tab or something else.

Write without quotes in empty columns in a CSV?

I need to modify some columns of a CSV file to add some text in them. Once I've modified that columns I write the whole row, with the modified column to a new CSV file, but it does not keep the original format, as it adds "" in the empty columns.
The original CSV is a special dialect that I've registered as:
csv.register_dialect('puntocoma', delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL)
And it is part of my code:
with open(fileName,'rt', newline='', encoding='ISO8859-1') as fdata, \
open(r'SampleFiles\Servergiro\fout.csv',
'wt', newline='', encoding='ISO8859-1') as fout:
reader=csv.DictReader(fdata, dialect='puntocoma')
writer=csv.writer(fout, dialect='puntocoma')
I am reading the CSV with DictReader and with the CSV module
Then I modify the column that I need:
for row in reader:
for (key, value) in row.items():
if key=='C' or key == 'D' or key == 'E':
if row[key] != "":
row[key] = '<something>' + value + '</something>'
And I write the modified content as it follows
content = list(row[i] for i in fields)
writer.writerow(content)
The original CSV has content like (header included):
"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"Ma";"No";"O 3";"O 4";"O 5"
"3123131";"Lorem";;;;;;;;;;;"Ipsum";"Ar";"Maquina Lorem";;;
"3003321";"HD 2.5' EP";;"asät 600 MB<br />Ere qweqsI (SAS)<br />tre qwe 15000 RPM<br />sasd ty 2.5 Zor<br />Areämis tyn<br />Ser Ja<br />Ütr ewas/s";;;;;;;;;"rew";"Asert ";"Trebol";"Casa";;
"3026273";"Sertro 5 M";;;;;;;;;;;"Rese";"Asert ";"Trebol";"Casa";;
But my modified CSV writes the following:
"3123131";"<something>Lorem</something>";"";"";"";"";"";"";"";"";"";"";"<something>Ipsum</something>";"<something>Ar</something>";"<something>Maquina Lorem</something>";"";"";""
I've modified the original question adding the headers of the CSV. (The names of the headers are not the original.
How can I write the new CSV without quotes. My guess is about the dialect, but in reality it is a quote-all dialect except for columns that are empty.
It seems that you either have quotes everywhere (QUOTE_ALL) or no quotes (QUOTE_MINIMAL) (and other exotic options useless here).
I first posted a solution which wrote in a file buffer, then replaced the double quotes by nothing, but it was really a hack and could not manage strings containing quotes properly.
A better solution is to manually manage the quoting to force it if string is not empty, and don't put any if empty:
with open("input.csv") as fr, open("output.csv","w") as fw:
csv.register_dialect('puntocoma', delimiter=';', quotechar='"')
cr = csv.reader(fr,dialect="puntocoma")
cw = csv.writer(fw,delimiter=';',quotechar='',escapechar="\\",quoting=csv.QUOTE_NONE)
cw.writerows(['"{}"'.format(x.replace('"','""')) if x else "" for x in row] for row in cr)
Here we tell csv no write no quotes at all (and we even pass an empty quote char). The manual quoting consists in generating the rows using a list comprehension quoting only if string is not empty, and doubling the quotes from within the string.

How to remove more than one space when reading text file

Problem: I cannot seem to parse the information in a text file because python reads it as a full string not individual separate strings. The spaces between each variable is not a \t which is why it does not separate. Is there a way for python to flexibly remove the spaces and put a comma or \t instead?
Example DATA:
MOR125-1 MOR129-1 0.587
MOR125-1 MOR129-3 0.598
MOR129-1 MOR129-3 0.115
The code I am using:
with open("Distance_Data_No_Bootstrap_RAW.txt","rb") as f:
reader = csv.reader(f,delimiter="\t")
d=list(reader)
for i in range(3):
print d[i]
Output:
['MOR125-1 MOR129-1 0.587']
['MOR125-1 MOR129-3 0.598']
['MOR129-1 MOR129-3 0.115']
Desired Output:
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can simply declare the delimiter to be a space, and ask csv to skip initial spaces after a delimiter. That way, your separator is in fact the regular expression ' +', that is one or more spaces.
rd = csv.reader(fd, delimiter=' ', skipinitialspace=True)
for row in rd:
print row
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can instruct csv.reader to use space as delimiter and skip all the extra space:
reader = csv.reader(f, delimiter=" ", skipinitialspace=True)
For detailed information about available parameters check Python docs:
Dialect.delimiter
A one-character string used to separate fields. It defaults to ','.
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.

How can i quote escape characters in csv writer in python

I am writing the csv file like this
for a in products:
mylist =[]
for h in headers['product']:
mylist.append(a.get(h))
writer.writerow(mylist)
My my few fields are text fields can conatins any characters like , " ' \n or anything else. what is the safest way to write that in csv file. also file will also have integers and floats
You should use QUOTE_ALL quoting option:
import StringIO
import csv
row = ["AAA \n BBB ,222 \n CCC;DDD \" EEE ' FFF 111"]
output = StringIO.StringIO()
wr = csv.writer(output, quoting=csv.QUOTE_ALL)
wr.writerow( row )
# Test:
contents = output.getvalue()
parsedRow = list(csv.reader([contents]))[0]
if parsedRow == row: print "BINGO!"
using csv.QUOTE_ALL will ensure that all of your entries are quoted like so:
"value1","value2","value3" while using csv.QUOTE_NONE will give you: value1,value2,value3
Additionally, this will change all of your quotes in the entries to double quotes as follows. "somedata"user"somemoredata will become
"somedata""user""somemoredata in your written .csv
However, if you set your quotechar to the backslash character (for example), your entry will return as \" for all quotes.
create=csv.writer(open("test.csv","wb"),quoting=csv.QUOTE_NONEescapechar='\\', quotechar='"')
for element in file:
create.writerow(element)
and the previous example will become somedata\"user\"somemoredata which is clean. It will also escape any commas that you have in your elements the same way.

Categories

Resources