I am writing the csv file like this
for a in products:
mylist =[]
for h in headers['product']:
mylist.append(a.get(h))
writer.writerow(mylist)
My my few fields are text fields can conatins any characters like , " ' \n or anything else. what is the safest way to write that in csv file. also file will also have integers and floats
You should use QUOTE_ALL quoting option:
import StringIO
import csv
row = ["AAA \n BBB ,222 \n CCC;DDD \" EEE ' FFF 111"]
output = StringIO.StringIO()
wr = csv.writer(output, quoting=csv.QUOTE_ALL)
wr.writerow( row )
# Test:
contents = output.getvalue()
parsedRow = list(csv.reader([contents]))[0]
if parsedRow == row: print "BINGO!"
using csv.QUOTE_ALL will ensure that all of your entries are quoted like so:
"value1","value2","value3" while using csv.QUOTE_NONE will give you: value1,value2,value3
Additionally, this will change all of your quotes in the entries to double quotes as follows. "somedata"user"somemoredata will become
"somedata""user""somemoredata in your written .csv
However, if you set your quotechar to the backslash character (for example), your entry will return as \" for all quotes.
create=csv.writer(open("test.csv","wb"),quoting=csv.QUOTE_NONEescapechar='\\', quotechar='"')
for element in file:
create.writerow(element)
and the previous example will become somedata\"user\"somemoredata which is clean. It will also escape any commas that you have in your elements the same way.
Related
I am trying to read a CSV file that sometimes uses double quotes (") for strings and sometimes uses single quotes (') for strings.
I would like to read the file to properly handle these strings.
It is not necessary but it would be helpful if " don't " was parsed correctly. This is why I want to avoid just replacing every ' for ".
A crude way to handle this would be to use regex to detect any single quotation which is either preceded by a space, or followed by a space.
We can then replace just these quotations with " and ignore the ones which have letters directly next to them.
CSV
"""Let's do a test""","""We will replace all 'single' quotation's not within""","""A word to """""
Python
import re
pattern = r'((?<=\s)\')|(\'(?=\s))'
data = []
with open('hello.csv', 'r') as file:
for row in file.readlines():
data.append(re.sub(pattern, '"', row))
Output
['"""Let\'s do a test""","""We will replace all "single" quotation\'s not within""","""A word to """""\n']
You can use quoting=csv.QUOTE_NONE to prevent quote processing by csv.reader, then use ast.literal_eval to interpret values as Python literals (or, if that fails, keep them as strings).
import io
import csv
from ast import literal_eval
def unquote(item):
item = item.strip()
try:
return literal_eval(item.strip())
except ValueError:
return item
f = io.StringIO(r'''
bare, "John \"O'Brien\" Smith", 'John "O\'Brien" Smith', 42
'''.strip())
reader = csv.reader(f, quoting=csv.QUOTE_NONE)
for row in reader:
parsed_row = [unquote(item) for item in row]
print(parsed_row)
# => ['bare', 'John "O\'Brien" Smith', 'John "O\'Brien" Smith', 42]
Note though that since they are evaluated as Python literals, any unquoted field values that represent valid Python literals (e.g. True or 42) will not remain as strings.
Well I have been sent a csv by othe system with comma as the delimiter. one row has one column with sample values as:
,""ABC. & XYZ (CfdfB,afGgM)_0110"" , .
This row cause of this column is causing error.
While debugging Now when I read this using python and printed row, this particular value is printed as:
'ABC. & XYZ (CfdfB', ' afGgM)_0110""'
so this valus is getting split, reason being double of " and a comma in between.
code used is:
with open(abccsv, "r", newline='',encoding="UTF-8") as file:
reader = csv.reader(file, quotechar='"', delimiter=",",quoting=csv.QUOTE_ALL)
# counter = 0
for row in reader:
print(row)
Try a different delimiter than comma. It's splitting because it's using comma as it's delimiter for CSV files. Try tab or something else.
I want to export some data from DB to CSV file. I need to add a '|' delimiter to specific fields. At the moment when I export file, I use something like that:
- To specific fields (at the end and beginning) I add '|':
....
if response.value_display.startswith('|'):
sheets[response.sheet.session][response.input.id] = response.value_display
else:
sheets[response.sheet.session][response.input.id] = '|'+response.value_display+'|'
....
And I have CSV writer function settings like that:
self.writer = csv.writer(self.queue, dialect=dialect,
lineterminator='\n',
quotechar='',
quoting=csv.QUOTE_NONE,
escapechar=' ',
** kwargs)
Now It works, but when I have DateTime fields(where is space) writer adds some extra space.
When I have default settings (sometimes) at the end and beginning CSV writer add double-quotes but I don't know why and what it depends on.
To remove your extra spaces I would just do something like.
file = open(the_file.csv, w+) #open your csv file
file.write(file.readline().replace(" ", " ") #finds any two spaces and replaces with one
file.close()
With the delimiter it is specific to the situation. If you want to add it at the beginning or the end.
delimiter = "|"
my_str = my_str + delimiter
or
delimiter = "|"
my_str = delimiter + my_str
If you want to add the delimiter somewhere else you may have to get creative as it would be based on the context.
I'm not sure on the double quotes. I'd replace like the spaces.
file = open(the_file.csv, w+) #open your csv file
file.write(file.readline().replace("\"", "'")
file.close()
Assuming you wanted to replace the double quote with a single quote.
I'm trying to make csv module to parse lines containing quoted strings and quoted separators. Unfortunately I'm not able to achieve desired results with any dialect/format parameters. Is there any way to parse this:
'"AAA", BBB, "CCC, CCC"'
and get this:
['"AAA"', 'BBB', '"CCC, CCC"'] # 3 elements, one quoted separator
?
Two fundamental requirements:
Quotations have to be preserved
Quoted, and not escaped separators have to be copied as regular characters
Is it possible?
There are 2 issues to overcome:
spaces around comma separator: skipinitialspace=True does the job (see also Python parse CSV ignoring comma with double-quotes)
preserving quoting when reading: replacing quotes by tripled quotes allows to preserve quotes
That second part is described in the documentation as:
Dialect.doublequote
Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True.
standalone example, without file:
import csv
data = ['"AAA", BBB, "CCC, CCC"'.replace('"','"""')]
cr = csv.reader(data,skipinitialspace=True)
row = next(cr)
print(row)
result:
['"AAA"', 'BBB', '"CCC, CCC"']
with a file as input:
import csv
with open("input.csv") as f:
cr = csv.reader((l.replace('"','"""' for l in f),skipinitialspace=True)
for row in cr:
print(row)
Have you tried this ?
import csv
with open('file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
print row
Problem: I cannot seem to parse the information in a text file because python reads it as a full string not individual separate strings. The spaces between each variable is not a \t which is why it does not separate. Is there a way for python to flexibly remove the spaces and put a comma or \t instead?
Example DATA:
MOR125-1 MOR129-1 0.587
MOR125-1 MOR129-3 0.598
MOR129-1 MOR129-3 0.115
The code I am using:
with open("Distance_Data_No_Bootstrap_RAW.txt","rb") as f:
reader = csv.reader(f,delimiter="\t")
d=list(reader)
for i in range(3):
print d[i]
Output:
['MOR125-1 MOR129-1 0.587']
['MOR125-1 MOR129-3 0.598']
['MOR129-1 MOR129-3 0.115']
Desired Output:
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can simply declare the delimiter to be a space, and ask csv to skip initial spaces after a delimiter. That way, your separator is in fact the regular expression ' +', that is one or more spaces.
rd = csv.reader(fd, delimiter=' ', skipinitialspace=True)
for row in rd:
print row
['MOR125-1', 'MOR129-1', '0.587']
['MOR125-1', 'MOR129-3', '0.598']
['MOR129-1', 'MOR129-3', '0.115']
You can instruct csv.reader to use space as delimiter and skip all the extra space:
reader = csv.reader(f, delimiter=" ", skipinitialspace=True)
For detailed information about available parameters check Python docs:
Dialect.delimiter
A one-character string used to separate fields. It defaults to ','.
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.