Writing to a CSV without getting quote marks, escapcehar error - python

I have an output I am writing to a CSV. I need to add csv.QUOTE_NONE but I can't seem to find the right location without it producing an error.
variable:
variable = ['20', '10', '30,30']
Note: some of the variables I am using will contain strings i.e ['Test','Output', '100']
code:
with open('file.csv', 'w') as csv_file:
writerc = csv.writer(csv_file)
for item in variable():
writerc.writerow(item)
When using the above code, it produces the following line in the CSV.
20,10,"30,30"
The required write is:
20,30,30,30
If I use quoting=csv.QUOTE_NONE I get an escapechar error _csv.Error: need to escape, but no escapechar set - this is resolved if I set an escapechar but this then adds a character in place of the quotation marks.
Any ideas?

You could try further splitting your data before writing it. This would avoid it needing to use quote characters automatically.
It works by creating a new list of values possibly containing multiple new split entries, for example your '30,30' would become ['30', '30']. Next it uses Python's chain function to flatten these sub-lists back into a single list which can then be written to your output CSV file.
import itertools
import csv
data = [['20', '10', '30,30'], ['Test','Output', '100']]
with open('file.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for line in data:
csv_output.writerow(list(itertools.chain.from_iterable(v.split(',') for v in line)))
This would give you the following file.csv:
20,10,30,30
Test,Output,100

I think the problem lies in the fact that .csv means Comma-separated
values, so it treats "," in the value as the separator or delimiter. That's why the doublequotes are used automatically to escape.
I suggest you use the pandas library which makes it easier to deal with this issue.
Code
import pandas as pd
df = pd.DataFrame({'variable' : ['20', '10', '30,30']})
# Note that I use '\t' as the separator instead of ',' and get a .tsv file which
# is essentially the same as .csv file except the separator.
df.to_csv(sep = '\t', path_or_buf='file.tsv', index=False)
you can see the differences of using these 2 separators in the Full script. Another thing is that, I think your code suggests that you use the variable
as the name of the column, but your output suggests that you use variable as the name of the row (or index). Anyway, my answer is based on the assumption that you use variable
as the name of the column. Hope it helps ;)

Related

How to replace characters in a csv file

I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.

Issue with parsing csv from Django web form

I was hoping someone could help me with this. I'm getting a file from a form in Django, this file is a csv and I'm trying to read it with Python's library csv. The problem here is that when I apply the function csv.reader and I turn that result into a list in order to print it, I find out that csv.reader is not splitting correctly my file.
Here are some images to show the problem
This is my csv file:
This my code:
And this is the printed value of the variable file_readed:
As you can see in the picture, it seems to be splitting my file character by character with some exceptions.
I thank you for any help you can provide me.
If you are pulling from a web form, try getting the csv as a string, confirm in a print or debug tool that the result is correct, and then pass it to csv using StringIO.
from io import StringIO
import csv
csv_string = form.files['carga_cie10'].file_read().decode(encoding="ISO-88590-1")
csv_file = StringIO(csv_string)
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
print(row)
Another thing you can try is changing the lineterminator argument to csv.reader(). It can default to \r\n but the web form might use some other value. Inspect the string you get from the web form to confirm.
that CSV does not seem right: you got some lines with more arguments than others.
The acronym of CSV being Comma Separated Values, you need to have the exact same arguments separated by commas for each line, or else it will mess it up.
I see in your lines you're maybe expecting to have 3 columns, instead you got lines with 2, or 4 arguments, and some of them have an opening " in one argument, comma, then closing " in the second argument
check if your script works with other CSVs maybe
Most likely you need to specify delimiter. Since you haven't explicitly told about the delimiter, I guess it's confused.
csv.reader(csvfile, delimiter=',')
However, since there are quotations with comma delimiter, you may need to alter the default delimiter on the CSV file's creation too for tab or something else.
The problem is here:
print(list(file_readed))
'list' is causing printing of every element within the csv as an individual unit.
Try this instead:
with open('carga_cie10') as f:
reader = csv.reader(f)
for row in reader:
print(" ".join(row))
Edit:
import pandas as pd
file_readed = pd.read_csv(file_csv)
print(file_readed)
The output should look clean. Pandas is highly useful in situations where data needs to be read, manipulated, changed, etc.

Pandas to_csv with multiple separators

I want to convert a pandas dataframe to csv with multiple separators. Is there a way?
dataframe.to_csv(file.csv, sep="%%")
Error: delimiter must be 1-character string
The easiest way might be to use a unique single-character separator first, then replace it:
tsv = dataframe.to_csv(sep='\t') # use '\1' if your data contains tabs
psv = tsv.replace('\t', '%%')
with open('file.csv', 'w') as outfile:
outfile.write(psv)
P.S.: Consider using an extension other than .csv since it's not comma separated.
I think there might be some bugs with replace as John says, cause it can't promise the replaced character is the seperator.
Besides, as to_csv returned as a string, if the data is big, it migth lead to memory error.
Here is another feasible solution.
with open('test_pandas.txt', 'w') as f:
for index, row in dataframe.iterrows():
l = map(str, row.values.tolist())
line = '%%'.join(l)
f.write(line+'\n')

How to parse a string using a CSV parser in Python?

I need to parse a string using a CSV parser. I've found this solution in many places, but it doesn't work for me. I was using Python 3.4, now I changed it to 2.7.9 and still nothing...
import csv
import StringIO
csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file)
for data in csv_reader:
# do something
Could anyone please suggest me another way to parse this string using a CSV parser? Or how can I make this work?
Obs: I have a string in a CSV format, with fields that have commas inside, that's why I can't parse it in the standard way.
You need to put double quotes around elements that contain commas.
The CSV format implements RFC 4180, which states:
Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes.
So for instance (run code here.):
import StringIO
import csv
# the text between double quotes will be treated
# as a single element and not parsed by commas
line = '1,2,3,"1,2,3",4'
csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file)
for data in csv_reader:
# output: ['1', '2', '3', '1,2,3', '4']
print data
As another option, you can change the delimiter. The default for csv.reader is delimiter=',' and quotechar='"' but both of these can be changed depending on your needs.
Semicolon Delimiter:
line = '1;2;3;1,2,3;4'
csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file, delimiter=';')
for data in csv_reader:
# output: ['1', '2', '3', '1,2,3', '4']
print data
Vertical Bar Quotechar
line = '1,2,3,|1,2,3|,4'
csv_file = StringIO.StringIO(line)
csv_reader = csv.reader(csv_file, quotechar='|')
for data in csv_reader:
# output: ['1', '2', '3', '1,2,3', '4']
print data
Also, the python csv module works on python 2.6 - 3.x, so that shouldn't be the problem.
The obvious solution that jumps out of the page, rather than reimplementing CSV parsing, is to preprocess the data first and replace all of the commas within strings by some never used token character (or even the word COMMA), then feeding that into the CSV parser, and then going back through the data and replacing the tokens back with commas.
Sorry, I've not tried this myself in Python, but I had issues with quotes in my data in another language, and that's how I solved it.
Also, Bcorso's answer is much more complete. Mine is just a quick hack to get around a common limitation.

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?
You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.
The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.
you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Categories

Resources