When using pythons csv module to create a csv it is automatically putting carriage return characters at the end of strings if the string has a comma inside it e.g:
['this one will have a carriage return, at the end','this one wont']
in an excel sheet this will turn out like:
| |this on|
because of the extra carriage return, it will also surround the string with the comma inside in double quotes, as expected.
The code I am using is:
with open(oldfile, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
How do I create a csv using the same data format which won't have carriage returns if the strings have commas inside, I don't mind the strings being surrounded by double quotes though
Here's a link to the diagnosis of the problem with the output .csv:
Excel showing empty cells when importing file created with csv module
It's the accepted answer.
I have changed my code to:
with open(oldfile, 'w', newline='', quoting=csv.QUOTE_MINIMAL) as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
I am now getting the error:
TypeError: 'quoting' is an invalid keyword argument for this function
The built-in CSV module of python has the option: csv.QUOTE_MINIMAL. When this option is added as an argument to the writer, it adds quotemarks when the delimeter is in the given string: "your text, with comma", "other field". This will eliminate the need for carriage returns.
The code is:
with open(oldfile, 'w') as csvfile: writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL) for row in data: writer.writerow(row)
Related
I am trying to write a list of strings to csv using csv.writer.
writer = csv.writer(f)
writer.writerow(some_text)
However, some of the strings contain a random escape character, which seems to be causing the following error : _csv.Error: need to escape, but no escapechar set
I've tried using the escapechar option in csv.writer like the following
writer = csv.writer(f, escapechar='\\')
but this seems to be a partial solution, since all the newline characters(\n) are not recognized.
How would I solve this problem? An example of a problematic string would be the following:
problem_string = "this \n sentence \% is \n problematic \g"
What format do you want to achieve in the end? Writing this to a csv seems to be leading to some odd outcomes anyway.
In any case, both of these code work for me without errors, both giving slightly different results with respect to escape characters.
With normal string:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = "this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
With raw input:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = r"this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
I have a task to convert one CSV file from UTF8 encoding to ANSI encoding and format it. I have learned that ANSI is really the encoding of the system in some sense, but this is not the issue at the moment.
Before converting, I have to be able to read my CSV file first. The file contains null values and the headers are different from the rest of the file. Whenever I try to read it, I always get error regarding the NULL values or headers. The headers are different in a sense, that they do not have any quotation at all, but the rest of the file has 3 quotation marks on each side of strings (and for some reason also around NUL values). The file columns are coma separated and each row ends with a new line.
When I try to read with QUOTE_NONNUMERIC (to get rid of the null values), I get the error:
batch_data = list(reader)
ValueError: could not convert string to float: 'my_first_column_name'
When I try with QUOTE_ALL (in hopes to quote the headers as well), I get the error:
batch_data = list(reader)
_csv.Error: line contains NULL byte
Here is my code:
import csv
file_in = r'<file_in>'
with open(file_in, mode='r') as infile:
reader = csv.reader(infile, quoting=csv.QUOTE_NONNUMERIC)
batch_data = list(reader)
for row in batch_data:
print(row, end="")
After reading some materials I understand that, I guess, I have to read headers separately from the rest of the file. How would one do it exactly? I was trying to skip them with reader.next(), but then I get the error, that next is not a method of reader. At this point I can't really believe that it has already taken so much of my time reading through the documentations and trying different things.
SOLUTION
I ended up using list(next()) to skip over the header and then replace all the null values with empty string. Then opening the output file with the configuration I needed and writing my rows in it. I do still have to deal with different formatting since this solution puts double quotes around every cell and I need some to be numbers. This also replaced all single quotes from sells to double quotes, which still has to be fixed. file_in and file_out variables have assigned the input file and output file locations to them.
import csv
import json
file_in = r'<input_filepath>'
file_out = r'<output_filepath>'
with open(file_in, mode='r', encoding='utf-8-sig') as infile:
reader = csv.reader(infile, quoting=csv.QUOTE_ALL)
header_data = list(next(reader))
infile = infile.read().replace('\0', '').splitlines()
reader2 = csv.reader(infile)
with open(file_out,'w',newline='', encoding='cp1252') as outfile:
writer = csv.writer(outfile, delimiter=';', quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(header_data)
for row in reader2:
row = str(row).replace('"','').replace("'",'"')
row = json.loads(row)
writer.writerow(row)
from csv import reader
csv_reader_results = reader(["办公室弥漫着\"女红\"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!"],
escapechar='\\',
quotechar='"',
delimiter=',',
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
for result in csv_reader_result:
print result[0]
What I'm expecting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!
But what I'm getting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉
Because it splits on the four commas inside the sentence.
I'm escaping the quotes inside of the sentence. I've set the quotechar and escapechar for csv.reader. What am I doing wrong here?
Edit:
I used the answer by j6m8 https://stackoverflow.com/a/19881343/3945463 as a workaround. But it would be preferable to learn the correct way to do this with csv reader.
I tried to write output file as a CSV file but getting either an error or not the expected result. I am using Python 3.5.2 and 2.7 also.
Getting error in Python 3.5:
wr.writerow(var)
TypeError: a bytes-like object is required, not 'str'
and
In Python 2.7, I am getting all column result in one column.
Expected Result:
An output file same format as the input file.
Code:
import csv
f1 = open("input_1.csv", "r")
resultFile = open("out.csv", "wb")
wr = csv.writer(resultFile, quotechar=',')
def sort_duplicates(f1):
for i in range(0, len(f1)):
f1.insert(f1.index(f1[i])+1, f1[i])
f1.pop(i+1)
for var in f1:
#print (var)
wr.writerow([var])
If I am using resultFile = open("out.csv", "w"), I get one row extra in the output file.
If I am using above code, getting one row and column extra.
On Python 3, csv requires that you open the file in text mode, not binary mode. Drop the b from your file mode. You should really use newline='' too:
resultFile = open("out.csv", "w", newline='')
Better still, use the file object as a context manager to ensure it is closed automatically:
with open("input_1.csv", "r") as f1, \
open("out.csv", "w", newline='') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
for var in f1:
wr.writerow([var.rstrip('\n')])
I've also stripped the lines from f1 (just to remove the newline) and put the line in a list; csv.writer.writerow wants a sequence with columns, not a single string.
Quoting the csv.writer() documentation:
If csvfile is a file object, it should be opened with newline='' [1]. [...] All other non-string data are stringified with str() before being written.
[1] If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.
Others have answered that you should open the output file in text mode when using Python 3, i.e.
with open('out.csv', 'w', newline='') as resultFile:
...
But you also need to parse the incoming CSV data. As it is your code reads each line of the input CSV file as a single string. Then, without splitting that line into its constituent fields, it passes the string to the CSV writer. As a result, the csv.writer will treat the string as a sequence and output each character , including any terminating new line character, as a separate field. For example, if your input CSV file contains:
1,2,3,4
Your output file would be written like this:
1,",",2,",",3,",",4,"
"
You should change the for loop to this:
for row in csv.reader(f1):
# process the row
wr.writerow(row)
Now the input CSV file will be parsed into fields and row will contain a list of strings - one for each field. For the previous example, row would be:
for row in csv.reader(f1):
print(row)
['1', '2', '3', '4']
And when that list is passed to the csv.writer the output to the file will be:
1,2,3,4
Putting all of that together you get this code:
import csv
with open('input_1.csv') as f1, open('out.csv', 'w', newline='') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
for row in csv.reader(f1):
wr.writerow(row)
open file without b mode
b mode open your file as binary
you can open file as w
open_file = open("filename.csv", "w")
You are opening the input file in normal read mode but the output file is opened in binary mode, correct way
resultFile = open("out.csv", "w")
As shown above if you replace "wb" with "w" it will work.
I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?
You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.
The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.
you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)