Python CSV Reader splitting on comma inside of quotes - python

from csv import reader
csv_reader_results = reader(["办公室弥漫着\"女红\"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!"],
escapechar='\\',
quotechar='"',
delimiter=',',
quoting=csv.QUOTE_ALL,
skipinitialspace=True)
for result in csv_reader_result:
print result[0]
What I'm expecting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉,,,,用心做东西感觉真好!!!
But what I'm getting is:
办公室弥漫着"女红"缝扣子.编蝴蝶结..手绣花...呵呵..原来做 些也会有幸福的感觉
Because it splits on the four commas inside the sentence.
I'm escaping the quotes inside of the sentence. I've set the quotechar and escapechar for csv.reader. What am I doing wrong here?
Edit:
I used the answer by j6m8 https://stackoverflow.com/a/19881343/3945463 as a workaround. But it would be preferable to learn the correct way to do this with csv reader.

Related

How to write string to csv that contain escape chars?

I am trying to write a list of strings to csv using csv.writer.
writer = csv.writer(f)
writer.writerow(some_text)
However, some of the strings contain a random escape character, which seems to be causing the following error : _csv.Error: need to escape, but no escapechar set
I've tried using the escapechar option in csv.writer like the following
writer = csv.writer(f, escapechar='\\')
but this seems to be a partial solution, since all the newline characters(\n) are not recognized.
How would I solve this problem? An example of a problematic string would be the following:
problem_string = "this \n sentence \% is \n problematic \g"
What format do you want to achieve in the end? Writing this to a csv seems to be leading to some odd outcomes anyway.
In any case, both of these code work for me without errors, both giving slightly different results with respect to escape characters.
With normal string:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = "this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
With raw input:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = r"this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)

Some CSV cells are wrapped in "quotes" while others are not

I am a newbie to Python. I am not able to debug the code. Can someone please guide how to debug?
with open(inputFile, mode='rt') as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
header = next(reader,None)
rows = sorted(reader, key=operator.itemgetter(1))
with open(outputFile, 'w') as final:
writer = csv.writer(final, delimiter=',')
writer.writerow(header)
for eachRow in rows:
writer.writerow(eachRow)
In some case the output is
"","xxx"
In other cases, I see
,xxx,
I tried for exception block faced some issue with indentation
When you instantiate csv.writer you can tell it what quoting behavior you want. Pass in quoting=csv.QUOTE_ALL to tell it to meticulously quote everything.
writer = csv.writer(final, delimiter=',', quoting=csv.QUOTE_ALL)
However, this is typically not necessary; any reasonable CSV implementation will allow and expect most fields to be unquoted. The only fields which really need to be quoted are the ones which contain literal double quotes or commas (or more generally speaking, literal instances of the column separator or the quoting character; there are common CSV dialects like TSV etc which use a different delimiter).

Python CSV Parsing, Escaped Quote Character

I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.
The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')

csv module automatically writing unwanted carriage returns

When using pythons csv module to create a csv it is automatically putting carriage return characters at the end of strings if the string has a comma inside it e.g:
['this one will have a carriage return, at the end','this one wont']
in an excel sheet this will turn out like:
| |this on|
because of the extra carriage return, it will also surround the string with the comma inside in double quotes, as expected.
The code I am using is:
with open(oldfile, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
How do I create a csv using the same data format which won't have carriage returns if the strings have commas inside, I don't mind the strings being surrounded by double quotes though
Here's a link to the diagnosis of the problem with the output .csv:
Excel showing empty cells when importing file created with csv module
It's the accepted answer.
I have changed my code to:
with open(oldfile, 'w', newline='', quoting=csv.QUOTE_MINIMAL) as csvfile:
writer = csv.writer(csvfile)
for row in data:
writer.writerow(row)
I am now getting the error:
TypeError: 'quoting' is an invalid keyword argument for this function
The built-in CSV module of python has the option: csv.QUOTE_MINIMAL. When this option is added as an argument to the writer, it adds quotemarks when the delimeter is in the given string: "your text, with comma", "other field". This will eliminate the need for carriage returns.
The code is:
with open(oldfile, 'w') as csvfile: writer = csv.writer(csvfile, quoting=csv.QUOTE_MINIMAL) for row in data: writer.writerow(row)

Reading ASCII with field delimiter as ctrl A and line delimiting as \n into python

I have an ASCII dataset that has ctrl A field delimiting and \n as the line delimiter. I am looking to read this into Python and am wondering how to deal with it. In particular I would like to be able to read this information into a pandas dataframe.
I currently have;
import pandas as pd
input = pd.read_csv('000000_0', sep='^A')
The error I then get is
_main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does
not support regex separators; you can avoid this warning by specifying engine='python'.
I then don't know how I am specifying the line delimiter too.
Any ideas?
Thanks in advance!
Instead of mentioning "^A" mention the hex code. Its works like a charm
import pandas as pd
data = pd.read_csv('000000_0', sep='\x01')
Use pd.read_csv with parameter sep=chr(1)
from io import StringIO
import pandas as pd
mycsv = """a{0}b{0}c
d{0}e{0}f""".format(chr(1))
pd.read_csv(StringIO(mycsv), sep=chr(1))
a b c
0 d e f
If by CTRL+A you mean the ASCII-Code for SOH (start of heading), try splitting your data on newline first to get the rows, and split these on "\x01", which is the hex code for SOH. But without any code, data, expected result or error message, this is mostly guessing.
Try this
reader = csv.reader(open("/Users/778123/Documents/Splunk/data/DMS3^idms_core^20200723140421.csv",newline=None), delimiter=',')
print(reader)
writer = csv.writer(open("/Users/778123/Documents/Splunk/data/DMS3^idms_core^test.csv", 'w'), delimiter=chr(1), quoting=csv.QUOTE_NONNUMERIC)
writer.writerows(reader)
Python's csv library is pretty good at reading delimited files ;-)
Taking an example from the docs linked above:
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print ', '.join(row)
This will automatically iterate over the lines in the file (thus handle the newline characters), and you can set the delimiter as shown.

Categories

Resources