Escape commas when writing string to CSV - python

I need to prepend a comma-containing string to a CSV file using Python. Some say enclosing the string in double quotes escapes the commas within. This does not work. How do I write this string without the commas being recognized as seperators?
string = "WORD;WORD 45,90;WORD 45,90;END;"
with open('doc.csv') as f:
prepended = string + '\n' + f.read()
with open('doc.csv', 'w') as f:
f.write(prepended)

So as you point out, you can typically quote the string as below. Is the system that reads these files not recognizing that syntax? If you use python's csv module it will handle the proper escaping:
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(myIterable, quoting=csv.QUOTE_ALL)
The quoted strings would look like:
"string1","string 2, with, commas"
Note if you have a quote character within your string it will be written as "" (two quote chars in a row):
"string1","string 2, with, commas, and "" a quote"

Related

How to write string to csv that contain escape chars?

I am trying to write a list of strings to csv using csv.writer.
writer = csv.writer(f)
writer.writerow(some_text)
However, some of the strings contain a random escape character, which seems to be causing the following error : _csv.Error: need to escape, but no escapechar set
I've tried using the escapechar option in csv.writer like the following
writer = csv.writer(f, escapechar='\\')
but this seems to be a partial solution, since all the newline characters(\n) are not recognized.
How would I solve this problem? An example of a problematic string would be the following:
problem_string = "this \n sentence \% is \n problematic \g"
What format do you want to achieve in the end? Writing this to a csv seems to be leading to some odd outcomes anyway.
In any case, both of these code work for me without errors, both giving slightly different results with respect to escape characters.
With normal string:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = "this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)
With raw input:
import csv
with open('test2.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
problem_string = r"this \n sentence \% is \n problematic \g"
csvwriter.writerow(problem_string)

remove double quotes in each row (csv writer)

I'm writing API results to CSV file in python 3.7. Problem is it adds double quotes ("") to each row when it writes to file.
I'm passing format as csv to API call, so that I get results in csv format and then I'm writing it to csv file, store to specific location.
Please suggest if there is any better way to do this.
Here is the sample code..
with open(target_file_path, 'w', encoding='utf8') as csvFile:
writer = csv.writer(csvFile, quoting=csv.QUOTE_NONE, escapechar='\"')
for line in rec.split('\r\n'):
writer.writerow([line])
when I use escapechar='\"' it adds (") at the of every column value.
here is sample records..
2264855868",42.38454",-71.01367",07/15/2019 00:00:00",07/14/2019 20:00:00"
2264855868",42.38454",-71.01367",07/15/2019 01:00:00",07/14/2019 21:00:00"
API gives string/bytes which you can write directly in file.
data = request.get(..).content
open(filename, 'wb').write(data)
With csv.writer you would have to convert string/bytes to Python's data using csv.reader and then convert it back to string/bytes with csv.writer - so there is no sense to do it.
The same method should work if API send any file: JSON, CSV, XML, PDF, images, audio, etc.
For bigger files you could use chunk/stream in requests. Doc: requests - Advanced Usage
Have you tried removing the backward-slash from escapechar='\"'? It shouldn't be necessary, since you are using single quotes for the string.
EDIT: From the documentation:
A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character.
And the delimeter:
A one-character string used to separate fields. It defaults to ','
So it is going to escape the delimeter (,) with whatever you set as the escapechar, in this case ,
If you don't want any escape, try leaving it empty
Try:
import codecs
def find_replace(file, search_characters, replace_with):
text = codecs.open(file, "r", "utf-8-sig")
text = ''.join([i for i in text]).replace(
search_characters, replace_with)
x = codecs.open(file, "w", "utf-8-sig")
x.writelines(text)
x.close()
if __name__ == '__main__':
file = "target_file_path"
search_characters = '"'
replace_with = ''
find_replace(file, search_characters, replace_with)
output:
2264855868,42.38454,-71.01367,07/15/2019 00:00:00,07/14/2019 20:00:00
2264855868,42.38454,-71.01367,07/15/2019 01:00:00,07/14/2019 21:00:00

Python CSV Parsing, Escaped Quote Character

I am trying to parse a CSV file using the csv.reader, my data is separated by commas and each value starts and ends with quotation marks. Example:
"This is some data", "New data", "More \"data\" here", "test"
My problem is with the third value, the data I get which has quotation marks within it has an escape character to show it is part of the data. The python CSV reader does not use this escape character so it results in incorrect parsing.
I tried code like below:
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='\\"')
But I get an error complaining the quotechar is not 1 character.
My current solution is just to replace all characters \" characters with a single quote ' before parsing with csv.reader - however, I would like to know if there is a better way without modifying the original data.
The issue here is that you need to define an escapechar, so that the csv reader knows to treat \" as ".
csv.reader(csv_file, quotechar='"', delimiter=',', escapechar='\\')

Modify default separators in json.dump in python 2.7.1

In a json.dump method (python 2.7.1) the output has the default separator as (',' and ': '). I want to remove the comma and the colon so that my outputs are simply separated by white space.
I also want to remove the opening and closing braces. Is there any particular attribute of separator or string formatting that allows me to do this or is there any other solution?
For example after applying
with open(foutput, 'a') as f1:
json.dump(newdict, f1,sort_keys=True,indent=4)
I am getting output as :
{
"0.671962000": 51.61292129099999,
"0.696699155": 51.61242420999999,
"0.721436310": 51.610724798999996,
"0.746173465": 51.60536924799999,
"0.770910620": 51.58964636499999,
"0.795647775": 51.543248571999996,
"0.820384930": 51.381941735,
}
But I want the below type output instead of that:
0.671962000 -28.875564044
0.696699155 -28.876061125
0.721436310 -28.877760536
0.746173465 -28.883116087
0.770910620 -28.898838970
Please note I only want this in python.
Thanks in advance!
You are not producing JSON, so don't use the JSON module. You are producing CSV data, with a space as delimiter. Use the csv module, or use simple string formatting.
Using the csv module:
import csv
with open(foutput, 'a', newline='') as f1:
writer = csv.writer(f1, delimiter=' ')
writer.writerows(sorted(newdict.items()))
or simply using string formatting:
with open(foutput, 'a') as f1:
for key, value in sorted(newdict.items()):
f1.write('{} {}\n'.format(key, value)

Python: Converting Binary Literal text file to Normal Text

I have a text file in this format:
b'Chapter 1 \xe2\x80\x93 BlaBla'
b'Boy\xe2\x80\x99s Dead.'
And I want to read those lines and covert them to
Chapter 1 - BlaBla
Boy's Dead.
and replace them on the same file.
I tried encoding and decoding already with print(line.encode("UTF-8", "replace")) and that didn't work
strings = [
b'Chapter 1 \xe2\x80\x93 BlaBla',
b'Boy\xe2\x80\x99s Dead.',
]
for string in strings:
print(string.decode('utf-8', 'ignore'))
--output:--
Chapter 1 – BlaBla
Boy’s Dead.
and replace them on the same file.
There is no computer programming language in the world that can do that. You have to write the output to a new file, delete the old file, and rename the newfile to the oldfile. However, python's fileinput module can perform that process for you:
import fileinput as fi
import sys
with open('data.txt', 'wb') as f:
f.write(b'Chapter 1 \xe2\x80\x93 BlaBla\n')
f.write(b'Boy\xe2\x80\x99s Dead.\n')
with open('data.txt', 'rb') as f:
for line in f:
print(line)
with fi.input(
files = 'data.txt',
inplace = True,
backup = '.bak',
mode = 'rb') as f:
for line in f:
string = line.decode('utf-8', 'ignore')
print(string, end="")
~/python_programs$ python3.4 prog.py
b'Chapter 1 \xe2\x80\x93 BlaBla\n'
b'Boy\xe2\x80\x99s Dead.\n'
~/python_programs$ cat data.txt
Chapter 1 – BlaBla
Boy’s Dead.
Edit:
import fileinput as fi
import re
pattern = r"""
\\ #Match a literal slash...
x #Followed by an x...
[a-f0-9]{2} #Followed by any hex character, 2 times
"""
repl = ''
with open('data.txt', 'w') as f:
print(r"b'Chapter 1 \xe2\x80\x93 BlaBla'", file=f)
print(r"b'Boy\xe2\x80\x99s Dead.'", file=f)
with open('data.txt') as f:
for line in f:
print(line.rstrip()) #Output goes to terminal window
with fi.input(
files = 'data.txt',
inplace = True,
backup = '.bak') as f:
for line in f:
line = line.rstrip()[2:-1]
new_line = re.sub(pattern, "", line, flags=re.X)
print(new_line) #Writes to file, not your terminal window
~/python_programs$ python3.4 prog.py
b'Chapter 1 \xe2\x80\x93 BlaBla'
b'Boy\xe2\x80\x99s Dead.'
~/python_programs$ cat data.txt
Chapter 1 BlaBla
Boys Dead.
Your file does not contain binary data, so you can read it (or write it) in text mode. It's just a matter of escaping things correctly.
Here is the first part:
print(r"b'Chapter 1 \xe2\x80\x93 BlaBla'", file=f)
Python converts certain backslash escape sequences inside a string to something else. One of the backslash escape sequences that python converts is of the format:
\xNN #=> e.g. \xe2
The backslash escape sequence is four characters long, but python converts the backslash escape sequence into a single character.
However, I need each of the four characters to be written to the sample file I created. To keep python from converting the backslash escape sequence into one character, you can escape the beginning '\' with another '\':
\\xNN
But being lazy, I didn't want to go through your strings and escape each backslash escape sequence by hand, so I used:
r"...."
An r string escapes all the backslashes for you. As a result, python writes all four characters of the \xNN sequence to the file.
The next problem is replacing a backslash in a string using a regex--I think that was your problem to begin with. When a file contains a \, python reads that into a string as \\ to represent a literal backslash. As a result, if the file contains the four characters:
\xe2
python reads that into a string as:
"\\xe2"
which when printed looks like:
\xe2
The bottom line is: if you can see a '\' in a string that you print out, then the backslash is being escaped in the string. To see what's really inside a string, you should always use repr().
string = "\\xe2"
print(string)
print(repr(string))
--output:--
\xe2
'\\xe2'
Note that if the output has quotes around it, then you are seeing everything in the string. If the output doesn't have quotes around it, then you can't be sure exactly what's in the string.
To construct a regex pattern that matches a literal back slash in a string, the short answer is: you need to use double the amount of back slashes that you would think. With the string:
"\\xe2"
you would think that the pattern would be:
pattern = "\\x"
but based on the doubling rule, you actually need:
pattern = "\\\\x"
And remember r strings? If you use an r string for the pattern, then you can write what seems reasonable, and then the r string will escape all the slashes, doubling them:
pattern r"\\x" #=> equivalent to "\\\\x"

Categories

Resources