Modify default separators in json.dump in python 2.7.1

Modify default separators in json.dump in python 2.7.1 - python

In a json.dump method (python 2.7.1) the output has the default separator as (',' and ': '). I want to remove the comma and the colon so that my outputs are simply separated by white space.
I also want to remove the opening and closing braces. Is there any particular attribute of separator or string formatting that allows me to do this or is there any other solution?
For example after applying
with open(foutput, 'a') as f1:
json.dump(newdict, f1,sort_keys=True,indent=4)
I am getting output as :
{
"0.671962000": 51.61292129099999,
"0.696699155": 51.61242420999999,
"0.721436310": 51.610724798999996,
"0.746173465": 51.60536924799999,
"0.770910620": 51.58964636499999,
"0.795647775": 51.543248571999996,
"0.820384930": 51.381941735,
}
But I want the below type output instead of that:
0.671962000 -28.875564044
0.696699155 -28.876061125
0.721436310 -28.877760536
0.746173465 -28.883116087
0.770910620 -28.898838970
Please note I only want this in python.
Thanks in advance!

You are not producing JSON, so don't use the JSON module. You are producing CSV data, with a space as delimiter. Use the csv module, or use simple string formatting.
Using the csv module:
import csv
with open(foutput, 'a', newline='') as f1:
writer = csv.writer(f1, delimiter=' ')
writer.writerows(sorted(newdict.items()))
or simply using string formatting:
with open(foutput, 'a') as f1:
for key, value in sorted(newdict.items()):
f1.write('{} {}\n'.format(key, value)

Related

remove double quotes in each row (csv writer)

I'm writing API results to CSV file in python 3.7. Problem is it adds double quotes ("") to each row when it writes to file.
I'm passing format as csv to API call, so that I get results in csv format and then I'm writing it to csv file, store to specific location.
Please suggest if there is any better way to do this.
Here is the sample code..
with open(target_file_path, 'w', encoding='utf8') as csvFile:
writer = csv.writer(csvFile, quoting=csv.QUOTE_NONE, escapechar='\"')
for line in rec.split('\r\n'):
writer.writerow([line])
when I use escapechar='\"' it adds (") at the of every column value.
here is sample records..
2264855868",42.38454",-71.01367",07/15/2019 00:00:00",07/14/2019 20:00:00"
2264855868",42.38454",-71.01367",07/15/2019 01:00:00",07/14/2019 21:00:00"

API gives string/bytes which you can write directly in file.
data = request.get(..).content
open(filename, 'wb').write(data)
With csv.writer you would have to convert string/bytes to Python's data using csv.reader and then convert it back to string/bytes with csv.writer - so there is no sense to do it.
The same method should work if API send any file: JSON, CSV, XML, PDF, images, audio, etc.
For bigger files you could use chunk/stream in requests. Doc: requests - Advanced Usage

Have you tried removing the backward-slash from escapechar='\"'? It shouldn't be necessary, since you are using single quotes for the string.
EDIT: From the documentation:
A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character.
And the delimeter:
A one-character string used to separate fields. It defaults to ','
So it is going to escape the delimeter (,) with whatever you set as the escapechar, in this case ,
If you don't want any escape, try leaving it empty

Try:
import codecs
def find_replace(file, search_characters, replace_with):
text = codecs.open(file, "r", "utf-8-sig")
text = ''.join([i for i in text]).replace(
search_characters, replace_with)
x = codecs.open(file, "w", "utf-8-sig")
x.writelines(text)
x.close()
if __name__ == '__main__':
file = "target_file_path"
search_characters = '"'
replace_with = ''
find_replace(file, search_characters, replace_with)
output:
2264855868,42.38454,-71.01367,07/15/2019 00:00:00,07/14/2019 20:00:00
2264855868,42.38454,-71.01367,07/15/2019 01:00:00,07/14/2019 21:00:00

Escape commas when writing string to CSV

I need to prepend a comma-containing string to a CSV file using Python. Some say enclosing the string in double quotes escapes the commas within. This does not work. How do I write this string without the commas being recognized as seperators?
string = "WORD;WORD 45,90;WORD 45,90;END;"
with open('doc.csv') as f:
prepended = string + '\n' + f.read()
with open('doc.csv', 'w') as f:
f.write(prepended)

So as you point out, you can typically quote the string as below. Is the system that reads these files not recognizing that syntax? If you use python's csv module it will handle the proper escaping:
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(myIterable, quoting=csv.QUOTE_ALL)
The quoted strings would look like:
"string1","string 2, with, commas"
Note if you have a quote character within your string it will be written as "" (two quote chars in a row):
"string1","string 2, with, commas, and "" a quote"

Python csv writer : "Unknown Dialect" Error

I have a very large string in the CSV format that will be written to a CSV file.
I try to write it to CSV using the simplest if the python script
results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
"2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
resultArray = results.split('\n')
with open(csvfile, 'wb') as f:
writer = csv.writer(f)
for row in resultArray:
writer.writerows(row)
The code returns
"Unknown Dialect"
Error
Is the error because of the script or is it due to the string that is being written?
EDIT
If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?

You need to specify the format of your string:
with open(csvfile, 'wb') as f:
writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)
You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.
To really exploit the module, try this:
import csv
lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]
reader = csv.reader(lines, quotechar="'")
with open('out.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(reader))
out.csv will have:
A,bunch+of,multiline,"CSV,LIKE,STRING"
If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:
"A","bunch+of","multiline","CSV,LIKE,STRING"
To change the quotes to ', add quotechar="'" to the writer object.

The above code does not give csv.writer.writerows input that it expects. Specifically:
resultArray = results.split('\n')
This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:
for row in resultArray:
writer.writerows(row)
But writerows does not expect a single string. From the docs:
csvwriter.writerows(rows)
Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.
So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:
A row must be a sequence of strings or numbers for Writer objects
Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.
For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.
If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.

you can use 'register_dialect':
for example for escaped formatting:
csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)

Regex to remove doubled double quotes from CSV

I have an excel sheet that has a lot of data in it in one column in the form of a python dictionary from a sql database. I don't have access to the original database and I can't import the CSV back into sql with the local infile command due to the fact that the keys/values on each row of the CSV are not in the same order. When I export the excel sheet to CSV I get:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
What is the best way to remove the " before and after the curly brackets as well as the extra " around the keys/values?
I also need to leave the integers alone that don't have quotes around them.
I am trying to then import this into python with the json module so that I can print specific keys but I can't import them with the doubled double quotes. I ultimately need the data saved in a file that looks like:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Any help is most appreciated!

Easy:
text = re.sub(r'"(?!")', '', text)
Given the input file: TEST.TXT:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
The script:
import re
f = open("TEST.TXT","r")
text_in = f.read()
text_out = re.sub(r'"(?!")', '', text_in)
print(text_out)
produces the following output:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}

This should do it:
with open('old.csv') as old, open('new.csv', 'w') as new:
new.writelines(re.sub(r'"(?!")', '', line) for line in old)

If the input file is just as shown, and of the small size you mention, you can load the whole file in memory, make the substitutions, and then save it. IMHO, you don't need a RegEx to do this. The easiest to read code that does this is:
with open(filename) as f:
input= f.read()
input= str.replace('""','"')
input= str.replace('"{','{')
input= str.replace('}"','}')
with open(filename, "w") as f:
f.write(input)
I tested it with the sample input and it produces:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Which is exactly what you want.
If you want, you can also pack the code and write
with open(inputFilename) as if:
with open(outputFilename, "w") as of:
of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}'))
but I think the first one is much clearer and both do exactly the same.

I think you are overthinking the problem, why don't replace data?
l = list()
with open('foo.txt') as f:
for line in f:
l.append(line.replace('""','"').replace('"{','{').replace('}"','}'))
s = ''.join(l)
print s # or save it to file
It generates:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Use a list to store intermediate lines and then invoke .join for improving performance as explained in Good way to append to a string

You can actual use the csv module and regex to do this:
st='''\
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\
'''
import csv, re
data=[]
reader=csv.reader(st, dialect='excel')
for line in reader:
data.extend(line)
s=re.sub(r'(\w+)',r'"\1"',''.join(data))
s=re.sub(r'({[^}]+})',r'\1\n',s).strip()
print s
Prints
{"first_name":"John","last_name":"Smith","age":"30"}
{"first_name":"Tim","last_name":"Johnson","age":"34"}

Insert whitespace after delimiter with CSV writer

f = open("file1.csv", "r")
g = open("file2.csv", "w")
a = csv.reader(f, delimiter=";", skipinitialspace=True)
b = csv.writer(g, delimiter=";")
for line in a:
b.writerow(line)
In the above code, I try to load file1.csv using the csv module in Python2.7, and then write it in file2.csv using a csv.writer.
My issue comes from existing whitespaces (a single space character) after the delimiter in the input file. I need to remove them in order to do some data manipulation later on, so I used the skipinitialspace=True argument for the reader. However, I cannot get the writer to print the space char after the delimiter, and therefore disturbing any subsequent diffing of the two files.
I tried to use the Sniffer class to auto-generate a Dialect but I guess my input files (coming from a large complex legacy system, with dozens of fields and poor quoting and escaping) are proving to be too complex for this.
In more simple terms I'm looking for the answers to the following questions:
How can I insert a space character after each delimiter in the writer?
Incidently, what are the reasons to prohibit the use of multi-character strings as delimiters? delimiter="; " would've solved my problem.

You can wrap your file objects in proxies that add the whitespace:
>>> class DelimitedFile(file):
... def write(self, value):
... super(DelimitedFile, self).write(value.replace(";", "; "))
...
>>> f = DelimitedFile("foo", "w")
>>> f.write("hello;world")
>>> f.close()
>>> open("foo").read()
'hello; world'

If you left the whitespace you want written in (removing/restoring it during processing), or put it back after processing but before writing, that would take care of it.

One solution would be to write to a StringIO object, and then to replace the semicolons with '; ', or to do so during processing of the lines, if you do any other processing.

As for the first, I would probably do something like this:
for k, line in enumerate(a):
if k == 0:
b.writerow(line)
else:
b.writerow(' ' + line) #assuming line is always a string, if not just use str() on it
As for the second, I have no idea.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.