Python JSON encoding Verification - python

I am trying to scan in a text document that I have and then find certain sections and output it to a file in json format.
Unfortunatly I am not to sure how to use json and would appricate it if someone could tell me how to encode it as json properly.
Thank you everyone!
#save word and type to database
word = [{'WORD':strWrd , 'TYPE':strWrdtyp}]
with open(input_lang+'.dic', 'a') as outfile:
try:
json.dump(word, outfile)
outfile.write('\n')
outfile.close
except (TypeError, ValueError) as err:
print 'Error:', err

Related

python - json keep returning JSONDecodeError when reading from file

I want to write data to a json file. If it does not exists, I want to create that file, and write data to it. I wrote code for it, but I'm getting json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).
Here's part of my code:
data = {"foo": "1"}
with open("file.json", "w+") as f:
try:
file_data = json.load(f)
json.dump(data, f)
print("got data from file:", file_data)
except json.decoder.JSONDecodeError:
json.dump(data, f)
print("wrote")
I put print statements so that I can "track" what's going on, but If I try to run this code multiple times, I keep getting wrote message.
Thanks in advance for help!
The problem is that you open the file for write/read therefore once you open the file it will be emptied.
Then you want to load the content with json.load and it obviously fails because the file is not a valid JSON anymore.
So I'd suggest to open the file for reading and writing separately:
import json
with open("file.json") as json_file:
file_data = json.load(json_file)
print("got data from file:", file_data)
data = {"foo": "1"}
with open("file.json", "w") as json_file:
json.dump(data, json_file)
Hope it helps!

Decode from Escaped Unicode to Arabic using Python

I was trying to decode a json file that has escaped unicode text /uHHH .. the original text is Arabic
my research lead me to the following code using python.
s = '\u00d8\u00b5\u00d9\u0088\u00d8\u00b1 \u00d8\u00a7\u00d9\u0084\u00d9\u008a\u00d9\u0088\u00d9\u0085\u00d9\u008a\u00d8\u00a7\u00d8\u00aa'
ouy= s.encode('utf-8').decode('unicode-escape').encode('latin1').decode('utf-8')
print(ouy)
the result text will be: صÙر اÙÙÙÙÙات
which still needs some fix using online tool to become the original text: صور اليوميات
Is there any way to perform that fix using the above code?
Would appreciate your help guys, thanks in advance
you can use this script to update all JSON files
import json
filename = 'YourFile.json' # file name we want to compress
newname = filename.replace('.json', '.min.json') # Output file name
with open(filename, encoding="utf8") as fp:
print("Compressing file: " + filename)
print('Compressing...')
jload = json.load(fp)
newfile = json.dumps(jload, indent=None, separators=(',', ':'), ensure_ascii=False)
newfile = newfile.encode('latin1').decode('utf-8') # remove this
#print(newfile)
with open(newname, 'w', encoding="utf8") as f: # add encoding="utf8"
f.write(newfile)
print('Compression complete!')
DecodeJsonToOrigin

How to save a path to json file as raw string

I want to save a path to json file, code as below:
def writeToJasonFile(results, filename):
with open(os.path.join(filename), "w") as fp:
try:
fp.write(json.dumps(results))
except Exception as e:
raise Exception("Exception while writing results " % e)
if __name__ == '__main__':
file_path = os.getcwd()
writeToJasonFile(file_path, 'test.json')
When I open json file, the string is saved as escape str: "C:\\test\\Python Script"
How could I dump it as raw string? Saying "C:\test\Python Script"
I could do it in another way. I replace '\' with '/' for the path string and then save it. Windows is able to open the location with this format "C:/test/Python Script". If someone has the answer for the original question, please post here.

Unicode issue in Python 2.7

I saved tweets in a json file
This is my code :
def on_data(self, data):
try:
with codecs.open('python.json', 'a', encoding='utf-8') as f:
f.write(data)
print("Tweet ajoute au JSON")
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
but I get this type of character : \u0e40\u0e21\u0e19\u0e0a
I tried everything to not have this kind of character but nothing works(utf-8, latin2...)
If you want the non-ascii characters encoded directly in the JSON file, you need to encode JSON with the ensure_ascii=False option.

Parsing XML from websites and save the code?

I would like to parse the xml code from a website like
http://ops.epo.org/3.1/rest-services/published-data/publication/docdb/EP1000000/biblio
and save it in another xml or csv file.
I tried it with this:
import urllib.request
web_data = urllib.request.urlopen("http://ops.epo.org/3.1/rest-services/published-data/publication/docdb/EP1000000/biblio")
str_data = web_data.read()
try:
f = open("file.xml", "w")
f.write(str(str_data))
print("SUCCESS")
except:
print("ERROR")
But in the saved XML data is between every element '\n' and at the beginning ' b' '
How can i save the XML data without all the 'n\' and ' b' '?
If you write the xml file in binary mode, you don't need to convert the data read into a string of characters first. Also, if you process the data a line at a time, that should get rid of '\n' problem. The logic of your code could also be structured a little better IMO, as shown below:
import urllib.request
web_data = urllib.request.urlopen("http://ops.epo.org/3.1/rest-services"
"/published-data/publication"
"/docdb/EP1000000/biblio")
data = web_data.read()
with open("file.xml", "wb") as f:
for line in data:
try:
f.write(data)
except Exception as exc:
print('ERROR')
print(str(exc))
break
else:
print('SUCCESS')
read() returns data as bytes but you can save data without converting to str(). You have to open file in byte mode - "wb" - and write data.
import urllib.request
web_data = urllib.request.urlopen("http://ops.epo.org/3.1/rest-services/published-data/publication/docdb/EP1000000/biblio")
data = web_data.read()
try:
f = open("file.xml", "wb")
f.write(data)
print("SUCCESS")
except:
print("ERROR")
BTW: To convert bytes to string/unicode you have to use ie. decode('utf-8') .
If you use str() then Python uses own method to create string and it adds b" to inform you that you have bytes in your data.

Categories

Resources