Merge JSON files JSONDecodeError - python

Goal: Merge JSON files into one big file
Background: I am using the code below taken from here Issue with merging multiple JSON files in Python
import json
import glob
result = []
for f in glob.glob("/Users/EER/Desktop/JSON_Combo/*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
However, I get the following error:
JSONDecodeError: Extra data: line 2 column 1 (char 5733)
I checked Python json.loads shows ValueError: Extra data and JSONDecodeError: Extra data: line 1 column 228 (char 227) and ValueError: Extra Data error when importing json file using python but they are a bit different. A potential reason for the error seems to be that my .json files are a list of strings but I am not sure
Question: Any thoughts on how to fix this error?

There is an invalid JSON file in your files, found out which one caused it by catching the error with try except
import json
import glob
result = []
for f in glob.glob("/Users/EER/Desktop/JSON_Combo/*.json"):
with open(f, "rb") as infile:
try:
result.append(json.load(infile))
except ValueError:
print(f)
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)

Related

python - json keep returning JSONDecodeError when reading from file

I want to write data to a json file. If it does not exists, I want to create that file, and write data to it. I wrote code for it, but I'm getting json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).
Here's part of my code:
data = {"foo": "1"}
with open("file.json", "w+") as f:
try:
file_data = json.load(f)
json.dump(data, f)
print("got data from file:", file_data)
except json.decoder.JSONDecodeError:
json.dump(data, f)
print("wrote")
I put print statements so that I can "track" what's going on, but If I try to run this code multiple times, I keep getting wrote message.
Thanks in advance for help!
The problem is that you open the file for write/read therefore once you open the file it will be emptied.
Then you want to load the content with json.load and it obviously fails because the file is not a valid JSON anymore.
So I'd suggest to open the file for reading and writing separately:
import json
with open("file.json") as json_file:
file_data = json.load(json_file)
print("got data from file:", file_data)
data = {"foo": "1"}
with open("file.json", "w") as json_file:
json.dump(data, json_file)
Hope it helps!

Opening file sometimes results in json.decoder.JSONDecodeError

I have a file that is updated with new content and saved every hour:
with open('data.txt','w') as outfile:
json.dump(data,outfile)
I then read this file at any given time:
with open('data.txt') as json_file:
data = json.load(json_file)
The problem I'm having is sometimes this file is in the process of being updated with new content when trying to read the file which results in a json.decoder.JSONDecodeError. How can I avoid this error? Maybe a try except case that waits for the file to be readable?
Easiest way, would be to catch JSONDecodeError.
try:
with open('data.txt') as json_file:
data = json.load(json_file)
except JSONDecodeError:
time.sleep(10)
with open('data.txt') as json_file:
data = json.load(json_file)
Or you can try answer from https://stackoverflow.com/a/11115521/4916849.

How to open a json.gz file and return to dictionary in Python

I have downloaded a compressed json file and want to open it as a dictionary.
I used json.load but the data type still gives me a string.
I want to extract a keyword list from the json file. Is there a way I can do it even though my data is a string?
Here is my code:
import gzip
import json
with gzip.open("19.04_association_data.json.gz", "r") as f:
data = f.read()
with open('association.json', 'w') as json_file:
json.dump(data.decode('utf-8'), json_file)
with open("association.json", "r") as read_it:
association_data = json.load(read_it)
print(type(association_data))
#The actual output is 'str' but I expect it is 'dic'
In the first with block you already got the uncompressed string, no need to open it a second time.
import gzip
import json
with gzip.open("19.04_association_data.json.gz", "r") as f:
data = f.read()
j = json.loads (data.decode('utf-8'))
print (type(j))
Open the file using the gzip package from the standard library (docs), then read it directly into json.loads():
import gzip
import json
with gzip.open("19.04_association_data.json.gz", "rb") as f:
data = json.loads(f.read(), encoding="utf-8")
To read from a json.gz, you can use the following snippet:
import json
import gzip
with gzip.open("file_path_to_read", "rt") as f:
expected_dict = json.load(f)
The result is of type dict.
In case if you want to write to a json.gz, you can use the following snippet:
import json
import gzip
with gzip.open("file_path_to_write", "wt") as f:
json.dump(expected_dict, f)

How to avoid partial file if exception thrown during json dump?

The following code throws
TypeError: Object of type 'datetime' is not JSON serializable
which I know how to resolve. However my real question is how to cleanly structure the code to avoid a partial file if any exception occurs in json.dump.
import datetime
import json
def save(data):
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
data = dict(sometime=datetime.datetime.now())
save(data)
The above code throws an exception and results in a partial file like:
{"sometime":
Should I dumps to a string first in a try/except? If so are there any memory implications to be aware of? Or delete the file in an except block?
Use a try/except block like:
Code:
def save_json(data, filename):
try:
with open(filename, 'w') as outfile:
json.dump(data, outfile)
except:
try:
os.unlink(filename)
except FileNotFoundError:
pass
and if you want to preserve the exception:
def save_json(data, filename):
try:
with open(filename, 'w') as outfile:
json.dump(data, outfile)
except:
if os.path.exists(filename):
os.unlink(filename)
raise
Test Code:
import datetime
import json
import os
data = dict(sometime=datetime.datetime.now())
save_json(data, 'data.txt')
That depends on whether your JSON data is under your control or from unknown source. If it’s from somewhere you can’t predict, use try...except... block. Otherwise, fix your program to make it always available to serialize.

Combining Regex files in Python

I have 48 .rx.txt files and I'm trying to combine them using Python. I know that when you combine .rx.txt files, you have to include a "|" in between the files.
Here's the code that I'm using:
import glob
read_files = filter(lambda f: f!='final.txt' and f!='result.txt', glob.glob('*.txt'))
with open("REGEXES.rx.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())
outfile.write('|')
But when I try to run that I get this error:
Traceback (most recent call last):
File "/Users/kosay.jabre/Desktop/Password Assessor/RegexesNEW/CombineFilesCopy.py", line 10, in <module>
outfile.write('|')
TypeError: a bytes-like object is required, not 'str'
Any ideas on how I can combine my files into one file?
Your REGEXES.rx.txt is opened in binary mode, but with outfile.write('|') you attempting to write string to it instead of binary. It seems that all of your files contain text data, so instead of opening them as binaries open them as texts, i.e.:
with open("REGEXES.rx.txt", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
outfile.write('|')
In python2.7.x your code will work fine, but for python3.x you should add b prefix to the string outfile.write(b'|') that will mark the string as a binary string and then we will be able to write it in a binary file mode.
Then your code for python3.x will be:
import glob
read_files = filter(lambda f: f!='final.txt' and f!='result.txt', glob.glob('*.txt'))
with open("REGEXES.rx.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())
outfile.write(b'|')

Categories

Resources