Combining JSON objects live

Combining JSON objects live - python

I have a JSON file that is getting continuously appended with new data. Each time it gets updated I need it to be "well-formed". The problem is that my JSON looks like this (each item is dumped serially):
{"one": 1},
{"two": 2}
I need the data to be properly formed so enclosing in square brackets could work, or an outer curly bracket. But I'm not quite sure how to do that.
[
{"one": 1},
{"two": 2}
]
Here is the code performing the JSON writing:
def printJSONFile(data):
json_dump = json.dumps(data, default=serialize)
try:
jf = open(fullpath, "a+")
jf.write(json_dump + ",\n")
jf.close()
except IOError:
print "ERROR: Unable to open/write to {}".format(fullpath)
return

Related

How to write a new dictionary to a json file

I have this function to make dictionaries into json objects and writes them to a json file:
def somethin(a1,a2,a3):
someDict = {"a": a1,
"b": a2,
"c": a3
}
json_obj = json.dumps(someDict)
file = open('somefile.json', 'a',encoding="utf-8")
file.write(json_obj)
the first written element to the file doesnt present any problem, but after adding the second json object to the file I get: SyntaxError: JSON.parse: unexpected non-whitespace character after JSON data with the json file I made:
{
"a": 1,
"b":2,
"c": 3
}
{
"a1": 1,
"b1":2,
"c1": 3
}
How can I modify the code for the json output file to be correct?

There are a few issues here:
file = open('somefile.json', 'a',encoding="utf-8")
You're implicitly assuming that the file will be closed here. That is true on CPython, but not on all Python implementations. A better approach would be:
with open('somefile.json', 'a',encoding="utf-8") as file:
file.write(json_obj)
Because that uses a context manager to explicitly close the file.
Second, you can avoid creating an extra string by writing directly to the file:
with open('somefile.json', 'a',encoding="utf-8") as file:
json.dump(someDict, file)
Third, having multiple JSON objects in a file is not valid JSON. There are a few approaches you could take here. One is to wrap the JSON objects in a list:
[
{
"a": 1,
"b":2,
"c": 3
},
{
"a1": 1,
"b1":2,
"c1": 3
}
]
So, start the file with an open bracket, and write a comma after every JSON object, except the last one, then end the file with a close bracket.
Second approach would be to newline-separate your JSON objects, like this:
{"a": 1,"b":2,"c": 3}
{"a1": 1, "b1":2,"c1": 3}
Each line is a JSON object. You'd read this like so:
with open("filename", "rt") as file:
for line in file:
obj = json.loads(line)
# do something with obj
# ...
The advantage of this approach would be that you can now load each individual JSON object in memory, without having to load the entire file in at once. The disadvantage is that you're no longer writing valid JSON, so you can't use tools like jq on the output. (If you want the best of both worlds, you can use a package like ijson, but that's more complex.)

Can you append to a dictionary from a foreign python file?

So I have a project I'm working on for fun but it requires me to append to a dictionary from another python file. In file1.py it will look like
Name: Eric <-- user input
Age: 27 <-- user input
and file2.py,
information = {'Eric':27}
I know that I can temporarily append to a dictionary while running the code, but it seems to reset after I close the program. Like recently I've seen this on a StackOverflow question
d = {'key': 'value'}
print(d)
# {'key': 'value'}
d['mynewkey'] = 'mynewvalue'
print(d)
# {'key': 'value', 'mynewkey': 'mynewvalue'}
But this too, resets after every run so I thought that the only way to save the dictionary is to write it to another file. Is there any way that I can achieve this or maybe a better alternative?

You can use JSON to save data to a file.
This will save the data, that is stored in your dictionary, in a file.
import json
my_dict = {"key": "value", "key2": "value2"}
with open("output_file.txt", "w") as file:
json.dump(my_dict, file, indent=4)
To use that data again, you can load that file.
import json
with open("output_file.txt") as file:
my_dict = json.load(file)
print(my_dict) # Will print {"key": "value", "key2": "value2"}
JSON stands for JavaScriptObjectNotation, and it's a way to save data in a string format (a file)
So JSON can convert a string into data, if it is valid JSON:
import json
string_data = '{"key": "value"}'
dictionary = json.loads(string_data)
print(type(string_data)) # <class 'str'>
print(type(dictionary)) # <class 'dict'>

Commas between two json object

I am creating a json file from pseudo xml format file. However I get commas between json object, which I don't want.
This is sample of what I get:
[{"a": a , "b": b } , {"a": a , "b": b }]
However I want this:
{"a": a , "b": b } {"a": a , "b": b }
It might not be a valid json but I want it that way so that I can shuffle it by doing:
shuf -n 100000 original.json > sample.json
otherwise, it will be just one big line of json
This is my code:
def read_html_file(file_name):
f = open(file_name,"r", encoding="ISO-8859-1")
html = f.read()
parsed_html = BeautifulSoup(html, "html.parser")
return parsed_html
def process_reviews(parsed_html):
reviews = []
for r in parsed_html.findAll('review'):
review_text = r.find('review_text').text
asin = r.find('asin').text
rating = r.find('rating').text
product_type = r.find('product_type').text
reviewer_location = r.find('reviewer_location').text
reviews.append({
'review_text': review_text.strip(),
'asin': asin.strip(),
'rating': rating.strip(),
'product_type': product_type.strip(),
'reviewer_location': reviewer_location.strip()
})
return reviews
def write_json_file(file_name, reviews):
with open('{f}.json'.format(f=file_name), 'w') as outfile:
json.dump(reviews, outfile)
if __name__ == '__main__':
parser = optparse.OptionParser()
parser.add_option('-f', '--file_name',action="store", dest="file_name",
help="name of the input html file to parse", default="positive.html")
options, args = parser.parse_args()
file_name = options.file_name
html = read_html_file(file_name)
reviews_list = process_reviews(html)
write_json_file(file_name,reviews_list)
The first [ ] is because of the reviews = [], and I can manually remove it but I also don't want commas between my json object.

What you are asking for is just not JSON. The standards, by definition, specify there has to be a comma between objects. You have two options to go forward:
Update your parser to match the standards (highly recommended).
For display purposes, or other internal processing you may have, in case you really want the structure you specified: capture the JSON object and transform it to something else, but please do not call it JSON, because it isn't.

There are a few concepts you're mixing on your question!
1. What you have is not a dict, but a list of dicts.
2. You don't have a JSON, neither on your input element list, nor on your expected output
Now going for solution, if you want to simply print your objects without the comma separating them, so you only need to print all your elements list, what you can do with:
sample = [{"a": "a" , "b": "b" } , {"a": "a" , "b": "b" }]
print(" ".join([str(element) for element in sample]))
Now, if what you really want is to manipulate it as a JSON object, you have two options, using the json lib:
Add each element from your sample as a Json and manipulate it individually
They are already formatted as Json, so you could manipulate them using the json lib to pretty print (dumps) as strings or any other manipulation:
import json
for element in sample:
print(json.dumps(element, indent = 4))
Make your sample list become a Json
You can either add all your elements to a single key, let's say adding to a key called elements, what would be:
sample_json = {"elements": []}
for data in sample:
sample_json["elements"].append(data)
# Output from sample_json
# {'elements': [{'a': 'a', 'b': 'b'}, {'a': 'a', 'b': 'b'}]}
Or you can add every single element to a different key. As an example, I'll create a counter and each number of the counter will define a different key for that specific element:
sample_json = {}
counter = 0
for data in sample:
sample_json[counter] = data
counter += 1
# Output from sample_json
# {0: {'a': 'a', 'b': 'b'}, 1: {'a': 'a', 'b': 'b'}}
You could use text keys as well, for this second case.

pythonic way of iterating over a collection of json objects stored in a text file

I have a text file that has several thousand json objects (meaning the textual representation of json) one after the other. They're not separated and I would prefer not to modify the source file. How can I load/parse each json in python? (I have seen this question, but if I'm not mistaken, this only works for a list of jsons (alreay separated by a comma?) My file looks like this:
{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}...

I don't see a clean way to do this without using the real JSON parser. The other options of modifying the text and using a non-JSON parser are risky. So the best way to go it find a way to iterate using the real JSON parser so that you're sure to comply with the JSON spec.
The core idea is to let the real JSON parser do all the work in identifying the groups:
import json, re
combined = '{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}'
start = 0
while start != len(combined):
try:
json.loads(combined[start:])
except ValueError as e:
pass
# Find the location where the parsing failed
end = start + int(re.search(r'column (\d+)', e.args[0]).group(1)) - 1
result = json.loads(combined[start:end])
start = end
print(result)
This outputs:
{u'json': 1}
{u'json': 2}
{u'json': 3}
{u'json': 4}
{u'json': 5}

I think the following would work as long as there are no non-comma-delimited json arrays of json sub-objects inside any of the outermost json objects. It's somewhat brute-force in that it reads the whole file into memory and attempts to fix it.
import json
def get_json_array(filename):
with open(filename, 'rt') as jsonfile:
json_array = '[{}]'.format(jsonfile.read().replace('}{', '},{'))
return json.loads(json_array)
for obj in get_json_array('multiobj.json'):
print(obj)
Output:
{u'json': 1}
{u'json': 2}
{u'json': 3}
{u'json': 4}
{u'json': 5}

Instead of modifying the source file, just make a copy. Use a regex to replace }{ with },{ and then hopefully a pre-built json reader will take care of it nicely.
EDIT: quick solution:
from re import sub
with open(inputfile, 'r') as fin:
text = sub(r'}{', r'},{', fin.read())
with open(outfile, 'w' as fout:
fout.write('[')
fout.write(text)
fout.write(']')

>>> import ast
>>> s = '{"json":1}{"json":2}{"json":3}{"json":4}{"json":5}'
>>> [ast.literal_eval(ele + '}') for ele in s.split('}')[:-1]]
[{'json': 1}, {'json': 2}, {'json': 3}, {'json': 4}, {'json': 5}]
Provided you have no nested objects and splitting on '}' is feasible this can be accomplished pretty simply.

Here is one pythonic way to do it:
from json.scanner import make_scanner
from json import JSONDecoder
def load_jsons(multi_json_str):
s = multi_json_str.strip()
scanner = make_scanner(JSONDecoder())
idx = 0
objects = []
while idx < len(s):
obj, idx = scanner(s, idx)
objects.append(obj)
return objects
I think json was never supposed to be used this way, but it solves your problem.
I agree with #Raymond Hettinger, you need to use json itself to do the work, text manipulation doesn't work for complex JSON objects. His answer parses the exception message to find the split position. It works, but it looks like a hack, hence, not pythonic :)
EDIT:
Just found out this is actually supported by json module, just use raw_decode like this:
decoder = JSONDecoder()
first_obj, remaining = decoder.raw_decode(multi_json_str)
Read http://pymotw.com/2/json/index.html#mixed-data-streams

How to parse json with ijson and python

I have JSON data as an array of dictionaries which comes as the request payload.
[
{ "Field1": 1, "Feld2": "5" },
{ "Field1": 3, "Feld2": "6" }
]
I tried ijson.items(f, '') which yields the entire JSON object as one single item. Is there a way I can iterate the items inside the array one by one using ijson?
Here is the sample code I tried which is yielding the JSON as one single object.
f = open("metadatam1.json")
objs = ijson.items(f, '')
for o in objs:
print str(o) + "\n"
[{'Feld2': u'5', 'Field1': 1}, {'Feld2': u'6', 'Field1': 3}]

I'm not very familiar with ijson, but reading some of its code it looks like calling items with a prefix of "item" should work to get the items of the array, rather than the top-level object:
for item in ijson.items(f, "item"):
# do stuff with the item dict

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combining JSON objects live - python

Related

How to write a new dictionary to a json file

Can you append to a dictionary from a foreign python file?

Commas between two json object

pythonic way of iterating over a collection of json objects stored in a text file

How to parse json with ijson and python

Categories

Resources