Incrementally Append to JSON File in a For Loop - python

Is there a way to append single JSON objects to a json file while in a for loop in python. I would prefer not store all my data in one giant json object and dump it all at once, as I am planning on performing millions of API requests. I would like to make a single API request, dump the result into a JSON file and then move to the next API request and dump that into the same JSON file.
The below code overwrites the JSON file, I am looking for something that appends.
for url in urls:
r = sesh.get(url)
data = r.json()
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
Such that:
with open('data.json') as outfile:
data = json.load(data, outfile)
type(data)
>> dict
r.json looks something like this:
{'attribute1':1, 'attribute2':10}

Update
Well since I don't have access to your API I just placed some sample responses, in the format you supplied, inside an array.
import json
urls = ['{"attribute1":1, "attribute2":10}', '{"attribute1":67, "attribute2":32}', '{"attribute1":37, "attribute2":12}'];
json_arr = []
for url in urls:
data = json.loads(url)
json_arr.append(data)
with open('data.json', 'w') as outfile:
json.dump(json_arr, outfile)
Basically we keep an array and append each API response to that array. Then, we can write the accumulative JSON to a file. Also if you want to update the same JSON file on different executions of the code, you can just read the existing output file into an array, in the beginning of the code, and then carry on with my example.
Change write mode to append
Try changing this:
with open('data.json', 'w') as outfile:
To this:
with open('data.json', 'a') as outfile:

The previous answer is surprisingly close to what you need to do.
So I will build upon it.
import json
json_arr = ['{"attribute1":1, "attribute2":10}', '{"attribute1":67, "attribute2":32}', '{"attribute1":37, "attribute2":12}'];
with open('data.json', 'w') as outfile:
outfile.write('[')
for element in json_arr:
with open('data.json', 'w') as outfile:
json.dump(element, outfile)
outfile.write(',')
with open('data.json', 'a') as outfile:
outfile.write(']')

Related

save json file within a loop, python

in jupyter notebook, I ran this code in a cell:
for i in range(10):
with open('data.json', 'w') as f:
json.dump({"counter":i}, f)
time.sleep(10000)
easy so far, but after executing the cell there won't be any update on the actual data.json file during each iteration, it will get updated up until the end of the program. in other words, the data.json as a file object stays open till the end of the code.
how can I update the file on the disk in a loop?
The json module doesn't work that way AFAIK. You'll have to load the json data into a dictionary/list then make your changes, then write the file again:
# funciton to read json files
def read_json(path):
with open(path, 'r') as file:
return json.load(file)
# function to write json files
def write_json(path, data, indent=4):
with open(path, 'w') as file:
json.dump(data, file, indent=indent)
# read some json data
json_data = read_json('./my_json_file.json')
# ... do some stuff to the data
# write the data back to the file
write_json('./my_json_file.json', json_data)

Write a json file from list

I have the following list data I want to save in a json file to be access later:
data = [{"nomineesWidgetModel":{"title":"","description":"",
"refMarker":"ev_nom","eventEditionSummary":{"awards":[{"awardName":"Oscar","trivia":[]}]}}}]
If saved as txt:
for item in data:
with open('./data/awards.txt', 'w', encoding='utf-8') as f:
f.write(', '.join(str(item) for item in data))
Output:
{"nomineesWidgetModel":{"title":"","description":"","refMarker":"ev_nom",
"eventEditionSummary":{"awards":[{"awardName":"Oscar","trivia":[]}]}}}
But I get an error when opening the file later in Jupyter Notebook
If save as json
for item in data:
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(item, f, ensure_ascii=False, indent=4)
Output with extra backslash:
"{\"nomineesWidgetModel\":{\"title\":\"\",\"description\":\"\",\"refMarker\":\"ev_nom\",
\"eventEditionSummary\":{\"awards\":[{\"awardName\":\"Oscar\",\"trivia\":[],}]}}
Is there a simpler way to do this without having to import the file and replace the extra slashes?
Just use json as usual:
import json
data = [{"nomineesWidgetModel":{"title":"","description":"", "refMarker":"ev_nom","eventEditionSummary":{"awards":[{"awardName":"Oscar","trivia":[]}]}}}]
with open('data.json', 'w') as f:
json.dump(data, f, indent=4)
Thanks to #Alexander explanation above, I was able to save the content I was scraping in a dict, and not a list, and then save as json while iterating the pages with:
with open('data.json', 'a') as file:
json.dump(data, file, indent=1)

Load JSON file into a dictionary and not string or list

I have created a JSON file after scraping data online with the following simplified code:
for item in range(items_to_scrape)
az_text = []
for n in range(first_web_page, last_web_page):
reviews_html = requests.get(page_link)
tree = fromstring(reviews_html.text)
page_link = base_url + str(n)
review_text_tags = tree.xpath(xpath_1)
for r_text in review_text_tags:
review_text = r_text.text
az_text.append(review_text)
az_reviews = {}
az_reviews[item] = az_text
with open('data.json', 'w') as outfile:
json.dump(az_reviews , outfile)
There might be a better way to create a JSON file with the first key equal to the item number and the second key equal to the list of reviews for that item, however I am currently stuck at opening the JSON file to see the items have been already scraped.
The structure of the JSON file looks like this:
{
"asin": "0439785960",
"reviews": [
"Don’t miss this one!",
"Came in great condition, one of my favorites in the HP series!",
"Don’t know how these books are so good and I’ve never read them until now. Whether you’ve watched the movies or not, read these books"
]
}
The unsuccessful attempt that seems to be closer to the solution is the following:
import json
from pprint import pprint
json_data = open('data.json', 'r').read()
json1_file = json.loads(json_data)
print(type(json1_file))
print(json1_file["asin"])
It returns a string that replicates exactly the result of the print() function I used during the scraping process to check what the JSON file was going to be look like, but I can't access the asins or reviews using json1_file["asin"] or json1_file["reviews"] since the file read is a string and not a dictionary.
TypeError: string indices must be integers
Using the json.load() function I still print the right content, but I have cannot figure out how to access the dictionary-like object from the JSON file to iterate through keys and values.
The following code prints the content of the file, but raises an error (AttributeError: '_io.TextIOWrapper' object has no attribute 'items') when I try to iterate through keys and values:
with open('data.json', 'r') as content:
print(json.load(content))
for key, value in content.items():
print(key, value)
What is wrong with the code above and what should be adjusted to load the file into a dictionary?
string indices must be integers
You're writing out the data as a string, not a dictionary. Remove the dumps, and only dump
with open('data.json', 'w') as outfile:
json.dump(az_reviews, outfile, indent=2, ensure_ascii=False)
what should be adjusted to load the file into a dictionary?
Once you're parsing a JSON object, and not a string, then nothing except maybe not using reads, then loads and rather only json.load
Another problem seems to be that you're overwriting the file on every loop iteration
Instead, you probably want to open one file then loop and write to it afterwards
data = {}
for item in range(items_to_scrape):
pass # add to data
# put all data in one file
with open('data.json', 'w') as f:
json.dump(data, f)
In this scenario, I suggest that you store the asin as a key, with the reviews as values
asin = "123456" # some scraped value
data[asin] = reviews
Or write a unique file for each scrape, which you then must loop over to read them all.
for item in range(items_to_scrape):
data = {}
# add to data
with open('data{}.json'.format(item), 'w') as f:
json.dump(data, f)

Json file decode in python

I'm working with python, I have a json structure into a dictionary and I have exported it into a file. Now I need to reload the structure from the file, I want to reload it into a dictionary (in order to update it) but I'm experiencing some problems. This is my code:
#export the structure
with open('data.json','w') as f:
data = {}
data['test'] = '1'
f.write(json.dumps(data))
#reload the structure
with open('data.json','r') as f:
dict = {}
dict = json.loads(f.read())
The error is: No JSON object could be decoded.
Try
with open('data.json', 'w') as f:
f.write(json.dumps(data))
with open('data.json', 'r') as f:
json.load(f)

How to add a key-value to JSON data retrieved from a file?

I am new to Python and I am playing with JSON data. I would like to retrieve the JSON data from a file and add to that data a JSON key-value "on the fly".
That is, my json_file contains JSON data as-like the following:
{"key1": {"key1A": ["value1", "value2"], "key1B": {"key1B1": "value3"}}}
I would like to add the "ADDED_KEY": "ADDED_VALUE" key-value part to the above data so to use the following JSON in my script:
{"ADDED_KEY": "ADDED_VALUE", "key1": {"key1A": ["value1", "value2"], "key1B": {"key1B1": "value3"}}}
I am trying to write something as-like the following in order to accomplish the above:
import json
json_data = open(json_file)
json_decoded = json.load(json_data)
# What I have to make here?!
json_data.close()
Your json_decoded object is a Python dictionary; you can simply add your key to that, then re-encode and rewrite the file:
import json
with open(json_file) as json_file:
json_decoded = json.load(json_file)
json_decoded['ADDED_KEY'] = 'ADDED_VALUE'
with open(json_file, 'w') as json_file:
json.dump(json_decoded, json_file)
I used the open file objects as context managers here (with the with statement) so Python automatically closes the file when done.
Json returned from json.loads() behave just like native python lists/dictionaries:
import json
with open("your_json_file.txt", 'r') as f:
data = json.loads(f.read()) #data becomes a dictionary
#do things with data here
data['ADDED_KEY'] = 'ADDED_VALUE'
#and then just write the data back on the file
with open("your_json_file.txt", 'w') as f:
f.write(json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))
#I added some options for pretty printing, play around with them!
For more info check out the official doc
You can do
json_decoded['ADDED_KEY'] = 'ADDED_VALUE'
OR
json_decoded.update({"ADDED_KEY":"ADDED_VALUE"})
which works nicely if you want to add more than one key/value pair.
Of course, you may want to check for the existence of ADDED_KEY first - depends on your needs.
AND I assume you want might want to save that data back to the file
json.dump(json_decoded, open(json_file,'w'))

Categories

Resources