Update json file - python

I have json file with some data, and would like to occasionally update this file.
I read the file:
with open('index.json', 'rb') as f:
idx = json.load(f)
then check for presence of a key from potentially new data, and if key is not present update the file:
with open('index.json', mode='a+') as f:
json.dump(new_data, f, indent=4)
However this procedure just creates new json object (python dict) and appends it as new object in output json file, making the file not valid json file.
Is there any simple way to append new data to json file without overwriting whole file, by updating the initial dict?

One way to do what you're after is to write one JSON object per line in the file. I'm using that approach and it works quite well.
A nice benefit is that you can read the file more efficiently (memory-wise) because you can read one line at a time. If you need all of them, there's no problem with assembling a list in Python, but if you don't you're operating much faster and you can also append.
So to initially write all your objects, you'd do something like this:
with open(json_file_path, "w") as json_file:
for data in data_iterable:
json_file.write("{}\n".format(json.dumps(data)))
Then to read efficiently (will consume little memory, no matter the file size):
with open(json_file_path, "r") as json_file:
for line in json_file:
data = json.loads(line)
process_data(data)
To update/append:
with open(json_file_path, "a") as json_file:
json_file.write("{}\n".format(json.dumps(new_data)))
Hope this helps :)

Related

Merge multiple JSON objects into a single JSON object

I have a log file containing one JSON record per line.
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AA:i-096379450e69ed082","arn":"arn:aws:sts::34502sdsdsd:assumed-role/RDSAccessRole/i-096379450e69ed082","accountId":"34502sdsdsd","accessKeyId":"ASIAVAVKXAXXXXXXXC","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKDDDDD","arn":"arn:aws:iam::3450291sdsdsd:role/RDSAccessRole","accountId":"345029asasas","userName":"RDSAccessRole"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T04:38:52Z"},"ec2RoleDelivery":"2.0"}},"eventTime":"2021-04-27T07:24:20Z","eventSource":"ssm.amazonaws.com","eventName":"ListInstanceAssociations","awsRegion":"us-east-1","sourceIPAddress":"188.208.227.188","userAgent":"aws-sdk-go/1.25.41 (go1.13.15; linux; amd64) amazon-ssm-agent/","requestParameters":{"instanceId":"i-096379450e69ed082","maxResults":20},"responseElements":null,"requestID":"a5c63b9d-aaed-4a3c-9b7d-a4f7c6b774ab","eventID":"70de51df-c6df-4a57-8c1e-0ffdeb5ac29d","readOnly":true,"resources":[{"accountId":"34502914asasas","ARN":"arn:aws:ec2:us-east-1:3450291asasas:instance/i-096379450e69ed082"}],"eventType":"AwsApiCall","managementEvent":true,"eventCategory":"Management","recipientAccountId":"345029149342"}
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AROAVAVKXAKPKZ25XXXX:AmazonMWAA-airflow","arn":"arn:aws:sts::3450291asasas:assumed-role/dev-1xdcfd/AmazonMWAA-airflow","accountId":"34502asasas","accessKeyId":"ASIAVAVKXAXXXXXXX","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKPKZXXXXX","arn":"arn:aws:iam::345029asasas:role/service-role/AmazonMWAA-dlp-dev-1xdcfd","accountId":"3450291asasas","userName":"dlp-dev-1xdcfd"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T07:04:08Z"}},"invokedBy":"airflow.amazonaws.com"},"eventTime":"2021-04-27T07:23:46Z","eventSource":"logs.amazonaws.com","eventName":"CreateLogStream","awsRegion":"us-east-1","sourceIPAddress":"airflow.amazonaws.com","userAgent":"airflow.amazonaws.com","errorCode":"ResourceAlreadyExistsException","errorMessage":"The specified log stream already exists","requestParameters":{"logStreamName":"scheduler.py.log","logGroupName":"dlp-dev-DAGProcessing"},"responseElements":null,"requestID":"40b48ef9-fc4b-4d1a-8fd1-4f2584aff1e9","eventID":"ef608d43-4765-4a3a-9c92-14ef35104697","readOnly":false,"eventType":"AwsApiCall","apiVersion":"20140328","managementEvent":true,"eventCategory":"Management","recipientAccountId":"3450291asasas"}
My goal is to merge this into a single json object which should look like:
{"Records":[{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":.....
I have been trying out merging them through Python dict merge but not able to get it to work.
Can anyone provide some pointers.
If your records are stored separated by newlines in a text file I would recommend the following approach by opening the file, parsing the records, and adding them to a dict which you can later dump with the native json library.
import json
data = {'records': []}
with open("data.txt", 'r') as f:
lines = f.readlines()
for line in lines:
data['records'].append(json.loads(line))
print(json.dumps(data))
I would do it following way, let file.txt content be
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AA:i-096379450e69ed082","arn":"arn:aws:sts::34502sdsdsd:assumed-role/RDSAccessRole/i-096379450e69ed082","accountId":"34502sdsdsd","accessKeyId":"ASIAVAVKXAXXXXXXXC","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKDDDDD","arn":"arn:aws:iam::3450291sdsdsd:role/RDSAccessRole","accountId":"345029asasas","userName":"RDSAccessRole"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T04:38:52Z"},"ec2RoleDelivery":"2.0"}},"eventTime":"2021-04-27T07:24:20Z","eventSource":"ssm.amazonaws.com","eventName":"ListInstanceAssociations","awsRegion":"us-east-1","sourceIPAddress":"188.208.227.188","userAgent":"aws-sdk-go/1.25.41 (go1.13.15; linux; amd64) amazon-ssm-agent/","requestParameters":{"instanceId":"i-096379450e69ed082","maxResults":20},"responseElements":null,"requestID":"a5c63b9d-aaed-4a3c-9b7d-a4f7c6b774ab","eventID":"70de51df-c6df-4a57-8c1e-0ffdeb5ac29d","readOnly":true,"resources":[{"accountId":"34502914asasas","ARN":"arn:aws:ec2:us-east-1:3450291asasas:instance/i-096379450e69ed082"}],"eventType":"AwsApiCall","managementEvent":true,"eventCategory":"Management","recipientAccountId":"345029149342"}
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AROAVAVKXAKPKZ25XXXX:AmazonMWAA-airflow","arn":"arn:aws:sts::3450291asasas:assumed-role/dev-1xdcfd/AmazonMWAA-airflow","accountId":"34502asasas","accessKeyId":"ASIAVAVKXAXXXXXXX","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKPKZXXXXX","arn":"arn:aws:iam::345029asasas:role/service-role/AmazonMWAA-dlp-dev-1xdcfd","accountId":"3450291asasas","userName":"dlp-dev-1xdcfd"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T07:04:08Z"}},"invokedBy":"airflow.amazonaws.com"},"eventTime":"2021-04-27T07:23:46Z","eventSource":"logs.amazonaws.com","eventName":"CreateLogStream","awsRegion":"us-east-1","sourceIPAddress":"airflow.amazonaws.com","userAgent":"airflow.amazonaws.com","errorCode":"ResourceAlreadyExistsException","errorMessage":"The specified log stream already exists","requestParameters":{"logStreamName":"scheduler.py.log","logGroupName":"dlp-dev-DAGProcessing"},"responseElements":null,"requestID":"40b48ef9-fc4b-4d1a-8fd1-4f2584aff1e9","eventID":"ef608d43-4765-4a3a-9c92-14ef35104697","readOnly":false,"eventType":"AwsApiCall","apiVersion":"20140328","managementEvent":true,"eventCategory":"Management","recipientAccountId":"3450291asasas"}
then
with open('file.txt', 'r') as f:
jsons = [i.strip() for i in f.readlines()]
with open('total.json', 'w') as f:
f.write('{"Records":[')
f.write(','.join(jsons))
f.write(']}')
will produce total.json with desired shape and being legal JSON if every line inside file.txt is legal JSON.

Python writing to json file object by object

I am looping through a bunch of urls to medium sized json files, that I'm trying to combine into a single file. Currently it's appending the json data for each file to a list, that I can then write to a single Json file
data = []
for i in range(50):
item = tree.getroot()[i][0]
with urllib.request.urlopen(url + item.text) as f:
for line in f:
data.append(json.loads(line))
# Save to file
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
However, I'm not sure how scalable this method is and I will eventually have to combine 100s of files this way. If the data list becomes way to too large I'm thinking that trying to write it in one go would cause my system to crash due to memory issues? Is there a way to write continuously to a json file, so instead of appending inside the loop like this:
data.append(json.loads(line))
I would instead write to the file inside each loop with something like this:
with open('data.json', 'w') as outfile:
json.writedata, outfile)
That way I'll be building the file as I go and can clear the memory between each loop?

Using python ijson to read a large json file with multiple json objects

I'm trying to parse a large (~100MB) json file using ijson package which allows me to interact with the file in an efficient way. However, after writing some code like this,
with open(filename, 'r') as f:
parser = ijson.parse(f)
for prefix, event, value in parser:
if prefix == "name":
print(value)
I found that the code parses only the first line and not the rest of the lines from the file!!
Here is how a portion of my json file looks like:
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000}
{"name":"engine_speed","value":772,"timestamp":1364323939.027000}
{"name":"vehicle_speed","value":0,"timestamp":1364323939.029000}
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}
In my opinion, I think ijson parses only one json object.
Can someone please suggest how to work around this?
Since the provided chunk looks more like a set of lines each composing an independent JSON, it should be parsed accordingly:
# each JSON is small, there's no need in iterative processing
import json
with open(filename, 'r') as f:
for line in f:
data = json.loads(line)
# data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
# contain correspoding values
Unfortunately the ijson library (v2.3 as of March 2018) does not handle parsing multiple JSON objects. It can only handle 1 overall object, and if you attempt to parse a second object, you will get an error: "ijson.common.JSONError: Additional data". See bug reports here:
https://github.com/isagalaev/ijson/issues/40
https://github.com/isagalaev/ijson/issues/42
https://github.com/isagalaev/ijson/issues/67
python: how do I parse a stream of json arrays with ijson library
It's a big limitation. However, as long as you have line breaks (new line character) after each JSON object, you can parse each one line-by-line independently, like this:
import io
import ijson
with open(filename, encoding="UTF-8") as json_file:
cursor = 0
for line_number, line in enumerate(json_file):
print ("Processing line", line_number + 1,"at cursor index:", cursor)
line_as_file = io.StringIO(line)
# Use a new parser for each line
json_parser = ijson.parse(line_as_file)
for prefix, type, value in json_parser:
print ("prefix=",prefix, "type=",type, "value=",value)
cursor += len(line)
You are still streaming the file, and not loading it entirely in memory, so it can work on large JSON files. It also uses the line streaming technique from: How to jump to a particular line in a huge text file? and uses enumerate() from: Accessing the index in 'for' loops?

Dictionary to string not being read back as a dictionary

Since the Json And Pickle methods aren't working out, i've decided to save my dictionaries as strings, and that works, but they arent being read.
I.E
Dictionary
a={'name': 'joe'}
Save:
file = open("save.txt", "w")
file.write(str(a))
file.close()
And that works.
But my load method doesn't read it.
Load:
f = open("save.txt", "r")
a = f
f.close()
So, it just doesn't become f.
I really don't want to use json or pickle, is there any way I could get this method working?
First, you're not actually reading anything from the file (the file is not its contents). Second, when you fix that, you're going to get a string and need to transform that into a dictonary.
Fortunately both are straightforward to address....
from ast import literal_eval
with open("save.txt") as infile:
data = literal_eval(infile.read())

How to store something other than a string in a file

I'm trying to write some code to create a file that will write data about a "character". I've been able to write strings using:
f = open('player.txt','w')
f.write("Karatepig")
f.close()
f = open('player.txt','r')
f.read()
The issue is, how do I store something other than a string to a file? Can I convert it from a string to a value?
Files can only store strings, so you have to convert other values to strings when writing, and converting them back to original values when reading.
The Python standard library has a whole section dedicated to data persistence that can help make this task easier.
However, for simple types, it is perhaps easiest to use the json module to serialize data to a file and read it back again with ease:
import json
def write_data(data, filename):
with open(filename, 'w') as outfh:
json.dump(data, outfh)
def read_data(filename):
with open(filename, 'r') as infh:
json.load(infh)

Categories

Resources