do both json.load() and json.dump() at once in Python - python

I am trying to do something like this which uses reading , appending and writing at the same time.
with open("data.json", mode="a+") as file:
# 1.Reading old data
data = json.load(file)
# 2. Updating old data with new data
data.update(new_dict)
# 3.Writing into json file
json.dump(data,file,indent=4)
But it shows json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

First you need to open the file in mode="r+". Update the old data with new, then seek(0) to the beginning of the file, write your updated json data, and then truncate the rest:
with open("data.json", mode="r+") as file:
file.seek(0, 2)
if file.tell():
file.seek(0)
data = json.load(file)
data.update(new_dict)
else:
data = new_dict
file.seek(0)
json.dump(data, file, indent=4)
file.truncate()
Reason it doesn't work with a+ mode is that it will always write at the end of the file, irrespective of seek(0). So your updated json data just gets appended to the older one like a normal text data, but since its not a valid json syntax, it causes JSON Decode error.
Check here for more detailed info on how the different open modes work.

Related

Merge multiple JSON objects into a single JSON object

I have a log file containing one JSON record per line.
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AA:i-096379450e69ed082","arn":"arn:aws:sts::34502sdsdsd:assumed-role/RDSAccessRole/i-096379450e69ed082","accountId":"34502sdsdsd","accessKeyId":"ASIAVAVKXAXXXXXXXC","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKDDDDD","arn":"arn:aws:iam::3450291sdsdsd:role/RDSAccessRole","accountId":"345029asasas","userName":"RDSAccessRole"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T04:38:52Z"},"ec2RoleDelivery":"2.0"}},"eventTime":"2021-04-27T07:24:20Z","eventSource":"ssm.amazonaws.com","eventName":"ListInstanceAssociations","awsRegion":"us-east-1","sourceIPAddress":"188.208.227.188","userAgent":"aws-sdk-go/1.25.41 (go1.13.15; linux; amd64) amazon-ssm-agent/","requestParameters":{"instanceId":"i-096379450e69ed082","maxResults":20},"responseElements":null,"requestID":"a5c63b9d-aaed-4a3c-9b7d-a4f7c6b774ab","eventID":"70de51df-c6df-4a57-8c1e-0ffdeb5ac29d","readOnly":true,"resources":[{"accountId":"34502914asasas","ARN":"arn:aws:ec2:us-east-1:3450291asasas:instance/i-096379450e69ed082"}],"eventType":"AwsApiCall","managementEvent":true,"eventCategory":"Management","recipientAccountId":"345029149342"}
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AROAVAVKXAKPKZ25XXXX:AmazonMWAA-airflow","arn":"arn:aws:sts::3450291asasas:assumed-role/dev-1xdcfd/AmazonMWAA-airflow","accountId":"34502asasas","accessKeyId":"ASIAVAVKXAXXXXXXX","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKPKZXXXXX","arn":"arn:aws:iam::345029asasas:role/service-role/AmazonMWAA-dlp-dev-1xdcfd","accountId":"3450291asasas","userName":"dlp-dev-1xdcfd"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T07:04:08Z"}},"invokedBy":"airflow.amazonaws.com"},"eventTime":"2021-04-27T07:23:46Z","eventSource":"logs.amazonaws.com","eventName":"CreateLogStream","awsRegion":"us-east-1","sourceIPAddress":"airflow.amazonaws.com","userAgent":"airflow.amazonaws.com","errorCode":"ResourceAlreadyExistsException","errorMessage":"The specified log stream already exists","requestParameters":{"logStreamName":"scheduler.py.log","logGroupName":"dlp-dev-DAGProcessing"},"responseElements":null,"requestID":"40b48ef9-fc4b-4d1a-8fd1-4f2584aff1e9","eventID":"ef608d43-4765-4a3a-9c92-14ef35104697","readOnly":false,"eventType":"AwsApiCall","apiVersion":"20140328","managementEvent":true,"eventCategory":"Management","recipientAccountId":"3450291asasas"}
My goal is to merge this into a single json object which should look like:
{"Records":[{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":.....
I have been trying out merging them through Python dict merge but not able to get it to work.
Can anyone provide some pointers.
If your records are stored separated by newlines in a text file I would recommend the following approach by opening the file, parsing the records, and adding them to a dict which you can later dump with the native json library.
import json
data = {'records': []}
with open("data.txt", 'r') as f:
lines = f.readlines()
for line in lines:
data['records'].append(json.loads(line))
print(json.dumps(data))
I would do it following way, let file.txt content be
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AA:i-096379450e69ed082","arn":"arn:aws:sts::34502sdsdsd:assumed-role/RDSAccessRole/i-096379450e69ed082","accountId":"34502sdsdsd","accessKeyId":"ASIAVAVKXAXXXXXXXC","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKDDDDD","arn":"arn:aws:iam::3450291sdsdsd:role/RDSAccessRole","accountId":"345029asasas","userName":"RDSAccessRole"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T04:38:52Z"},"ec2RoleDelivery":"2.0"}},"eventTime":"2021-04-27T07:24:20Z","eventSource":"ssm.amazonaws.com","eventName":"ListInstanceAssociations","awsRegion":"us-east-1","sourceIPAddress":"188.208.227.188","userAgent":"aws-sdk-go/1.25.41 (go1.13.15; linux; amd64) amazon-ssm-agent/","requestParameters":{"instanceId":"i-096379450e69ed082","maxResults":20},"responseElements":null,"requestID":"a5c63b9d-aaed-4a3c-9b7d-a4f7c6b774ab","eventID":"70de51df-c6df-4a57-8c1e-0ffdeb5ac29d","readOnly":true,"resources":[{"accountId":"34502914asasas","ARN":"arn:aws:ec2:us-east-1:3450291asasas:instance/i-096379450e69ed082"}],"eventType":"AwsApiCall","managementEvent":true,"eventCategory":"Management","recipientAccountId":"345029149342"}
{"eventVersion":"1.08","userIdentity":{"type":"AssumedRole","principalId":"AROAVAVKXAKPKZ25XXXX:AmazonMWAA-airflow","arn":"arn:aws:sts::3450291asasas:assumed-role/dev-1xdcfd/AmazonMWAA-airflow","accountId":"34502asasas","accessKeyId":"ASIAVAVKXAXXXXXXX","sessionContext":{"sessionIssuer":{"type":"Role","principalId":"AROAVAVKXAKPKZXXXXX","arn":"arn:aws:iam::345029asasas:role/service-role/AmazonMWAA-dlp-dev-1xdcfd","accountId":"3450291asasas","userName":"dlp-dev-1xdcfd"},"webIdFederationData":{},"attributes":{"mfaAuthenticated":"false","creationDate":"2021-04-27T07:04:08Z"}},"invokedBy":"airflow.amazonaws.com"},"eventTime":"2021-04-27T07:23:46Z","eventSource":"logs.amazonaws.com","eventName":"CreateLogStream","awsRegion":"us-east-1","sourceIPAddress":"airflow.amazonaws.com","userAgent":"airflow.amazonaws.com","errorCode":"ResourceAlreadyExistsException","errorMessage":"The specified log stream already exists","requestParameters":{"logStreamName":"scheduler.py.log","logGroupName":"dlp-dev-DAGProcessing"},"responseElements":null,"requestID":"40b48ef9-fc4b-4d1a-8fd1-4f2584aff1e9","eventID":"ef608d43-4765-4a3a-9c92-14ef35104697","readOnly":false,"eventType":"AwsApiCall","apiVersion":"20140328","managementEvent":true,"eventCategory":"Management","recipientAccountId":"3450291asasas"}
then
with open('file.txt', 'r') as f:
jsons = [i.strip() for i in f.readlines()]
with open('total.json', 'w') as f:
f.write('{"Records":[')
f.write(','.join(jsons))
f.write(']}')
will produce total.json with desired shape and being legal JSON if every line inside file.txt is legal JSON.

json dump generating unnecessary curly braces

This question might have been asked many times but I am still unable to understand how to use json file. I use json.dump(data, filename). While dumping I get unnecessary {} at the end of the file. So json.load(data) gives me below error.
simplejson.scanner.JSONDecodeError: Extra data: line 1 column 1865 - line 1 column 1867 (char 1864 - 1866)
I read that there is no way to load a first or second dictionary. I have also read that there is a separater which can be used with json dump but I see no use here. Should I be using encoding, decoding here?
My json.dump file:
{
"deployCI2": ["094fd196-20f0-4e8d-b946-f74a56d2f319", "6a1ce382-98c6-4058-a929-95a7d2415fd0"],
"deployCI3": ["c8fff661-4482-4908-b722-4fac0227a8b0", "929cf1fa-3fa6-4f95-8464-d58e5490f4cf"],
"deployCI4": ["9f8ffa3c-460d-43a9-8113-58e891340e1b", "6e535e92-4da2-4228-a6ab-c8fc8d31adcd", "8e26a35e-7fb9-43b3-8026-d1283f7b678c", "f40e5c29-b4df-4cfb-9d7f-3bcc9c4dcf9f"],
"HeenaStackABC": [], "HeenaStackABC-DISK_VM1-mm55lkkvccej": ["cc2a89a2-3b27-4f88-af09-b3b0b1301056"]
}{}
Edited: I think the code is doing something here.
with open('stackList.json', 'a') as f:
for stack in stacks:
try:
hlist = hc.resources.list(stack_id=stack.id)
vlist = [o.physical_resource_id for o in hlist if o.resource_type =='OS::Cinder::Volume']
myDict[stack.stack_name] = vlist
except heatclient.exc.HTTPBadRequest as e:
pass
json.dump(myDict,f)
I edited the code like below. I hope this is valid. It removed the last braces
if len(myDict) != 0:
json.dump(myDict, f)
Your problem is here :
with open('stackList.json', 'a') as f:
You're opening the file in 'append' mode, so each time the code is executed it appends the dump to your file. The result you complain about comes from this and mydict being empty on the second run.
You either have to open the file in "w" ("write") mode which will overwrite the existing content (you can eventually create a new dump file for each call) or switch to the "jsonline" format (but your file will NOT be a valid json file anymore and any code reading it will have to know to parse it as jsonlines)

Append text file in python with json data

I'm trying to create a simple function which I can use to store json data to a file. I currently have this code
def data_store(key_id, key_info):
try:
with open('data.txt', 'a') as f:
data = json.load(f)
data[key_id] = key_info
json.dump(data, f)
pass
except Exception:
print("Error in data store")
The idea is the load what data is currently within the text file, then create or edit the json data. So running the code...
data_store("foo","bar")
The function will then read what's within the text file, then allow me to append the json data with either replacing what's there (if "foo" exists) or create it if it doesn't exist
This has been throwing errors at me however, Any ideas?
a mode would not work for both reading and writing at the same time. Instead, use r+:
with open('data.txt', 'r+') as f:
data = json.load(f)
data[key_id] = key_info
f.seek(0)
json.dump(data, f)
f.truncate()
seek(0) call here moves the cursor back to the beginning of the file. truncate() helps in situations where the new file contents is less than the old one.
And, as a side note, try to avoid having a bare except clause, or/and log the error and the traceback properly.

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am trying to import a file which was saved using json.dumps and contains tweet coordinates:
{
"type": "Point",
"coordinates": [
-4.62352292,
55.44787441
]
}
My code is:
>>> import json
>>> data = json.loads('/Users/JoshuaHawley/clean1.txt')
But each time I get the error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I want to end up extracting all the coordinates and saving them separately to a different file so they can then be mapped, but this seemingly simple problem is stopping me from doing so. I have looked at answers to similar errors but don't seem to be able to apply it to this. Any help would be appreciated as I am relatively new to python.
json.loads() takes a JSON encoded string, not a filename. You want to use json.load() (no s) instead and pass in an open file object:
with open('/Users/JoshuaHawley/clean1.txt') as jsonfile:
data = json.load(jsonfile)
The open() command produces a file object that json.load() can then read from, to produce the decoded Python object for you. The with statement ensures that the file is closed again when done.
The alternative is to read the data yourself and then pass it into json.loads().
It helped for me to add "myfile.seek(0)", move the pointer to the 0 character:
with open(storage_path, 'r') as myfile:
if len(myfile.readlines()) != 0:
myfile.seek(0)
Bank_0 = json.load(myfile)
I got same type of error after reading in a json file creating from python.
Same error occurred whether i read into a string and tried json.loads() or straight from file with json.load()
In my case, it turned out to be because I had written python booleans (False/True) straight out to the file.
Trying to read them back in again caused this error.
When i modified to valid json (true/false), json.load worked fine
Didnt see any SO questions with this as a possible cause for this error so adding here for reference.
You may use this function (it works with data):
def read_json_file(filename):
with open(filename, 'r') as f:
cache = f.read()
data = eval(cache)
return data
Or, you may put this in your code (it has the same effect):
def read_json_file(filename):
data = []
with open(filename, 'r') as f:
data = [json.loads(_.replace('}]}"},', '}]}"}')) for _ in f.readlines()]
return data
import json
file_path = "C:/Projects/Tryouts/books.json"
with open(file_path, 'r') as j:
contents = json.loads(j.read())
print(contents)

Update json file

I have json file with some data, and would like to occasionally update this file.
I read the file:
with open('index.json', 'rb') as f:
idx = json.load(f)
then check for presence of a key from potentially new data, and if key is not present update the file:
with open('index.json', mode='a+') as f:
json.dump(new_data, f, indent=4)
However this procedure just creates new json object (python dict) and appends it as new object in output json file, making the file not valid json file.
Is there any simple way to append new data to json file without overwriting whole file, by updating the initial dict?
One way to do what you're after is to write one JSON object per line in the file. I'm using that approach and it works quite well.
A nice benefit is that you can read the file more efficiently (memory-wise) because you can read one line at a time. If you need all of them, there's no problem with assembling a list in Python, but if you don't you're operating much faster and you can also append.
So to initially write all your objects, you'd do something like this:
with open(json_file_path, "w") as json_file:
for data in data_iterable:
json_file.write("{}\n".format(json.dumps(data)))
Then to read efficiently (will consume little memory, no matter the file size):
with open(json_file_path, "r") as json_file:
for line in json_file:
data = json.loads(line)
process_data(data)
To update/append:
with open(json_file_path, "a") as json_file:
json_file.write("{}\n".format(json.dumps(new_data)))
Hope this helps :)

Categories

Resources