Python modify JSON file based on input - python

I have a JSON configuration file which looks something like this:
{
"generic": {
"loglevel": 2,
...
},
"clients": [
{
"type": "foo",
"bar": {
"bar_1": 0.7,
"bar_2": 0.95
},
...
},
{
"type": "foo",
...
}
]
}
I can modify the contents and store the modified version of it using:
import json
with open("sample.cfg", "r") as config_file:
config = json.load(config_file)
config["clients"][0]["bar"]["bar_1"] = 100
with open("modified.cfg", "w") as config_file:
config_file.write(json.dumps(config))
But I would like to modify the file based on some input. Let's say the input is a string changestring:
changestring = 'clients,0,bar,bar_1:1,2,3'
keyval=changestring.split(':')
keys = keyval[0].split(',')
vals = keyval[1].split(',')
But now I don't know how to use the keys in order to access the config path. Is this actually the right way to do this? Or maybe there is a different way to handle it? Thanks.

This is certainly a viable solution. It will work, but if actual users supply the change string, you probably want some way to ensure that the string is valid.
You also probably want to distinguish between integer indices and string indices!
Assuming your code, you could do the following:
import json
with open("sample.cfg", "r") as config_file:
config = json.load(config_file)
changestring = 'clients,0,bar,bar_1:1,2,3'
keyval = changestring.split(':')
keys = keyval[0].split(',')
vals = keyval[1].split(',')
# Move our "pointer"
obj = config
for key in keys[:-1]:
try: obj = obj[key]
except TypeError:
# Probably want a more general solution
obj = obj[int(key)]
# Update value
obj[keys[-1]] = vals
with open("modified.cfg", "w") as config_file:
config_file.write(json.dumps(config))
Python has aliasing, so by updating a variable down the "index" tree (but not all the way), we can get a mutable copy of what we actually want to modify. This should work for any "depth" of keys supplied. I tested this on python2.7.

if i am interpreting your question right
your keys and vals list will look like this
keys = ["clients", "0", "bar", "bar_1"]
vals = ["1", "2", "3"]
so to update the value of config["clients"][0]["bar"]["bar_1"] you can do like this
config[keys[0]][keys[1]][keys[2]][keys[3]] = vals[index]
index will be the index of value in vals list with which you want to update your json

Related

Filterting a json file by removing objects that contain certain keys

I'm having a bit of trouble filtering my json file. Basically I have a json file where each line is a different json object (I know this is not the normal valid format but it's what I have to work with), and I want to go through each line and check if it contains either 1 of 2 keys (e.g. "name" or "firstname"). If either of the 2 keys exist in the json object, I want to keep it. And if not, I want to remove it. So at the end, I will have an output json file that doesn't include the objects missing those keys.
I've tried out a bunch of different things but I can't seem to get it to work, this is what I have so far:
jsonList = []
with open(filename) as f:
for json_line in f:
obj = json.loads(json_line)
checker(obj)
def checker(obj):
check = 0
if ("name" in obj):
check = 1
if ("firstname" in obj):
check = 1
if (check == 1):
jsonList.append(obj)
When I try printing jsonList after it just gives me an empty list [], so my check variable never changed to 1 even though there are json objects in my file that have those keys.
My json file looks something like this: (note: number of things inside each object isn't guaranteed so I can't just check for that)
{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}
Does anyone have any ideas on what I could do? Or if you know why what I tried here didn't work?
Your original code seems to work for me. I use the checker function as-is without modification:
import json
from io import StringIO
from pprint import pprint
jsonList = []
filedata = StringIO("""\
{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}\
""")
def checker(obj):
check = 0
if ("name" in obj):
check = 1
if ("firstname" in obj):
check = 1
if (check == 1):
jsonList.append(obj)
for json_line in filedata:
obj = json.loads(json_line)
checker(obj)
pprint(jsonList)
Output:
[{'age': 32, 'date': '2018-11-13', 'name': 'name1'},
{'age': 40, 'date': '2019-05-09', 'firstname': 'name2'}]
Steps to Optimize
There's a couple different approaches to optimize your code, but the easiest way I'd suggest is with set.intersection to compare a set of required keys against the keys in a dict object. If there are any matches, then we add the dict object as it's valid.
jsonList = []
need_one_of_keys = {'name', 'firstname'}
for json_line in filedata:
obj = json.loads(json_line)
if need_one_of_keys.intersection(obj):
jsonList.append(obj)
pprint(jsonList)
One other approach that's worth mentioning, is to use dict in combined with the builtin any function:
jsonList = []
need_one_of_keys = frozenset(['name', 'firstname'])
for json_line in filedata:
obj = json.loads(json_line)
if any(key in obj for key in need_one_of_keys):
jsonList.append(obj)
pprint(jsonList)
You are calling check(obj), which is not a method.
Please call checker(obj)

How can I use jsonpath in python to change an element value in the json object

I have the following json object (Say car_details.json):
{
"name":"John",
"age":30,
"cars":
[
{
"car_model": "Mustang",
"car_brand": "Ford"
},
{
"car_model": "cx-5",
"car_brand": "Mazda"
}
}
I want to change the value of car_model from cx-5 to cx-9 through python code.
I am providing the json path to this element, through an external file. The json-path expression is basically represented as a string. Something like this:
'cars[2].car_model'
And the new value is also provided through an external file as a string:
'cx-9'
Now how do I parse through car_details.json using the jsonpath expression, and change its value to the one provided as string, and finally return the modified json object
P.S I want to do this through python code
This is an approach without using json module. Load your data in variable. Then iterate over cars key/values. If you find the key that is the value you are looking for set it to new value.
Also note: you need to close your array block, otherwise your above json is not valid. Generally I use an online json parser to check if my data is valid etc. (may be helpful in future).
data = {
"name":"John",
"age":30,
"cars":
[
{
"car_model": "Mustang",
"car_brand": "Ford"
},
{
"car_model": "cx-5",
"car_brand": "Mazda"
}
]
}
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = "cx-9"
print(data)
If you want to load your json object from a file, let's assume it is called "data.json" and is in the same directory as the python script you are going to run:
import json
with open('data.json') as json_data:
data = json.load(json_data)
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = "cx-9"
print(data)
Now if you'd like to write the content to the original file or new file, in this case I am writing to a file called "newdata.json":
import json
import re
with open('data.json') as json_data:
data = json.load(json_data)
print(data)
with open('external.txt') as f:
content = f.read()
print(content)
for cars in data['cars']:
for key, value in cars.items():
if key == "car_model" and value == "cx-5":
cars[key] = content
with open('newdata.json', 'w') as outfile:
json.dump(data, outfile)

How to define a multi-level dictionary structure and insert data into that?

I am trying to create a multi-level Python dictionary using defaultdict.The structure of the dictionary is like below:
{
"source1": {
"gene": {
"gene1": {
"location": [
[
10,
200
]
],
"mrna": {
"1": {
"location": [
[
10,
200
]
],
"product": "hypothetical",
"CDS": {
"location": [
[
10,
50
],
[
100,
200
]
]
}
}
}
}
}
}
}
But in this kind of cases in Python, we need to define the structure before inserting any data.
my try to define the structure is :
from collections import defaultdict
dct = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(
lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(list)))))))
Now to insert data in the above structure, I am using below codes to create the above-defined format.
dct['source1']['gene']['gene1']['location'].append([10, 200])
dct['source1']['gene']['gene1']['mrna']['1']['location'].append(['10', '200'])
dct['source1']['gene']['gene1']['mrna']['1']['product'] = 'hypothetical'
dct['source1']['gene']['gene1']['mrna']['1']['CDS']['location'].append([10, 50])
dct['source1']['gene']['gene1']['mrna']['1']['CDS']['location'].append([100, 200])
But I am getting some error. So can any one help me to create the multi-level dictionary?
Your dictionary definition is using other datatypes than those you want to add -
(Also check ChristianFigueroas answere because of your spelling).
If I run your code i get the error AttributeError: 'collections.defaultdict' object has no attribute 'append'. This is why I created your dictionary with the right datatypes (excuse lazy me for using dictionaries instead of default dicts).
dct = {} #Dict
dct['source1'] = {} #Dict
dct['source1']['gene'] = {} #Dict
dct['source1']['gene']['gene1'] = {} #Dict
dct['source1']['gene']['gene1']['location'] = [] #List
dct['source1']['gene']['gene1']['mrna'] = {} #Dict
dct['source1']['gene']['gene1']['mrna']['1'] = {} #Dict
dct['source1']['gene']['gene1']['mrna']['1']['location'] = [] #List
dct['source1']['gene']['gene1']['mrna']['1']['product'] = '' #String
dct['source1']['gene']['gene1']['mrna']['1']['CDS'] = {} #Dict
dct['source1']['gene']['gene1']['mrna']['1']['CDS']['location'] = [] #List
dct['source1']['gene']['gene1']['location'].append([10, 200])
dct['source1']['gene']['gene1']['mrna']['1']['location'].append(['10', '200'])
dct['source1']['gene']['gene1']['mrna']['1']['product'] = 'hypothetical'
dct['source1']['gene']['gene1']['mrna']['1']['CDS']['location'].append([10, 50])
dct['source1']['gene']['gene1']['mrna']['1']['CDS']['location'].append([100, 200])
I hope you see what I did there.
The build-up of subdicts can be done automatically:
>>> from collections import defaultdict
>>> f = lambda: defaultdict(f)
>>> d = f()
>>> d['usa']['texas'] = 'lone star'
>>> d['usa']['ohio'] = 'buckeye'
>>> d['canada']['alberta'] = 'flames'
In your code, you're trying to get dct["source1"]["gene"]["gene1"]["mrna"]["1"]["CDS"]["location"] but there's no "CDS" key, just a "cds". Replace the "CDS" with "cds".
dct["source1"]["gene"]["gene1"]["mrna"]["1"]["cds"]["location"].append( ... )
Python keys are case-sensitive, so make sure you are matching the string exactly.
Also, I would recommend not putting your data into super specific dicts like that because then it gets harder to debug and actually see where something went wrong, like the case-sensitive "CDS" thing.

Use Python and JSON to recursively get all keys associated with a value

Giving data organized in JSON format (code example bellow) how can we get the path of keys and sub-keys associated with a given value?
i.e.
Giving an input "23314" we need to return a list with:
Fanerozoico, Cenozoico, Quaternario, Pleistocenico, Superior.
Since data is a json file, using python and json lib we had decoded it:
import json
def decode_crono(crono_file):
with open(crono_file) as json_file:
data = json.load(json_file)
Now on we do not know how to treat it in a way to get what we need.
We can access keys like this:
k = data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "].keys()
or values like this:
v= data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "]["Superior"].values()
but this is still far from what we need.
{
"Fanerozoico": {
"id": "20000",
"Cenozoico": {
"id": "23000",
"Quaternario": {
"id": "23300",
"Pleistocenico": {
"id": "23310",
"Superior": {
"id": "23314"
},
"Medio": {
"id": "23313"
},
"Calabriano": {
"id": "23312"
},
"Gelasiano": {
"id": "23311"
}
}
}
}
}
}
It's a little hard to understand exactly what you are after here, but it seems like for some reason you have a bunch of nested json and you want to search it for an id and return a list that represents the path down the json nesting. If so, the quick and easy path is to recurse on the dictionary (that you got from json.load) and collect the keys as you go. When you find an 'id' key that matches the id you are searching for you are done. Here is some code that does that:
def all_keys(search_dict, key_id):
def _all_keys(search_dict, key_id, keys=None):
if not keys:
keys = []
for i in search_dict:
if search_dict[i] == key_id:
return keys + [i]
if isinstance(search_dict[i], dict):
potential_keys = _all_keys(search_dict[i], key_id, keys + [i])
if 'id' in potential_keys:
keys = potential_keys
break
return keys
return _all_keys(search_dict, key_id)[:-1]
The reason for the nested function is to strip off the 'id' key that would otherwise be on the end of the list.
This is really just to give you an idea of what a solution might look like. Beware the python recursion limit!
Based on the assumption that you need the full dictionary path until a key named id has a particular value, here's a recursive solution that iterates the whole dict. Bear in mind that:
The code is not optimized at all
For huge json objects it might yield StackOverflow :)
It will stop at first encountered value found (in theory there shouldn't be more than 1 if the json is semantically correct)
The code:
import json
from types import DictType
SEARCH_KEY_NAME = "id"
FOUND_FLAG = ()
CRONO_FILE = "a.jsn"
def decode_crono(crono_file):
with open(crono_file) as json_file:
return json.load(json_file)
def traverse_dict(dict_obj, value):
for key in dict_obj:
key_obj = dict_obj[key]
if key == SEARCH_KEY_NAME and key_obj == value:
return FOUND_FLAG
elif isinstance(key_obj, DictType):
inner = traverse_dict(key_obj, value)
if inner is not None:
return (key,) + inner
return None
if __name__ == "__main__":
value = "23314"
json_dict = decode_crono(CRONO_FILE)
result = traverse_dict(json_dict, value)
print result

how to parse json where key is variable in python?

i am parsing a log file which is in json format,
and contains data in the form of key : value pair.
i was stuck at place where key itself is variable. please look at the attached code
in this code i am able to access keys like username,event_type,ip etc.
problem for me is to access the values inside the "submission" key where
i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1 is a variable key which will change for different users,
how can i access it as a variable ?
{
"username": "batista",
"event_type": "problem_check",
"ip": "127.0.0.1",
"event": {
"submission": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
}
},
"success": "correct",
"grade": 1,
"correct_map": {
"i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"hint": "",
"hintmode": null,
"correctness": "correct",
"npoints": null,
"msg": "",
"queuestate": null
}
}
this is my code how i am solving it :
import json
import pprint
with open("log.log") as infile:
# Loop until we have parsed all the lines.
for line in infile:
# Read lines until we find a complete object
while (True):
try:
json_data = json.loads(line)
username = json_data['username']
print "username :- " + username
except ValueError:
line += next(infile)
how can i access i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1 key and
data inside this key ??
You don't need to know the key in advance, you can simply iterate over the dictionary:
for k,v in obj['event']['submission'].iteritems():
print(k,v)
Suppose you have a dictionary of type d = {"a":"b"} then d.popitem() would give you a tuple ("a","b") which is (key,value). So using this you can access key-value pairs without knowing the key.
In you case if j is the main dictionary then j["event"]["submission"].popitem() would give you tuple
("i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1": {
"input_type": "choicegroup",
"question": "",
"response_type": "multiplechoiceresponse",
"answer": "MenuInflater.inflate()",
"variant": "",
"correct": true
})
Hope this is what you were asking.
using python json module you'll end up with a dictionary of parsed values from the above JSON data
import json
parsed = json.loads(this_sample_data_in_question)
# parsed is a dictionary, so are "correct_map" and "submission" dictionary keys within "event" key
So you could iterate over the key, values of the data as a normal dictionary, say like this:
for k, v in parsed.items():
print k, v
Now you could find the (possible different values) of "i4x-IITB-CS101-problem-33e4aac93dc84f368c93b1d08fa984fc_2_1" key in a quick way like this:
import json
parsed = json.loads(the_data_in_question_as_string)
event = parsed['event']
for key, val in event.items():
if key in ('correct_map', 'submission'):
section = event[key]
for possible_variable_key, its_value in section.items():
print possible_variable_key, its_value
Of course there might be better way of iterating over the dictionary, but that one you could choose based on your coding taste, or performance if you have a fairly larger kind of data than the one posted in here.

Categories

Resources