modifying json - deleting certain elements within a json structure using python - python

My json structure is as follows :
"AGENT": {
"pending": [],
"active": null,
"completed": [
**{
"result": {
"job1.AGENT": "SUCCESS",
"job2.AGENT": "SUCCESS"
},
"return_value": {
"job1.AGENT": "",
"job2.AGENT": ""
},
"visible": true,
"global": true,
"locale": [
"en_US"
],
"complete_time": "2018-01-24T17:44:33.484Z",
"persist": true,
"type": "script",
"script": "<script_name>.py",
"preset_status": "CONFIGURING",
"parameters": {},
"submit_time": "2018-01-24T17:44:26.747Z"
}**,
{
"result": {
..
},
"return_value": {
..
},
"visible": true,
"global": true,
"locale": [
"en_US"
],
"complete_time": "2018-04-2T17:44:40.049Z",
"submit_time": "2018-04-2T17:44:26.817Z"
}
I need to delete the entire result block based on complete_time, like delete the result block before 2018-04-03
How can i acheive this in python ?
I have tried the following so far :
json_data = json.dumps(data)
item_dict = json.loads(data)
print item_dict["AGENT"]["completed"][0]["complete_time"]
This prints the complete time. However my problem is "AGENT" is not a constant string. The string can vary. Also I will need to figure out the logic to remove the entire json block based on complete_time

Ok, I assume that you were able to correctly load the json into a Python dictionnary, let call it item_dict, but the key may vary.
What you need now it to walk inside that Python object, and decode the complete_time field. Unfortunately, Python strptime does not know about the Z time zone, so we will have to skip that last character.
Additionaly, you should never modify a collection object while iterating it, so the bullet proof way is to store indices to remove and later remove them. Code could be:
datelimit = datetime.datetime(2018, 4, 1) # limit date for completed_time
to_remove = []
dateformat = '%Y-%m-%dT%H:%M:%S.%f'
for k, v in item_dict.items(): # enumerate top_level objects
for i, block in enumerate(v['completed']): # enumerate inner blocks
complete_time = datetime.datetime.strptime( # skip last char from complete_time
block["complete_time"][:-1], dateformat)
# print(k, i, complete_time) # uncomment for tests
if complete_time < datelimit: # too old
to_remove.append((k, i)) # store the index for later processing
for k, i in reversed(to_remove): # start from the end to keep consistent indices
del item_dict[k]["completed"][i] # actual deletion

Related

Automatically entering next JSON level using Python in a similar way to JQ in bash

I am trying to use Python to extract pricePerUnit from JSON. There are many entries, and this is just 2 of them -
{
"terms": {
"OnDemand": {
"7Y9ZZ3FXWPC86CZY": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "7Y9ZZ3FXWPC86CZY",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7",
"description": "Processed translation request in AWS GovCloud (US)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
},
"CQNY8UFVUNQQYYV4": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "CQNY8UFVUNQQYYV4",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7",
"description": "$0.000015 per Character for TextTranslationJob:TextTranslationJob in EU (London)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
}
}
}
}
The issue I run into is that the keys, which in this sample, are 7Y9ZZ3FXWPC86CZY, CQNY8UFVUNQQYYV4.JRTCKXETXF, and CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7 are a changing string that I cannot just type out as I am parsing the dictionary.
I have python code that works for the first level of these random keys -
with open('index.json') as json_file:
data = json.load(json_file)
json_keys=list(data['terms']['OnDemand'].keys())
#Get the region
for i in json_keys:
print((data['terms']['OnDemand'][i]))
However, this is tedious, as I would need to run the same code three times to get the other keys like 7Y9ZZ3FXWPC86CZY.JRTCKXETXF and 7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7, since the string changes with each JSON entry.
Is there a way that I can just tell python to automatically enter the next level of the JSON object, without having to parse all keys, save them, and then iterate through them? Using JQ in bash I can do this quite easily with jq -r '.terms[][][]'.
If you are really sure, that there is exactly one key-value pair on each level, you can try the following:
def descend(x, depth):
for i in range(depth):
x = next(iter(x.values()))
return x
You can use dict.values() to iterate over the values of a dict. You can also use next(iter(dict.values())) to get a first (only) element of a dict.
for demand in data['terms']['OnDemand'].values():
next_level = next(iter(demand.values()))
print(next_level)
If you expect other number of children than 1 in the second level, you can just nest the fors:
for demand in data['terms']['OnDemand'].values():
for sub_demand in demand.values()
print(sub_demand)
If you are insterested in the keys too, you can use dict.items() method to iterate over dict keys and values at the same time:
for demand_key, demand in data['terms']['OnDemand'].items():
for sub_demand_key, sub_demand in demand.items()
print(demand_key, sub_demand_key, sub_demand)

Proper way to iterate Python dictionary and build new one

I'm trying to find a cogent way to check to see whether certain keys exist in a dictionary and use those to build a new one.
Here is my example json:
"dmarc": {
"record": "v=DMARC1; p=none; rua=mailto:dmarc.spc#test.domain; adkim=s; aspf=s",
"valid": true,
"location": "test.domain",
"warnings": [
"DMARC record at root of test.domain has no effect"
],
"tags": {
"v": {
"value": "DMARC1",
"explicit": true
},
"p": {
"value": "none",
"explicit": true
},
"rua": {
"value": [
{
"scheme": "mailto",
"address": "ssc.dmarc.spc#canada.ca",
"size_limit": null
}
],
"explicit": true
},
"adkim": {
"value": "s",
"explicit": true
},
"aspf": {
"value": "s",
"explicit": true
},
"fo": {
"value": [
"0"
],
"explicit": false
},
"pct": {
"value": 100,
"explicit": false
},
"rf": {
"value": [
"afrf"
],
"explicit": false
},
"ri": {
"value": 86400,
"explicit": false
},
"sp": {
"value": "none",
"explicit": false
}
}
}
}
What I'm specifically looking to do, is pull record, valid, location, tags-p, tags-sp, and tags-pct in a programmatic way, instead of doing a bunch of try/excepts. For example, to get valid, I do:
try:
res_dict['valid'] = jsonData['valid']
except KeyError:
res_dict['valid'] = None
Now, this is easy enough to loop/repeat for top level key/values, but how would I accomplish this for the nested key/values?
No, you don't need a try-except block for the same. You can check if the key exists using:
if jsonData.get("valid"):
res_dict["valid"] = jsonData.get("valid")
The .get("key") method returns the value for the given key, if present in the dictionary. If not, then it will return None (if get() is used with only one argument).
If you want it to return something else if it doesn't find the key then suppose:
jsonData.get("valid", "invalid_something_else")
One way of handling this is by taking advantage of the fact that the result of dict.keys can be treated as a set. See the following code.
my_keys = {'record', 'valid', 'location'} # you can add more here
new_dict = {}
available_keys = my_keys & jsonData.keys()
for key in available_keys:
new_dict[key] = jsonData[key]
Above, we define the keys we are interested in within the my_keys set. We then get the available keys by taking the intersection of the keys in the dictionary and the keys we are interested in. This, in effect, only gets the keys that we are interested in that are also defined in the dictionary. Finally, we just iterate through the available_keys and build the new dictionary.
However, this does not set keys to None if they do not exist in the input dictionary. For that, it may be best to use the get method as mentioned in other answers, like so:
my_keys = ['record', 'valid', 'location'] # you can add more here
new_dict = {}
for key in my_keys:
new_dict[key] = jsonData.get(key)
The get method allows us to attempt to get the value for a key in the dictionary. If that key is not defined, it returns None. You can also change the returned default by adding an extra argument to the get method like so new_dict[key] = jsonData.get(key, "some other default value")
Simple: instead of dict['key'] use
dict.get('key', {}) for all nodes that are not leaves, and
dict.get('key', DEFAULT) for leaves, where DEFAULT is whatever you need.
If you omit DEFAULT and 'key' is absent, you get None. See the docs.
E.g.:
jsonData.get('record', "") # empty string if no 'record' key
jsonData.get('valid', False) # False if no 'valid' key
jsonData.get('location') # None if no 'location'
jsonData.get('tags', {}).get('p') # None if no 'tags' and/or no 'p'
jsonData.get('tags', {}).get('p', {}) # {} if no 'tags' and/or no 'p'
jsonData.get('tags', {}).get('p', {}).get('explicit', False) # and so on
The above presumes that you don't traverse lists (JSON arrays). If you do, you can still use
dict.get('key', [])
but if you have to dive deeper from there, you will probably have to loop over list items.

python TypeError: string indices must be integers json

Can some one tell me what I am doing wrong ?I am Getting this error..
went through the earlier post of similar error. couldn't able to understand..
import json
import re
import requests
import subprocess
res = requests.get('https://api.tempura1.com/api/1.0/recipes', auth=('12345','123'), headers={'App-Key': 'some key'})
data = res.text
extracted_recipes = []
for recipe in data['recipes']:
extracted_recipes.append({
'name': recipe['name'],
'status': recipe['status']
})
print extracted_recipes
TypeError: string indices must be integers
data contains the below
{
"recipes": {
"47635": {
"name": "Desitnation Search",
"status": "SUCCESSFUL",
"kitchen": "eu",
"active": "YES",
"created_at": 1501672231,
"interval": 5,
"use_legacy_notifications": false
},
"65568": {
"name": "Validation",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1522583593,
"interval": 5,
"use_legacy_notifications": false
},
"47437": {
"name": "Gateday",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1501411588,
"interval": 10,
"use_legacy_notifications": false
}
},
"counts": {
"total": 3,
"limited": 3,
"filtered": 3
}
}
You are not converting the text to json. Try
data = json.loads(res.text)
or
data = res.json()
Apart from that, you probably need to change the for loop to loop over the values instead of the keys. Change it to something the following
for recipe in data['recipes'].values()
There are two problems with your code, which you could have found out by yourself by doing a very minimal amount of debugging.
The first problem is that you don't parse the response contents from json to a native Python object. Here:
data = res.text
data is a string (json formatted, but still a string). You need to parse it to turn it into it's python representation (in this case a dict). You can do it using the stdlib's json.loads() (general solution) or, since you're using python-requests, just by calling the Response.json() method:
data = res.json()
Then you have this:
for recipe in data['recipes']:
# ...
Now that we have turned data into a proper dict, we can access the data['recipes'] subdict, but iterating directly over a dict actually iterates over the keys, not the values, so in your above for loop recipe will be a string ( "47635", "65568" etc). If you want to iterate over the values, you have to ask for it explicitly:
for recipe in data['recipes'].values():
# now `recipe` is the dict you expected

What Am I Missing When Parsing this JSON Output

What am I missing when trying to parse this JSON output with Python? The JSON looks like this:
{
"start": 0,
"terms": [
"process_name:egagent.exe"
],
"highlights": [],
"total_results": 448,
"filtered": {},
"facets": {},
"results": [
{
"username": "SYSTEM",
"alert_type": "test"
},
{
"username": "SYSTEM2",
"alert_type": "test"
}
]
}
The Python I'm trying to use to access this is simple. I want to grab username, but everything I try throws an error. When it doesn't throw an error, I seem to get the letter of each one. So, if I do:
apirequest = requests.get(requesturl, headers=headers, verify=False)
readable = json.loads(apirequest.content)
#print readable
for i in readable:
print (i[0])
I get s, t, h, t, f, f, r, which are the first letters of each item. If I try i[1], I get the second letter of each item. When I try by name, say, i["start"], I get an error saying the string indices must be integers. I'm pretty confused and I am new to Python, but I haven't found anything on this yet. Please help! I just want to access the username fields, which is why I am trying to do the for loop. Thanks in advance!
Try this:
for i in readable["results"]:
print i["username"]
Load your json string:
import json
s = """
{
"start": 0,
"terms": [
"process_name:egagent.exe"
],
"highlights": [],
"total_results": 448,
"filtered": {},
"facets": {},
"results": [
{
"username": "SYSTEM",
"alert_type": "test"
},
{
"username": "SYSTEM2",
"alert_type": "test"
}
]
}
"""
And print username for every result:
print [res['username'] for res in json.loads(s)['results']]
Output:
[u'SYSTEM', u'SYSTEM2']
for i in readable will iterate i through each key in the readable dictionary. If you then print i[0], you are printing the first character of each key.
Given that you want the values associated with the "username" key in the entries of the list which is associated with the "results" key, you can get them like this:
for result in readable["results"]:
print (result["username"])
If readable is your JSON object (dict), you can access elements like you'd in every map, using their keys.
readable["results"][0]["username"]
Should give you "SYSTEM" string as result.
To print every username, do:
for result in readable["results"]:
print(result["username"])
If your JSON is a str object, you have to deserialize it with json.loads(readable) first.

Adding key to values in json using Python

This is the structure of my JSON:
"docs": [
{
"key": [
null,
null,
"some_name",
"12345567",
"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
I want to add the keys to the key list. This is what I want the output to look like:
"docs": [
{
"key": [
"key1":null,
"key2":null,
"key3":"some_name",
"key4":"12345567",
"key5":"test_name"
],
"value": {
"lat": "29.538208354844658",
"long": "71.98762580927113"
}
},
What's a good way to do it. I tried this but doesn't work:
for item in data['docs']:
item['test'] = data['docs'][3]['key'][0]
UPDATE 1
Based on the answer below, I have tweaked the code to this:
for number, item in enumerate(data['docs']):
# pprint (item)
# print item['key'][4]
newdict["key1"] = item['key'][0]
newdict["yek1"] = item['key'][1]
newdict["key2"] = item['key'][2]
newdict["yek2"] = item['key'][3]
newdict["key3"] = item['key'][4]
newdict["latitude"] = item['value']['lat']
newdict["longitude"] = item['value']['long']
This creates the JSON I am looking for (and I can eliminate the list I had previously). How does one make this JSON persist outside the for loop? Outside the loop, only the last value from the dictionary is added otherwise.
In your first block, key is a list, but in your second block it's a dict. You need to completely replace the key item.
newdict = {}
for number,item in enumerate(data['docs']['key']):
newdict['key%d' % (number+1)] = item
data['docs']['key'] = newdict

Categories

Resources