I have an unordered dict/json object of data. In fact I have many of those, line by line in a file. There are three keys/objects in each one. I never know which of the three has the data I need to add right back onto the other two. I cannot control how the data is initially written, whether I like it or not.
Currently I iterate over each of the three keys/objects until I find the correct key that has the fields I need. I then save them off to variables. Now, how do I go right back over the other two keys/objects that I might of already iterated over and add the fields and values back into them? As I said there are multiples of these from a file so it will just keep going on to the next one if I don't... reiterate?
Code:
with open(inputfile) as f:
for line in f:
try:
# File is one big json object per line. Load up the current line as JSON.
line = json.loads(line)
for result in line['scan_result']:
# Check if this object's filename field has the extra data I need to parse out and palce in t he others.
if "meta_data" in file_result['filename']:
print "FOUND METADATA"
#print result['filename']
regmatch = re.match(".*meta_data_(.+?)_(.+?):(.+?)$", file_result['filename'])
if regmatch:
print "REG MATCH -------------"
#print regmatch.groups()
timecreated = regmatch.group(1)
author = regmatch.group(2)
mime_type = regmatch.group(3)
So as you can see, I have the data pulled out. I just need to figure out how to put it back into the JSON objects I just iterated over. I'm open to doing this other ways to. Maybe sorting the object first and then running through it?
If it helps, the data structure looks like this. The order of the parent is never known though. This is one "line" (json object) in the file:
{
"filename": abc.gif
id : 13241
parent : 999
interesting_file_stuff : {
stuff : 123
stuff2 : 456
}
}
{
"filename": hello.zip+meta_data_stuff_here
id : 999
parent : NA
interesting_file_stuff : {
stuff : 5435
stuff2 : 24223
}
}
{
"filename": xyz.exe
id : 8342
parent : 999
interesting_file_stuff : {
stuff : 2
stuff2 : 3232
}
}
Add an extra boolean while loop.
You could have an extra loop which is while True, repeat until you use a break statement, then the outer loop will increment to the next value.
for line in f:
while True:
# do stuff
if condition:
break
# do more stuff
Looks like for loops can't go backwards, so I'll have to manually loop through with a while loop, giving complete control of the iterations and which way I go.
Related
I'm trying to clean AWS Cloudwatch's log data, which is delivered in JSON format when queried via boto3. Each log line is stored as an array of dictionaries. For example, one log line takes the following form:
[
{
"field": "field1",
"value": "abc"
},
{
"field": "field2",
"value": "def"
},
{
"field": "field3",
"value": "ghi"
}
]
If this were in a standard key-value format (e.g., {'field1':'abc'}), I would know exactly what to do with it. I'm just getting stuck on untangling the extra layer of hierarchy introduced by the field/value keys. The ultimate goal is to convert the entire response object into a data frame like the following:
| field1 | field2 | field3 |
|--------|--------|--------|
| abc | def | ghi
(and so on for the rest of the response object, one row per log line.)
Last bit of info: each array has the same set of fields, and there is no nesting deeper than the example I've provided here. Thank you in advance :)
I was able to do this using nested loops. Not my favorite - I always feel like there has to be a more elegant solution than crawling through every single inch of the object, but this data is simple enough that it's still very fast.
logList = [] # Empty array to store list of dictionaries (i.e., log lines)
for line in logs: # logs = response object
line_dict = {}
# Flatten each dict into a single key-value pair
for i in range( len(line) ):
line_dict[ line[i]['field'] ] = line[i]['value']
logList.append(line_dict)
df = pd.json_normalize(logList)
For anyone else working with CloudWatch logs, the actual log lines (like those I displayed above) are nested in an array called 'results' in the boto3 response object. So you'd need to extract that array first, or point the outer loop to it (i.e., for line in response['results']).
I am trying to loop through a JSON file to get specific values, however, when doing so the loop is printing three times. I only want the value to print once and have tried breaking the loop but it still has not worked.
Python Code:
with open(filename) as json_filez:
dataz = json.load(json_filez)
for i in dataz:
for i in dataz['killRelated']:
print(i["SteamID"])
break
and a snippet of my json file is
{
"killRelated": [
{
"SteamID": "76561198283763531",
"kill": "15,302",
"shotacc": "16.1%"
}
],
"metaData": [
{
"test": "lol"
}
],
"miscData": [
{
"damageGiven": "2,262,638",
"gamePlayed": "1,292",
"moneyEarned": "50,787,000",
"score": "31,122",
"timePlayed": "22d 11h 56m"
}
]
}
and this is my output:
76561198283763531
76561198283763531
76561198283763531
Expected output:
76561198283763531
The return from json.load is a dictionary, and you are only interested in one entry in that, keyed by 'killRelated'. Now the "values" against each dictionary entry are lists, so that is what you need to be iterating though. And each element of such a list is a dictionary that you can again access via a key.
So your code could be:
with open(filename) as json_filez:
dataz = json.load(json_filez)
for kr in dataz['killRelated']: # iterate through the list under the top-level keyword
print (kr["SteamID"])
Now in your sample data, there's only one entry in the dataz['killRelated'] list, so you'll only get that one printed. But in general, you should expect multiple entries - and cater for the possibility of none. You can handle that by try/except of by checking key existence; here's the latter:
with open(filename) as json_filez:
dataz = json.load(json_filez)
if 'killRelated' in dataz: # check for the top keyword
for kr in dataz['killRelated']: # iterate through the list under this keyword
if 'steamID' in kr: # check for the next level keyword
print (kr["SteamID"]) # report it
You were getting three output lines because your outer loop iterated across all keyword entries in dataz (although without examining them), and then each time within that also iterated across the dataz['killRelated'] list. Your addition of break only stopped that inner loop, which for the particular data you had was redundant anyway because it was only going to print one entry.
Your code is correct. You should check your json file or you can share your full JSON text. That would be a problem. I run you code with json snippet you provided and it works as expected.
import json
with open("test.json") as json_filez:
dataz = json.load(json_filez)
for i in dataz:
for i in dataz['killRelated']:
print(i["SteamID"])
and the result as blow:
76561198283763531
I have dynamically changing json file (not the entire file changes dynamically, its just at one point which I will specify below) and I need to iterate using for loop at that point (where it changed dynamically) so that I can grab required elements inside that bracket in json file. Below is json snippet what it looks like.
"A" : {
"uni/aa/bb" (----> This changes randomly): [ {
"Name" : "cc",
"Id" : "1",
}, {
"Name" : "cc",
"Id" : "1",
} ]
}
I used re.search to match the pattern I get at that point. But no luck. I need to store Name and Id values ultimately. Any suggestions?
resp = json.loads(resp) ---> This gives me above mentioned json output
Here are the sample of codes I am trying.
for k in resp['A']:
for v in k['uni/aa/bb']: #---> TypeError: string indices must be integers
for k in resp['A']:
m = re.search('uni/(.*)') #--> Because this is changing dynamically
if m:
name = str(m['Name']) #---> TypeError: '_sre.SRE_Match' object has no attribute '__getitem__'
If it the case that you always want to know the string key that belongs to the child of "A" object and don't mind removing the item from json, you can try poping the thing. Like this: key, value = resp["A"].popitem(). From this key, you can get those uni/aa/bb strings, whatever it maybe. And after that you can also traverse that child further down the depth like this: resp["A"][key][0]["Name"]. Reference.
Sample code.
Another approach could be like the following. Instead of for v in k['uni/aa/bb']: use this: for key, value in k.items(): in case of python3.x or for key, value in k.iteritems(): for python2.x. Reference.
Sample code.
working on a project and this is driving me nuts , I have search online and found few answer that have work for my other queries that are json related however for this one its a bit of nightmare keep getting TrackStack error
this is my json
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":null
}
data = responseIdentify.read()
print("The following data return : " + data)
#Parse json data to print just
load = json.loads(data)
print(load[0]['name'])
and thats where my problem is I am unable to get the value form name , try for next statement and then i get this error:
Traceback (most recent call last):
File "C:\Python-Windows\random_test\cogT2.py", line 110, in <module>
for b in load[0]['name']:
KeyError: 0
using this for loop
for b in load[0]['name']:
print b[load]
any support would be most welcome am sure its something simple just can not figure it out.
Understanding how to reference nested dicts and lists in JSON is the hardest part. Here's a few things to consider.
Using your original data
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# No index here, just the dictionary key
print(ServerReturnJson['name'])
Added second person by making a list of dicts
ServerReturnJson = [{
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey",
"userData": 'null'
},
{
"personId": "234123412341234234",
"persistedFaceIds": ["1241234123423"],
"name": "miller",
"userData": 'null'
}
]
# You can use the index here since you have a list of dictionaries
print(ServerReturnJson[1]['name'])
# You can iterate like this
for item in ServerReturnJson:
print(item['name'])
Thanks for your support basically Microsoft Face API is returning back json with no index like Chris said in this first example
The above example works only if you add the following
data = responseIdentify.read() # read incoming respond form server
ServerReturnJson = json.loads(data)
so the complete answer is as follows :
dataJson= {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# add json load here
ServerReturnJson = json.loads(dataJson)
# No index here, just the dictionary key
print(ServerReturnJson['name'])
credits to Chris thanks , one last thing Chris mention "Understanding how to reference nested dicts and lists in JSON is the hardest part" 100% agreed
Really can't get out of this...
Here's my python code:
for i in range(len(realjson)) :
store["Store"]={
"id" :realjson[i]['id'].strip(),
"retailer_id" :RETAILER_ID,
"name" :find(realjson[i]["title"],">","<").strip(),
"address" :realjson[i]["address"].strip(),
"city" :realjson[i]["address"].split(",")[-4].strip(),
"province" :realjson[i]["address"].split(",")[-3].strip(),
"group" :realjson[i]["address"].split(",")[-1].strip(),
"zip" :realjson[i]["address"].split(",")[-2].strip(),
"url" :"http://blabla.com?id="+realjson[i]["id"].strip(),
"lat" :realjson[i]["lat"].strip(),
"lng" :realjson[i]["lon"].strip(),
"phone" :realjson[i]["telephone_number"].replace("<br />Phone Number: ","").strip()
}
stores.append(store)
print stores[i]
When I print the list inside the for loop it works correctly.
Otherwise when I print the array outside the loop like this:
print storesit contains only the last element that I've appended repeated for the entire length of the list.
Do you have some advice to help me!
Thank you.
You reuse a mutable object in your loop:
store['Store']
Create a new copy in the loop instead:
newstore = store.copy()
newstore['Store'] = { ... }
store["Store"]={ ... }
if you expect this line to create new dictionary with just one key, then what you actually want is
store = {"Store": { ... }}