How to parse empty JSON property/element in Python

How to parse empty JSON property/element in Python - python

I am attempting to parse some JSON that I am receiving from a RESTful API, but I am having trouble accessing the data in Python because it appears that there is an empty property name.
A sample of the JSON returned:
{
"extractorData" : {
"url" : "RetreivedDataURL",
"resourceId" : "e38e1a7dd8f23dffbc77baf2d14ee500",
"data" : [ {
"group" : [ {
"CaseNumber" : [ {
"text" : "PO-1994-1350",
"href" : "http://www.referenceURL.net"
} ],
"DateFiled" : [ {
"text" : "03/11/1994"
} ],
"CaseDescription" : [ {
"text" : "Mary v. JONES"
} ],
"FoundParty" : [ {
"text" : "Lastname, MARY BETH (Plaintiff)"
} ]
}, {
"CaseNumber" : [ {
"text" : "NP-1998-2194",
"href" : "http://www.referenceURL.net"
}, {
"text" : "FD-1998-2310",
"href" : "http://www.referenceURL.net"
} ],
"DateFiled" : [ {
"text" : "08/13/1993"
}, {
"text" : "06/02/1998"
} ],
"CaseDescription" : [ {
"text" : "IN RE: NOTARY PUBLIC VS REDACTED"
}, {
"text" : "REDACTED"
} ],
"FoundParty" : [ {
"text" : "Lastname, MARY H (Plaintiff)"
}, {
"text" : "Lastname, MARY BETH (Defendant)"
} ]
} ]
} ]
And the Python code I am attempting to use
import requests
import json
FirstName = raw_input("Please Enter First name: ")
LastName = raw_input("Please Enter Last Name: ")
with requests.Session() as c:
url = ('https://www.requestURL.net/?name={}&lastname={}').format(LastName, FirstName)
page = c.get(url)
data = page.content
theJSON = json.loads(data)
def myprint(d):
stack = d.items()
while stack:
k, v = stack.pop()
if isinstance(v, dict):
stack.extend(v.iteritems())
else:
print("%s: %s" % (k, v))
print myprint(theJSON["extractorData"]["data"]["group"])
I get the error:
TypeError: list indices must be integers, not str
I am new to parsing Python and more than simple python in general so excuse my ignorance. But what leads me to believe that it is an empty property is that when I use a tool to view the JSON visually online, I get empty brackets, Like so:
Any help parsing this data into text would be of great help.
EDIT: Now I am able to reference a certain node with this code:
for d in group:
print group[0]['CaseNumber'][0]["text"]
But now how can I iterate over all the dictionaries listed in the group property to list all the nodes labeled "CaseNumber" because it should exist in every one of them. e.g
print group[0]['CaseNumber'][0]["text"]
then
for d in group:
print group[1]['CaseNumber'][0]["text"]
and so on and so forth. Perhaps incrementing some sort of integer until it reaches the end? I am not quite sure.

If you look at json carefully the data key that you are accessing is actually a list, but data['group'] is trying to access it as if it were a dictionary, which is raising the TypeError.
To minify your json it is something like this
{
"extractorData": {
"url": "string",
"resourceId": "string",
"data": [{
"group": []
}]
}
}
So if you want to access group, you should first retrieve data which is a list.
data = sample['extractorData']['data']
then you can iterate over data and get group within it
for d in data:
group = d['group']
I hope this clarifies things a bit for you.

Related

how to read nested lists information from a Json file using Python

Here is a part of my Jason file, and I want to read "information" under "runs" -> "results" -> "properties"
I am trying the following:
with open(inputFile, "r") as readFile:
data = json.load(readFile)
print(type(data))
print("Run data type is: ",type(data['runs']))
#print("properties data type is: ", type(data['runs']['properties']))
# error: print("results data type is: ", type(data['runs']['properties']))TypeError: list indices must be integers or slices, not str
for info in data['runs']:
res = info.get('results',{})
#res = info.get('results', {}).get('properties', None)
#Error: AttributeError: 'list' object has no attribute 'get'
#inf = info.get('properties')
print(res)
All the parts that I have commented is not working. and I added also the error message
how can i read "information" in a loop?
{
"$schema" : "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json",
"version" : "2.1.0",
"runs" : [ {
"tool" : { ...},
"artifacts" : [ ...],
"results" : [ {
"ruleId" : "DECL_MISMATCH",
"ruleIndex" : 0,
"message" : {
"text" : "XXXXX"
},
"level" : "error",
"baselineState" : "unchanged",
"rank" : 100,
"kind" : "fail",
"properties" : {
"tags" : [ "databaseId", "metaFamily", "family", "group", "information", "severity", "status", "comment", "justified", "assignedTo", "ticketKey", "color" ],
"databaseId" : 54496,
"metaFamily" : "Defect",
"family" : "Defect",
"group" : "Programming",
"information" : "Impact: High",
"severity" : "Unset",
"status" : "Unreviewed",
"comment" : "",
"justified" : false,
"color" : "RED"
},
"locations" : [ {
"physicalLocation" : {
"artifactLocation" : {
"index" : 0
}
},
"logicalLocations" : [ {
"fullyQualifiedName" : "File Scope",
"kind" : "function"
} ]
} ]
} ]
} ]
}

While you're trying to access the key properties which is inside a list, you have to set the index number. In this json you've posted the index number can be 0. So the code probably should be like this:
with open(inputFile, "r") as readFile:
data = json.load(readFile)
print(type(data))
print("Run data type is: ",type(data['runs']))
#print("properties data type is: ", type(data['runs']['properties']))
# error: print("results data type is: ", type(data['runs']['properties']))TypeError: list indices must be integers or slices, not str
for info in data['runs']:
# res = info.get('results',{})
res = info.get('results', {})[0].get('properties', None)
#inf = info.get('properties')
print(res)

for run in data['runs']:
for result in run['results']:
properties = result['properties']
print("information = {}".format(properties['information']))

Cant Access Data in List of Dictionaries

I'm trying to access some data within nested ordered dictionaries. This dictionary was created by using the XMLTODICT module. Obviously I would like to create my own dictionaries but this one is out of my control.
I've tried to access them numerous ways.
Example:
Using a for loop:
I can access the first level using v["name"] which gives me Child_Policy and Parent Policy
When I do v["class"]["name"] I would expect to get "Test1" but that's not the case.
I've also tried v[("class", )] variations as well with no luck.
Any input would be much appreciated
The data below is retrieved from a device via XML and converted to dictionary with XMLTODICT.
[
{
"#xmlns": "http://cisco.com/ns/yang/Cisco-IOS-XE-policy",
"name": "Child_Policy",
"class": [
{
"name": "Test1",
"action-list": {
"action-type": "bandwidth",
"bandwidth": {
"percent": "30"
}
}
},
{
"name": "Test2",
"action-list": {
"action-type": "bandwidth",
"bandwidth": {
"percent": "30"
}
}
}
]
},
{
"#xmlns": "http://cisco.com/ns/yang/Cisco-IOS-XE-policy",
"name": "Parent_Policy",
"class": {
"name": "class-default",
"action-list": [
{
"action-type": "shape",
"shape": {
"average": {
"bit-rate": "10000000"
}
}
},
{
"action-type": "service-policy",
"service-policy": "Child_Policy"
}
]
}
}
]
My expectations result is to retrieve values from the nested dictionary and produce and output similar to this:
Queue_1: Test1
Action_1: bandwidth
Allocation_1: 40
Queue_2: Test2
Action_2: bandwidth
Allocation_2: 10
I have now issue formatting the output, just getting the values is the issue.
#
I had some time tonight so I changed the code be be dynamic:
int = 0
int_2 = 0
for v in policy_dict.values():
print("\n")
print("{:15} {:<35}".format("Policy: ", v[0]["name"]))
print("_______")
for i in v:
int_2 = int_2 + 1
try:
print("\n")
print("{:15} {:<35}".format("Queue_%s: " % int_2, v[0]["class"][int]["name"]))
print("{:15} {:<35}".format("Action_%s: " % int_2, v[0]["class"][int]["action-list"]["action-type"]))
print("{:15} {:<35}".format("Allocation_%s: " % int_2, v[0]["class"][int]["action-list"]["bandwidth"]["percent"]))
int = int + 1
except KeyError:
break
pass

According to the sample you posted you can try to retrieve values like:
v[0]["class"][0]["name"]
This outputs:
Test1

JSON parsing in python using JSONPath

In the JSON below, I want to access the email-id and 'gamesplayed' field for each user.
"UserTable" : {
"abcd#gmailcom" : {
"gameHistory" : {
"G1" : [ {
"category" : "1",
"questiontext" : "What is the cube of 2 ?"
}, {
"category" : "2",
"questiontext" : "What is the cube of 4 ?"
} ]
},
"gamesplayed" : 2
},
"xyz#gmailcom" : {
"gameHistory" : {
"G1" : [ {
"category" : "1",
"questiontext" : "What is the cube of 2 ?"
}, {
"category" : "2",
"questiontext" : "What is the cube of 4 ?"
} ]
},
"gamesplayed" : 2
}
}
Following is the code that I using to try and access the users email-id:
for user in jp.match("$.UserTable[*].[0]", game_data):
print("User ID's {}".format(user_id))
This is the error I'm getting:
File "C:\ProgramData\Anaconda3\lib\site-packages\jsonpath_rw\jsonpath.py", line 444, in find
return [DatumInContext(datum.value[self.index], path=self, context=datum)]
KeyError: 0
And when I run the following line to and access the 'gamesplayed' field for each user, the IDE Crashes.
print (parser.ExtentedJsonPathParser().parse("$.*.gamesplayed").find(gd_info))

If you like to use JSONPath. Please try this.
Python code:
with open(json_file) as json_file:
raw_data = json.load(json_file)
jsonpath_expr = parse('$.UserTable')
players = [match.value for match in jsonpath_expr.find(raw_data)][0]
emails = players.keys()
result = [{'email': email, 'gamesplayed': players[email]['gamesplayed']} for email in emails ]
print (result)
Output:
[{'email': 'abcd#gmailcom', 'gamesplayed': 2}, {'email': 'xyz#gmailcom', 'gamesplayed': 2}]

Python can handle valid json's as dictionaries. Therefore you have to parse to json string to a python dictionary.
import json
dic = json.loads(json_str)
You can now access a value by using the specific key as an index value = dict[key].
for user in dic:
email = user
gamesplayed = dic[user][gamesplayed]
print("{} played {} game(s).".format(email, gamesplayed))
>>> abcd#gmailcom played 2 game(s).
xyz#gmailcom played 2 game(s).

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Sample JSON file below
{
"destination_addresses" : [ "New York, NY, USA" ],
"origin_addresses" : [ "Washington, DC, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "225 mi",
"value" : 361715
},
"duration" : {
"text" : "3 hours 49 mins",
"value" : 13725
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}
I'm looking to reference the text value for distance and duration. I've done research but i'm still not sure what i'm doing wrong...
I have a work around using several lines of code, but i'm looking for a clean one line solution..
thanks for your help!

If you're using the regular JSON module:
import json
And you're opening your JSON like this:
json_data = open("my_json.json").read()
data = json.loads(json_data)
# Equivalent to:
data = json.load(open("my_json.json"))
# Notice json.load vs. json.loads
Then this should do what you want:
distance_text, duration_text = [data['rows'][0]['elements'][0][key]['text'] for key in ['distance', 'duration']]
Hope this is what you wanted!

Why my index fields are still shown as "analyzed" even after i index em as "not_analyzed"

I have a lot of data (json format) in Amazon SQS. I basically have a simple python script which pulls data from the SQS queue & then indexes it in ES. My problem is even though i have specified in my script to index as "not_analyzed", i still see my index filed as "analyzed" in index setting of kibana4 dashboard
Here is my python code :
doc = {
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type_name": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
es = Elasticsearch()
h = { "Content-type":"application/json" }
res = requests.request("POST","http://localhost:9200/"+index_name+"/",headers=h,data=json.dumps(doc))
post = es.index(index=index_name , doc_type='server' , id =1 , body=json.dumps(new_list))
print "------------------------------"
print "Data Pushed Successfully to ES"
I am not sure what's wrong here?

The doc_type you're using when indexing (= server) doesn't match the one you have in your index mappings (= type_name).
So if you index your documents like this instead, it will work
post = es.index(index=index_name , doc_type='type_name' , id =1 , body=json.dumps(new_list))
^
|
change this

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse empty JSON property/element in Python - python

Related

how to read nested lists information from a Json file using Python

Cant Access Data in List of Dictionaries

JSON parsing in python using JSONPath

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Why my index fields are still shown as "analyzed" even after i index em as "not_analyzed"

Categories

Resources