Python - Parse JSON file using ijson - python

I am having json file which around 600MB. The structure of the json file is,
[
{
"metadata":{
"batchSize":100,
"totalRecords":"1000",
"batchIndex":1
},
"notificationData":[
{
"brandId":"A",
"sourceUniqueId":"12345",
"contentType":"PDF",
"transactionId":"cABCD",
"batchId":"ABC_1",
}
]
},
{
"metadata":{
"batchSize":100,
"totalRecords":"1000",
"batchIndex":2
},
"notificationData":[
{
"brandId":"B",
"sourceUniqueId":"789",
"contentType":"PDF1",
"transactionId":"XYZ",
"batchId":"XYZ_1",
}
]
}
]
Now, Output I want the notificationData array of provided batchIndex. I am able to find specific batchIndex But how to get notificationData of that batchIndex. BatchIndex is unique. And after getting notificationData stops the execution. No need to parse complete file.
f = open('batchIndex.json')
data = ijson.items(f,'item.metadata')
jsons = (o for o in data if o['batchIndex'] == input_batchIndex)
for j in jsons:
print(j)

Related

How do I return an upper field in a JSON with python?

So, I need some help returning an ID having found a certain string. My JSON looks something like this:
{
"id": "id1"
"field1": {
"subfield1": {
"subrield2": {
"subfield3": {
"subfield4": [
"string1",
"string2",
"string3"
]
}
}
}
}
"id": "id2"
"field1": {
"subfield1": {
"subrield2": {
"subfield3": {
"subfield4": [
"string4",
"string5",
"string6"
]
}
}
}
}
}
Now, I need to get the ID from a certain string, for example:
For "string5" I need to return "id2"
For "string2" I need to return "id1"
In order to find these strings I have used objectpath python module like this: json_Tree.execute('$..subfield4'))
After doing an analysis on a huge amount of strings, I need to return the ones that are meeting my criterias. I have the strings that I need (for example "string3"), but now I have to return the IDs.
Thank you!!
Note: I don't have a lot of experience with coding, I just started a few months ago to work on a project in Python and I have been stuck on this for a while
Making some assumptions about the actual structure of the data as being:
[
{
"id": "id1",
"subfield1": {
"subfield2": {
"subfield3": {
"subfield4": [
"string1",
"string2",
"string3"
]
}
}
}
}
// And so on
]
And assuming that each string1, string2 etc. is in only one id, then you can construct this mapping like so:
data: List[dict] # The json parsed as a list of dicts
string_to_id_mapping = {}
for record in data:
for string in record["subfield1"]["subfield2"]["subfield3"]["subfield4"]:
string_to_id_mapping[string] = record["id"]
assert string_to_id_mapping["string3"] == "id1"
If each string can appear in multiple ids then the following will catch all of them:
from collections import defaultdict
data: List[dict] # The json parsed as a list of dicts
string_to_id_mapping = defaultdict(set)
for record in data:
for string in record["subfield1"]["subfield2"]["subfield3"]["subfield4"]:
string_to_id_mapping[string].add(record["id"])
assert string_to_id_mapping["string3"] == {"id1"}

How to get this json object in python?

so I want to get the first key element from this JSON using python 3.7 without knowing its name.
Here is the JSON:
{
"intent":[
{
"confidence":0.99313362101529,
"value":"sendmessage"
}
],
"wikipedia_search_query":[
{
"suggested":true,
"confidence":0.93804001808167,
"value":"message",
"type":"value"
}
],
"messenger_recipient":[
{
"confidence":0.93138399364195,
"value":"me",
"type":"value"
}
]
}
EDIT:
I want to compare the name of the first key like so:
if(jsonobj[0] == "wikipedia_search_query")
dosomething()
While Python 3.6+ does maintain insertion order on dictionaries, there's no guarantee that your incoming JSON will be in the order you expect. That being said, if you can guarantee the insertion order, here's a working example.
import json
js = """{
"intent":[
{
"confidence":0.99313362101529,
"value":"sendmessage"
}
],
"wikipedia_search_query":[
{
"suggested":true,
"confidence":0.93804001808167,
"value":"message",
"type":"value"
}
],
"messenger_recipient":[
{
"confidence":0.93138399364195,
"value":"me",
"type":"value"
}
]
}"""
json_data = json.loads(js)
first_key = next(iter(json_data))
first_value = json_data[next(iter(json_data))]
print(first_key)
print(first_value)
Output
intent
[{'confidence': 0.99313362101529, 'value': 'sendmessage'}]

PDAL filter doesn't work: Unable to parse pipeline

So this is my first time to use PDAL. I use python 3.6 and PDAL 1.9.
json_s = """{
"test.las",
{
"type":"filters.outlier",
"method":"statistical",
"mean_k":12,
"multiplier":0.5
},
{
"type":"filters.range",
"limits":"Classification![7:7]"
},
"testOut.las"
}"""
pipeline = pdal.Pipeline(json_s)
count = pipeline.execute()
It shows the err,
RuntimeError: JSON pipeline: Unable to parse pipeline.
I checked the sample code in website and it looks the same. Just don't know why it doesn't work?
PDAL format look like this:
json_s = """{
"pipeline":[
"input.las",
{
#anything you need
},
"output.las"
{
}
]
}"""
In your case, try:
json_s = """{
"pipeline":[
"test.las",
{
"type":"filters.outlier",
"method":"statistical",
"mean_k":12,
"multiplier":0.5
},
{
"type":"filters.range",
"limits":"Classification![7:7]"
},
"testOut.las"
]
}"""

Difficulties using Python request (POST) + API

Im trying to use a simple API with python. I get the data with the code below but, but I can't seem to parse it. When i print the type of variable "c" it says "unicode". I want a Json object or dictionary so I can use the information.
I have I tried various ways to solve this but I'm not sure if the output from the API (below) is actually Json or why it doesn't work properly.
import requests
import json
import urllib
test1 ={
"query": [
{
"code": "SNI2007",
"selection": {
"filter": "item",
"values": [
"47.4+47.54"
]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": [
"HA0101A9",
"HA0101B4"
]
}
},
{
"code": "Tid",
"selection": {
"filter": "item",
"values": [
"2010M01",
"2010M02",
"2010M03",
"2010M04",
"2010M05",
"2010M06",
"2010M07",
"2010M08",
"2010M09",
"2010M10",
"2010M11",
"2010M12",
"2011M01",
"2011M02",
"2011M03",
"2011M04",
"2011M05",
"2011M06",
"2011M07",
"2011M08",
"2011M09",
"2011M10",
"2011M11",
"2011M12",
"2012M01",
"2012M02",
"2012M03",
"2012M04",
"2012M05",
"2012M06",
"2012M07",
"2012M08",
"2012M09",
"2012M10",
"2012M11",
"2012M12",
"2013M01",
"2013M02",
"2013M03",
"2013M04",
"2013M05",
"2013M06",
"2013M07",
"2013M08",
"2013M09",
"2013M10",
"2013M11",
"2013M12",
"2014M01",
"2014M02",
"2014M03",
"2014M04",
"2014M05",
"2014M06",
"2014M07",
"2014M08",
"2014M09",
"2014M10",
"2014M11",
"2014M12",
"2015M01",
"2015M02",
"2015M03",
"2015M04",
"2015M05",
"2015M06",
"2015M07",
"2015M08",
"2015M09",
"2015M10",
"2015M11",
"2015M12",
"2016M01",
"2016M02",
"2016M03",
"2016M04",
"2016M05",
"2016M06",
"2016M07",
"2016M08",
"2016M09",
"2016M10",
"2016M11",
"2016M12",
"2017M01",
"2017M02",
"2017M03",
"2017M04",
"2017M05",
"2017M06",
"2017M07",
"2017M08",
"2017M09",
"2017M10",
"2017M11",
"2017M12",
"2018M01",
"2018M02",
"2018M03",
"2018M04"
]
}
}
],
"response": {
"format": "json"
}
}
response = requests.post("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/HA/HA0101/HA0101B/Detoms07", json = test1)
dat = response.content
b = json.dumps(dat)
c = json.loads(b)
print type(b)
This is what I get if I print the "response.content" variable.
{"columns":[{"code":"SNI2007","text":"näringsgren SNI 2007","type":"d"},{"code":"Tid","text":"månad","type":"t"},{"code":"HA0101A9","text":"Löpande priser","type":"c"},{"code":"HA0101B4","text":"Fasta priser","type":"c"}],"comments":[],"data":[{"key":["47.4+47.54","2010M01"],"values":["90.3","45.0"]},{"key":["47.4+47.54","2010M02"],"values":["80.9","40.3"]},{"key":["47.4+47.54","2010M03"],"values":["91.3","45.7"]},{"key":["47.4+47.54","2010M04"],"values":["83.9","43.5"]},{"key":["47.4+47.54","2010M05"],"values":["87.4","45.7"]},{"key":["47.4+47.54","2010M06"],"values":["97.6","52.6"]},{"key":["47.4+47.54","2010M07"],"values":["99.5","54.2"]},{"key":["47.4+47.54","2010M08"],"values":["105.2","57.3"]},{"key":["47.4+47.54","2010M09"],"values":["108.9","60.4"]},{"key":["47.4+47.54","2010M10"],"values":["107.9","60.7"]},{"key":["47.4+47.54","2010M11"],"values":["107.9","61.3"]},{"key":["47.4+47.54","2010M12"],"values":["181.9","106.1"]},{"key":["47.4+47.54","2011M01"],"values":["95.3","55.9"]},{"key":["47.4+47.54","2011M02"],"values":["80.1","47.3"]},{"key":["47.4+47.54","2011M03"],"values":["88.8","53.5"]},{"key":["47.4+47.54","2011M04"],"values":["79.4","48.5"]},{"key":["47.4+47.54","2011M05"],"values":["85.9","53.0"]},{"key":["47.4+47.54","2011M06"],"values":["90.2","57.3"]},{"key":["47.4+47.54","2011M07"],"values":["95.5","61.1"]},{"key":["47.4+47.54","2011M08"],"values":["97.1","62.3"]},{"key":["47.4+47.54","2011M09"],"values":["96.3","62.4"]},{"key":["47.4+47.54","2011M10"],"values":["97.0","63.6"]},{"key":["47.4+47.54","2011M11"],"values":["104.5","69.2"]},{"key":["47.4+47.54","2011M12"],"values":["171.4","113.9"]},{"key":["47.4+47.54","2012M01"],"values":["93.7","62.8"]},{"key":["47.4+47.54","2012M02"],"values":["78.3","53.1"]},{"key":["47.4+47.54","2012M03"],"values":["87.2","60.1"]},{"key":["47.4+47.54","2012M04"],"values":["82.7","57.4"]},{"key":["47.4+47.54","2012M05"],"values":["81.1","56.8"]},{"key":["47.4+47.54","2012M06"],"values":["92.8","66.3"]},{"key":["47.4+47.54","2012M07"],"values":["88.4","64.0"]},{"key":["47.4+47.54","2012M08"],"values":["92.7","68.0"]},{"key":["47.4+47.54","2012M09"],"values":["96.1","71.5"]},{"key":["47.4+47.54","2012M10"],"values":["92.4","69.7"]},{"key":["47.4+47.54","2012M11"],"values":["99.2","75.9"]},{"key":["47.4+47.54","2012M12"],"values":["147.5","115.5"]},{"key":["47.4+47.54","2013M01"],"values":["89.6","70.6"]},{"key":["47.4+47.54","2013M02"],"values":["75.5","59.9"]},{"key":["47.4+47.54","2013M03"],"values":["79.5","63.7"]},{"key":["47.4+47.54","2013M04"],"values":["76.2","62.0"]},{"key":["47.4+47.54","2013M05"],"values":["79.0","65.0"]},{"key":["47.4+47.54","2013M06"],"values":["84.6","70.5"]},{"key":["47.4+47.54","2013M07"],"values":["85.7","73.0"]},{"key":["47.4+47.54","2013M08"],"values":["91.6","77.8"]},{"key":["47.4+47.54","2013M09"],"values":["90.6","77.4"]},{"key":["47.4+47.54","2013M10"],"values":["93.0","79.8"]},{"key":["47.4+47.54","2013M11"],"values":["97.4","84.3"]},{"key":["47.4+47.54","2013M12"],"values":["151.0","133.0"]},{"key":["47.4+47.54","2014M01"],"values":["92.3","81.6"]},{"key":["47.4+47.54","2014M02"],"values":["75.7","67.6"]},{"key":["47.4+47.54","2014M03"],"values":["82.3","74.5"]},{"key":["47.4+47.54","2014M04"],"values":["79.6","72.7"]},{"key":["47.4+47.54","2014M05"],"values":["80.3","73.9"]},{"key":["47.4+47.54","2014M06"],"values":["92.7","85.9"]},{"key":["47.4+47.54","2014M07"],"values":["88.0","82.7"]},{"key":["47.4+47.54","2014M08"],"values":["94.4","88.6"]},{"key":["47.4+47.54","2014M09"],"values":["100.2","95.3"]},{"key":["47.4+47.54","2014M10"],"values":["103.0","98.9"]},{"key":["47.4+47.54","2014M11"],"values":["104.4","100.0"]},{"key":["47.4+47.54","2014M12"],"values":["159.9","154.1"]},{"key":["47.4+47.54","2015M01"],"values":["95.9","93.3"]},{"key":["47.4+47.54","2015M02"],"values":["80.5","78.3"]},{"key":["47.4+47.54","2015M03"],"values":["90.4","88.5"]},{"key":["47.4+47.54","2015M04"],"values":["82.6","81.2"]},{"key":["47.4+47.54","2015M05"],"values":["85.9","84.4"]},{"key":["47.4+47.54","2015M06"],"values":["97.5","96.8"]},{"key":["47.4+47.54","2015M07"],"values":["95.1","95.0"]},{"key":["47.4+47.54","2015M08"],"values":["93.7","93.8"]},{"key":["47.4+47.54","2015M09"],"values":["98.4","99.4"]},{"key":["47.4+47.54","2015M10"],"values":["105.5","107.5"]},{"key":["47.4+47.54","2015M11"],"values":["114.6","116.9"]},{"key":["47.4+47.54","2015M12"],"values":["159.9","164.9"]},{"key":["47.4+47.54","2016M01"],"values":["91.4","95.8"]},{"key":["47.4+47.54","2016M02"],"values":["84.7","90.1"]},{"key":["47.4+47.54","2016M03"],"values":["89.6","96.2"]},{"key":["47.4+47.54","2016M04"],"values":["87.9","94.8"]},{"key":["47.4+47.54","2016M05"],"values":["84.6","92.1"]},{"key":["47.4+47.54","2016M06"],"values":["95.0","105.6"]},{"key":["47.4+47.54","2016M07"],"values":["93.0","104.3"]},{"key":["47.4+47.54","2016M08"],"values":["96.1","106.9"]},{"key":["47.4+47.54","2016M09"],"values":["98.2","110.5"]},{"key":["47.4+47.54","2016M10"],"values":["103.2","116.4"]},{"key":["47.4+47.54","2016M11"],"values":["116.6","132.3"]},{"key":["47.4+47.54","2016M12"],"values":["155.6","177.2"]},{"key":["47.4+47.54","2017M01"],"values":["94.7","108.3"]},{"key":["47.4+47.54","2017M02"],"values":["79.2","91.4"]},{"key":["47.4+47.54","2017M03"],"values":["88.8","102.8"]},{"key":["47.4+47.54","2017M04"],"values":["80.3","93.9"]},{"key":["47.4+47.54","2017M05"],"values":["82.9","97.4"]},{"key":["47.4+47.54","2017M06"],"values":["94.2","111.0"]},{"key":["47.4+47.54","2017M07"],"values":["88.3","103.4"]},{"key":["47.4+47.54","2017M08"],"values":["91.0","105.8"]},{"key":["47.4+47.54","2017M09"],"values":["92.6","107.9"]},{"key":["47.4+47.54","2017M10"],"values":["97.9","115.2"]},{"key":["47.4+47.54","2017M11"],"values":["121.2","142.7"]},{"key":["47.4+47.54","2017M12"],"values":["149.7","177.7"]},{"key":["47.4+47.54","2018M01"],"values":["98.1","116.3"]},{"key":["47.4+47.54","2018M02"],"values":["79.0","94.6"]},{"key":["47.4+47.54","2018M03"],"values":["93.0","112.9"]},{"key":["47.4+47.54","2018M04"],"values":["85.6","104.3"]}]}
There's two strategies you can use, response.json() will give you back a dict with all the JSON in key-value format using requests internal JSON parser, the other if you want to use the actual json library is to do json_data = json.loads(response.text) and allow the json library to parse it instead.
In general the requests JSON parser is probably enough for what you need.

How do I read a json file into python?

I'm new to JSON and Python, any help on this would be greatly appreciated.
I read about json.loads but am confused
How do I read a file into Python using json.loads?
Below is my JSON file format:
{
"header": {
"platform":"atm"
"version":"2.0"
}
"details":[
{
"abc":"3"
"def":"4"
},
{
"abc":"5"
"def":"6"
},
{
"abc":"7"
"def":"8"
}
]
}
My requirement is to read the values of all "abc" "def" in details and add this is to a new list like this [(1,2),(3,4),(5,6),(7,8)]. The new list will be used to create a spark data frame.
Open the file, and get a filehandle:
fh = open('thefile.json')
https://docs.python.org/2/library/functions.html#open
Then, pass the file handle into json.load(): (don't use loads - that's for strings)
import json
data = json.load(fh)
https://docs.python.org/2/library/json.html#json.load
From there, you can easily deal with a python dictionary that represents your json-encoded data.
new_list = [(detail['abc'], detail['def']) for detail in data['details']]
Note that your JSON format is also wrong. You will need comma delimiters in many places, but that's not the question.
I'm trying to understand your question as best as I can, but it looks like it was formatted poorly.
First off your json blob is not valid json, it is missing quite a few commas. This is probably what you are looking for:
{
"header": {
"platform": "atm",
"version": "2.0"
},
"details": [
{
"abc": "3",
"def": "4"
},
{
"abc": "5",
"def": "6"
},
{
"abc": "7",
"def": "8"
}
]
}
Now assuming you are trying to parse this in python you will have to do the following.
import json
json_blob = '{"header": {"platform": "atm","version": "2.0"},"details": [{"abc": "3","def": "4"},{"abc": "5","def": "6"},{"abc": "7","def": "8"}]}'
json_obj = json.loads(json_blob)
final_list = []
for single in json_obj['details']:
final_list.append((int(single['abc']), int(single['def'])))
print(final_list)
This will print the following: [(3, 4), (5, 6), (7, 8)]

Categories

Resources