How to parse json with ijson and python

How to parse json with ijson and python - python

I have JSON data as an array of dictionaries which comes as the request payload.
[
{ "Field1": 1, "Feld2": "5" },
{ "Field1": 3, "Feld2": "6" }
]
I tried ijson.items(f, '') which yields the entire JSON object as one single item. Is there a way I can iterate the items inside the array one by one using ijson?
Here is the sample code I tried which is yielding the JSON as one single object.
f = open("metadatam1.json")
objs = ijson.items(f, '')
for o in objs:
print str(o) + "\n"
[{'Feld2': u'5', 'Field1': 1}, {'Feld2': u'6', 'Field1': 3}]

I'm not very familiar with ijson, but reading some of its code it looks like calling items with a prefix of "item" should work to get the items of the array, rather than the top-level object:
for item in ijson.items(f, "item"):
# do stuff with the item dict

Related

How to print out a value in a json, with only 1 'searchstring'

payload = {
"data": {
"name": "John",
"surname": "Doe"
}
}
print(payload["data"]["name"])
I want to print out the value of 'name' inside the json. I know the way to do it like above. But is there also a way to print out the value of 'name' with only 1 'search string'?
I'm looking for something like this
print(payload["data:name"])
Output:
John

If you were dealing with nested attributes of an object I would suggest operator.attrgetter, however, the itemgetter in the same module does not seems to support nested key access. It is fairly easy to implement something similar tho:
payload = {
"data": {
"name": "John",
"surname": "Doe",
"address": {
"postcode": "667"
}
}
}
def get_key_path(d, path):
# Remember latest object
obj = d
# For each key in the given list of keys
for key in path:
# Look up that key in the last object
if key not in obj:
raise KeyError(f"Object {obj} has no key {key}")
# now we know the key exists, replace
# last object with obj[key] to move to
# the next level
obj = obj[key]
return obj
print(get_key_path(payload, ["data"]))
print(get_key_path(payload, ["data", "name"]))
print(get_key_path(payload, ["data", "address", "postcode"]))
Output:
$ python3 ~/tmp/so.py
{'name': 'John', 'surname': 'Doe', 'address': {'postcode': '667'}}
John
667
You can always later decide on a separator character and use a single string instead of path, however, you need to make sure this character does not appear in a valid key. For example, using |, the only change you need to do in get_key_path is:
def get_key_path(d, path):
obj = d
for key in path.split("|"): # Here
...

There isn't really a way you can do this by using the 'search string'. You can use the get() method, but like getting it using the square brackets, you will have to first parse the dictionary inside the data key.

You could try creating your own function that uses something like:
str.split(sep=None, maxsplit=-1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
def get_leaf_value(d, search_string):
if ":" not in search_string:
return d[search_string]
next_d, next_search_string = search_string.split(':', 1)
return get_value(d[next_d], next_search_string)
payload = {
"data": {
"name": "John",
"surname": "Doe"
}
}
print(payload["data"]["name"])
print(get_leaf_value(payload, "data:name"))
Output:
John
John
This approach will only work if your data is completely nested dictionaries like in your example (i.e., no lists in non-leaf nodes) and : is not part of any keys obviously.

Here is an alternative. Maybe an overkill, it depends.
jq uses a single "search string" - an expression called 'jq program' by the author - to extract and transform data. It is a powerful tool meaning the jq program can be quite complex. Reading a good tutorial is almost a must.
import pyjq
payload = ... as posted in the question ...
expr = '.data.name'
name = pyjq.one(expr, payload) # "John"
The original project (written in C) is located here. The python jq libraries are build on top of that C code.

reading from a json file using python

Im trying to read from this json file and print the values. I cant find out how to print all the values from the first (dictonary-index?) in the list.
I want to print the following:
website: https://www.amazon.com/Apple-iPhone-GSM-Unlocked-64GB/dp/B07
price: 382,76
How can i do it?
JSON file:
[
{
"website": "https://www.amazon.com/Apple-iPhone-GSM-Unlocked-64GB/dp/B078P5BK5G",
"price": "382,76"
},
{
"website": "https://www.ebay.com/itm/Apple-iPhone-8-Plus-GSM-Unlocked-64GB-Gold-Renewed-Gold-64-GB-Gold-64-GB-/143340730792",
"price": "609,15"
}
]
Python code:
Tried this
import json
with open('./result.json') as json_file:
data = json.load(json_file)
for p in data:
print(p["price"])
Output is the prices of the products:
382,76
609,15
Instead of printing the prices it should print the values in the first dict in the list. Any good tips on how to do this?

You are looping over the list of dictionaries. If you want to loop over the values of the first dictionary, you first need to get the first element, and loop over that one.
first_dict = data[0]
for value in first_dict.values():
print(value)

Python list formatting with JSON

I'm a newbie in Python trying to turn information from an Excel file into JSON output.
I'm trying to parse this Python list:
value = ['Position: Backstab, Gouge,', 'SumPosition: DoubleParse, Pineapple']
into this JSON format:
"value": [
{
"Position": [
"Backstab, Gouge,"
]
},
{
"SumPosition": [
"DoubleParse, Pineapple"
]
}
]
Please note:
This list was previously a string:
value = 'Position: Backstab, Gouge, SumPosition: DoubleParse, Pineapple'
Which I turned into a list by using re.split().
I've already turned the string into a list by using re.split, but I still can't turn the inside of the string into a dict, and the value from the dict into a list.
Is that even possible? Is it the case to format the list/string with JSON or previously prepare the string itself so it can receive the json.dump method?
Thanks in advance!

You can iterate over the list to achieve desired result.
d = {'value': []}
for val in value:
k, v = val.split(':')
tmp = {k.strip() : [v.strip()]}
d['value'].append(tmp)
print(d)
{'value': [{'Position': ['Backstab, Gouge,']},
{'SumPosition': ['DoubleParse, Pineapple']}]}

Here is a quick way.
value = ['Position: Backstab, Gouge,',
'SumPosition: DoubleParse, Pineapple']
dictionary_result = {}
for line in value:
key, vals = line.split(':')
vals = vals.split(',')
dictionary_result[key] = vals
Remaining tasks for you: trim off empty strings from result lists like [' Backstab', ' Gouge', ''], and actually convert the data from a Python dict to a JSON file

Write a list objects in JSON using Python

I am trying to output the following JSON from my python (2.7) script:
[
{
"id": "1002-00001",
"name": "Name 1"
},
{
"id": "1002-00002",
"display": "Name 2"
},
]
What data structure in Python will output this when using json.dumps?
The outermost item is a python list, but what should be the type of items inside the list? It looks like a dictionary with no keys?

Hopefully this clarifies the notes in comments that are not clear for you. It's achieved by appending (in this case small) dictionaries into a list.
import json
#Added an extra entry with an integer type. Doesn't have to be string.
full_list = [['1002-00001', 'Name 1'],
['1002-00002', 'Name 2'],
['1002-00003', 2]]
output_list = []
for item in full_list:
sub_dict = {}
sub_dict['id'] = item[0] # key-value pair defined
sub_dict['name'] = item[1]
output_list.append(sub_dict) # Just put the mini dictionary into a list
# See Python data structure
print output_list
# Specifically using json.dumps as requested in question.
# Automatically adds double quotes to strings for json formatting in printed
# output but keeps ints (unquoted)
json_object = json.dumps(output_list)
print json_object
# Writing to a file
with open('SO_jsonout.json', 'w') as outfile:
json.dump(output_list, outfile)
# What I think you are confused about with the "keys" is achieved with an
# outer dictionary (but isn't necessary to make a valid data structure, just
# one that you might be more used to seeing)
outer_dict = {}
outer_dict['so_called_missing_key'] = output_list
print outer_dict

how to delete json object using python?

I am using python to delete and update a JSON file generated from the data provided by user, so that only few items should be stored in the database. I want to delete a particular object from the JSON file.
My JSON file is:
[
{
"ename": "mark",
"url": "Lennon.com"
},
{
"ename": "egg",
"url": "Lennon.com"
}
]
I want to delete the JSON object with ename mark.
As I am new to python I tried to delete it by converting objects into dict but it is not working. Is there any other way to do it?
i tried this one:
index=0
while index < len(data):
next=index+1
if(data[index]['ename']==data[next]['ename']):
print "match found at"
print "line %d and %d" %(next,next+1)
del data[next]
index +=1

Here's a complete example that loads the JSON file, removes the target object, and then outputs the updated JSON object to file.
#!/usr/bin/python
# Load the JSON module and use it to load your JSON file.
# I'm assuming that the JSON file contains a list of objects.
import json
obj = json.load(open("file.json"))
# Iterate through the objects in the JSON and pop (remove)
# the obj once we find it.
for i in xrange(len(obj)):
if obj[i]["ename"] == "mark":
obj.pop(i)
break
# Output the updated file with pretty JSON
open("updated-file.json", "w").write(
json.dumps(obj, sort_keys=True, indent=4, separators=(',', ': '))
)
The main point is that we find the object by iterating through the objects in the loaded list, and then pop the object off the list once we find it. If you need to remove more than one object in the list, then you should store the indices of the objects you want to remove, and then remove them all at once after you've reached the end of the for loop (you don't want to modify the list while you iterate through it).

The proper way to json is to deserialize it, modify the created objects, and then, if needed, serialize them back to json.
To do so, use the json module. In short, use <deserialized object> = json.loads(<some json string>) for reading json and <json output> = json.dumps(<your object>) to create json strings.
In your example this would be:
import json
o = json.loads("""[
{
"ename": "mark",
"url": "Lennon.com"
},
{
"ename": "egg",
"url": "Lennon.com"
}
]""")
# kick out the unwanted item from the list
o = filter(lambda x: x['ename']!="mark", o)
output_string = json.dumps(o)

Your json file contains in a list of objects, which are dictionaries in Python. Just replace the list with a new one that doesn't have the object in it:
import json
with open('testdata.json', 'rb') as fp:
jsondata = json.load(fp)
jsondata = [obj for obj in jsondata if obj['ename'] != 'mark']
print(json.dumps(jsondata, indent=4))

You need to use the json module. I'm assuming python2. Try this:
import json
json_data = json.loads('<json_string>')
for i in xrange(len(json_data)):
if(json_data[i]["id"] == "mark"):
del json_data[i]
break

You have a list there with two items, which happen to be dictionaries. To remove the first, you can use list.remove(item) or list.pop(0) or del list[0].
http://docs.python.org/2/tutorial/datastructures.html#more-on-lists

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse json with ijson and python - python

I'm not very familiar with ijson, but reading some of its code it looks like calling items with a prefix of "item" should work to get the items of the array, rather than the top-level object: for item in ijson.items(f, "item"): # do stuff with the item dict

Related

How to print out a value in a json, with only 1 'searchstring'

reading from a json file using python

Python list formatting with JSON

Write a list objects in JSON using Python

how to delete json object using python?

Categories

Resources