Extracting fields from a json file - python

I am provided with the following json file.
{
"results": [
{
"id": "1234",
"useless_field1": null,
"useless_field2": null,
"data": [
{
"type": "type_1",
"useless_field3": null,
"volumne": "5"
}
]
}
]
}
I would like to extract only the following fields: id, type, volume. I did it in the following way.
def extract(json_response):
return [{
item['id'],
item['data'][0]['type'],
item['data'][0]['volume']
} for item in json_response['results']]
It is possible for the type to be a list. Is my solution efficient? Are there any alternatives of solving such problems?

Related

How can I retrieve relative document in MongoDB?

I'm using Flask with Jinja2 template engine and MongoDB via pymongo. This are my documents from two collections (phone and factory):
phone = db.get_collection("phone")
{
"_id": ObjectId("63d8d39206c9f93e68d27206"),
"brand": "Apple",
"model": "iPhone XR",
"year": NumberInt("2016"),
"image": "https://apple-mania.com.ua/media/catalog/product/cache/e026f651b05122a6916299262b60c47d/a/p/apple-iphone-xr-yellow_1.png",
"CPU": {
"manufacturer": "A12 Bionic",
"cores": NumberInt("10")
},
"misc": [
"Bluetooth 5.0",
"NFC",
"GPS"
],
"factory_id": ObjectId("63d8d42b7a4d7a7e825ef956")
}
factory = db.get_collection("factory")
{
"_id": ObjectId("63d8d42b7a4d7a7e825ef956"),
"name": "Foxconn",
"stock": NumberInt("1000")
}
In my python code to retrieve the data I do:
models = list(
phone.find({"brand": brand}, projection={"model": True, "image": True, "factory_id": True})
)
How can I retrieve relative factory document by factory_id and have it as an embedded document in a models list?
I think you are looking for this query using aggregation stage $lookup.
So this query:
First $match by your desired brand.
Then do a "join" between collections based on the factory_id and store it in an array called "factory". The $lookup output is always an array because can be more than one match.
Last project only values you want. In this case, as _id is unique you can get the factory using $arrayElemAt position 0.
So the code can be like this (I'm not a python expert)
models = list(
phone.aggregate([
{
"$match": {
"brand": brand
}
},
{
"$lookup": {
"from": "factory",
"localField": "factory_id",
"foreignField": "_id",
"as": "factories"
}
},
{
"$project": {
"model": True,
"image": True,
"factory": {
"$arrayElemAt": [
"$factories",
0
]
}
}
}
])
)

how to handle complex json where key is not always present using python?

I need support in getting the value of Key issues which is not always present in JSON.
my JSON object is as below -
{
"records":[
{
"previousAttempts": [],
"id": "aaaaa-aaaaaa-aaaa-aaa",
"parentId": null
},
{
"previousAttempts": [],
"id": "aaaaa-aaaaaa-aaaa-aaa",
"parentId": null,
"issues":[
{
"type": "warning",
"category": "General"
},
{
"type": "warning",
"category": "General"
}
]
}
]
}
This should work for you
import json
data = json.loads(json_data)
issues = [r.get('issues', []) for r in data['records']]
You can use Pydantic to deserialize JSON string to Pydantic object. You can make a dict object from pydantic object then. You can set default value for fields you need.
You can use validators to validate loaded values.
from pydantic import BaseModel, Field
class AttemptModel(BaseModel):
pass
class IssueModel(BaseModel):
type: str
category: str
class RecordModel(BaseModel):
id: str
parent_id: int = Field(alias='parentId', default=None)
previous_attempts: list[AttemptModel] = Field(alias='previousAttempts')
issues: list[IssueModel] = Field(default=[])
class DataModel(BaseModel):
records: list[RecordModel]
data = """{
"records":[
{
"previousAttempts": [],
"id": "aaaaa-aaaaaa-aaaa-aaa",
"parentId": null
},
{
"previousAttempts": [],
"id": "aaaaa-aaaaaa-aaaa-aaa",
"parentId": null,
"issues":[
{
"type": "warning",
"category": "General"
},
{
"type": "warning",
"category": "General"
}
]
}
]
}
"""
data_model = DataModel.parse_raw(data)
print(data_model.dict())
You can find addition documentation here: https://pydantic-docs.helpmanual.io/
Also it contain installation steps.

How to get a value from JSON

This is the first time I'm working with JSON, and I'm trying to pull url out of the JSON below.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer,
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
I have been able to access description and _id via
data = json.loads(line)
if 'xpath' in data:
xpath = data["_id"]
description = data["sections"][0]["payload"][0]["description"]
However, I can't seem to figure out a way to access url. One other issue I have is there could be other items in sections, which makes indexing into Contact Info a non starter.
Hope this helps:
import json
with open("test.json", "r") as f:
json_out = json.load(f)
for i in json_out["sections"]:
for j in i["payload"]:
for key in j:
if "url" in key:
print(key, '->', j[key])
I think your JSON is damaged, it should be like that.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
You can check it on http://json.parser.online.fr/.
And if you want to get the value of the url.
import json
j = json.load(open('yourJSONfile.json'))
print(j['sections'][1]['payload'][0]['url'])
I think it's worth to write a short function to get the url(s) and make a decision whether or not to use the first found url in the returned list, or skip processing if there's no url available in your data.
The method shall looks like this:
def extract_urls(data):
payloads = []
for section in data['sections']:
payloads += section.get('payload') or []
urls = [x['url'] for x in payloads if 'url' in x]
return urls
This should print out the URL
import json
# open json file to read
with open('test.json','r') as f:
# load json, parameter as json text (file contents)
data = json.loads(f.read())
# after observing format of JSON data, the location of the URL key
# is determined and the data variable is manipulated to extract the value
print(data['sections'][1]['payload'][0]['url'])
The exact location of the 'url' key:
1st (position) of the array which is the value of the key 'sections'
Inside the array value, there is a dict, and the key 'payload' contains an array
In the 0th (position) of the array is a dict with a key 'url'
While testing my solution, I noticed that the json provided is flawed, after fixing the json flaws(3), I ended up with this.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"}
After utilizing the JSON that was provided by Vincent55.
I made a working code with exception handling and with certain assumptions.
Working Code:
## Assuming that the target data is always under sections[i].payload
from json import loads
line = open("data.json").read()
data = loads(line)["sections"]
for x in data:
try:
# With assumption that there is only one payload
if x["payload"][0]["url"]:
print(x["payload"][0]["url"])
except KeyError:
pass

JSON or Python dict / list decoding problem

I have been using the Python script below to try and retrieve and extract some data from Flightradar24, it would appear that it extracts the data in JSON format and will print the data out ok fully using json.dumps, but when I attempt to select the data I want (the status text in this case) using get it gives the following error:
'list' object has no attribute 'get'
Is the Data in JSON or a List ? I'm totally confused now.
I'm fairly new to working with data in JSON format, any help would be appreciated!
Script:
import flightradar24
import json
flight_id = 'BA458'
fr = flightradar24.Api()
flight = fr.get_flight(flight_id)
y = flight.get("data")
print (json.dumps(flight, indent=4))
X= (flight.get('result').get('response').get('data').get('status').get('text'))
print (X)
Sample of output data:
{
"result": {
"request": {
"callback": null,
"device": null,
"fetchBy": "flight",
"filterBy": null,
"format": "json",
"limit": 25,
"page": 1,
"pk": null,
"query": "BA458",
"timestamp": null,
"token": null
},
"response": {
"item": {
"current": 16,
"total": null,
"limit": 25
},
"page": {
"current": 1,
"total": null
},
"timestamp": 1546241512,
"data": [
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
"live": false,
"text": "Scheduled",
"icon": null,
"estimated": null,
"ambiguous": false,
"generic": {
"status": {
"text": "scheduled",
"type": "departure",
"color": "gray",
"diverted": null
},
You can use print(type(variable_name)) to see what type it is. The .get(key[,default]) is not supported on lists - it is supported for dict's.
X = (flight.get('result').get('response').get('data').get('status').get('text'))
# ^^^^^^^^ does not work, data is a list of dicts
as data is a list of dicts:
"data": [ # <<<<<< this is a list
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
This should work:
X = (flight.get('result').get('response').get('data')[0].get('status').get('text')
The issue, as pointed out by #PatrickArtner, is your data is actually a list rather than a dictionary. As an aside, you may find your code more readable if you were to use a helper function to apply dict.get repeatedly on a nested dictionary:
from functools import reduce
def ng(dataDict, mapList):
"""Nested Getter: Iterate nested dictionary"""
return reduce(dict.get, mapList, dataDict)
X = ng(ng(flight, ['result', 'response', 'data'])[0], ['status'[, 'text']])

How to make a 'outer' JSON key for JSON object with python

I would like to make the following JSON syntax output with python:
data={
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
But I cannot manage to get the 'outer' data key there.
So far I got this output without the data key:
{
"timestamp": "1462868427",
"sites": [
{
"name": "SiteA",
"zone": 1
},
{
"name": "SiteB",
"zone": 7
}
]
}
I have tried with this python code:
sites = [
{
"name":"nameA",
"zone":123
},
{
"name":"nameB",
"zone":324
}
]
data = {
"timestamp": 123456567,
"sites": sites
}
print(json.dumps(data, indent = 4))
But how do I manage to get the outer 'data' key there?
Once you have your data ready, you can simply do this :
data = {'data': data}
JSON doesn't have =, it's all key:value.
What you're looking for is
data = {
"data": {
"timestamp": 123456567,
"sites": sites
}
}
json.dumps(data)
json.dumps() doesn't care for the name you give to the data object in python. You have to specify it manually inside the object, as a string.

Categories

Resources