Python - Parsing JSON Data Set

Python - Parsing JSON Data Set - python

I am trying to parse a JSON data set that looks something like this:
{"data":[
{
"Rest":0,
"Status":"The campaign is moved to the archive",
"IsActive":"No",
"StatusArchive":"Yes",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":1111111,
"StatusShow":"No",
"StartDate":"2013-01-20",
"Sum":0,
"StatusModerate":"Yes",
"Clicks":0,
"Shows":0,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_01"
},
{
"Rest":82.6200000000008,
"Status":"Impressions will begin tomorrow at 10:00",
"IsActive":"Yes",
"StatusArchive":"No",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":2222222,
"StatusShow":"Yes",
"StartDate":"2013-01-28",
"Sum":15998,"StatusModerate":"Yes",
"Clicks":7571,
"Shows":5535646,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_02"
}
]
}
Lets assume that there can be many of these data sets.
I would like to iterate through each one of them and grab the "Name" and the "Campaign ID" parameter.
So far my code looks something like this:
decoded_response = response.read().decode("UTF-8")
data = json.loads(decoded.response)
for item in data[0]:
for x in data[0][item] ...
-> need a get name procedure
-> need a get campaign_id procedure
Probably quite straight forward! I am not good with lists/dictionaries :(

Access dictionaries with d[dict_key] or d.get(dict_key, default) (to provide default value):
jsonResponse=json.loads(decoded_response)
jsonData = jsonResponse["data"]
for item in jsonData:
name = item.get("Name")
campaignID = item.get("CampaignID")
I suggest you read something about dictionaries.

Related

how can i take a specific element from this list in python?

I'm working with the Microsoft Azure face API and I want to get only the glasses response.
heres my code:
########### Python 3.6 #############
import http.client, urllib.request, urllib.parse, urllib.error, base64, requests, json
###############################################
#### Update or verify the following values. ###
###############################################
# Replace the subscription_key string value with your valid subscription key.
subscription_key = '(MY SUBSCRIPTION KEY)'
# Replace or verify the region.
#
# You must use the same region in your REST API call as you used to obtain your subscription keys.
# For example, if you obtained your subscription keys from the westus region, replace
# "westcentralus" in the URI below with "westus".
#
# NOTE: Free trial subscription keys are generated in the westcentralus region, so if you are using
# a free trial subscription key, you should not need to change this region.
uri_base = 'https://westcentralus.api.cognitive.microsoft.com'
# Request headers.
headers = {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': subscription_key,
}
# Request parameters.
params = {
'returnFaceAttributes': 'glasses',
}
# Body. The URL of a JPEG image to analyze.
body = {'url': 'https://upload.wikimedia.org/wikipedia/commons/c/c3/RH_Louise_Lillian_Gish.jpg'}
try:
# Execute the REST API call and get the response.
response = requests.request('POST', uri_base + '/face/v1.0/detect', json=body, data=None, headers= headers, params=params)
print ('Response:')
parsed = json.loads(response.text)
info = (json.dumps(parsed, sort_keys=True, indent=2))
print(info)
except Exception as e:
print('Error:')
print(e)
and it returns a list like this:
[
{
"faceAttributes": {
"glasses": "NoGlasses"
},
"faceId": "0f0a985e-8998-4c01-93b6-8ef4bb565cf6",
"faceRectangle": {
"height": 162,
"left": 177,
"top": 131,
"width": 162
}
}
]
I want just the glasses attribute so it would just return either "Glasses" or "NoGlasses"
Thanks for any help in advance!

I think you're printing the whole response, when really you want to drill down and get elements inside it. Try this:
print(info[0]["faceAttributes"]["glasses"])
I'm not sure how the API works so I don't know what your specified params are actually doing, but this should work on this end.
EDIT: Thank you to #Nuageux for noting that this is indeed an array, and you will have to specify that the first object is the one you want.

I guess that you can get few elements in that list, so you could do this:
info = [
{
"faceAttributes": {
"glasses": "NoGlasses"
},
"faceId": "0f0a985e-8998-4c01-93b6-8ef4bb565cf6",
"faceRectangle": {
"height": 162,
"left": 177,
"top": 131,
"width": 162
}
}
]
for item in info:
print (item["faceAttributes"]["glasses"])
>>> 'NoGlasses'

Did you try:
glasses = parsed[0]['faceAttributes']['glasses']

This looks more like a dictionary than a list. Dictionaries are defined using the { key: value } syntax, and can be referenced by the value for their key. In your code, you have faceAttributes as a key that for value contains another dictionary with a key glasses leading to the last value that you want.
Your info object is a list with one element: a dictionary. So in order to get at the values in that dictionary, you'll need to tell the list where the dictionary is (at the head of the list, so info[0]).
So your reference syntax will be:
#If you want to store it in a variable, like glass_var
glass_var = info[0]["faceAttributes"]["glasses"]
#Or if you want to print it directly
print(info[0]["faceAttributes"]["glasses"])
What's going on here? info[0] is the dictionary containing several keys, including faceAttributes,faceId and faceRectangle. faceRectangle and faceAttributes are both dictionaries in themselves with more keys, which you can reference to get their values.
Your printed tree there is showing all the keys and values of your dictionary, so you can reference any part of your dictionary using the right keys:
print(info["faceId"]) #prints "0f0a985e-8998-4c01-93b6-8ef4bb565cf6"
print(info["faceRectangle"]["left"]) #prints 177
print(info["faceRectangle"]["width"]) #prints 162
If you have multiple entries in your info list, then you'll have multiple dictionaries, and you can get all the outputs as so:
for entry in info: #Note: "entry" is just a variable name,
# this can be any name you want. Every
# iteration of entry is one of the
# dictionaries in info.
print(entry["faceAttributes"]["glasses"])
Edit: I didn't see that info was a list of a dictionary, adapted for that fact.

Dynamically create list/dict during for loop for conversion to JSON

I am trying to build a list/dict that will be converted to JSON later on. I am trying to write the code that builds and populates the multiple levels of the JSON format I ultimately need. I am having an issue wrapping my head around this. Thank you for the help.
What I ultimately need -> Populate this list/dict:
dataset_permission_json = []
with this format:
{
"projects":[
{
"project":"test-project-1",
"datasets":[
{
"dataset":"testing1",
"permissions":[
{
"role":"READER",
"google_group":"testing1#test.com"
}
]
},
{
"dataset":"testing2",
"permissions":[
{
"role":"OWNER",
"google_group":"testing2#test.com"
}
]
},
{
"dataset":"testing3",
"permissions":[
{
"role":"READER",
"google_group":"testing3#test.com"
}
]
},
{
"dataset":"testing4",
"permissions":[
{
"role":"WRITER",
"google_group":"testing4#test.com"
}
]
}
]
}
]
}
I have multiple for loops that successfully print out the information I am pulling from an external API but I to be able to enter that data into the list/dict. The dynamic values I am trying to input are:
'project' i.e. test-project-1
'dataset' i.e. testing1
'role' i.e. READER
'google_group' i.e. testing1#test.com
I have tried things like:
dataset_permission_json.update({'project': project})
but cannot figure out how not to overwrite the data during the multiple for loops.
for project in projects:
print(project) ## Need to add this variable to 'projects'
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
print(dataset['datasetReference']['datasetId']) ##Add the dataset to 'datasets' under the project
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
### ADD THE NEXT LEVEL 'permissions' level here?
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
print(dataset['datasetReference']['datasetId'] && dataset_permission['groupByEmail']) ##Add to each dataset
I appreciate the help.
EDIT: Updated Progress
Ok I have created the nested structure that I was looking for using StackOverflow
Things are great except for the last part. I am trying to append the role & group to each 'permission' nest, but after everything runs the data is only appended to the last 'permission' nest in the JSON structure. It seems like it is overwriting itself during the for loop. Thoughts?
Updated for loop:
for project in projects:
for bq_group in bq_groups:
delegated_credentials = credentials.create_delegated(bq_group)
http_auth = delegated_credentials.authorize(Http())
list_datasets_in_project = bigquery_service.datasets().list(projectId=project).execute()
datasets = list_datasets_in_project.get('datasets',[])
for dataset in datasets:
get_dataset_permissions_result = bigquery_service.datasets().get(projectId=project, datasetId=dataset['datasetReference']['datasetId']).execute()
dataset_permissions = get_dataset_permissions_result.get('access',[])
for dataset_permission in dataset_permissions:
if 'groupByEmail' in dataset_permission:
if bq_group in dataset_permission['groupByEmail']:
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['permissions'] = permission
UPDATE: Solved.
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions']
permission = {'group': dataset_permission['groupByEmail'],'role': dataset_permission['role']}
dataset_permission_json['projects'][project]['datasets'][dataset['datasetReference']['datasetId']]['permissions'] = permission

How to create a function that subtracts the current value from previous value every time it's called

I am querying a dataset that works like this: each time I query it produces a different value of Bytes_Written and Bytes_Read. What I cannot seem to accomplish is to subtract the current value from the previous value and keep doing this every second.
Here is what the data looks like:
{
"Number of Devices": 2,
"Block Devices": {
"bdev0": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-d1c8e7c6-8c77-444c-9a93-8b56fa1e37f2-lun-010.0.0.142",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "97069",
"Bytes_Written": "34410496",
"Bytes_Read": "363172864"
},
"bdev1": {
"Backend_Device_Path": "/dev/disk/by-path/ip-192.168.26.1:3260-iscsi-iqn.2010-10.org.openstack:volume-b27110f9-41ba-4bc6-b97c-b5dde23af1f9-lun-010.0.0.146",
"Capacity": "2147483648",
"Guest_Device_Name": "vdb",
"IO_Operations": "93",
"Bytes_Written": "0",
"Bytes_Read": "380928"
}
}
}
The code that queries the data:
def counterVolume_one():
#Get data
url = 'http://url:8080/vrio/blk'
r = requests.get(url)
data = r.json()
wanted = {'Bytes_Written', 'Bytes_Written', 'IO_Operation'}
for d in data['Block Devices'].itervalues():
values = {k: v for k, v in d.iteritems() if k in wanted}
print json.dumps(values)
counterVolume_one()
The way I want to get the output is:
{"IO_Operations": "97069", "Bytes_Read": "363172864", "Bytes_Written": "34410496"}
{"IO_Operations": "93", "Bytes_Read": "380928", "Bytes_Written": "0"}
Here is what I want to accomplish:
first time query = first set of values
after 1 sec
second time query = first set of values-second set of values
after 1 sec
third time query = second set of values-third set of values
Expected output would be a json object as below
{'bytes-read': newvalue, 'bytes-written': newvalue, 'io_operations': newvalue}

The simplest fix may be to modify the counterVolume_one() function so that it accepts parameters that define the current state, and that you'd update as you collect data. For example, the following code collects and sums the the fields your interested in from the JSON documents:
FIELDS = ('Bytes_Written', 'Bytes_Read', 'IO_Operation')
def counterVolume_one(state):
url = 'http://url:8080/vrio/blk'
r = requests.get(url)
data = r.json()
for field in FIELDS:
state[field] += data[field]
return state
state = {"Bytes_Written": 0, "Bytes_Read": 0, "IO_Operation": 0}
while True:
counterVolume_one(state)
time.sleep(1)
for field in FIELDS:
print("{field:s}: {count:d}".format(field=field,
count=state[field]))
A more correct fix might be to use a class to hold the state, and that has a method that updates the state. But, I think following the idea above will get you the solution quickest.

How to turn a dataframe of categorical data into a dictionary

I have a dataframe that I need to transform into JSON. I think it would be easier to first turn it into a dictionary, but I can't figure out how. I need to transform it into JSON so that I can visualize it with js.d3
Here is what the data looks like currently:
NAME, CATEGORY, TAG
Ex1, Education, Books
Ex2, Transportation, Bus
Ex3, Education, Schools
Ex4, Education, Books
Ex5, Markets, Stores
Here is what I want the data to look like:
Data = {
Education {
Books {
key: Ex1,
key: Ex2
}
Schools {
key: Ex3
}
}
Transportation {
Bus {
key: Ex2
}
}
Markets {
Stores {
key: Ex5
}
}
(I think my JSON isn't perfect here, but I just wanted to convey the general idea).

This code is thanks to Brent Washburne's very helpful answer above. I just needed to remove the tags column because for now it was too messy (many of the rows had more than one tag separated by commas). I also added a column (of integers) which I wanted connected to the names. Here it is:
import json, string
import pprint
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], [])
to_append = {}
to_append[fields[0]] = fields[3]
categories.append(to_append)
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')

You can't use 'key' as a key more than once, so the innermost group is a list:
import json, string
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], {})
tags = categories.get(fields[2], [])
tags.append(fields[0])
categories[fields[2]] = tags
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')
Result:
{"Markets": {"Stores": ["Ex5"]}, "Education": {"Schools": ["Ex3"], "Books": ["Ex1", "Ex4"]}, "Transportation": {"Bus": ["Ex2"]}}

Parsing json file with changeable structure in Python

I'm using Yahoo Placemaker API which gives different structure of json depending on input.
Simple json file looks like this:
{
'document':{
'itemDetails':{
'id'='0'
'prop1':'1',
'prop2':'2'
}
'other':{
'propA':'A',
'propB':'B'
}
}
}
When I want to access itemDetails I simply write json_file['document']['itemDetails'].
But when I get more complicated response, such as
{
'document':{
'1':{
'itemDetails':{
'id'='1'
'prop1':'1',
'prop2':'2'
}
},
'0':{
'itemDetails':{
'id'='0'
'prop1':'1',
'prop2':'2'
},
'2':{
'itemDetails':{
'id'='1'
'prop1':'1',
'prop2':'2'
}
'other':{
'propA':'A',
'propB':'B'
}
}
}
the solution obviously does not work.
I use id, prop1 and prop2 to create objects.
What would be the best approach to automatically access itemDetails in the second case without writing json_file['document']['0']['itemDetails'] ?

If I understand correctly, you want to loop through all of json_file['document']['0']['itemDetails'], json_file['document']['1']['itemDetails'], ...
If that's the case, then:
item_details = {}
for key, value in json_file['document']:
item_details[key] = value['itemDetails']
Or, a one-liner:
item_details = {k: v['itemDetails'] for k, v in json_file['document']}
Then, you would access them as item_details['0'], item_details['1'], ...
Note: You can suppress the single quotes around 0 and 1, by using int(key) or int(k).
Edit:
If you want to access both cases seamlessly (whether there is one result or many), you could check:
if 'itemDetails' in json_file['document']:
item_details = {'0': json_file['document']['itemDetails']}
else:
item_details = {k: v['itemDetails'] for k, v in json_file['document'] if k != 'other'}
Then loop through the item_details dict.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Parsing JSON Data Set - python

Access dictionaries with d[dict_key] or d.get(dict_key, default) (to provide default value): jsonResponse=json.loads(decoded_response) jsonData = jsonResponse["data"] for item in jsonData: name = item.get("Name") campaignID = item.get("CampaignID") I suggest you read something about dictionaries.

Related

how can i take a specific element from this list in python?

Dynamically create list/dict during for loop for conversion to JSON

How to create a function that subtracts the current value from previous value every time it's called

How to turn a dataframe of categorical data into a dictionary

Parsing json file with changeable structure in Python

Categories

Resources