Python 2.7.6, KeyError: 'data' when trying to run script - python

First of all, I am totally new at python. I am a graphic designer and I need to get group members photos for a group logo. I have found this:
https://github.com/lionaneesh/IIITD-Students-Collage
and it pretty much should do the thing I need, but apparently I am doing something wrong and it does not work as intended.
When I execute this script:
import json
from urllib2 import urlopen
fp = open("test2.txt")
data = json.loads(fp.read())
fp.close()
user_photos = {} # id -> [User's Name, Photo URL]
for user in data["data"]:
print user
page = urlopen("http://graph.facebook.com/" + user["id"] + "?fields=picture")
page_data = json.loads(page.read())
photo_url = page_data["picture"]["data"]["url"]
user_photos[user["id"]] = [user["name"], photo_url]
fp = open("user_photos.json", "w")
fp.write(json.dumps(user_photos))
I get this error:
Traceback (most recent call last):
File "C:\test.py", line 11, in <module>
for user in data["data"]:
KeyError: 'data'
>>>
Could someone explain to me how to fix it or where to seek for help?
edit: this is how the data in text2.txt looks:
{
"id": "1390694364479028",
"members": {
"data": [
{
"name": "Patryk Wiśniewski",
"administrator": false,
"id": "321297624692717"
},
{
"name": "Backed PL",
"administrator": false,
"id": "1440205746235525"
},
and so on, with other group members infos

You probably just don't have a JSON field for "data" in test2.txt

KeyError means that there is no such key in a dict object. Therefore it means your file does not contain JSON data structure like this according to your script.
{"data": {"id": 10000}, {"id": 20000}, {"id": 30000}}
It would help if you posted the contents of test2.txt or the output of print(data).
Edit: according to your text2.txt file, your program flow should be like this
for user in data["members"]["data"]:
print user
page = urlopen("http://graph.facebook.com/" + user["id"] + "?fields=picture")
page_data = json.loads(page.read())
photo_url = page_data["picture"]["data"]["url"]
user_photos[user["id"]] = [user["name"], photo_url]
You just simply change data["data"] to data["members"]["data"] to make your script work.

Looking at the docs, you should have exactly the same structure as the following in your txt file bar the details.
{
"data": [
{
"name": "Arushi Jain",
"administrator": false,
"id": "100000582289046"
},
{
"name": "Ajay Yadav",
"administrator": false,
"id": "100004213058283"
},
and so on ........
],
"paging": {
"next": "https://graph.facebook.com/114462201948585/members?limit=5000&offset=5000&__after_id=712305377"
}
}
{
{
"data": [ # how yours should look
{
"name": "Patryk Wiśniewski",
"administrator": false,
"id": "321297624692717"
},
{
"name": "Patryk Kurowski",
"administrator": false,
"id": "1429534777317507"
},
{
"name": "Jan Konieczny",
"administrator": false,
"id": "852450774783365"
}
],
"paging": {
"next": "https://graph.facebook.com/114462201948585/members?limit=5000&offset=5000&__after_id=712305377"
}
}
That is the very first thing that is executed in the loop so if it does not match exactly then it will fail as it does in your error.

Related

Parse complex JSON in Python

EDITED WITH LARGER JSON:
I have the following JSON and I need to get id element: 624ff9f71d847202039ec220
results": [
{
"id": "62503d2800c0d0004ee4636e",
"name": "2214524",
"settings": {
"dataFetch": "static",
"dataEntities": {
"variables": [
{
"id": "624ffa191d84720202e2ed4a",
"name": "temp1",
"device": {
"id": "624ff9f71d847202039ec220",
"name": "282c0240ea4c",
"label": "282c0240ea4c",
"createdAt": "2022-04-08T09:01:43.547702Z"
},
"chartType": "line",
"aggregationMethod": "last_value"
},
{
"id": "62540816330443111016e38b",
"device": {
"id": "624ff9f71d847202039ec220",
"name": "282c0240ea4c",
},
"chartType": "line",
}
]
}
...
Here is my code (EDITED)
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
all_ids = []
for i in data['results']: # i is a dictionary
for variable in i['settings']['dataEntities']['variables']:
print(variable['id'])
all_ids.append(variable['id'])
But I have the following error:
for variable in i['settings']['dataEntities']['variables']:
KeyError: 'dataEntities'
Could you please help?
Thanks!!
What is it printing when you print(fetc)? If you format the json, it will be easier to read, the current nesting is very hard to comprehend.
fetc is a string, not a dict. If you want the dict, you have to use the key.
Try:
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
for i in data['results']:
print(json.dumps(i['settings']))
print(i['settings']['dataEntities']
EDIT: To get to the id field, you'll need to dive further.
i['settings']['dataEntities']['variables'][0]['id']
So if you want all the ids you'll have to loop over the variables (assuming the list is more than one)`, and if you want them for all the settings, you'll need to loop over that too.
Full solution for you to try (EDITED after you uploaded the full JSON):
url = "API_URL"
response = urllib.urlopen(url)
data = json.loads(response.read().decode("utf-8"))
print url
all_ids = []
for i in data['results']: # i is a dictionary
for variable in i['settings']['dataEntities']['variables']:
print(variable['id'])
all_ids.append(variable['id'])
all_ids.append(variable['device']['id']
Let me know if that works.
The shared JSON is not valid. A valid JSON similar to yours is:
{
"results": [
{
"settings": {
"dataFetch": "static",
"dataEntities": {
"variables": [
{
"id": "624ffa191d84720202e2ed4a",
"name": "temp1",
"span": "inherit",
"color": "#2ccce4",
"device": {
"id": "624ff9f71d847202039ec220"
}
}
]
}
}
}
]
}
In order to get a list of ids from your JSON you need a double for cycle. A Pythonic code to do that is:
all_ids = [y["device"]["id"] for x in my_json["results"] for y in x["settings"]["dataEntities"]["variables"]]
Where my_json is your initial JSON.

How to get a value from JSON

This is the first time I'm working with JSON, and I'm trying to pull url out of the JSON below.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer,
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
I have been able to access description and _id via
data = json.loads(line)
if 'xpath' in data:
xpath = data["_id"]
description = data["sections"][0]["payload"][0]["description"]
However, I can't seem to figure out a way to access url. One other issue I have is there could be other items in sections, which makes indexing into Contact Info a non starter.
Hope this helps:
import json
with open("test.json", "r") as f:
json_out = json.load(f)
for i in json_out["sections"]:
for j in i["payload"]:
for key in j:
if "url" in key:
print(key, '->', j[key])
I think your JSON is damaged, it should be like that.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"
}
You can check it on http://json.parser.online.fr/.
And if you want to get the value of the url.
import json
j = json.load(open('yourJSONfile.json'))
print(j['sections'][1]['payload'][0]['url'])
I think it's worth to write a short function to get the url(s) and make a decision whether or not to use the first found url in the returned list, or skip processing if there's no url available in your data.
The method shall looks like this:
def extract_urls(data):
payloads = []
for section in data['sections']:
payloads += section.get('payload') or []
urls = [x['url'] for x in payloads if 'url' in x]
return urls
This should print out the URL
import json
# open json file to read
with open('test.json','r') as f:
# load json, parameter as json text (file contents)
data = json.loads(f.read())
# after observing format of JSON data, the location of the URL key
# is determined and the data variable is manipulated to extract the value
print(data['sections'][1]['payload'][0]['url'])
The exact location of the 'url' key:
1st (position) of the array which is the value of the key 'sections'
Inside the array value, there is a dict, and the key 'payload' contains an array
In the 0th (position) of the array is a dict with a key 'url'
While testing my solution, I noticed that the json provided is flawed, after fixing the json flaws(3), I ended up with this.
{
"name": "The_New11d112a_Company_Name",
"sections": [
{
"name": "Products",
"payload": [
{
"id": 1,
"name": "TERi Geriatric Patient Skills Trainer",
"type": "string"
}
]
},
{
"name": "Contact Info",
"payload": [
{
"id": 1,
"name": "contacts",
"url": "https://www.a3bs.com/catheterization-kits-8000892-3011958-3b-scientific,p_1057_31043.html",
"contacts": [
{
"name": "User",
"email": "Company Email",
"phone": "Company PhoneNumber"
}
],
"type": "contact"
}
]
}
],
"tags": [
"Male",
"Airway"
],
"_id": "0e4cd5c6-4d2f-48b9-acf2-5aa75ade36e1"}
After utilizing the JSON that was provided by Vincent55.
I made a working code with exception handling and with certain assumptions.
Working Code:
## Assuming that the target data is always under sections[i].payload
from json import loads
line = open("data.json").read()
data = loads(line)["sections"]
for x in data:
try:
# With assumption that there is only one payload
if x["payload"][0]["url"]:
print(x["payload"][0]["url"])
except KeyError:
pass

Need read some data from JSON

I need to make a get (id, name, fraction id) for each deputy in this json
{
"id": "75785",
"title": "(за основу)",
"asozdUrl": null,
"datetime": "2011-12-21T12:20:26+0400",
"votes": [
{
"deputy": {
"id": "99111772",
"name": "Абалаков Александр Николаевич",
"faction": {
"id": "72100004",
"title": "КПРФ"
}
},
"result": "accept"
},
{
"deputy": {
"id": "99100491",
"name": "Абдулатипов Рамазан Гаджимурадович",
"faction": {
"id": "72100024",
"title": "ЕР"
}
},
"result": "none"
}
.......,` etc
My code is looks like that:
urlData = "https://raw.githubusercontent.com/data-dumaGovRu/vote/master/poll/2011-12-21/75785.json"
response = urllib.request.urlopen(urlData)
content = response.read()
data = json.loads(content.decode("utf8"))
for i in data:
#print(data["name"])
`
And i dont know what to do with that #print line, how I should write it?
You can access the list containing the deputies with data['votes']. Iterating through the list, you can access the keys you're interested in as you would with dict key lookups. Nested dicts imply you have to walk through the keys starting from the root to your point of interest:
for d in data['votes']:
print(d['deputy']['id'], d['deputy']['name'], d['deputy']['faction']['id'])

Connecting many json files to one

i get many json strings from a mysql DB an should combine them.
For example:
{
"type": "device",
"name": "Lampe",
"controls": [
{
"type": "switch",
"name": "Betrieb",
"topic": "/lampe/schalter"
}
]
}
in combination this devices should get into a array of a json file
{
"name": "Test-System",
"devices": [
{
"type": "device",
"name": "Lampe",
"controls": [
{
"type": "switch",
"name": "Betrieb",
"topic": "/lampe/schalter"
}
]
},
{
other Device
}
]
}
i do not understand how to do this in python
does someone have a idea how to do it ?
The json module can be used.
#!/usr/bin/env python3.5
import json
# Parse each device JSON file.
device1 = json.load(open("device-switch-Lampe.json"))
device2 = json.load(open("device-sensor-Wert.json"))
# more devices ...
obj = {"name": "Test-System", "devices": [device1, device2]}
print(json.dumps(obj))
Output (prettified):
{
"devices": [{
"type": "device",
"controls": [{
"type": "switch",
"topic": "/lampe/schalter",
"name": "Betrieb"
}],
"name": "Lampe"
}, {
"type": "device",
"controls": [{
"type": "sensor",
"topic": "/sensor/wert",
"name": "Wert"
}],
"name": "Sensor"
}],
"name": "Test-System"
}
There are two ways you could do this - by working on strings, or by working with Python-JSON data structures. The former would be something like
# untested code
s = '''{
"name": "Test-System",
"devices": [ '''
while True:
j = get_json_from_DB()
if not j: break # null string or None
s = s + j + ',\n'
s = s[:-2] + ']\n}\n' #[:-2 loses the last ',\n' from the loop
Or if you want to work with Python loaded-JSON then
import json
# untested code
s = {
"name": "Test-System",
"devices": []
}
while True:
j = get_json_from_DB()
if not j: break # null string or None
s['devices'].append( json.loads(j) )
# str = json.dumps(s) # ought to be valid
This latter will validate all your incoming json-strings (json.loads() will throw an exception for any bad JSON) and will be more efficient for large numbers of devices. It's therefore to be preferred unless you are working in a RAM-constrained embedded system with small numbers of devices, where the greater memory footprint of the latter is a problem.

Issues decoding Collections+JSON in Python

I've been trying to decode a JSON response in Collections+JSON format using Python for a while now but I can't seem to overcome a small issue.
First of all, here is the JSON response:
{
"collection": {
"href": "http://localhost:8000/social/messages-api/",
"items": [
{
"data": [
{
"name": "messageID",
"value": 19
},
{
"name": "author",
"value": "mike"
},
{
"name": "recipient",
"value": "dan"
},
{
"name": "pm",
"value": "0"
},
{
"name": "time",
"value": "2015-03-31T15:04:01.165060Z"
},
{
"name": "text",
"value": "first message"
}
]
}
],
"version": "1.0",
"links": []
}
}
And here is how I am attempting to extract data:
response = urllib2.urlopen('myurl')
responseData = response.read()
jsonData = json.loads(responseData)
test = jsonData['collection']['items']['data']
When I run this code I get the error:
list indices must be integers, not str
If I use an integer, e.g. 0, instead of a string it merely shows 'data' instead of any useful information, unlike if I were to simply output 'items'. Similarly, I can't seem to access the data within a data child, for example:
test = jsonData['collection']['items'][0]['name']
This will argue that there is no element called 'name'.
What is the proper method of accessing JSON data in this situation? I would also like to iterate over the collection, if that helps.
I'm aware of a package that can be used to simplify working with Collections+JSON in Python, collection-json, but I'd rather be able to do this without using such a package.

Categories

Resources