Extracting specific keyvalue in dictionary - python

I am learning to use Python and Twitter API. The information of a user was saved as json file.
Basically the json file stored a list of dictionaries:
data = [{1}, {2}, {3}, {4}, {5}]
In each dictionary there are some information, f.ex.:
[
{
"created_at": "2018-04-28 13:12:07",
"favorite_count": 0,
"followers_count": 2,
"id_str": "990217093206310912",
"in_reply_to_screen_name": null,
"retweet_count": 0,
"screen_name": "SyerahMizi",
"text": "u can count on me like 123 \ud83d\ude0a\ud83d\udc6d"
},
{
"created_at": "2018-04-26 04:21:48",
"favorite_count": 0,
"followers_count": 2,
"id_str": "989358860937846785",
"in_reply_to_screen_name": null,
"retweet_count": 0,
"screen_name": "SyerahMizi",
"text": "Never give up"
},
]
I am simply trying to print only the "text" information in each dictionary but I kept getting an error
TypeError: list indices must be integers, not str
Here is what I have so far:
import json
with open('981452637_tweetlist.json') as json_file:
json_data = json.load(json_file)
lst = json_file['text'][0]
print lst
so any help or explanation as to what I need would be great. Thank you!

As error is suggesting, list indices must be 'int' not 'str'. You are mistakenly taking list as dictionary and dictionary as list. Correct code will be:
import json
with open('981452637_tweetlist.json') as json_file:
json_data = json.load(json_file)
lst = json_file[0]['text'] #change here
print lst

You can use a for-loop.
for d in json_data:
print(d["text"])
If you wanted it on the same line, you can instead make a list and append to it in the for-loop.

Related

JSON Parsing in Python - help extracting dictionaries inside a list

I've searched and there's a similar problem here but the solution states to fix the json. I really cant fix the json produced as its from a REST API.
{
"__metadata": {
"uri": "http://website:6405/biprws/v1/cmsquery?page=1&pagesize=50"
},
"first": {
"__deferred": {
"uri": "http://website:6405/biprws/v1/cmsquery?page=1&pagesize=50"
}
},
"last": {
"__deferred": {
"uri": "http://website:6405/biprws/v1/cmsquery?page=1&pagesize=50"
}
},
"entries": [
{
"SI_ID": 31543,
"SI_NAME": "Some Client",
"SI_PARENTID": 31414,
"SI_PATH": {
"SI_FOLDER_NAME1": "COR OPS",
"SI_FOLDER_ID1": 31414,
"SI_FOLDER_OBTYPE1": 1,
"SI_FOLDER_NAME2": "CLIENT",
"SI_FOLDER_ID2": 28178,
"SI_FOLDER_OBTYPE2": 1,
"SI_NUM_FOLDERS": 2
}
}
]
}
I need to be able to get the folder names from SI_PATH, but that is where I am having issues. I can access "entries" fine as the whole json is considered as a dict, but the problem is after. If I get "entries", its just a list with a len of 1
import json
data = json.load(open('file.json'))
print(type(data))
print(data['entries])
print(type(data['entries']))
Sample output below:
<class 'dict'>
<class 'list'>
[{'SI_ID': 31543, 'SI_NAME': 'Some Client', 'SI_PARENTID': 31414, 'SI_PATH': {'SI_FOLDER_NAME1': 'COR OPS', 'SI_FOLDER_ID1': 31414, 'SI_FOLDER_OBTYPE1': 1, 'SI_FOLDER_NAME2': 'CLIENT', 'SI_FOLDER_ID2': 28178, 'SI_FOLDER_OBTYPE2': 1, 'SI_NUM_FOLDERS': 2}}]
I can use pandas to put the 'entries' onto a DataFrame and pull in the SI_PATH values, but not sure how to access each of them.
f = pd.DataFrame(data['entries'])
print(f['SI_PATH'].values)
Output of this:
[{'SI_FOLDER_NAME1': 'COR OPS', 'SI_FOLDER_ID1': 31414, 'SI_FOLDER_OBTYPE1': 1, 'SI_FOLDER_NAME2': 'CLIENT', 'SI_FOLDER_ID2': 28178, 'SI_FOLDER_OBTYPE2': 1, 'SI_NUM_FOLDERS': 2}]
But unsure as to how to access the items individual from this point. If possible, really want to stick with just importing json.
Since there is only one item in the list that is data['entries']:
print(data['entries'][0]['SI_ID'])
Prints:
31543
since it is a list of dict, why not
for items in data['entries']:
print(items.get("SI_ID"))

python TypeError: string indices must be integers json

Can some one tell me what I am doing wrong ?I am Getting this error..
went through the earlier post of similar error. couldn't able to understand..
import json
import re
import requests
import subprocess
res = requests.get('https://api.tempura1.com/api/1.0/recipes', auth=('12345','123'), headers={'App-Key': 'some key'})
data = res.text
extracted_recipes = []
for recipe in data['recipes']:
extracted_recipes.append({
'name': recipe['name'],
'status': recipe['status']
})
print extracted_recipes
TypeError: string indices must be integers
data contains the below
{
"recipes": {
"47635": {
"name": "Desitnation Search",
"status": "SUCCESSFUL",
"kitchen": "eu",
"active": "YES",
"created_at": 1501672231,
"interval": 5,
"use_legacy_notifications": false
},
"65568": {
"name": "Validation",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1522583593,
"interval": 5,
"use_legacy_notifications": false
},
"47437": {
"name": "Gateday",
"status": "SUCCESSFUL",
"kitchen": "us-west",
"active": "YES",
"created_at": 1501411588,
"interval": 10,
"use_legacy_notifications": false
}
},
"counts": {
"total": 3,
"limited": 3,
"filtered": 3
}
}
You are not converting the text to json. Try
data = json.loads(res.text)
or
data = res.json()
Apart from that, you probably need to change the for loop to loop over the values instead of the keys. Change it to something the following
for recipe in data['recipes'].values()
There are two problems with your code, which you could have found out by yourself by doing a very minimal amount of debugging.
The first problem is that you don't parse the response contents from json to a native Python object. Here:
data = res.text
data is a string (json formatted, but still a string). You need to parse it to turn it into it's python representation (in this case a dict). You can do it using the stdlib's json.loads() (general solution) or, since you're using python-requests, just by calling the Response.json() method:
data = res.json()
Then you have this:
for recipe in data['recipes']:
# ...
Now that we have turned data into a proper dict, we can access the data['recipes'] subdict, but iterating directly over a dict actually iterates over the keys, not the values, so in your above for loop recipe will be a string ( "47635", "65568" etc). If you want to iterate over the values, you have to ask for it explicitly:
for recipe in data['recipes'].values():
# now `recipe` is the dict you expected

JSON parse Python

I have a json data that i got from VK.
{
"response": [{
"id": 156603484,
"name": "Equestria in the Space",
"screen_name": "equestriaspace",
"is_closed": 0,
"type": "group",
"is_admin": 1,
"admin_level": 3,
"is_member": 1,
"description": "Официально сообщество Equestria in the Space!",
"photo_50": "https://pp.userap...089/u0_mBSE4E34.jpg",
"photo_100": "https://pp.userap...088/O6vENP0IW_w.jpg",
"photo_200": "https://pp.userap...086/rwntMz6YwWM.jpg"
}]
}
So i wanted to print only "name" but when i did it it gave me an error
TypeError: list indices must be integers or slices, not str
My code is:
method_url = 'https://api.vk.com/method/groups.getById?'
data = dict(access_token=access_token, gid=group_id)
response = requests.post(method_url, data)
result = json.loads(response.text)
print (result['response']['name'])
Any idea how can i fix it? In google i found how to parse json with one array. But here is two or something
P.S dont beat me so much. I am new in Python, just learning
What sort of data structure is the value of the key response?
i.e. how would you get it if I gave you the following instead?
"response": [{
"id": 156603484,
"name": "Equestria in the Space",
"screen_name": "equestriaspace",
"is_closed": 0,
"type": "group",
"is_admin": 1,
"admin_level": 3,
"is_member": 1,
"description": "Официально сообщество Equestria in the Space!",
"photo_50": "https://pp.userap...089/u0_mBSE4E34.jpg",
"photo_100": "https://pp.userap...088/O6vENP0IW_w.jpg",
"photo_200": "https://pp.userap...086/rwntMz6YwWM.jpg"
},
{
"not_a_real_response": "just some garbage actually"
}]
You would need to pick out the first response in that array of responses. As nice people in the comments have already told you.
name = result['response'][0]['name']

What Am I Missing When Parsing this JSON Output

What am I missing when trying to parse this JSON output with Python? The JSON looks like this:
{
"start": 0,
"terms": [
"process_name:egagent.exe"
],
"highlights": [],
"total_results": 448,
"filtered": {},
"facets": {},
"results": [
{
"username": "SYSTEM",
"alert_type": "test"
},
{
"username": "SYSTEM2",
"alert_type": "test"
}
]
}
The Python I'm trying to use to access this is simple. I want to grab username, but everything I try throws an error. When it doesn't throw an error, I seem to get the letter of each one. So, if I do:
apirequest = requests.get(requesturl, headers=headers, verify=False)
readable = json.loads(apirequest.content)
#print readable
for i in readable:
print (i[0])
I get s, t, h, t, f, f, r, which are the first letters of each item. If I try i[1], I get the second letter of each item. When I try by name, say, i["start"], I get an error saying the string indices must be integers. I'm pretty confused and I am new to Python, but I haven't found anything on this yet. Please help! I just want to access the username fields, which is why I am trying to do the for loop. Thanks in advance!
Try this:
for i in readable["results"]:
print i["username"]
Load your json string:
import json
s = """
{
"start": 0,
"terms": [
"process_name:egagent.exe"
],
"highlights": [],
"total_results": 448,
"filtered": {},
"facets": {},
"results": [
{
"username": "SYSTEM",
"alert_type": "test"
},
{
"username": "SYSTEM2",
"alert_type": "test"
}
]
}
"""
And print username for every result:
print [res['username'] for res in json.loads(s)['results']]
Output:
[u'SYSTEM', u'SYSTEM2']
for i in readable will iterate i through each key in the readable dictionary. If you then print i[0], you are printing the first character of each key.
Given that you want the values associated with the "username" key in the entries of the list which is associated with the "results" key, you can get them like this:
for result in readable["results"]:
print (result["username"])
If readable is your JSON object (dict), you can access elements like you'd in every map, using their keys.
readable["results"][0]["username"]
Should give you "SYSTEM" string as result.
To print every username, do:
for result in readable["results"]:
print(result["username"])
If your JSON is a str object, you have to deserialize it with json.loads(readable) first.

SimpleJson handling of same named entities

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name
{
"status": "OK",
"usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
"url": "",
"language": "english",
"entities": [
{
"type": "Person",
"relevance": "0.33",
"count": "1",
"text": "Michael Jordan",
"disambiguated": {
"name": "Michael Jordan",
"subType": "Athlete",
"subType": "AwardWinner",
"subType": "BasketballPlayer",
"subType": "HallOfFameInductee",
"subType": "OlympicAthlete",
"subType": "SportsLeagueAwardWinner",
"subType": "FilmActor",
"subType": "TVActor",
"dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
"freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
"umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
"opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
"yago": "http://mpii.de/yago/resource/Michael_Jordan"
}
}
]
}
So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?
The rfc 4627 that defines application/json says:
An object is an unordered collection of zero or more name/value pairs
And:
The names within an object SHOULD be unique.
It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.
You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:
import simplejson as json
from collections import defaultdict
def multidict(ordered_pairs):
"""Convert duplicate keys values to lists."""
# read all values into lists
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
# unpack lists that have only 1 item
for k, v in d.items():
if len(v) == 1:
d[k] = v[0]
return dict(d)
print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
Example
text = """{
"type": "Person",
"subType": "Athlete",
"subType": "AwardWinner"
}"""
Output
{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}
The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
From rfc 2119:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
This is a known problam.
You can solve this by modify the duplicate key, or save him into array.
You can use this code if you want.
import json
def parse_object_pairs(pairs):
"""
This function get list of tuple's
and check if have duplicate keys.
if have then return the pairs list itself.
but if haven't return dict that contain pairs.
>>> parse_object_pairs([("color": "red"), ("size": 3)])
{"color": "red", "size": 3}
>>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
[("color": "red"), ("size": 3), ("color": "blue")]
:param pairs: list of tuples.
:return dict or list that contain pairs.
"""
dict_without_duplicate = dict()
for k, v in pairs:
if k in dict_without_duplicate:
return pairs
else:
dict_without_duplicate[k] = v
return dict_without_duplicate
decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)
str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'
data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)

Categories

Resources