I'm new to python (and coding in general), I've gotten this far but I'm having trouble. I'm querying against a web service that returns a json file with information on every employee. I would like to pull just a couple of attributes for each employee, but I'm having some trouble.
I have this script so far:
import json
import urllib2
req = urllib2.Request('http://server.company.com/api')
response = urllib2.urlopen(req)
the_page = response.read()
j = json.loads(the_page)
print j[1]['name']
The JSON that it returns looks like this...
{
"name": bill jones,
"address": "123 something st",
"city": "somewhere",
"state": "somestate",
"zip": "12345",
"phone_number": "800-555-1234",
},
{
"name": jane doe,
"address": "456 another ave",
"city": "metropolis",
"state": "ny",
"zip": "10001",
"phone_number": "555-555-5554",
},
You can see that with the script I can return the name of employee in index 1. But I would like to have something more along the lines of: print j[**0 through len(j)**]['name'] so it will print out the name (and preferably the phone number too) of every employee in the json list.
I'm fairly sure I'm approaching something wrong, but I need some feedback and direction.
Your JSON is the list of dict objects. By doing j[1], you are accessing the item in the list at index 1. In order to get all the records, you need to iterate all the elements of the list as:
for item in j:
print item['name']
where j is result of j = json.loads(the_page) as is mentioned in your answer
Slightly nicer for mass-conversions than repeated dict lookup is using operator.itemgetter:
from future_builtins import map # Only on Py2, to get lazy, generator based map
from operator import itemgetter
for name, phone_number in map(itemgetter('name', 'phone_number'), j):
print name, phone_number
If you needed to look up individual things as needed (so you didn't always need name or phone_number), then regular dict lookups would make sense, this just optimizes the case where you're always retrieving the same set of items by pushing work to builtin functions (which, on the CPython reference interpreter, are implemented in C, so they run a bit faster than hand-rolled code). Using a generator based map isn't strictly necessary, but it avoids making (potentially large) temporary lists when you're just going to iterate the result anyway.
It's basically just a faster version of:
for emp in j:
name, phone_number = emp['name'], emp['phone_number']
print name, phone_number
Related
I have a very heavily nested json file with multiple blocks inside it.
The following is an excerpt of the file, It has more than 6 levels of nesting like that
{
"title": "main questions",
"type": "static",
"value":
{
"title": "state your name",
"type": "QUESTION",
"locator": "namelocator",
}
}
If anyone can please help me to parse this in a way such that, i can find the title and locator when type = question(because the type may vary across different parts of the file)
and that too concurrently(sequential would kill the system considering the scale of the file)
I have been using the following code to get the values of title and locator separately
pip install jsonpath(in anaconda terminal)
from jsonpath import JSONPath
import json as js
data = js.load(f)# f is the path to .json file
JSONPath('$.[?(#.type== "QUESTION")].locator').parse(data)
JSONPath('$.[?(#.type== "QUESTION")].title').parse(data)
The problem is:
I am getting the list of locators and title, but its all jumbled since there is no way to know the sequence the function parses the file in
its been a while since I am stuck with this problem, and the only solution is going across the file to find all type==questions and then looping again to find the locators and titles(which is computationally not really feasible for a huge chunk of files)
The key is to parse once, and treat the objects you find as objects, so you group the correct title and locator together. They are easy to split if you need.
Here's a code sample demonstrating all the various answers I made in comments. I don't know what exact library you're using, but they all seem to implement the same JSONPath, so you can probably use this. Just change the function names and parameter order to fit whatever library you actually have.
from jsonpath import jsonpath
import json
text = """{
"title": "main questions",
"type": "static",
"value":
{
"title": "state your name",
"type": "QUESTION",
"locator": "namelocator"
}
}"""
# use jsonpath to find the question nodes
data = json.loads(text)
questions_parsed = jsonpath(obj=data, expr='$.[?(#.type== "QUESTION")]')
print (questions_parsed)
[{'title': 'state your name', 'type': 'QUESTION', 'locator': 'namelocator'}]
# python code to parse the same structure
def find_questions(data):
if isinstance(data, dict):
if 'type' in data and 'QUESTION' == data['type']:
# TODO: write a dataclass, or validate that it has title and locator
yield data
elif 'value' in data and isinstance(data['value'], dict):
value = data['value']
yield from find_questions(value)
elif isinstance(data, list):
for item in data:
yield from find_questions(item)
questions = [(question['title'], question['locator']) for question in find_questions(json.loads(text))]
Like I said, it's easy to split the one object into separate lists if you need them:
How to unzip a list of tuples into individual lists?
titles, locators = (list(t) for t in zip(*questions))
print(titles)
print(locators)
['state your name']
['namelocator']
I used this implementation:
pip show jsonpath
Name: jsonpath
Version: 0.82
Summary: An XPath for JSON
Home-page: http://www.ultimate.com/phil/python/#jsonpath
Author: Phil Budne
Author-email: phil#ultimate.com
License: MIT
I have a json whose first few lines are:
{
"type": "Topology",
"objects": {
"counties": {
"type": "GeometryCollection",
"bbox": [-179.1473399999999, 17.67439566600018, 179.7784800000003, 71.38921046500008],
"geometries": [{
"type": "MultiPolygon",
"id": 53073,
"arcs": [
[
[0, 1, 2]
]
]
},
I built a python dictionary from that data as follows:
import json
with open('us.json') as f:
data = json.load(f)
It's a very long json (each county in the US). Yet when I run: len(data) it returns 4. I was a bit confused by that. So I set out to probe further and explore the data:
data['id']
data['geometry']
both of which return key errors. Yet I know that this json file is defined for those properties. In fact, that's all the json is, its the id for each county 'id' and a series of polygon coordinates for each county 'geometry'. Entering data does indeed return the whole json, and I can see the properties that way, but that doesn't help much.
My ultimate aim is to add a property to the json file, somewhat similar to this:
Add element to a json in python
The difference is I'm adding a property that is from a tsv. If you'd like all the details you may find my json and tsv here:
https://gist.github.com/diggetybo/ca9d3c2fed76ddc7185cf966a65b8718
For clarity, let me summarize what I'm asking:
My question is: Why can't I access the properties in the above way? Can someone provide a way to access the properties I'm interested in ('id','geometries') Or better yet, demonstrate how to add a property?
Thank you
json.load
Deserialize fp (a .read()-supporting file-like object containing a
JSON document) to a Python object using this conversion table.
[] are for lists and {} are for dictionaries.So this is an example to get id:
with open("us.json") as f:
c=json.load(f)
for i in c["objects"]["counties"]["geometries"]:
print i["id"]
And the structure of your data is like this:
{
"type":"xx",
"objects":"xx",
"arcs":"xx",
"transform":"xx"
}
So the length of data is 4.You can append data or add a new element just like using list and dict.See more details from Json.
Hope this helps.
I've got the following data in a CSV file (a few hundred lines) that I'm trying to massage into sensible JSON to post into a rest api
I've gone with the bare minimum fields required, but here's what I've got:
dateAsked,author,title,body,answers.author,answers.body,topics.name,answers.accepted
13-Jan-16,Ben,Cant set a channel ,"Has anyone had any issues setting channels. it stays at �0�. It actually tells me there are �0� files.",Silvio,"I�m not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port. maybe �extended� clip names over 32 characters.
Please let me know if you find out!
Best regards.",club_k,TRUE
Here's a sample of JSON that is roughly like where I need to get to:
json_test = """{
"title": "Can I answer a question?",
"body": "Some text for the question",
"author": "Silvio",
"topics": [
{
"name": "club_k"
}
],
"answers": [
{
"author": "john",
"body": "I\'m not sure. I think you can leave the cable out. Please let me know if you find out! Best regards.",
"accepted": "true"
}
]
}"""
Pandas seems to import it into a dataframe okay (ish) but keeps telling me I can't serialize it to json - also need to clean it and sanitise, but that should be fairly easy to achieve within the script.
There must also be a way to do this in Pandas, but I'm beating my head against a wall here - as the columns for both answers and topics can't easily be merged together into a dict or a list in python.
You can use a csv.DictReader to process the CSV file as a dictionary for each row. Using the field names as keys, a new dictionary can be constructed that groups common keys into a nested dictionary keyed by the part of the field name after the .. The nested dictionary is held within a list, although it is unclear whether that is really necessary - the nested dictionary could probably be placed immediately under the top-level without requiring a list. Here's the code to do it:
import csv
import json
json_data = []
for row in csv.DictReader(open('/tmp/data.csv')):
data = {}
for field in row:
key, _, sub_key = field.partition('.')
if not sub_key:
data[key] = row[field]
else:
if key not in data:
data[key] = [{}]
data[key][0][sub_key] = row[field]
# print(json.dumps(data, indent=True))
# print('---------------------------')
json_data.append(json.dumps(data))
For your data, with the print() statements enabled, the output would be:
{
"body": "Has anyone had any issues setting channels. it stays at '0'. It actually tells me there are '0' files.",
"author": "Ben",
"topics": [
{
"name": "club_k"
}
],
"title": "Cant set a channel ",
"answers": [
{
"body": "I'm not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port. maybe 'extended' clip names over 32 characters. \nPlease let me know if you find out!\n Best regards.",
"accepted ": "TRUE",
"author": "Silvio"
}
],
"dateAsked": "13-Jan-16"
}
---------------------------
I'm sorry if this has been answered (I looked and did not find anything.) Please let me know and I will delete immediately.
I am writing a program that makes an API call which returns a multiple lists of different length depending on the call (e.g. facebook API call. Enter the persons name and a list of pictures is returned and each picture has a list of of who "liked" each photo. I want to store a list of a list of these "likes").
#Import urllib for API request
import urllib.request
import urllib.parse
#First I have a function that takes two arguments, first and last name
#Function will return a list of all photos the person has been tagged in facebook
def id_list_generator(first,last):
#Please note I don't actually know facebook API, this part wil not be reproducible
pic_id_request = urllib.request.open('www.facebook.com/pics/id/term={first}+{last}[person]')
pic_id_list = pic_id_request.read()
for i in pic_id_list:
id_list.append(i)
return(id_list)
#Now, for each ID of a picture, I will generate a list of people who "liked" that picture.
#This is where I have trouble. I don't know how to store these list of lists.
for i in id_list:
pic_list = urllib.request.open('www.facebook.com/pics/id/like/term={i}[likes]')
print pic_list
This would print multiple lists of "likes" for each picture the person was tagged in:
foo, bar
bar, baz
baz, foo, qux
norf
I don't really know how to store these honestly.
I was thinking of using a list that would look like this after appending:
foo = [["foo", "bar"], ["bar","baz"],["baz","foo","qux"],["norf"]]
But really I'm not sure what type of storage to use in this case. I thought of using a dictionary of a dictionary, but I don't know if the key can be iterable. I feel like there is a simple answer to this that I am missing.
Well, you could have a list of dictionaries:
Here's an example:
facebook_likes = [{
"first_name": "John",
"last_name": "Smith",
"image_link": "link",
"likes": ["foo"]
}, {
"first_name": "John",
"last_name": "Doe",
"image_link": "link",
"likes": ["foo", "bar"]
}]
for like in facebook_likes:
print like
print like["likes"]
print like["likes"][0]
You should also look into JSON objects.
Its one of the standard response objects that you get after making API calls.
Fortunately, its very simple to transform a Python dict into a JSON object and vice versa.
If you just want to sort by the first element in each list, Python does that by default for 2D lists. Refer to this thread: Python sort() first element of list
I started reading about underscore.js today, it is a library for javascript that adds some functional programming goodies I'm used to using in Python. One pretty cool shorthand method is pluck.
Indeed in Python I often need to pluck out some specific attribute, and end up doing this:
users = [{
"name" : "Bemmu",
"uid" : "297200003"
},
{
"name" : "Zuck",
"uid" : "4"
}]
uids = map(lambda x:x["uid"], users)
If the underscore shorthand is somewhere in Python, this would be possible:
uids = pluck(users, "uid")
It's of course trivial to add, but is that in Python somewhere already?
Just use a list comprehension in whatever function is consuming uids:
instead of
uids = map(operator.itemgetter("uid"), users)
foo(uids)
do
foo([x["uid"] for x in users])
If you just want uids to iterate over, you don't need to make a list -- use a generator instead. (Replace [] with ().)
For example:
def print_all(it):
""" Trivial function."""
for i in it:
print i
print_all(x["uid"] for x in users)
From funcy module (https://github.com/Suor/funcy) you can pick pluck function.
In this case, provided that funcy is available on your host, the following code should work as expected:
from funcy import pluck
users = [{
"name" : "Bemmu",
"uid" : "297200003"
},
{
"name" : "Zuck",
"uid" : "4"
}]
uids = pluck("uid", users)
Pay attention to the fact that the order of arguments is different from that used with underscore.js