Python web scraping nested dict key pairs - AttributeError - python

I'm attempting to scrape PGA stats from the API below.
url = 'https://statdata.pgatour.com/r/stats/current/02671.json?userTrackingId=exp=1594257225~acl=*~hmac=464d3dfcda2b2ccb384b77ac7241436f25b7284fb2eb0383184f48cbdff33cc4'
response = requests.get(url)
pga_stats = response.json()
I would like to only select the nested keys identified in this image. I've been able to traverse to the 'year' key with the below code, but I receive the following AttributeError for anything beyond that.
test = pga_stats.get('tours')[0].get('years')
(prints reduced dictionary)
test = pga_stats.get('tours')[0].get('years').get('stats')
'list' object has no attribute 'get'
My end goal is to write this player data to a csv file. Any suggestions would be greatly appreciated.

pga_stats.get('tours')[0].get('years') returns a list, not a dict. You actually want to use the get method on it's first element, like this:
test = pga_stats.get('tours')[0].get('years')[0].get('stats')

Related

how to get nested data with pandas and request

I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.

From an array of objects, print a list of only one object property (Python)

This is my first time using Python and I'm tasked with the following: print a list of cities from this JSON: http://jsonplaceholder.typicode.com/users
I'm trying to print out a list that should read:
Gwenborough
Wisokyburgh
McKenziehaven
South Elvis
etc.
This is the code I have so far:
import json
import requests
response = requests.get("https://jsonplaceholder.typicode.com/users")
users = json.loads(response.text)
print(users)
When I run $python3 -i api.py (file is named api.py) I'm able to print the list from the JSON file in my terminal. However I have been stuck trying to figure out how to print the cities only. I'm assuming it would look something like users.address.city but any attempt at figuring out the code has resulted in the following error: AttributeError: 'list' object has no attribute 'address'.
Any help you could provide would be greatly appreciated. Thanks!
As users is a list, it should be:
print(users[0]['address']['city'])
This is how you can access nested properties in JSON response.
You can also loop over the users and print their city in the same format.
for user in users:
print(user['address']['city'])
You can get city name with user['address']['city']
and use loop to get all city names
like this
for user in users:
print(user['address']['city'])
output :
Gwenborough
Wisokyburgh
McKenziehaven
South Elvis
Roscoeview
South Christy
Howemouth
Aliyaview
Bartholomebury
Lebsackbury
[Program finished]
first of all i get this, why your loading(response.text) , instead requests package has a built in .json() method which is what you want to access nested data . so you could do something like this
response = requests.get("https://jsonplaceholder.typicode.com/users")
data = response.json()
# optional
print(data)
* loop through the addresses to get all the cities
for dt in data['address']:
# do what you want with the data returned

Can't Get Python To Parse JSON From Site

I'm trying to get my Python script to parse some data (the price) from a specific json file on a site, but I am unable to get it working.
It can extract the whole page fine, but it cannot extract certain data just by itself.
Here is the JSON I am trying to extract data from:
[{
"id": 1696146,
"name": "Genos",
"photo_url": "https://hobbydb-production.s3.amazonaws.com/processed_uploads/collectible_photo/collectible_photo/image/324461/1556082253-24867-7610/Genos_Vinyl_Art_Toys_60fb245b-1af9-4ad1-a5a2-c90d3e8291a6_medium.jpg",
"preorder": false,
"price": "$40.00",
"price_after_discount": "$40.00",
"seller_username": "BatmanPajamas",
"url": "https://www.hobbydb.com/marketplaces/2/cart/1696146"
}]
Here is the code I have got that allows me to get the entire json:
import urllib.request, json
withurllib.request.urlopen("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2") as url:
data = json.loads(url.read().decode())
print(data)
I have tried various pieces of code, but everytime I get:
TypeError: list indices must be integers or slices, not str
Any ideas how I can parse the price from this JSON?
The outer brackets ([]) indicate the response returns a list of items. So, you need to loop over the indices of the list, then you can access what you're trying to access. Here's how I do it with requests
import requests
resp = requests.get("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2")
#requests has built-in support for json, so no need to import json module
for product in resp.json():
print(product["price"])
To iterate over json array:
for item in data:
for keys in item.keys():
print(item[keys])
to display only price
for item in data:
print(item['price'])
I think the problem you are having is because this JSON object starts with an array (which will be a list once we load it as a Python object). First, you need to use the json library from the standard lib. Then, you have to access the object using the list index, then the dict keys.
Try this:
import urllib.request, json
with urllib.request.urlopen("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2") as url:
data = json.loads(url.read().decode())
print(data)
toy = data[0]
price = toy['price']
Also, keep in mind that the with keyword creates a context for parsing the JSON data, so once your script moves on to code outside of this context, you won't be able to access your price variable any longer, so you might want to assign or set that value to to another variable created outside of that context.

Python http request and loop over contents of JSON

I'm trying to learn Python and have following problem:
I get an error while running this as it cannot see the 'name' attribute in data.
It works when I grab one by one items from JSON. However when I want to do it in a loop it fails.
I assume my error is wrong request. That it cannot read JSON correctly and see attributes.
import requests
import json
def main():
req = requests.get('http://pokeapi.co/api/v2/pokemon/')
print("HTTP Status Code: " + str(req.status_code))
print(req.headers)
json_obj = json.loads(req.content)
for i in json_obj['name']:
print(i)
if __name__ == '__main__':
main()
You want to access the name attribute of the results attribute in your json_object like this:
for pokemon in json_obj['results']:
print (pokemon['name'])
I was able to guess that you want to access the results keys because I have looked at the result of
json_obj.keys()
that is
dict_keys(['count', 'previous', 'results', 'next'])
Because all pokemons are saved in a list which is under keyword results, so you firstly need to get that list and then iterate over it.
for result in json_obj['results']:
print(result['name'])
A couple things: as soon mentioned, iterating through json_obj['name'] doesn't really make sense - use json_obj['results'] instead.
Also, you can use req.json() which is a method that comes with the requests library by default. That will turn the response into a dictionary which you can then iterate through as usual (.iteritems() or .items(), depending if you're using Python 2 or 3).

How to print same dictionary object from multiple urls with grequest?

I have a list of URLs that all use the same json structure. I am trying to pull specific dictionary objects from all of the URLs at once with grequest. I am able to do it with one URL, though I am using request:
import requests
import json
main_api = 'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-1ST&type=both&depth=50'
json_data = requests.get(main_api).json()
Quantity = json_data['result']['buy'][0]['Quantity']
Rate = json_data['result']['buy'][0]['Rate']
Quantity_2 = json_data['result']['sell'][0]['Quantity']
Rate_2 = json_data['result']['sell'][0]['Rate']
print ("Buy")
print(Rate)
print(Quantity)
print ("")
print ("Sell")
print(Rate_2)
print(Quantity_2)
I want to be able to print what I printed above, for every URL. But I do not know where to begin. This is what I have so far:
import grequests
import json
urls = [
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-1ST&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-2GIVE&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-ABY&type=both&depth=50',
]
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests)
I thought it would be something like print(response.json(['result']['buy'][0]['Quantity'] for response in responses)) but that does not work at all, and python returns the following: print(responses.json(['result']['buy'][0]['Quantity'] for response in responses)) AttributeError: 'list' object has no attribute 'json'. I am very new to python, and coding in general, and I would appreciate any help.
Your responses variable is a list of Response objects. If you simple print the list with
print(responses)
it gives you
[<Response [200]>, <Response [200]>, <Response [200]>]
the brackets [] tell you that this is a list and it contains three Responseobjects.
When you type responses.json(...) you are telling python to call the json() method on the list object. The list, however does not offer such a method, only the objects in the list have it.
What you need to do is access an element in the list and call the json() method on this element. This done by specifying the position of the list element you want to access like this:
print(responses[0].json()['result']['buy'][0]['Quantity'])
This will access the first element in the responses list.
Of course, it is not practical to access each list element individually if you want to output many items. That's why there are loops. Using a loop you can simply say: do this for each element in my list. This looks like this:
for response in responses:
print("Buy")
print(response.json()['result']['buy'][0]['Quantity'])
print(response.json()['result']['buy'][0]['Rate'])
print("Sell")
print(response.json()['result']['sell'][0]['Quantity'])
print(response.json()['result']['sell'][0]['Rate'])
print("----")
The for-each-loop executes the indented lines of code for each element in the list. The current element is available in the response variable.

Categories

Resources