How to print same dictionary object from multiple urls with grequest? - python

I have a list of URLs that all use the same json structure. I am trying to pull specific dictionary objects from all of the URLs at once with grequest. I am able to do it with one URL, though I am using request:
import requests
import json
main_api = 'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-1ST&type=both&depth=50'
json_data = requests.get(main_api).json()
Quantity = json_data['result']['buy'][0]['Quantity']
Rate = json_data['result']['buy'][0]['Rate']
Quantity_2 = json_data['result']['sell'][0]['Quantity']
Rate_2 = json_data['result']['sell'][0]['Rate']
print ("Buy")
print(Rate)
print(Quantity)
print ("")
print ("Sell")
print(Rate_2)
print(Quantity_2)
I want to be able to print what I printed above, for every URL. But I do not know where to begin. This is what I have so far:
import grequests
import json
urls = [
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-1ST&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-2GIVE&type=both&depth=50',
'https://bittrex.com/api/v1.1/public/getorderbook?market=BTC-ABY&type=both&depth=50',
]
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests)
I thought it would be something like print(response.json(['result']['buy'][0]['Quantity'] for response in responses)) but that does not work at all, and python returns the following: print(responses.json(['result']['buy'][0]['Quantity'] for response in responses)) AttributeError: 'list' object has no attribute 'json'. I am very new to python, and coding in general, and I would appreciate any help.

Your responses variable is a list of Response objects. If you simple print the list with
print(responses)
it gives you
[<Response [200]>, <Response [200]>, <Response [200]>]
the brackets [] tell you that this is a list and it contains three Responseobjects.
When you type responses.json(...) you are telling python to call the json() method on the list object. The list, however does not offer such a method, only the objects in the list have it.
What you need to do is access an element in the list and call the json() method on this element. This done by specifying the position of the list element you want to access like this:
print(responses[0].json()['result']['buy'][0]['Quantity'])
This will access the first element in the responses list.
Of course, it is not practical to access each list element individually if you want to output many items. That's why there are loops. Using a loop you can simply say: do this for each element in my list. This looks like this:
for response in responses:
print("Buy")
print(response.json()['result']['buy'][0]['Quantity'])
print(response.json()['result']['buy'][0]['Rate'])
print("Sell")
print(response.json()['result']['sell'][0]['Quantity'])
print(response.json()['result']['sell'][0]['Rate'])
print("----")
The for-each-loop executes the indented lines of code for each element in the list. The current element is available in the response variable.

Related

Input variable name as raw string into request in python

I am kind of very new to python.
I tried to loop through an URL request via python and I want to change one variable each time it loops.
My code looks something like this:
codes = ["MCDNDF3","MCDNDF4"]
#count = 0
for x in codes:
response = requests.get(url_part1 + str(codes) + url_part3, headers=headers)
print(response.content)
print(response.status_code)
print(response.url)
I want to have the url change at every loop to like url_part1+code+url_part3 and then url_part1+NEXTcode+url_part3.
Sadly my request badly formats the string from the variable to "%5B'MCDNDF3'%5D".
It should get inserted as a raw string each loop. I don't know if I need url encoding as I don't have any special chars in the request. Just change code to MCDNDF3 and in the next request to MCDNDF4.
Any thoughts?
Thanks!
In your for loop, the first line should be:
response = requests.get(url_part1 + x + url_part3, headers=headers)
This will work assuming url_part1 and url_part3 are regular strings. x is already a string, as your codes list (at least in your example) contains only strings. %5B and %5D are [ and ] URL-encoded, respectively. You got that error because you called str() on a single-membered list:
>>> str(["This is a string"])
"['This is a string']"
If url_part1 and url_part3 are raw strings, as you seem to indicate, please update your question to show how they are defined. Feel free to use example.com if you don't want to reveal your actual target URL. You should probably be calling str() on them before constructing the full URL.
You’re putting the whole list in (codes) when you probably want x.

Python web scraping nested dict key pairs - AttributeError

I'm attempting to scrape PGA stats from the API below.
url = 'https://statdata.pgatour.com/r/stats/current/02671.json?userTrackingId=exp=1594257225~acl=*~hmac=464d3dfcda2b2ccb384b77ac7241436f25b7284fb2eb0383184f48cbdff33cc4'
response = requests.get(url)
pga_stats = response.json()
I would like to only select the nested keys identified in this image. I've been able to traverse to the 'year' key with the below code, but I receive the following AttributeError for anything beyond that.
test = pga_stats.get('tours')[0].get('years')
(prints reduced dictionary)
test = pga_stats.get('tours')[0].get('years').get('stats')
'list' object has no attribute 'get'
My end goal is to write this player data to a csv file. Any suggestions would be greatly appreciated.
pga_stats.get('tours')[0].get('years') returns a list, not a dict. You actually want to use the get method on it's first element, like this:
test = pga_stats.get('tours')[0].get('years')[0].get('stats')

Can't Get Python To Parse JSON From Site

I'm trying to get my Python script to parse some data (the price) from a specific json file on a site, but I am unable to get it working.
It can extract the whole page fine, but it cannot extract certain data just by itself.
Here is the JSON I am trying to extract data from:
[{
"id": 1696146,
"name": "Genos",
"photo_url": "https://hobbydb-production.s3.amazonaws.com/processed_uploads/collectible_photo/collectible_photo/image/324461/1556082253-24867-7610/Genos_Vinyl_Art_Toys_60fb245b-1af9-4ad1-a5a2-c90d3e8291a6_medium.jpg",
"preorder": false,
"price": "$40.00",
"price_after_discount": "$40.00",
"seller_username": "BatmanPajamas",
"url": "https://www.hobbydb.com/marketplaces/2/cart/1696146"
}]
Here is the code I have got that allows me to get the entire json:
import urllib.request, json
withurllib.request.urlopen("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2") as url:
data = json.loads(url.read().decode())
print(data)
I have tried various pieces of code, but everytime I get:
TypeError: list indices must be integers or slices, not str
Any ideas how I can parse the price from this JSON?
The outer brackets ([]) indicate the response returns a list of items. So, you need to loop over the indices of the list, then you can access what you're trying to access. Here's how I do it with requests
import requests
resp = requests.get("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2")
#requests has built-in support for json, so no need to import json module
for product in resp.json():
print(product["price"])
To iterate over json array:
for item in data:
for keys in item.keys():
print(item[keys])
to display only price
for item in data:
print(item['price'])
I think the problem you are having is because this JSON object starts with an array (which will be a list once we load it as a Python object). First, you need to use the json library from the standard lib. Then, you have to access the object using the list index, then the dict keys.
Try this:
import urllib.request, json
with urllib.request.urlopen("https://www.hobbydb.com/api/collectibles/for_sale_search?limit=5&original_site_id=10748&market_id=2") as url:
data = json.loads(url.read().decode())
print(data)
toy = data[0]
price = toy['price']
Also, keep in mind that the with keyword creates a context for parsing the JSON data, so once your script moves on to code outside of this context, you won't be able to access your price variable any longer, so you might want to assign or set that value to to another variable created outside of that context.

Python http request and loop over contents of JSON

I'm trying to learn Python and have following problem:
I get an error while running this as it cannot see the 'name' attribute in data.
It works when I grab one by one items from JSON. However when I want to do it in a loop it fails.
I assume my error is wrong request. That it cannot read JSON correctly and see attributes.
import requests
import json
def main():
req = requests.get('http://pokeapi.co/api/v2/pokemon/')
print("HTTP Status Code: " + str(req.status_code))
print(req.headers)
json_obj = json.loads(req.content)
for i in json_obj['name']:
print(i)
if __name__ == '__main__':
main()
You want to access the name attribute of the results attribute in your json_object like this:
for pokemon in json_obj['results']:
print (pokemon['name'])
I was able to guess that you want to access the results keys because I have looked at the result of
json_obj.keys()
that is
dict_keys(['count', 'previous', 'results', 'next'])
Because all pokemons are saved in a list which is under keyword results, so you firstly need to get that list and then iterate over it.
for result in json_obj['results']:
print(result['name'])
A couple things: as soon mentioned, iterating through json_obj['name'] doesn't really make sense - use json_obj['results'] instead.
Also, you can use req.json() which is a method that comes with the requests library by default. That will turn the response into a dictionary which you can then iterate through as usual (.iteritems() or .items(), depending if you're using Python 2 or 3).

retrieved URLs, trouble building payload to use requests module

I'm a Python novice, thanks for your patience.
I retrieved a web page, using the requests module. I used Beautiful Soup to harvest a few hundred href objects (links). I used uritools to create an array of full URLs for the target pages I want to download.
I don't want everybody who reads this note to bombard the web server with requests, so I'll show a hypothetical example that is realistic for just 2 hrefs. The array looks like this:
hrefs2 = ['http://ku.edu/pls/WP040?PT001F01=910&pf7331=11',
'http://ku.edu/pls/WP040?PT001F01=910&pf7331=12']
If I were typing these into 100s of lines of code, I understand what to do in order to retrieve each page:
from lxml import html
import requests
url = 'http://ku.edu/pls/WP040/'
payload = {'PT001F01' : '910', 'pf7331' : '11')
r = requests.get(url, params = payload)
Then get the second page
payload = {'PT001F01' : '910', 'pf7331' : '12')
r = requests.get(url, params = payload)
And keep typing in payload objects. Not all of the hrefs I'm dealing with are sequential, not all of the payloads are different simply in the last integer.
I want to automate this and I don't see how to create the payloads from the hrefs2 array.
While fiddling with uritools, I find urisplit which can give me the part I need to parse into a payload:
[urisplit(x)[3] for x in hrefs2]
['PT001F01=910&pf7331=11',
'PT001F01=910&pf7331=12']
Each one of those has to be turned into a payload object and I don't understand what to do.
I'm using Python3 and I used uritools because that appears to be the standards-compliant replacement of urltools.
I fell back on shell script to get pages with wget, which does work, but it is so un-Python-ish that I'm asking here for what to do. I mean, this does work:
import subprocess
for i in hrefs2:
subprocess.call(["wget", i])
You can pass the full url to requests.get() without splitting up the parameters.
>>> requests.get('http://ku.edu/pls/WP040?PT001F01=910&pf7331=12')
<Response [200]>
If for some reason you don't want to do that, you'll need to split up the parameters some how. I'm sure there are better ways to do it, but the first thing that comes to mind is:
a = ['PT001F01=910&pf7331=11',
'PT001F01=910&pf7331=12']
# list to store all url parameters after they're converted to dicts
urldata = []
#iterate over list of params
for param in a:
data = {}
# split the string into key value pairs
for kv in param.split('&'):
# split the pairs up
b = kv.split('=')
# first part is the key, second is the value
data[b[0]] = b[1]
# After converting every kv pair in the parameter, add the result to a list.
urldata.append(data)
You could do this with less code but I wanted to be clear what was going on. I'm sure there is already a module somewhere out there that does this for you too.

Categories

Resources