Using "Next Page" in while loop - python

I am retrieving data from an API endpoint which only allows me to retrieve a maximum of 100 data points at a time. There is a "next page" field within the response which I could use to retrieve the next 100 data points and so on (there are about 70,000 in total) by plugging the next page url back into the GET request. How can I utilize a for loop or while loop to retrieve all the data available in the endpoint by automatically plugging the "next page" URL back into the get request?
Here is the code im using. The problem is when I execute the While loop I get the same response everytime because it is running on the first response instance. I can't think of the solution of how to adjust this.
response = requests.get(url + '/api/named_users?limit=100', headers=headers)
users = []
resp_json = response.json()
users.append(resp_json)
while resp_json.get('next_page') != '':
response = s.get(resp_json.get('next_page'), headers = headers)
resp_json = response.json()
users.append(resp_json)
To summarize: I want to take the "next page" URL in every response to get the next 100 data points and append it to a list each time until I have all the data fetched.

You can do it, with a recursive function.
For example something like this :
response = requests.get(url + '/api/named_users?limit=100', headers=headers)
users = []
resp_json = response.json()
users.append(resp_json)
users = next_page(resp_json.get('next_page'), users)
def next_page(url, users):
if url != '':
response = s.get(url, headers=headers)
resp_json = response.json()
users.append(resp_json)
if resp_json.get('next_page') != '':
return next_page(resp_json.get('next_page'), users)
return users
But in general, APIs return a total number of items and a number of items per request. So you can easily paginate and loop through all items.
Here is some pseudo-code :
for i in range(items_returned__per_request, total_number_of_items/items_returned__per_request):
response = s.get(resp_json.get('next_page'), headers=headers)
resp_json = response.json()
users.append(resp_json)

Related

get request payload in python

my code is about send get request using query parameters which depends on a page number
After that o have to do for loop to get some ids from the response and also getting the next page number of the same response
and send a new get request with the new next page number that I got from the first response, and I need to get the ids also from the new response
My code works fine , but I’m using two loop which it’s not the right way I think? I couldn’t do it with one loop any ideas?
def get():
response = requests.get(url, headers=header)
data = response.text
data = json.loads(data)
check_if_theres_next_page = data['pagination']['hasMorePages']
check_for_next_page_number = data['pagination']['nextPage']
last_page_number = data['pagination']['lastPage']
orders = data['orders']
list_of_ids = []
for manufacturingOrderId in orders:
ids = manufacturingOrderId['manufacturingOrderId']
list_of_ids.append(ids)
if check_for_next_page_number == 4:
check_for_next_page_number = last_page_number
if check_if_theres_next_page:
url_ = url + '&page_number=' + str(check_for_next_page_number)
response = requests.get(url_, headers=header)
data = response.text
data = json.loads(data)
orders = data['orders']
for manufacturingOrderId_ in orders:
ids = manufacturingOrderId_['manufacturingOrderId']
list_of_ids.append(ids)
if "nextPage" in data['pagination']:
check_for_next_page_number = data['pagination']['nextPage']
else:
check_if_theres_next_page = False
return list_of_ids

How to format a new function to use each part of an array in a request URL

response = requests.get('requestURL', params=params, cookies=cookies, headers=headers)
response_data_ids = response.json()
listing_ids = [result['id'] for result in response_data_ids['results']]
print("got listing ids")
def check_listing_price(listing_ids,detuction_total):
detuction_total == 100
response = requests.post(f'apiurl.com/me/stock/{result['id']}' for result in response_data['results']]', cookies=cookies, headers=headers)
response_data_price = response.json()
price = [result['priceCents'] for result in response_data_ids['results']]
updated_price = price - detuction_total
I got the array to be saved when formatting the id from all results but I want to make a loop where each [0], [1], [2,] etc from the array is used in the request url then I have another result to pull the price cents which I then want to subtract deduction_total from then I want it to be used in another function (I have the requests working with the api URL I just am trying to figure out the proper syntax/formatting) Also if possible create a new function def get_listing_ids to format everything correctly? or is this not needed

Wikipedia All-Pages API after 30 requests returns same pages titles

I am want to extract all Wikipedia titles via API.Each response contains continue key which is used to get next logical batch,but after 30 requests continue key starts to repeat it mean I am receiving same pages.
I have tried the following code above and Wikipedia documentation
https://www.mediawiki.org/wiki/API:Allpages
def get_response(self, url):
resp = requests.get(url=url)
return resp.json()
appcontinue = []
url = 'https://en.wikipedia.org/w/api.php?action=query&list=allpages&format=json&aplimit=500'
json_resp = self.get_response(url)
next_batch = json_resp["continue"]["apcontinue"]
url +='&apcontinue=' + next_batch
appcontinue.append(next_batch)
while True:
json_resp = self.get_response(url)
url = url.replace(next_batch, json_resp["continue"]["apcontinue"])
next_batch = json_resp["continue"]["apcontinue"]
appcontinue.append(next_batch)
I am expecting to receive more than 10000 unique continue keys as one response could contains max 500 Titles.
Wikipedia has 5,673,237 articles in English.
Actual response. I did more than 600 requests and there is only 30 unique continue keys.
json_resp["continue"] contains two pairs of values, one is apcontinue and the other is continue. You should add them both to your query. See https://www.mediawiki.org/wiki/API:Query#Continuing_queries for more details.
Also, I think it'll be easier to use the params parameter of request.get instead of manually replacing the continue values. Perhaps something like this:
import requests
def get_response(url, params):
resp = requests.get(url, params)
return resp.json()
url = 'https://en.wikipedia.org/w/api.php?action=query&list=allpages&format=json&aplimit=500'
params = {}
while True:
json_resp = get_response(url, params)
params = json_resp["continue"]
...

handle url pagination with python generator

Currently I'm fetching only first page from the server, part of the json is
{"status":"success","count":100,"total":22188,"next":"https://pimber.ly/api/v2/products/?sinceId=5981e16fcde47c0854dc540b","previous":"https://pimber.ly/api/v2/products/?maxId=5981e01dcde47c0854dc4afd","sinceId":"5981e01dcde47c0854dc4afd","maxId":"5981e16fcde47c0854dc540b","data":[.....]}
and the function is:
_fetch_data = response.json()
while _fetch_data['next'] is not None:
response = requests.get(
url=API_DOMAIN',
headers=headers
)
_page_data = response.json()['data']
for _data in _page_data:
yield _data
Current state of the function is only processing the first page, and it will just do that forever, so how can i fix the function to check next so can fetch total data?
I guess it should be
_fetch_data = response.json()
while _fetch_data['next'] is not None:
response = requests.get(_fetch_data['next'], headers=headers)
_fetch_data = response.json()
for _data in fetch_data['data']:
yield _data

How to read the next page on API using python?

I need help on how to do a loop so each time I make a GET request, it will always be the new page from the API.
I start with getting the first response. It includes a parameter to the next page next_key
{
"result": [
{
...,
...
}
],
"next_key": 123
}
Below is my current attempt
import requests
import json
url = "https://flespi.io/gw/channels/all/messages"
headers = {"Authorization": "FlespiToken 23ggh45"}
def getFirst():
data = {"limit_count":100, "limit_size":10000}
params = {"data":json.dumps(data, separators=(",", ":"))}
reqFirst = requests.get(url, params=params, headers=headers).json()
return reqFirst["next_key"] ## this returns "123"
def getDataNext():
data = {"limit_count":100, "limit_size":10000, "curr_key":getFirst()}
params = {"data":json.dumps(data, separators=(",", ":"))}
reqNext = requests.get(url, params=params, headers=headers)
jsonData = reqNext.json()
while True:
if "next_key" in jsonData:
data = {"limit_count":100, "limit_size":10000,"curr_key":jsonData["next_key"]}
params = {"data":json.dumps(data, separators=(",", ":"))}
req = requests.get(url, params=params, headers=headers).json() ## this should do GET request for the third page and so on...
print req["next_key"] # this returns "3321" which is the value for "next_key" in second page
else:
pass
getDataNext()
The full url including limit count, limit size and curr key is as follows https://flespi.io/gw/channels/all/messages?data=%7B%22curr_key%22%123%2C%22limit_count%22%3A100%2C%22limit_size%22%3A10000%7D
As you can see this only returns the second page that is jsonData["next_key"]. What I want to do is that for each GET request, the program will read the next_key and put it on the next GET request.
I am thinking to use increment on the curr_key but the key is random and also I do not know how many page there is.
I believe there must be just a simple solution for this but apparently I could not think about it. Thank you for your help and suggestion.
try this
has_next_key = False
nextKey = ""
if "next_key" in jsonData:
has_next_key = True
nextKey = jsonData["next_key"]
while has_next_key:
data = {"limit_count":100, "limit_size":10000,"curr_key":nextKey}
params = {"data":json.dumps(data, separators=(",", ":"))}
req = requests.get(url, params=params, headers=headers).json() ## this should do GET request for the third page and so on...
if "next_key" in req:
nextKey = req["next_key"]
print nextKey # this returns "3321" which is the value for "next_key" in second page
else:
has_next_key = False
# no next_key, stop the loop

Categories

Resources