Rest API iteration until number of records reached

Rest API iteration until number of records reached - python

Trying to get the api iteration until it pulls the whole records. Any idea/hits would be really appreciated. it returns 5000 records default max per api call and there are almost 30000 rows in account object.
As per doc- more than 5000 records that can be fetched, pass another API call with Offset as 5001 so that remaining records (maximum 5000 records again) are fetched
import requests
import json
url = 'https://xyzabc.com/account'
headers = {'content-type': 'application/json','Accesskey': '1234'}
body = {"select": [
"accountid",
"accountname",
"location"],
"offset" :0}
response = requests.post(url, data=json.dumps(body), headers=headers)
account = response.json()

As your offset is where you are, you can do it in a loop like this:
url = 'https://xyzabc.com/account'
headers = {'content-type': 'application/json','Accesskey': '1234'}
# Please check if you have a better way to get total number from your API specs,
# then specify it - that may need a separate API call.
total_records = 1000000000
# Get the results of your API calls into the list
accounts = []
# Go from 0 to total_records every 5000 records
try:
for i in range(0, total_records, 5000):
body = {"select": ["accountid",
"accountname",
"location"],
"offset" :i}
response = requests.post(url, data=json.dumps(body), headers=headers)
accounts.append(response.json())
except Exception as e:
print(f"Connection error - {e}") # Handle it your way
for account in accounts:
# Your logic for every account fetched.

Related

Get IDs of all Facebook groups

I need to get all user's groups ID or links to those groups which he belong
The code is executing but nothing is writing down to a file
Has anyone a clue or diffret way to get groups id's
I need it to make an alghortim which will be adding specific ads on every group but i cant get the id's
import requests
# Replace ACCESS_TOKEN with a valid access token
ACCESS_TOKEN = "TOKEN_ACCCES"
# Set the endpoint URL
url = "https://graph.facebook.com/me/groups"
# Set the HTTP headers
headers = {
"Authorization": f"Bearer {ACCESS_TOKEN}",
"Content-Type": "application/json",
}
# Set the params for the request
params = {
"limit": 100, # The maximum number of groups to retrieve per request
}
# Initialize an empty list to store the group IDs
group_ids = []
# Flag to determine whether there are more groups to retrieve
more_groups = True
# Keep making requests until all groups have been retrieved
while more_groups:
# Send the request
response = requests.get(url, headers=headers, params=params)
# Check the response status code
if response.status_code != 200:
print("Failed to retrieve groups")
print(response.text)
break
# Extract the data from the response
data = response.json()
# Add the group IDs to the list
group_ids.extend([group["id"] for group in data["data"]])
# Check if there are more groups to retrieve
if "paging" in data and "next" in data["paging"]:
# Update the URL and params for the next request
url = data["paging"]["next"]
params = {}
else:
# No more groups to retrieve
more_groups = False
# Write the group IDs to a file
with open("group_ids.txt", "w") as f:
f.write(",".join(group_ids))
print("Done!")

Python - Loop through each page to get all records

I would like to retrieve all records (total 50,000) from an API endpoint. The endpoint only returns a maximum of 1000 records per page. Here's the function to get the records.
def get_products(token,page_number):
url = "https://example.com/manager/nexus?page={}&limit={}".format(page_number,1000)
header = {
"Authorization": "Bearer {}".format(token)
}
response = requests.get(url, headers=header)
product_results = response.json()
total_list = []
for result in product_results['Data']:
date = result['date']
price = result['price']
name = result['name']
total_list.append((date,price,name))
columns = ['date', 'price', 'name']
df = pd.DataFrame(total_list, columns=columns)
results = json.dumps(total_list)
return df, results
How can I loop through each page until the final record without hardcoding the page numbers? Currently, I'm hardcoding the page numbers as below for the first 2 pages to get 2000 records as a test.
for page_number in np.arange(1,3):
token = get_token()
product_df,product_json = get_products(token,page_number)
if page_number==1:
product_all=product_df
else:
product_all=pd.concat([product_all,product_df])
print(product_all)
Thank you.

I don't know about the behavior of the endpoint. Assuming when the page number is greater than the last page number, you would get an empty list instead. If that is the case, you could just check if the result is empty.
page_number = 1
token = get_token()
product_df, product_json = get_products(token,page_number)
product_all=product_df
while product_df.size:
page_number = page_number + 1
token = get_token()
product_df,product_json = get_products(token,page_number)
product_all=pd.concat([product_all,product_df])
print(product_all)
If you are sure there are 1000 records max per page, you could check if the result count is less than 1000 and stop the loop.

It depends on how your backend method: json GET's return.
page and limit are required. you may rewrite the json return all data. in stead of just every 1000.
num = int(50000/1000);
for i in range(1, num):
token = get_token()
product_df,product_json = get_products(token, i)
if i==1:
product_all=product_df
else:
product_all=pd.concat([product_all,product_df])
print(product_all)

Using "Next Page" in while loop

I am retrieving data from an API endpoint which only allows me to retrieve a maximum of 100 data points at a time. There is a "next page" field within the response which I could use to retrieve the next 100 data points and so on (there are about 70,000 in total) by plugging the next page url back into the GET request. How can I utilize a for loop or while loop to retrieve all the data available in the endpoint by automatically plugging the "next page" URL back into the get request?
Here is the code im using. The problem is when I execute the While loop I get the same response everytime because it is running on the first response instance. I can't think of the solution of how to adjust this.
response = requests.get(url + '/api/named_users?limit=100', headers=headers)
users = []
resp_json = response.json()
users.append(resp_json)
while resp_json.get('next_page') != '':
response = s.get(resp_json.get('next_page'), headers = headers)
resp_json = response.json()
users.append(resp_json)
To summarize: I want to take the "next page" URL in every response to get the next 100 data points and append it to a list each time until I have all the data fetched.

You can do it, with a recursive function.
For example something like this :
response = requests.get(url + '/api/named_users?limit=100', headers=headers)
users = []
resp_json = response.json()
users.append(resp_json)
users = next_page(resp_json.get('next_page'), users)
def next_page(url, users):
if url != '':
response = s.get(url, headers=headers)
resp_json = response.json()
users.append(resp_json)
if resp_json.get('next_page') != '':
return next_page(resp_json.get('next_page'), users)
return users
But in general, APIs return a total number of items and a number of items per request. So you can easily paginate and loop through all items.
Here is some pseudo-code :
for i in range(items_returned__per_request, total_number_of_items/items_returned__per_request):
response = s.get(resp_json.get('next_page'), headers=headers)
resp_json = response.json()
users.append(resp_json)

How to call and loop through paginated API using python

I am trying to call a python API to get a result set that is about 21,500 records with the PageSize limit or default at 4000 records. I also do not know the total pages and there is no "next_url" or "last_page_url" links. The only given is the total number of the results which is 21205 and than I can divide it by the PageSize limit of 4000 equaling 5.30125 pages.
There is 2 possible ways i am thinking just i am not sure how to put it in code.
First doing a while loop to see if result set = PageSize of 4000 than loop through another page.
Second is for each loop and if total pages is 5.3 make it round to a 6 to get all records and paginate through page =+1
Lastly I need to append all the records to ta pandas dataframe so i can export to a sql table.
Any help is greatly appreciated.
url = "https://api2.enquiresolutions.com/v3/?Id=XXXX&ListId=161585&PageSize=4000"
auth = { 'Ocp-Apim-Subscription-Key': 'XXX', 'Content-Type': 'application/json'}
params = {'PageNumber': page}
res = requests.get(url=url, headers=auth, params=params).json()
df = pd.DataFrame(res['result'])
total_result= df['total'][0]
total_pages = int(total_result) /4000
properties = json_normalize(df['individuals'],record_path=['properties'],meta=
['casenumber','individualid','type'])
properties['Data'] = properties.label.str.cat(properties.id,sep='_')
properties = properties.drop(['label','id'],axis=1)
pivotprop = properties.pivot(index='individualid', columns='Data', values='value')
data = pivotprop.reset_index()
data.to_sql('crm_Properties',con=engine, if_exists='append'

are you looking for something like this ? You just loop until the result size is less than 4000 and consolidate the datas in a list
url = "https://api2.enquiresolutions.com/v3/?Id=XXXX&ListId=161585&PageSize=4000"
auth = { 'Ocp-Apim-Subscription-Key': 'XXX', 'Content-Type': 'application/json'}
page = 0
params = {'PageNumber': page}
pages_remaining = True
full_res = []
while pages_remaining:
res = requests.get(url=url, headers=auth, params=params).json()
full_res.append(res['result'])
page += 4000
params = {'PageNumber' : page}
if not len(res['result']) == 4000:
pages_remaining = False

Wikipedia All-Pages API after 30 requests returns same pages titles

I am want to extract all Wikipedia titles via API.Each response contains continue key which is used to get next logical batch,but after 30 requests continue key starts to repeat it mean I am receiving same pages.
I have tried the following code above and Wikipedia documentation
https://www.mediawiki.org/wiki/API:Allpages
def get_response(self, url):
resp = requests.get(url=url)
return resp.json()
appcontinue = []
url = 'https://en.wikipedia.org/w/api.php?action=query&list=allpages&format=json&aplimit=500'
json_resp = self.get_response(url)
next_batch = json_resp["continue"]["apcontinue"]
url +='&apcontinue=' + next_batch
appcontinue.append(next_batch)
while True:
json_resp = self.get_response(url)
url = url.replace(next_batch, json_resp["continue"]["apcontinue"])
next_batch = json_resp["continue"]["apcontinue"]
appcontinue.append(next_batch)
I am expecting to receive more than 10000 unique continue keys as one response could contains max 500 Titles.
Wikipedia has 5,673,237 articles in English.
Actual response. I did more than 600 requests and there is only 30 unique continue keys.

json_resp["continue"] contains two pairs of values, one is apcontinue and the other is continue. You should add them both to your query. See https://www.mediawiki.org/wiki/API:Query#Continuing_queries for more details.
Also, I think it'll be easier to use the params parameter of request.get instead of manually replacing the continue values. Perhaps something like this:
import requests
def get_response(url, params):
resp = requests.get(url, params)
return resp.json()
url = 'https://en.wikipedia.org/w/api.php?action=query&list=allpages&format=json&aplimit=500'
params = {}
while True:
json_resp = get_response(url, params)
params = json_resp["continue"]
...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rest API iteration until number of records reached - python

Related

Get IDs of all Facebook groups

Python - Loop through each page to get all records

Using "Next Page" in while loop

How to call and loop through paginated API using python

Wikipedia All-Pages API after 30 requests returns same pages titles

Categories

Resources