I am trying to call paginated API through loop. It looks like below. API doc is new and does not tell anything about next,offset or anything. Tried adding {page:page} in param but still returning the first 100 rows. Any help.
headers = {'key':'key'}
data = requests.get(url,headers=headers).json()
Sample response:
{'results': []
'page': 1,
'results_per_page': 100,
'total_results': 25000}
Related
I have a script setup to pull a JSON from an API and I need to convert objects into different columns for a single row layout for a SQL server. See the example below for the body raw layout of an example object:
"answers": {
"agent_star_rating": {
"question_id": 145,
"question_text": "How satisfied are you with the service you received from {{ employee.first_name }} today?",
"comment": "John was exceptionally friendly and knowledgeable.",
"selected_options": {
"1072": {
"option_id": 1072,
"option_text": "5",
"integer_value": 5
}
}
},
In said example I need the output for all parts of agent_star_rating to be individual columns so all data spits out 1 row for the entire survey on our SQL server. I have tried mapping several keys like so:
agent_star_rating = [list(response['answers']['agent_star_rating']['selected_options'].values())[0]['integer_value']]
agent_question = (response['answers']['agent_star_rating']['question_text'])
agent_comment = (response['answers']['agent_star_rating']['comment'])
response['agent_question'] = agent_question
response['agent_comment'] = agent_comment
response['agent_star_rating'] = agent_star_rating
I get the expected result until we reach a point where some surveys have skipped a field like ['question text'] and we'll get a missing key error. This happens over the course of other objects and I am failing to come up with a solution for these missing keys. If there is a better way to format the output as I've described beyond the keys method I've used I'd also love to hear ideas! I'm fresh to learning python/pandas so pardon any improper terminology!
I would do something like this:
# values that you always capture
row = ['value1', 'value2', ...]
gottem_attrs = {'question_id': '' ,
'question_text': '',
'comment': '',
'selected_options': ''}
# find and save the values that response have
for attr in list(response['agent_star_rating']):
gottem_attrs[attr] = response['agent_star_rating'][attr]
# then you have your final row
final_row = row + gottem_attrs.values()
If the response have a value in his attribute, this code will save it. Else, it will save a empty string for that value.
I am hoping someone can help me solve this problem I am having with a nested JSON response. I have been trying to crack this for a few weeks now with no success.
Using a sites API I am trying to create a dictionary which can hold three pieces of information, for each user, extracted from the JSON responses. The first JSON response holds the users uid and crmid that I require.
The API comes back with a large JSON response, with an object for each account. An extract of this for a single account can be seen below:
{
'uid': 10,
'key':
'[
N#839374',
'customerUid': 11,
'selfPaid': True,
'billCycleAllocationMethodUid': 1,
'stmtPaidForAccount': False,
'accountInvoiceDeliveryMethodUid': 1,
'payerAccountUid': 0,
'countryCode': None,
'currencyCode': 'GBP',
'languageCode': 'en',
'customFields':
{
'field':
[{
'name': 'CRMID',
'value': '11001'
}
]
},
'consentDetails': [],
'href': '/accounts/10'}
I have made a loop which extracts each UID for each account:
get_accounts = requests.get('https://website.com/api/v1/accounts?access_token=' + auth_key)
all_account_info = get_accounts.json()
account_info = all_account_info['resource']
account_information = {}
for account in account_info:
account_uid = account['uid']
I am now trying to extract the CRMID value, in this case '11001': {'name': 'CRMID', 'value': '11001'}.
I have been struggling all week to make this work, I have two problems:
I would like to extract the UID (which I have done) and the CRMID from the deeply nested 'customFields' dictionary in the JSON response. I manage to get as far as ['key'][0], but I am not sure how to access the next dictionary that is nested in the list.
I would like to store this information in a dictionary in the format below:
{'accounts': [{'uid': 10, 'crmid': 11001, 'amount': ['bill': 4027]}{'uid': 11, 'crmid': 11002, 'amount': ['bill': 1054]}]}
(The 'bill' information is going to come from a separate JSON response.)
My problem is, with every loop I design the dictionary seems to only hold one account/the last account it loops over. I cant figure out a way to append to the dictionary instead of overwrite whilst using a loop. If anyone has a useful link on how to do this it would be much appreciated.
My end goal is to have a single dictionary which holds the three pieces of information for each account (uid, crmid, bill). I'm then going to export this into a CSV document.
Any help, guidance, useful links etc would be much appreciated.
In regards to question 1, it may be helpful to print each level as you go down, then try and work out how to access the object you are returned at that level. If it is an array it will using number notation like [0] and if it is a dictionary it will use key notation like ['key']
Regarding question 2, your dictionary needs unique keys. You are probably looping over and replacing the whole thing each time.
The final structure you suggest is a bit off, imagine it as:
accounts: {
'10': {
'uid': '10',
'crmid': 11001,
'amount': {
'bill': 4027
}
},
'11': {
'uid': '11',
'crmid': 11011,
'amount': {
'bill': 4028
}
}
}
etc.
So you can access accounts['10']['crmid'] or accounts['10']['amount']['bill'] for example.
So what I need to do is to pass some information from XML files into elasticsearch and then search those files with tfidf weights applied to them. I also need to output the top 20 best results. I want to do this with python.
So far I have been able to pass the XML data and create an index successfully through python by creating arrays and then indexing them through a json-like format. I am aware that this means that while indexing most other options that are available through elasticsearch get a default value however I was unable to find a way to do this in a different way. What remains for me to do since all the data is passed into the index, is to search for it. I am given 10 documents that contain the title and a small summary of the text contained and I need to return the top 20 results with tfidf through elasticsearch. This is how I gather the 10 text files that need to be searched for in my index and this is how I try to search for them.
queries = []
with open("testingQueries.txt") as file:
queries = [i.strip() for i in file]
for query_text in queries:
query = {
'query': {
'more_like_this': {
'fields': ['document.text'],
'like': query_text
}
}
}
results = es.search(index=INDEX_NAME, body=query)
print(str(results) + "\n")
As you can see I haven't added an analyzer in this query and I have no idea how to add tfidf weights to search for these queries in my data. I've been searching for an answer everywhere but most answers are either not python related or do not really solve my problem. The search results that I am getting are also not giving me the top 20 results...in fact they aren't giving me any results. The output looks like this: {'took': 14, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': 0, 'max_score': None, 'hits': []}}
when I try to do the same with 'match' instead of 'more_like_this' I get a lot more results with hits but again I would still need tfidf scores and a result of the top 20 documents that are similar to my queries.
(I've added the google-analytics api tags but I suspect that my issue is more a fundamental flaw in my approach to a loop, detailed below)
I'm using Python to query the Google Analytics API (V4). Having already successfully connected to the API with my credentials, I'm trying to loop over each 10k result set returned by the API to get the full results set.
When querying the API you pass a dict that looks something like this:
{'reportRequests':[{'viewId': '1234567', # my actual view id goes here of course
'pageToken': 'go', # can be any string initially (I think?)
'pageSize': 10000,
'samplingLevel': 'LARGE',
'dateRanges': [{'startDate': '2018-06-01', 'endDate': '2018-07-13'}],
'dimensions': [{'name': 'ga:date'}, {'name': 'ga:dimension1'}, {'name': 'ga:dimension2'}, {'name': 'ga:userType'}, {'name': 'ga:landingpagePath'}, {'name': 'ga:deviceCategory'}],
'metrics': [{'expression': 'ga:sessions'}, {'expression': 'ga:bounces'}, {'expression': 'ga:goal1Completions'}]}]}
According to the documentation on Google Analytics API V4 on the pageToken parameter:
"A continuation token to get the next page of the results. Adding this to the request will return the rows after the pageToken. The pageToken should be the value returned in the nextPageToken parameter in the response to the reports.batchGet request. "
My understanding is that I need to query the API in chunks of 10,000 (max query result size allowed) and that to do this I must pass the value of nextPageToken field returned in each query result into the new query.
In researching, it sounds like the nextPageToken field will be a empty string when all the results have been returned.
So, I tried a while loop. To get to the loop stage I built some functions:
## generates the dimensions in the right format to use in the query
def generate_dims(dims):
dims_ar = []
for i in dims:
d = {'name': i}
dims_ar.append(d)
return(dims_ar)
## generates the metrics in the right format to use in the query
def generate_metrics(mets):
mets_ar = []
for i in mets:
m = {'expression': i}
mets_ar.append(m)
return(mets_ar)
## generate the query dict
def query(pToken, dimensions, metrics, start, end):
api_query = {
'reportRequests': [
{'viewId': VIEW_ID,
'pageToken': pToken,
'pageSize': 10000,
'samplingLevel': 'LARGE',
'dateRanges': [{'startDate': start, 'endDate': end}],
'dimensions': generate_dims(dimensions),
'metrics': generate_metrics(metrics)
}]
}
return(api_query)
Example output of the above 3 functions:
sessions1_qr = query(pToken = pageToken,
dimensions = ['ga:date', 'ga:dimension1', 'ga:dimension2',
'ga:userType', 'ga:landingpagePath',
'ga:deviceCategory'],
metrics = ['ga:sessions', 'ga:bounces', 'ga:goal1Completions'],
start = '2018-06-01',
end = '2018-07-13')
The results of this look like the first code block in this post.
So far so good. Here's the loop I attempted:
def main(query):
global pageToken, store_response
# debugging, was hoping to see print output on each iteration (I didn't)
print(pageToken)
while pageToken != "":
analytics = initialize_analyticsreporting()
response = get_report(analytics, query)
pageToken = response['reports'][0]['nextPageToken'] # < IT ALL COMES DOWN TO THIS LINE HERE
store_response['pageToken'] = response
return(False) # don't actually need the function to return anything, just append to global store_response.
Then I tried to run it:
pageToken = "go" # can be any string to get started
store_response = {}
sessions1 = main(sessions1_qr)
The following happens:
The console remains busy
The line print(pageToken) print's once to the console, the initial value of pageToken
store_response dict has one item in it, not many as was hoped for
So, it looks like my loop runs once only.
Having stared at the code I suspect it has something to do with the value of query parameter that I pass to main(). When I initially call main() the value of query is the same as the first code block above (variable sessions1_qr, the dict with all the API call parameters). On each loop iteration this is supposed to update so that the value of pageToken is replaced with the responses nextPageToken value.
Put another way and in short, I need to update the input of the loop with a result from the previous iteration of the loop. My logic is clearly flawed so any help very much appreciated.
Adding some screen shots per comments discussion:
This is the approach I would take to solve this:
def main(query):
global pageToken, store_response
while pageToken != "":
# debugging, was hoping to see print output on each iteration (I didn't)
print(pageToken)
analytics = initialize_analyticsreporting()
response = get_report(analytics, query)
# note that this has changed -- you were using 'pageToken' as a key
# which would overwrite each response
store_response[pageToken] = response
pageToken = response['reports'][0]['nextPageToken'] # update the pageToken
query['reportRequests'][0]['pageToken'] = pageToken # update the query
return(False) # don't actually need the function to return anything, just append to global store_response.
i.e. update the query data structure manually, and store each of the responses with the pageToken as the dictionary key.
Presumably the last page has '' as the nextPageToken so your loop will stop.
working on a project and this is driving me nuts , I have search online and found few answer that have work for my other queries that are json related however for this one its a bit of nightmare keep getting TrackStack error
this is my json
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":null
}
data = responseIdentify.read()
print("The following data return : " + data)
#Parse json data to print just
load = json.loads(data)
print(load[0]['name'])
and thats where my problem is I am unable to get the value form name , try for next statement and then i get this error:
Traceback (most recent call last):
File "C:\Python-Windows\random_test\cogT2.py", line 110, in <module>
for b in load[0]['name']:
KeyError: 0
using this for loop
for b in load[0]['name']:
print b[load]
any support would be most welcome am sure its something simple just can not figure it out.
Understanding how to reference nested dicts and lists in JSON is the hardest part. Here's a few things to consider.
Using your original data
ServerReturnJson = {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# No index here, just the dictionary key
print(ServerReturnJson['name'])
Added second person by making a list of dicts
ServerReturnJson = [{
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey",
"userData": 'null'
},
{
"personId": "234123412341234234",
"persistedFaceIds": ["1241234123423"],
"name": "miller",
"userData": 'null'
}
]
# You can use the index here since you have a list of dictionaries
print(ServerReturnJson[1]['name'])
# You can iterate like this
for item in ServerReturnJson:
print(item['name'])
Thanks for your support basically Microsoft Face API is returning back json with no index like Chris said in this first example
The above example works only if you add the following
data = responseIdentify.read() # read incoming respond form server
ServerReturnJson = json.loads(data)
so the complete answer is as follows :
dataJson= {
"personId":"59c16cab-9f28-454e-8c7c-213ac6711dfc",
"persistedFaceIds":["3aaafe27-9013-40ae-8e2a-5803dad90d04"],
"name":"ramsey,",
"userData":'null'
}
# add json load here
ServerReturnJson = json.loads(dataJson)
# No index here, just the dictionary key
print(ServerReturnJson['name'])
credits to Chris thanks , one last thing Chris mention "Understanding how to reference nested dicts and lists in JSON is the hardest part" 100% agreed