Recursive API Calls using AWS Lambda Functions Python - python

I've never written a recursive python script before. I'm used to splitting up a monolithic function into sub AWS Lambda functions. However, this particular script I am working on is challenging to break up into smaller functions.
Here is the code I am currently using for context. I am using one api request to return a list of objects within a table.
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
This returns every object(primary key) within a particular table as a list. For example,
['XX-XXSALES-WORK%20PO-1', 'XX-XXSALES-WORK%20PO-10', 'XX-XXSALES-WORK%20PO-100', 'XX-XXSALES-WORK%20PO-101', 'XX-XXSALES-WORK%20PO-102', 'XX-XXSALES-WORK%20PO-103', 'XX-XXSALES-WORK%20PO-104', 'XX-XXSALES-WORK%20PO-105', 'XX-XXSALES-WORK%20PO-106', 'XX-XXSALES-WORK%20PO-107']
I then use this list to populate more get requests using a for loop which then grabs me all the data per object.
for t in caseid:
url = requests.get(('https://cloud.xxxx.com:443/prweb/api/v1/cases/{}'.format(t)), auth=(username, password)).json()
data.append(url)
This particular lambda function takes about 15min which is the limit for one AWS Lambda function. Ideally, I'd like to split up the list into smaller parts and run the same process. I am struggling marking the point where it last ran before failure and passing that information on to the next function.
Any help is appreciated!

I'm not sure if I entirely understand what you want to do with the data once you've fetched all the information about the case, but it terms of breaking up the work once lambda is doing into many lambdas, you should be able to chunk out the list of cases and pass them to new invocations of the same lambda. Python psuedocode below, hopefully it helps illustrate the idea. I stole the chunks method from this answer that would help break the list into batches
import boto3
import json
client = boto3.client('lambda')
def handler
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
for chunk in chunks(pega_EEvisa, 10)
client.invoke(
FunctionName='lambdaToHandleBatchOfTenCases',
Payload=json.dumps(chunk)
)
Hopefully that helps? Let me know if this was not on target 😅

Related

Struggling with how to iterate data

I am learning Python3 and I have a fairly simple task to complete but I am struggling how to glue it all together. I need to query an API and return the full list of applications which I can do and I store this and need to use it again to gather more data for each application from a different API call.
applistfull = requests.get(url,authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
else:
print(applistfull.status_code)
I next have I think 'summaryguid' and I need to again query a different API and return a value that could exist many times for each application; in this case the compiler used to build the code.
I can statically call a GUID in the URL and return the correct information but I haven't yet figured out how to get it to do the below for all of the above and build a master list:
summary = requests.get(f"url{summaryguid}moreurl",authmethod)
if summary.ok:
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(appsummary["compiler"])
I would prefer to not yet have someone just type out the right answer but just drop a few hints and let me continue to work through it logically so I learn how to deal with what I assume is a common issue in the future. My thought right now is I need to move my second if up as part of my initial block and continue the logic in that space but I am stuck with that.
You are on the right track! Here is the hint: the second API request can be nested inside the loop that iterates through the list of applications in the first API call. By doing so, you can get the information you require by making the second API call for each application.
import requests
applistfull = requests.get("url", authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
summary = requests.get(f"url/{summaryguid}/moreurl", authmethod)
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(app["profile"]["name"],appsummary["compiler"])
else:
print(applistfull.status_code)

how can i write unit test for function that is making network request without changing it's interface?

I read that Unit tests run fast. If they don’t run fast, they aren’t unit tests. A test is not a unit test if 1. It talks to a database. 2. It communicates across a network. 3. It touches the file system. 4. You have to do special things to your environment (such as editing configuration files) to run it. in Working Effectively with legacy code (book).
I have a function that is downloading the zip from the internet and then converting it into a python object for a particular class.
import typing as t
def get_book_objects(date: str) -> t.List[Book]:
# download the zip with the date from the endpoint
res = requests.get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
let's say I want to write a unit test for the function get_book_objects. Then how am I supposed to write a unit test without making a network request? I mean I prefer file system read-over a network request because it will be way faster than making a request to the network although it is written that a good unit test also not touches the file system I will be fine with that.
So even if I want to write a unit test where I can provide a local zip file I have to modify the existing function to open the file from the local file system or I have to add some additional parameter to the function so I can send a zip file path from unit test function.
What will you do to write a good unit test in this kind of situation?
What will you do to write a good unit test in this kind of situation?
In the TDD world, the usual answer would be to delegate the work to a more easily tested component.
Consider:
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
which might then become something like
def get_book_objects(date: str) -> t.List[Book]:
# This is the piece that makes get_book_objects hard
# to isolate
http_get = requests.get
return get_book_objects_v2(http_get, date)
def get_book_objects_v2(http_get, date: str) -> t.List[Book]
# download the zip with the date from the endpoint
res = http_get(f"HTTP-URL-{date}")
# code to read the response content in BytesIO and then use the ZipFile module
# to extract data.
# parse the data and return a list of Book object
return books
get_book_objects is still hard to test, but it is also "so simple that there are obviously no deficiencies". On the other hand, get_book_objects_v2 is easy to test, because your test can control what callable is passed to the subject, and can use any reasonable substitute you like.
What we've done is shift most of the complexity/risk into a "unit" that is easier to test. For the function that is still hard to test, we'll use other techniques.
When authors talk about tests "driving" the design, this is one example - we're treating "complicated code needs to be easy to test" as a constraint on our design.
You've already identified the correct reference (Working Effectively with Legacy Code). The material you want is the discussion of seams.
A seam is a place where you can alter behavior in your program without editing in that place.
(In my edition of the book, the discussion begins in Chapter 4).

boto3 eks client how to generate presigned url

I'm trying to update a docker image within a deployment in EKS. I'm running a python code from a lambda function. However, I don't know how to use generate_presigned_url(). What should I pass as ClientMethod parameter???
import boto3
client = boto3.client("eks")
url = client.generate_presigned_url()
These are the clientMethods that you could perform in case of EKS.
'associate_encryption_config'
'associate_identity_provider_config'
'can_paginate'
'create_addon'
'create_cluster'
'create_fargate_profile'
'create_nodegroup'
'delete_addon'
'delete_cluster'
'delete_fargate_profile'
'delete_nodegroup'
'describe_addon'
'describe_addon_versions'
'describe_cluster'
'describe_fargate_profile'
'describe_identity_provider_config'
'describe_nodegroup'
'describe_update'
'disassociate_identity_provider_config'
'generate_presigned_url'
'get_paginator'
'get_waiter'
'list_addons'
'list_clusters'
'list_fargate_profiles'
'list_identity_provider_configs'
'list_nodegroups'
'list_tags_for_resource'
'list_updates'
'tag_resource'
'untag_resource'
'update_addon'
'update_cluster_config'
'update_cluster_version'
'update_nodegroup_config'
'update_nodegroup_version'
You can get more information about these method in the documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/eks.html#client
After over two weeks I suppose you've found your answer, anyway the ClientMethod mentioned (and, not really well explained on the boto3 docs) is just one of the methods you can use with the EKS client itself. I honestly think this is what KnowledgeGainer was trying to say by listing all the methods, basically you can just pick one. This would give you the presigned URL.
For example, here I'm using one method that isn't requiring any additional arguments, list_clusters:
>>> import boto3
>>> client = boto3.client("eks")
>>> client.generate_presigned_url("list_clusters")
'https://eks.eu-west-1.amazonaws.com/clusters?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAQKOXLHHBFT756PNG%2F20210528%2Feu-west-1%2Feks%2Faws4_request&X-Amz-Date=20210528T014603Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=d25dNCC17013ad9bc75c04b6e067105c23199c23cbadbbbeForExample'
If the method requires any additional arguments, you add those into Params as a dictionary:
>>> method_params = {'name': <your_cluster_name>}
>>> client.generate_presigned_url('describe_cluster', Params=method_params)

Imgur API: Dictionary values magically turns into None?

I know voodoo magic probably isn't the cause of this - but it sure seems like it!
I have the following code snippets, making use of the imgur API. The imgur object is the client which the imgur API uses and contains an attribute credits which displays the number of access credits the user has on the website.
imgur = imgurpython.ImgurClient(client_id, client_secret)
Calling:
imgur.credits
Returns the credits as normal, i.e.:
{'ClientLimit': 12500, 'UserReset': 1503185179, 'UserLimit': 500, 'UserRemaining': 0, 'ClientRemaining': 12000}
However when I attempt to call the dictionary in a later function:
def check_credits(imgur):
'''
Receives a client - and if there is not much credits left,
wait until the credit refills - i.e. pause the program
'''
credits = imgur.credits
credits_remaining = credits['UserRemaining']
reset_time = credits['UserReset']
if credits_remaining < 10:
print('not enough credits, remaining: %i' % credits_remaining)
now = int(dt.utcnow().timestamp())
wait_time = reset_time - now
print('waiting for: %i' % wait_time)
time.sleep(wait_time)
Sometimes the values in the dictionaries seem to turn into None instead of the integers they are supposed to be. In this case both reset_time and credits_remaining sometimes turn out to be None. In order to allow my code to run I'm having to add try-catches all over the code and it's getting quite frustrating. By the way, this function is called whenever the error ImgurClientRateLimitError, which is when imgur.credits['UserRemaining'] == 0. I'm wondering if anyone know why this may have been the case.
Upon looking at the source code for the client it seems that this is updated automatically upon each request. The updated values are obtained from the response headers after a call to ImgurClient.make_request. The header values are obtained from dict.get which can return None if the key does not exist in the headers dictionary. The code for reference is here: https://github.com/Imgur/imgurpython/blob/master/imgurpython/client.py#L143
I am not sure if these headers are still used on errors like 404 or 403 but I would investigate further from there. It seems though that because of this behavior you would need to either cache previous values or manually call the ImgurClient.get_credits method in these cases to get the real values. Whichever fix you go with is up to you.

How to call the function it's in?

I am creating a Python script which is calling the Instagram API and creating an array of all the photos. The API results are paginated, so it only shows 25 results. If there are more photos, it gives you a next_url which contains the next batch.
I have the script made in PHP, and I am doing something like this in my function:
// loop through this current function again with the next batch of photos
if($data->pagination->next_url) :
$func = __FUNCTION__;
$next_url = json_decode(file_get_contents($data->pagination->next_url, true));
$func($next_url);
endif;
How can I do something like this in Python?
My function looks sort of like this:
def add_images(url):
if url['pagination']['next_url']:
try:
next_file = urllib2.urlopen(url['pagination']['next_url'])
next_json = f.read()
finally:
# THIS DOES NOT WORK
next_url = json.loads(next_json)
add_images(next_url)
return
But obviously I can't just call add_images() from within. What are my options here?
You can call add_images() from within add_images(). Last time I checked, recursion still works in Python ;-).
However, since Python does not support tail call elimination, you need to be wary of stack overflows. The default recursion limit for CPython is 1,000 (available via sys.getrecursionlimit()), so you probably don't need to worry.
However, nowadays with generators and the advent of async I'd consider such JavaScript style recursive callback calls unpythonic. You might instead consider using generators and/or coroutines:
def get_images(base_url):
url = base_url
while url:
with contextlib.closing(urllib2.urlopen(url)) as url_file:
json_data = url_file.read()
# get_image_urls() extracts the images from JSON and returns an iterable.
# python 3.3 and up have "yield from"
# (see https://www.python.org/dev/peps/pep-0380/)
for img_url in get_image_urls(json_data):
yield img_url
# dict.get() conveniently returns None or
# the provided default argument when the
# element is missing.
url = json_data.get('pagination', {}).get('next_url')
images = list(get_images(base_url));

Categories

Resources