I am creating a Python script which is calling the Instagram API and creating an array of all the photos. The API results are paginated, so it only shows 25 results. If there are more photos, it gives you a next_url which contains the next batch.
I have the script made in PHP, and I am doing something like this in my function:
// loop through this current function again with the next batch of photos
if($data->pagination->next_url) :
$func = __FUNCTION__;
$next_url = json_decode(file_get_contents($data->pagination->next_url, true));
$func($next_url);
endif;
How can I do something like this in Python?
My function looks sort of like this:
def add_images(url):
if url['pagination']['next_url']:
try:
next_file = urllib2.urlopen(url['pagination']['next_url'])
next_json = f.read()
finally:
# THIS DOES NOT WORK
next_url = json.loads(next_json)
add_images(next_url)
return
But obviously I can't just call add_images() from within. What are my options here?
You can call add_images() from within add_images(). Last time I checked, recursion still works in Python ;-).
However, since Python does not support tail call elimination, you need to be wary of stack overflows. The default recursion limit for CPython is 1,000 (available via sys.getrecursionlimit()), so you probably don't need to worry.
However, nowadays with generators and the advent of async I'd consider such JavaScript style recursive callback calls unpythonic. You might instead consider using generators and/or coroutines:
def get_images(base_url):
url = base_url
while url:
with contextlib.closing(urllib2.urlopen(url)) as url_file:
json_data = url_file.read()
# get_image_urls() extracts the images from JSON and returns an iterable.
# python 3.3 and up have "yield from"
# (see https://www.python.org/dev/peps/pep-0380/)
for img_url in get_image_urls(json_data):
yield img_url
# dict.get() conveniently returns None or
# the provided default argument when the
# element is missing.
url = json_data.get('pagination', {}).get('next_url')
images = list(get_images(base_url));
Related
I am learning Python3 and I have a fairly simple task to complete but I am struggling how to glue it all together. I need to query an API and return the full list of applications which I can do and I store this and need to use it again to gather more data for each application from a different API call.
applistfull = requests.get(url,authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
else:
print(applistfull.status_code)
I next have I think 'summaryguid' and I need to again query a different API and return a value that could exist many times for each application; in this case the compiler used to build the code.
I can statically call a GUID in the URL and return the correct information but I haven't yet figured out how to get it to do the below for all of the above and build a master list:
summary = requests.get(f"url{summaryguid}moreurl",authmethod)
if summary.ok:
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(appsummary["compiler"])
I would prefer to not yet have someone just type out the right answer but just drop a few hints and let me continue to work through it logically so I learn how to deal with what I assume is a common issue in the future. My thought right now is I need to move my second if up as part of my initial block and continue the logic in that space but I am stuck with that.
You are on the right track! Here is the hint: the second API request can be nested inside the loop that iterates through the list of applications in the first API call. By doing so, you can get the information you require by making the second API call for each application.
import requests
applistfull = requests.get("url", authmethod)
if applistfull.ok:
data = applistfull.json()
for app in data["_embedded"]["applications"]:
print(app["profile"]["name"],app["guid"])
summaryguid = app["guid"]
summary = requests.get(f"url/{summaryguid}/moreurl", authmethod)
fulldata = summary.json()
for appsummary in fulldata["static-analysis"]["modules"]["module"]:
print(app["profile"]["name"],appsummary["compiler"])
else:
print(applistfull.status_code)
I've never written a recursive python script before. I'm used to splitting up a monolithic function into sub AWS Lambda functions. However, this particular script I am working on is challenging to break up into smaller functions.
Here is the code I am currently using for context. I am using one api request to return a list of objects within a table.
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
This returns every object(primary key) within a particular table as a list. For example,
['XX-XXSALES-WORK%20PO-1', 'XX-XXSALES-WORK%20PO-10', 'XX-XXSALES-WORK%20PO-100', 'XX-XXSALES-WORK%20PO-101', 'XX-XXSALES-WORK%20PO-102', 'XX-XXSALES-WORK%20PO-103', 'XX-XXSALES-WORK%20PO-104', 'XX-XXSALES-WORK%20PO-105', 'XX-XXSALES-WORK%20PO-106', 'XX-XXSALES-WORK%20PO-107']
I then use this list to populate more get requests using a for loop which then grabs me all the data per object.
for t in caseid:
url = requests.get(('https://cloud.xxxx.com:443/prweb/api/v1/cases/{}'.format(t)), auth=(username, password)).json()
data.append(url)
This particular lambda function takes about 15min which is the limit for one AWS Lambda function. Ideally, I'd like to split up the list into smaller parts and run the same process. I am struggling marking the point where it last ran before failure and passing that information on to the next function.
Any help is appreciated!
I'm not sure if I entirely understand what you want to do with the data once you've fetched all the information about the case, but it terms of breaking up the work once lambda is doing into many lambdas, you should be able to chunk out the list of cases and pass them to new invocations of the same lambda. Python psuedocode below, hopefully it helps illustrate the idea. I stole the chunks method from this answer that would help break the list into batches
import boto3
import json
client = boto3.client('lambda')
def handler
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
for chunk in chunks(pega_EEvisa, 10)
client.invoke(
FunctionName='lambdaToHandleBatchOfTenCases',
Payload=json.dumps(chunk)
)
Hopefully that helps? Let me know if this was not on target 😅
I have been learning python for about one month, and I was reading scrapy source code during last week.
It's OK for me to understand modules like spiders or crawler. Until I found something interesting in 'shell.py', I was totally confused. Here is the codes.
def _request_deferred(request):
"""Wrap a request inside a Deferred.
This function is harmful, do not use it until you know what you are doing.
This returns a Deferred whose first pair of callbacks are the request
callback and errback. The Deferred also triggers when the request
callback/errback is executed (ie. when the request is downloaded)
WARNING: Do not call request.replace() until after the deferred is called.
"""
# zeal4u: what's the meaning of codes below?
request_callback = request.callback
request_errback = request.errback
def _restore_callbacks(result):
request.callback = request_callback
request.errback = request_errback
return result
d = defer.Deferred()
d.addBoth(_restore_callbacks)
if request.callback:
d.addCallbacks(request.callback, request.errback)
request.callback, request.errback = d.callback, d.errback
return d
So the point is that why it assigns 'request.callback' and 'request.errback' to local vars 'request_callback' and 'request_errback', then assigns vars back to them in next function '_restore_callbacks'?
I know this can't be useless, but what's the real meaning and how it works?
Or should I read some relevant modules to figure it out? Please give me some advice. : )
I want to write a tool in Python to prepare a simulation study by creating for each simulation run a folder and a configuration file with some run-specific parameters.
study/
study.conf
run1
run.conf
run2
run.conf
The tool should read the overall study configuration from a file including (1) static parameters (key-value pairs), (2) lists for iteration parameters, and (3) some small code snippets to calculate further parameters from the previous ones. The latter are run specific depending on the permutation of the iteration parameters used.
Before writing the run.conf files from a template, I need to run some code like this to determine the specific key-value pairs from the code snippets for that run
code = compile(code_str, 'foo.py', 'exec')
rv=eval(code, context, { })
However, as this is confirmed by the Python documentation, this just leads to a None as return value.
The code string and context dictionary in the example are filled elsewhere. For this discussion, this snippet should do it:
code_str="""import math
math.sqrt(width**2 + height**2)
"""
context = {
'width' : 30,
'height' : 10
}
I have done this before in Perl and Java+JavaScript. There, you just give the code snippet to some evaluation function or script engine and get in return a value (object) from the last executed statement -- not a big issue.
Now, in Python I struggle with the fact that eval() is too narrow just allowing one statement and exec() doesn't return values in general. I need to import modules and sometimes do some slightly more complex calculations, e.g., 5 lines of code.
Isn't there a better solution that I don't see at the moment?
During my research, I found some very good discussions about Pyhton eval() and exec() and also some tricky solutions to circumvent the issue by going via the stdout and parsing the return value from there. The latter would do it, but is not very nice and already 5 years old.
The exec function will modify the global parameter (dict) passed to it. So you can use the code below
code_str="""import math
Result1 = math.sqrt(width**2 + height**2)
"""
context = {
'width' : 30,
'height' : 10
}
exec(code_str, context)
print (context['Result1']) # 31.6
Every variable code_str created will end up with a key:value pair in the context dictionary. So the dict is the "object" like you mentioned in JavaScript.
Edit1:
If you only need the result of the last line in code_str and try to prevent something like Result1=..., try the below code
code_str="""import math
math.sqrt(width**2 + height**2)
"""
context = { 'width' : 30, 'height' : 10 }
lines = [l for l in code_str.split('\n') if l.strip()]
lines[-1] = '__myresult__='+lines[-1]
exec('\n'.join(lines), context)
print (context['__myresult__'])
This approach is not as robust as the former one, but should work for most case. If you need to manipulate the code in a sophisticated way, please take a look at the Abstract Syntax Trees
Since this whole exec() / eval() thing in Python is a bit weird ... I have chose to re-design the whole thing based on a proposal in the comments to my question (thanks #jonrsharpe).
Now, the whole study specification is a .py module that the user can edit. From there, the configuration setup is directly written to a central object of the whole package. On tool runs, the configuration module is imported using the code below
import imp
# import the configuration as a module
(path, name) = os.path.split(filename)
(name, _) = os.path.splitext(name)
(file, filename, data) = imp.find_module(name, [path])
try:
module = imp.load_module(name, file, filename, data)
except ImportError as e:
print(e)
sys.exit(1)
finally:
file.close()
I came across similar needs, and finally figured out a approach by playing with ast:
import ast
code = """
def tf(n):
return n*n
r=tf(3)
{"vvv": tf(5)}
"""
ast_ = ast.parse(code, '<code>', 'exec')
final_expr = None
for field_ in ast.iter_fields(ast_):
if 'body' != field_[0]: continue
if len(field_[1]) > 0 and isinstance(field_[1][-1], ast.Expr):
final_expr = ast.Expression()
final_expr.body = field_[1].pop().value
ld = {}
rv = None
exec(compile(ast_, '<code>', 'exec'), None, ld)
if final_expr:
rv = eval(compile(final_expr, '<code>', 'eval'), None, ld)
print('got locals: {}'.format(ld))
print('got return: {}'.format(rv))
It'll eval instead of exec the last clause if it's an expression, or have all execed and return None.
Output:
got locals: {'tf': <function tf at 0x10103a268>, 'r': 9}
got return: {'vvv': 25}
I usually programmed web application in PHP. For now I am learning using CherryPy framework of Python to do web program. What I am trying to do is to accept a http Get request from user, then using the variables in the query string to do some function, finally I will return the result. But now I think I am being stuck in the programming convention of CherryPY.
I tried to use the index function to accept the variables post by GET method and then do function as follows:
import cherrypy
from sqlalchemy import *
from function import function
class WelcomePage:
def index(self,number):
if not number:
return "failure";
else:
number1=number;
return ;
now = datetime.datetime.utcnow();
code = generateCode();
finalResult();
def finalResult():
return code;
The above code does not work, it will just end at the return statement of index(self,number) and I cannot continue the other function. I guess I am breaking the object oriented structure of Python style (different from PHP).
So is the following is only the proper way to handle the GET input and return after calculation
#cherrypy.exposed
def index(self,number):
if not number:
return "failure";
else:
cal = Calculation();
other =OtherFunction();
code=cal.doSomething(number1);
final =other.doAnotherThing(code);
return final;
That is I need to call all other function within index function and then return the result within the index function. Is that the only way to have something to do about the http GET variable (or post). Can I have alternative way to write the code that does not need to call all the function inside the index funcion ( that seems will make the index function so lengthy). Is that a more clean and clear way to finished the processing of the GET variable with different function and then finally return the calculated result? I have tried to google for many days but still can't figure out a good way.
First, coming from PHP you should understand syntax (e.g. you don't need semi-colons unless it is a multi-statement line) and semantics (e.g. Python OO capabilities to arrange your code, your "programming structure") of Python. Second, workflow difference between CherryPy application and common PHP deployment (you can read PHP is meant to die for instance). CherryPy to your code is threaded application-server. Your active code is always in memory and can run in background. You have as the more power as the more responsibility.
Once you know the basics you can arrange your code the way you like. Compose or decompose, put in functions or classes or separate modules or any other imaginable way that Python can handle. CherryPy here is no limit. In its simplest it could be something as follows.
#!/usr/bin/env python
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 4
}
}
class App:
def _foo(self, value):
return self._bar(value)[::-1]
def _bar(self, value):
return value * 2
#cherrypy.expose
def index(self, value = None):
if not value:
# when you visit /
raise cherrypy.HTTPError(500)
else:
# when you visit e.g. /?value=abc
return self._foo(value)
if __name__ == '__main__':
cherrypy.quickstart(App(), config = config)