I'm confused about Scrapy Source Codes in shell.py

I'm confused about Scrapy Source Codes in shell.py - python

I have been learning python for about one month, and I was reading scrapy source code during last week.
It's OK for me to understand modules like spiders or crawler. Until I found something interesting in 'shell.py', I was totally confused. Here is the codes.
def _request_deferred(request):
"""Wrap a request inside a Deferred.
This function is harmful, do not use it until you know what you are doing.
This returns a Deferred whose first pair of callbacks are the request
callback and errback. The Deferred also triggers when the request
callback/errback is executed (ie. when the request is downloaded)
WARNING: Do not call request.replace() until after the deferred is called.
"""
# zeal4u: what's the meaning of codes below?
request_callback = request.callback
request_errback = request.errback
def _restore_callbacks(result):
request.callback = request_callback
request.errback = request_errback
return result
d = defer.Deferred()
d.addBoth(_restore_callbacks)
if request.callback:
d.addCallbacks(request.callback, request.errback)
request.callback, request.errback = d.callback, d.errback
return d
So the point is that why it assigns 'request.callback' and 'request.errback' to local vars 'request_callback' and 'request_errback', then assigns vars back to them in next function '_restore_callbacks'?
I know this can't be useless, but what's the real meaning and how it works?
Or should I read some relevant modules to figure it out? Please give me some advice. : )

Related

Recursive API Calls using AWS Lambda Functions Python

I've never written a recursive python script before. I'm used to splitting up a monolithic function into sub AWS Lambda functions. However, this particular script I am working on is challenging to break up into smaller functions.
Here is the code I am currently using for context. I am using one api request to return a list of objects within a table.
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
This returns every object(primary key) within a particular table as a list. For example,
['XX-XXSALES-WORK%20PO-1', 'XX-XXSALES-WORK%20PO-10', 'XX-XXSALES-WORK%20PO-100', 'XX-XXSALES-WORK%20PO-101', 'XX-XXSALES-WORK%20PO-102', 'XX-XXSALES-WORK%20PO-103', 'XX-XXSALES-WORK%20PO-104', 'XX-XXSALES-WORK%20PO-105', 'XX-XXSALES-WORK%20PO-106', 'XX-XXSALES-WORK%20PO-107']
I then use this list to populate more get requests using a for loop which then grabs me all the data per object.
for t in caseid:
url = requests.get(('https://cloud.xxxx.com:443/prweb/api/v1/cases/{}'.format(t)), auth=(username, password)).json()
data.append(url)
This particular lambda function takes about 15min which is the limit for one AWS Lambda function. Ideally, I'd like to split up the list into smaller parts and run the same process. I am struggling marking the point where it last ran before failure and passing that information on to the next function.
Any help is appreciated!

I'm not sure if I entirely understand what you want to do with the data once you've fetched all the information about the case, but it terms of breaking up the work once lambda is doing into many lambdas, you should be able to chunk out the list of cases and pass them to new invocations of the same lambda. Python psuedocode below, hopefully it helps illustrate the idea. I stole the chunks method from this answer that would help break the list into batches
import boto3
import json
client = boto3.client('lambda')
def handler
url_pega_EEvisa = requests.get('https://cloud.xxxx.com:443/prweb/api/v1/data/D_pxCaseList?caseClass=xx-xx-xx-xx', auth=(username, password))
pega_EEvisa_raw = url_pega_EEvisa.json()
pega_EEvisa = pega_EEvisa_raw['pxResults']
for chunk in chunks(pega_EEvisa, 10)
client.invoke(
FunctionName='lambdaToHandleBatchOfTenCases',
Payload=json.dumps(chunk)
)
Hopefully that helps? Let me know if this was not on target 😅

Is it possible to inject python code in Kwargs and how could I prevent this user input

I'm at the moment in the middle of writing my Bachelor thesis and for it creating a database system with Postgres and Flask.
To ensure the safety of my data, I was working on a file to prevent SQL injections, since a user should be able to submit a string via Http request. Since most of my functions which I use to analyze the Http request use Kwargs and a dict based on JSON in the request I was wondering if it is possible to inject python code into those kwargs.
And If so If there are ways to prevent that.
To make it easier to understand what I mean, here are some example requests and code:
def calc_sum(a, b):
c = a + b
return c
#app.route(/<target:string>/<value:string>)
def handle_request(target,value):
if target == 'calc_sum':
cmd = json.loads(value)
calc_sum(**cmd)
example Request:
Normal : localhost:5000/calc_sum/{"a":1, "b":2}
Injected : localhost:5000/calc_sum/{"a":1, "b:2 ): print("ham") def new_sum(a=1, b=2):return a+b":2 }
Since I'm not near my work, where all my code is I'm unable to test it out. And to be honest that my code example would work. But I hope this can convey what I meant.
I hope you can help me, or at least nudge me in the right direction. I've searched for it, but all I can find are tutorials on "who to use kwargs".
Best regards.

Yes you, but not in URL, try to use arguments like these localhost:5000/calc_sum?func=a+b&a=1&b=2
and to get these arguments you need to do this in flask
#app.route(/<target:string>)
def handle_request(target):
if target == 'calc_sum':
func= request.args.get('func')
a = request.args.get('a')
b = request.args.get('b')
result = exec(func)
exec is used to execute python code in strings

Imgur API: Dictionary values magically turns into None?

I know voodoo magic probably isn't the cause of this - but it sure seems like it!
I have the following code snippets, making use of the imgur API. The imgur object is the client which the imgur API uses and contains an attribute credits which displays the number of access credits the user has on the website.
imgur = imgurpython.ImgurClient(client_id, client_secret)
Calling:
imgur.credits
Returns the credits as normal, i.e.:
{'ClientLimit': 12500, 'UserReset': 1503185179, 'UserLimit': 500, 'UserRemaining': 0, 'ClientRemaining': 12000}
However when I attempt to call the dictionary in a later function:
def check_credits(imgur):
'''
Receives a client - and if there is not much credits left,
wait until the credit refills - i.e. pause the program
'''
credits = imgur.credits
credits_remaining = credits['UserRemaining']
reset_time = credits['UserReset']
if credits_remaining < 10:
print('not enough credits, remaining: %i' % credits_remaining)
now = int(dt.utcnow().timestamp())
wait_time = reset_time - now
print('waiting for: %i' % wait_time)
time.sleep(wait_time)
Sometimes the values in the dictionaries seem to turn into None instead of the integers they are supposed to be. In this case both reset_time and credits_remaining sometimes turn out to be None. In order to allow my code to run I'm having to add try-catches all over the code and it's getting quite frustrating. By the way, this function is called whenever the error ImgurClientRateLimitError, which is when imgur.credits['UserRemaining'] == 0. I'm wondering if anyone know why this may have been the case.

Upon looking at the source code for the client it seems that this is updated automatically upon each request. The updated values are obtained from the response headers after a call to ImgurClient.make_request. The header values are obtained from dict.get which can return None if the key does not exist in the headers dictionary. The code for reference is here: https://github.com/Imgur/imgurpython/blob/master/imgurpython/client.py#L143
I am not sure if these headers are still used on errors like 404 or 403 but I would investigate further from there. It seems though that because of this behavior you would need to either cache previous values or manually call the ImgurClient.get_credits method in these cases to get the real values. Whichever fix you go with is up to you.

Python GTK get selected value from the treeview

I am working on a mini GUI project , I am currently struggling to figure out how to get selected value from the list and then return that value to the main function so that I can use that value in somewhere else . Can someone help me please !!!!
####
self.device_list_store = gtk.ListStore(str,str,str,str,str)
for device in self.get_dev_list():
self.device_list_store.append(list(device))
device_list_treeview = gtk.TreeView(self.device_list_store)
selected_row = device_list_treeview.get_selection()
selected_row.connect("changed",self.item_selected)
####
def item_selected(self,selection):
model,row = selection.get_selected()
if row is not None:
selected_device = model[row][0]
at the moment ,the item_selected function is not returning anything , I want to return selected_device back to the main function so I can use it in other functions as well .
EDIT: I've edited code above to remove formatting errors #jcoppens

As you can see in the documentation, the item_selected function is called with one parameter, tree_selection. But if you define the function inside a class, it requires the self parameter too, which is normally added automatically. In your (confusing) example, there is no class defined, so I suspect the problem is your program which is incomplete.
Also, I suspect you don't want device_list_treeview = gtk.T... in the for loop:
for device in self.get_dev_list():
self.device_list_store.append(list(device))
device_list_treeview = gtk.TreeView(self.device_list_store)
And I suspect you want selected_device = mod... indented below the if:
if row is not None:
selected_device = model[row][0]
Please convert your example in a complete program, and formatted correctly.
BTW: item_selected is not a good name for the signal handler. It is also called if the item is unselected (which is why the signal is called 'changed')!
And important: Even though you should first read the basic Python tutorials and Gtk tutorials, you should then consider using lazka's excellent reference for all the Python APIs. There's a link on the page to download it completely and have it at hand in your computer.

Python CherryPy Web Programming Structure

I usually programmed web application in PHP. For now I am learning using CherryPy framework of Python to do web program. What I am trying to do is to accept a http Get request from user, then using the variables in the query string to do some function, finally I will return the result. But now I think I am being stuck in the programming convention of CherryPY.
I tried to use the index function to accept the variables post by GET method and then do function as follows:
import cherrypy
from sqlalchemy import *
from function import function
class WelcomePage:
def index(self,number):
if not number:
return "failure";
else:
number1=number;
return ;
now = datetime.datetime.utcnow();
code = generateCode();
finalResult();
def finalResult():
return code;
The above code does not work, it will just end at the return statement of index(self,number) and I cannot continue the other function. I guess I am breaking the object oriented structure of Python style (different from PHP).
So is the following is only the proper way to handle the GET input and return after calculation
#cherrypy.exposed
def index(self,number):
if not number:
return "failure";
else:
cal = Calculation();
other =OtherFunction();
code=cal.doSomething(number1);
final =other.doAnotherThing(code);
return final;
That is I need to call all other function within index function and then return the result within the index function. Is that the only way to have something to do about the http GET variable (or post). Can I have alternative way to write the code that does not need to call all the function inside the index funcion ( that seems will make the index function so lengthy). Is that a more clean and clear way to finished the processing of the GET variable with different function and then finally return the calculated result? I have tried to google for many days but still can't figure out a good way.

First, coming from PHP you should understand syntax (e.g. you don't need semi-colons unless it is a multi-statement line) and semantics (e.g. Python OO capabilities to arrange your code, your "programming structure") of Python. Second, workflow difference between CherryPy application and common PHP deployment (you can read PHP is meant to die for instance). CherryPy to your code is threaded application-server. Your active code is always in memory and can run in background. You have as the more power as the more responsibility.
Once you know the basics you can arrange your code the way you like. Compose or decompose, put in functions or classes or separate modules or any other imaginable way that Python can handle. CherryPy here is no limit. In its simplest it could be something as follows.
#!/usr/bin/env python
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 4
}
}
class App:
def _foo(self, value):
return self._bar(value)[::-1]
def _bar(self, value):
return value * 2
#cherrypy.expose
def index(self, value = None):
if not value:
# when you visit /
raise cherrypy.HTTPError(500)
else:
# when you visit e.g. /?value=abc
return self._foo(value)
if __name__ == '__main__':
cherrypy.quickstart(App(), config = config)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

I'm confused about Scrapy Source Codes in shell.py - python

Related

Recursive API Calls using AWS Lambda Functions Python

Is it possible to inject python code in Kwargs and how could I prevent this user input

Imgur API: Dictionary values magically turns into None?

Python GTK get selected value from the treeview

Python CherryPy Web Programming Structure

Categories

Resources