Django URL dispatcher not matching named group - python

I'm trying to make a DJango site, but the group matching in the URL dispatcher is giving me "p" no matter what I enter into the URL. Here's the pertinent parts of my code:
From user's urls.py (it does get included in the main urls.py)
url(r'^lookup?(?P<match_str>\w+)/$', views.lookup, name='user_lookup')
From views.py
def lookup(request, match_str):
users = User.objects.filter(name__contains=match_str)
json = serializers.serialize("json", users)
return json
And a couple log entries:
[01/Jul/2014 22:43:17] "GET /user/lookup/?z HTTP/1.1" 500 11363
[01/Jul/2014 22:43:18] "GET /user/lookup/?za HTTP/1.1" 500 11363
On closer inspection, it looks like my AJAX is actually sending two calls, and the second call is actually what's being matched. The logs for the second calls of the above log lines are:
[01/Jul/2014 22:43:17] "GET /merchant/lookup?z HTTP/1.1" 301 0
[01/Jul/2014 22:43:18] "GET /merchant/lookup?za HTTP/1.1" 301 0
I put a "debug" line in the view to print match_str and no matter I put it, I get 'p'. What is going on here?
Per karthikr's request, here's the result of print request.GET, match_str
<QueryDict: {u'za': [u'']}> p

Your regex doesn't match the URL from the log. The GET goes to /user/lookup, and the string user is not contained in Django's url Changing your regex to ^lookup/\?(?P<match_str>\w+)$, the request lookup/?someuser creates a named group match_str with the value someuser.
I recommend using one of the many online regex testers to play with the URL regex.

Related

How to capture response.code for each url that is attempted to scrape?

I have a large list of URLs to scrape and after multiple tests, I noticed that in the output from the execution of the spider there are a results sections that show all the response codes that the crawler encountered. But when I run my code that has this simple line in it, ALL the urls come back with a Code = 200
urlStatusCode = response.status
In the debug window the break down is like this and was hoping to capture the same thing in my file so that I can easily identify which URLs I need to go validate and adjust the code if needed.
Response Count 200 = 2494
Response Count 301 = 122
Response Count 404 = 37
I know what they all mean, but I would like to capture these actual codes in my CSV file that is creating with the scrape so that I can investigate the troubled URLS.
I don’t think you want to capture 301 response codes. When Scrapy find a 301, by default it yields a new request for the redirect target (a new URL), and your callback only receives the response to the final URL (after following all redirects).
As for 404 responses, they never reach your callback by default. If you want your callback to receive these responses, you have two options:
Add 404 to the HTTP_ALLOWED_CODES setting, so that 404 responses also reach your callbacks
Use an errback to handle 404 responses

Python get url from string(regex)

So what I am trying to do is to extract all urls from HTTP requests list. They should be stripped of protocol, parameters and slash at the end of the path(if exists).So for example:
10.4.180.222 [5/Feb/2018:08:03:40 +0100] "GET http://somewebsite.com/ HTTP/1.1" 200 1080
10.4.180.222 [5/Feb/2018:08:03:11 +0100] "GET http://www.somewebsite.cc/somesubdomain/ HTTP/1.1" 200 3056
10.4.180.222 [5/Feb/2018:08:03:11 +0100] "GET https://www.somewebsite.ua HTTP/1.1" 200 3056
Should be:
somewebsite.com
www.somewebsite.cc/somepath
www.somewebsite.ua
I've tried to do this in two steps, without using any sophisticated regex(just general for any url)
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file.read())
And then using urlparse.
domain = '{url.netloc}{url.path}'.format(url=urlparse(url))
It works almost fine. However I am getting path ending with slash.
www.somewebsite.cc/somepath/
So I've decided to use regex. However, I know only basics so I can't come up with anything well-functioning.Right now I have something like that but it doesn't cover "/" thing and different protocols :/
Thank you for any advice :)
((?:www\.+)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*))
If the end slash is your only problem, this is the solution.
urls = [ x.rstrip('/') for x in re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file.read()) ]
In other words, just do
urls = [ x.rstrip('/') for x in < your regex goes here > ].

Django request.GET.get() truncating url string

I am sending a message from chrome extension to django app running locally using chrome.runtime.sendMessage. I am able to capture the message in the url but somehow the whole GET parameter is not being captured. For example,
"GET /sensitiveApi/?text=%20%20%20%20The%20Idiots%20-%20Rainbow%20Six%20Siege%20Funny%20Moments%20&%20Epic%20Stuff%20%20We%27re%20back%20with%20some%20Rainbow%20Six%20Siege%20funny%20moments!%20All%20clips%20were%20streamed%20live%20on%20my%20Twitch:%20https://www.twitch.tv/teosgameMore%20Siege%20funny%20moments:%20https://www.youtube.com/playlist?list...Discord:%20https://discord.gg/teoTwitter:%20https://twitter.com/LAGxPeanutPwnerInstagram:%20https://www.instagram.com/photeographPeople%20in%20video:Alex:%20https://twitter.com/AlexandraRose_GKatie:%20https://www.twitch.tv/katielouise_jKatja:%20https://www.twitch.tv/katjawastakenPaddy:%20https://twitter.com/Patward96Smii7y:%20https://www.youtube.com/user/SMii7YSnedger:%20https://www.twitch.tv/snedgerStefan:%20https://twitter.com/lagxsourTortilla:%20https://twitter.com/Tortilla_NZColderMilk:%20https://www.youtube.com/user/ColderMilkColderMilk%20Twitch:%20https://www.twitch.tv/colder_milkColderMilk:%20Twitter:%20https://twitter.com/colder_milkMusic%20used:Outro:%20Come%20Back%20from%20San%20Francisco%20(Instrumental)%20by%20Rameses%20B%20https://www.youtube.com/watch?v=fBWac...%20Go%20check%20out%20his%20music!%20:)%20https://www.youtube.com/RamesesB2 HTTP/1.1" 200 2
this is one response that I want to capture and I a doing request.GET.get('text', '') but all it returns is this,
The Idiots - Rainbow Six Siege Funny Moments
How do I capture the whole GET parameter?
This is how I use chrome.runtime.sendMessage,
chrome.runtime.sendMessage({
method: 'GET',
action: 'xhttp',
url: "http://127.0.0.1:8000/sensitiveApi/?text=",
data : text
});
Unescaped ampersand(&), that needs to be percent-encoded:
>>> import urllib
>>> print(urllib.quote('&'.encode('utf-8')))
%26
url(http://www.example.com?fields=name&age) with & would look like below mentioned value:
url = http://www.example.com?fields=name%26age

404 while connecting to /hello/1 but 200 while connecting to any other number such as /hello/12 in flask

Trying to learn flask but stuck with some error or maybe an issue.
def check_int(no):
return "number is %d" %no
app.add_url_rule('/hello/<int:no>', 'nothign_specific', check_int)
So when I do a curl call to http://127.0.0.1:5000/hello/1 it fails wherein the same curl call to any other number apart from 1 passes.
http://127.0.0.1:5000/hello/<any number apart from 1 passes>
127.0.0.1 - - [05/Aug/2016 14:17:48] "GET /hello/1/ HTTP/1.1" 404 -
127.0.0.1 - - [05/Aug/2016 14:18:01] "GET /hello/12 HTTP/1.1" 200 -
Can someone let me know what's happening around
In flask, if your route (or rule) definition has no trailing slash is explicit. If you would add a trailing / to your url rule, i.e.
'/hello/<int:no>/'
then you would be able to use both (request with or without /).
According to flask docs, a route with a trailing slash is treated similar to a folder name in a file system: If accessed without the slash, flask will recognize it and redirect you to the one with slash. Contrastingly, a route that is defined without a trailing slash is treated like the pathname of a file, i.e. it will throw 404 when accessed with a trailing slash.
Read more: http://flask.pocoo.org/docs/0.11/quickstart/, section "Unique URLs / Redirection Behavior"

How get in real time POST update with django and instagram api?

I try to use python-instagram for get in real time instagram's media.
I use api.create_subscription with tag. And my callback url is a django web page on a distant web server.
My python script (I run it on my local computer):
api = InstagramAPI(client_id='my_id', client_secret='my_secret')
sub = api.create_subscription(object='tag', object_id='test', aspect='media', callback_url=my_url/insta)
print sub
while 1:
pass
My django view call by the callback url (I run it on distant web server):
def getInstagramPicture(request):
if request.method == "GET":
mode = request.GET.get("hub.mode")
challenge = request.GET.get("hub.challenge")
verify_token = request.GET.get("hub.verify_token")
return HttpResponse(challenge)
if request.method == "POST":
print "post"
I think the subscription works well. Web server terminal logs:
[20/Jan/2015 13:30:11] "GET /insta?hub.challenge=1aed90578d1743a3afb865cc2a6b69cc&hub.mode=subscribe HTTP/1.1" 301 0
[20/Jan/2015 13:30:11] "GET /insta/?hub.challenge=1aed90578d1743a3afb865cc2a6b69cc&hub.mode=subscribe HTTP/1.1" 200 32
And local terminal log:
sub {'meta': {'code': 200}, 'data': {'object': 'tag', 'object_id': 'test', 'aspect': 'media', 'callback_url': 'my_url/insta', 'type': 'subscription', 'id': '15738925'}}
But my problem it's when I try to post on Instagram a picture with tag "test" my view it's not call and I have in my web server terminal:
[20/Jan/2015 13:31:24] "POST /insta HTTP/1.1" 500 65563
Why my view is not call when I post instagram picture?
You need to correct your callback url to match what is defined in your urlconf. You currently tell Instagram that your callback url is <server>/insta but from the request logs it seems your urlconf is expecting <server>/insta/ (with a trailing slash)
This works ok for GET requests, as seen below
[20/Jan/2015 13:30:11] "GET /insta?hub.challenge=1aed90578d1743a3afb865cc2a6b69cc&hub.mode=subscribe HTTP/1.1" 301 0
[20/Jan/2015 13:30:11] "GET /insta/?hub.challenge=1aed90578d1743a3afb865cc2a6b69cc&hub.mode=subscribe HTTP/1.1" 200 32
(note the first line is a 301 redirect)
Django's default APPEND_SLASH setting ensures that the request for /insta is automatically redirected to the /insta/ view that you have defined.
However Django can't do that for a POST request... an HTTP redirect is always a GET request so the POST data would be lost. This is why you see a 500 error in your logs when a POST request is made to the non-existent url /insta:
[20/Jan/2015 13:31:24] "POST /insta HTTP/1.1" 500 65563
See also: https://stackoverflow.com/a/9739046/202168
The code shown does not appear to handle POST requests at all. It merely prints out "post" and returns nothing.
The web server log shows a HTTP 500 error (Internal Server Error) and a 64KiB error page that probably tells you exactly why.
You need to implement a handler for POST requests, and this handler would probably be similar to the GET handler that you already have.
Seems like instagram uses POST method to access to your callback url.
Try this:
from django.views.decorators.csrf import csrf_exempt
#csrf_exempt
def getInstagramPicture(request):
if request.method == "GET":
mode = request.GET.get("hub.mode")
challenge = request.GET.get("hub.challenge")
verify_token = request.GET.get("hub.verify_token")
return HttpResponse(challenge)
if request.method == "POST":
mode = request.POST.get("hub.mode")
challenge = request.POST.get("hub.challenge")
verify_token = request.POST.get("hub.verify_token")
return HttpResponse(challenge)

Categories

Resources