Python get url from string(regex) - python

So what I am trying to do is to extract all urls from HTTP requests list. They should be stripped of protocol, parameters and slash at the end of the path(if exists).So for example:
10.4.180.222 [5/Feb/2018:08:03:40 +0100] "GET http://somewebsite.com/ HTTP/1.1" 200 1080
10.4.180.222 [5/Feb/2018:08:03:11 +0100] "GET http://www.somewebsite.cc/somesubdomain/ HTTP/1.1" 200 3056
10.4.180.222 [5/Feb/2018:08:03:11 +0100] "GET https://www.somewebsite.ua HTTP/1.1" 200 3056
Should be:
somewebsite.com
www.somewebsite.cc/somepath
www.somewebsite.ua
I've tried to do this in two steps, without using any sophisticated regex(just general for any url)
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file.read())
And then using urlparse.
domain = '{url.netloc}{url.path}'.format(url=urlparse(url))
It works almost fine. However I am getting path ending with slash.
www.somewebsite.cc/somepath/
So I've decided to use regex. However, I know only basics so I can't come up with anything well-functioning.Right now I have something like that but it doesn't cover "/" thing and different protocols :/
Thank you for any advice :)
((?:www\.+)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*))

If the end slash is your only problem, this is the solution.
urls = [ x.rstrip('/') for x in re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', file.read()) ]
In other words, just do
urls = [ x.rstrip('/') for x in < your regex goes here > ].

Related

Why is the Flask 'POST' method not working?

I'm trying to create a small chatting platofrm on Flask. Nothing fancy, just an experiment. My problem is that flask doesn't seem to be taking any input from my post statements, even though I get the message in the console.
text = {"text":typed} #both userenter and typed are strings
username = {"username":userenter}
requests.post(send_url, username)
requests.post(send_url, text)
typed = ''
That's the most important part of the client code, as the rest is mainly getting things printed and getting keystrokes.
This is the flask code:
chat = {"Chat":["Welcome to my chatting thing!\n"]}
#app.route('/chat', methods = ['POST'])
def chatt():
global chat
username = request.form['username']
Text = request.form['text']
chat["Chat"].append(username)
chat["Chat"].append(": ")
chat["Chat"].append(Text)
chat["Chat"].append("\n")
return
For the flask code, my main goal is to take the input from the post and append it to the list within the dictionary 'chat'
Console logs:
172\.31.128.1 - - \[16/Feb/2023 23:07:42\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:43\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:44\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:45\] "POST /chat HTTP/1.1" 400 -
172\.31.128.1 - - \[16/Feb/2023 23:07:45\] "POST /chat HTTP/1.1" 400 -
172\.31.128.1 - - \[16/Feb/2023 23:07:46\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:47\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:48\] "GET /get-chat HTTP/1.1" 200 -
172\.31.128.1 - - \[16/Feb/2023 23:07:49\] "GET /get-chat HTTP/1.1" 200 -
The console does register that the posts got through, but nothing was added to the list.
I've tried playing around with the code and trying out different syntaxes, as well as printing it out. However, no matter what I've tried, it feels like the console is the only proof that the POST statements went through.
Edit #1:
I have tried using debuggers such as breakpoint() and some others. Didn't work.
Also, somebody pointed out that my post was returning 400 instead of 200, which is a malformed request. If anybody knows a solution to that, please tell me.
Tell me if you need any more information about my code if you need it.
There is 400 error, check what that means
You're sending two requests instead of one. Your action is expecting 2 parameters:
data = {"text": typed, "username": userenter}
requests.post(send_url, data)
You may also want to receive some response, so action can return something that you can pick up.

Ruby net/http GET requests with empty body [duplicate]

This question already has answers here:
Ruby - net/http - following redirects
(6 answers)
Closed 17 days ago.
I'm currently simply trying to get a simple GET request working in Ruby, however, I'm seeing some strange behavior.
I have an Open Web Analytics application running with Docker and it is reachable at http://127.0.0.1:8080/.
I can reach the login site and everything works fine.
Now I want to do a GET request with Ruby to analyze the body of that request but I cannot get it to work, in other languages like Python or simple GET requests over the terminal it works fine. Why not with Ruby?
Here is my very basic Ruby code:
require 'net/http'
url = 'http://127.0.0.1:8080/'
uri = URI(url)
session = Net::HTTP.new(uri.host, uri.port)
response = session.get(uri.request_uri)
puts response.body
Which doesn't output anything. If I look into the NGINX logs from the container, I can see the request being made but there is no further redirection as with the other methods (see below).
172.23.0.1 - - [02/Feb/2023:20:02:59 +0000] "GET / HTTP/1.1" 302 5 "-" "Ruby" "-" 0.088 0.088 . -
If I do a simple GET over the terminal, it works:
GET http://127.0.0.1:8080/
will output the correct body, and in the NGINX logs I can see the following:
172.23.0.1 - - [02/Feb/2023:20:20:10 +0000] "GET / HTTP/1.1" 302 5 "-" "lwp-request/6.61 libwww-perl/6.61" "-" 0.086 0.088 . -
172.23.0.1 - - [02/Feb/2023:20:20:10 +0000] "GET /index.php?owa_do=base.loginForm&owa_go=http%3A%2F%2F127.0.0.1%3A8080%2F& HTTP/1.1" 200 3200 "-" "lwp-request/6.61 libwww-perl/6.61" "-" 0.086 0.088 . -
Doing it in Python with the following basic code also works and gives similar results as with the terminal GET version:
import requests
x = requests.get("http://127.0.0.1:8080/")
print(x.content)
What am I doing wrong?
Got it working with following redirects (see here):
begin
response = Net::HTTP.get_response(URI.parse(url))
url = response['location']
end while response.is_a?(Net::HTTPRedirection)

Pythonic way of parsing possibly quoted fields

To do this, I'd normally write a function that pulls one field at a time from the input string, and then loop until the input string is empty.
But there must be a more pythonic way of doing it that splits everything up at once.
Fields in the input string are separated by a space, and fields that contain spaces are enclosed by quotation marks. Quoted fields do not contain quotation marks.
An real example of this format is a web server's access_log file:
216.244.66.234 - - [01/Nov/2019:19:20:07 +0000] "GET /robots.txt HTTP/1.1" 200 67 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help#moz.com)"
EDIT:
access_log was a bad choice as an example, as it contains a bracket-delimited field that contains a space.
But since there is a simple solution to my original question (shlex.split()), I'll revise this question to include processing the bracketed field too (again with no internal delimiter character).
What I'm looking for is an example of parsing a string into fields in a way other than using a function to pull one token out of the string at a time.
IUUC, you could use shlex.split:
from shlex import split
s = '216.244.66.234 - - [01/Nov/2019:19:20:07 +0000] "GET /robots.txt HTTP/1.1" 200 67 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help#moz.com)"'
for field in split(s):
print(field)
Output
216.244.66.234
-
-
[01/Nov/2019:19:20:07
+0000]
GET /robots.txt HTTP/1.1
200
67
-
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help#moz.com)

404 while connecting to /hello/1 but 200 while connecting to any other number such as /hello/12 in flask

Trying to learn flask but stuck with some error or maybe an issue.
def check_int(no):
return "number is %d" %no
app.add_url_rule('/hello/<int:no>', 'nothign_specific', check_int)
So when I do a curl call to http://127.0.0.1:5000/hello/1 it fails wherein the same curl call to any other number apart from 1 passes.
http://127.0.0.1:5000/hello/<any number apart from 1 passes>
127.0.0.1 - - [05/Aug/2016 14:17:48] "GET /hello/1/ HTTP/1.1" 404 -
127.0.0.1 - - [05/Aug/2016 14:18:01] "GET /hello/12 HTTP/1.1" 200 -
Can someone let me know what's happening around
In flask, if your route (or rule) definition has no trailing slash is explicit. If you would add a trailing / to your url rule, i.e.
'/hello/<int:no>/'
then you would be able to use both (request with or without /).
According to flask docs, a route with a trailing slash is treated similar to a folder name in a file system: If accessed without the slash, flask will recognize it and redirect you to the one with slash. Contrastingly, a route that is defined without a trailing slash is treated like the pathname of a file, i.e. it will throw 404 when accessed with a trailing slash.
Read more: http://flask.pocoo.org/docs/0.11/quickstart/, section "Unique URLs / Redirection Behavior"

Django URL dispatcher not matching named group

I'm trying to make a DJango site, but the group matching in the URL dispatcher is giving me "p" no matter what I enter into the URL. Here's the pertinent parts of my code:
From user's urls.py (it does get included in the main urls.py)
url(r'^lookup?(?P<match_str>\w+)/$', views.lookup, name='user_lookup')
From views.py
def lookup(request, match_str):
users = User.objects.filter(name__contains=match_str)
json = serializers.serialize("json", users)
return json
And a couple log entries:
[01/Jul/2014 22:43:17] "GET /user/lookup/?z HTTP/1.1" 500 11363
[01/Jul/2014 22:43:18] "GET /user/lookup/?za HTTP/1.1" 500 11363
On closer inspection, it looks like my AJAX is actually sending two calls, and the second call is actually what's being matched. The logs for the second calls of the above log lines are:
[01/Jul/2014 22:43:17] "GET /merchant/lookup?z HTTP/1.1" 301 0
[01/Jul/2014 22:43:18] "GET /merchant/lookup?za HTTP/1.1" 301 0
I put a "debug" line in the view to print match_str and no matter I put it, I get 'p'. What is going on here?
Per karthikr's request, here's the result of print request.GET, match_str
<QueryDict: {u'za': [u'']}> p
Your regex doesn't match the URL from the log. The GET goes to /user/lookup, and the string user is not contained in Django's url Changing your regex to ^lookup/\?(?P<match_str>\w+)$, the request lookup/?someuser creates a named group match_str with the value someuser.
I recommend using one of the many online regex testers to play with the URL regex.

Categories

Resources