Python regular expression for HTTP Request header - python

I have a question about Python regex. I don't have much information about Python regex. I am working with HTTP request messages and parsing them with regex. As you know, the HTTP GET messages are in this format.
GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive
I want to parse the URI, method, user-agent, and the host areas of the message. My regex for this job is:
r'^({0})\s+(\S+)\s+[^\n]*$\n.*^User-Agent:\s*(\S+)[^\n]*$\n.*^Host:\s*(\S+)[^\n]*$\n'.format('|'.join(methods)), re.MULTILINE|re.DOTALL)
But, when the message comes up with like
GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive
I can not catch them because of the places of host or, user-agent changed. So I need a generic regex that will catch all of them, even if the places of host, method, uri are changed in the message.

Readability Counts (The Zen of Python)
Use findall() for each subexpression you want to find. This way your regex will be short, readable, and independent of the location of the subexpression.
Define a simple, readable regex:
>>> user=re.compile("User-Agent: (.*?)\n")
Test it with two different http headers:
>>> s1='''GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive'''
>>> s2='''GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive'''
>>> user.findall(s1)
['Wget/1.12 (linux-gnu)']
>>> user.findall(s2)
['Wget/1.12 (linux-gnu)']

Parse the whole headers into a dictionary like so?
headers = """GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive"""
headers = headers.splitlines()
firstLine = headers.pop(0)
(verb, url, version) = firstLine.split()
d = {'verb' : verb, 'url' : url, 'version' : version}
for h in headers:
h = h.split(': ')
if len(h) < 2:
continue
field=h[0]
value= h[1]
d[field] = value
print d
print d['User-Agent']
print d['url']

Related

How exactly do I pass a POST body parameter using python requests

I'm trying to provide an "mfa-code" parameter to a post request using python requests, but the response I'm getting is that the parameter "mfa-code" is missing, even though I try and provide it via requests.post(url, data={"mfa-code": "0000"}) and also tried requests.post(url, json={"mfa-code": "0000"}).
Here is what I'm trying to send.
POST /login2 HTTP/1.1
Host: redacted.net
Cookie: session=qexMWyQnLtSlBI8B005qnVW4OYvEwEV2; verify=wiener
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://redacted.net/login2
Content-Type: application/x-www-form-urlencoded
Content-Length: 13
Origin: https://redacted.net
Upgrade-Insecure-Requests: 1
Dnt: 1
Sec-Gpc: 1
Te: trailers
Connection: close
mfa-code=0000
And this is the request I'm sending with my python script
import requests
url = "redacted.net"
data={"mfa-code": "0000"}
r = reqeusts.post(url, data=data)
print(r.text)
This results in a response only stating:
"Missing parameter 'mfa-code'"
I took note in the response and how mfa-code is surrounded with ', so I went to burp repeater and put single quotes on mfa-code and sure enough received the same error.
I then tried with other options like json=json.dumps(data), but to the same result as the requests needs a POST body parameter of the type variable=data and not a json object.
What am I missing here?
Or is this something python requests cannot do?

Python Requests module - proxy not working

I'm trying to connect to website with python requests, but not with my real IP. So, I found some proxy on the internet and wrote this code:
import requests
proksi = {
'http': 'http://5.45.64.97:3128'
}
x = requests.get('http://www.whatsmybrowser.org/', proxies = proksi)
print(x.text)
When I get output, proxy simple don't work. Site returns my real IP Address. What I did wrong? Thanks.
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Now, when the proxy server sends this request to the actual destination, it sends:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/

Python - sending http requests as strings

How can I send HTTP requests as string with python? something like this:
r = """GET /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.stackoverflow.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive"""
answer = send(r)
print answer # gives me the response as string
Assuming python 3, it is recommended that you use urllib.request.
But since you specifically ask for providing the HTTP message as a string,
I assume you want to do low level stuff, so you can also use http.client:
import http.client
connection = http.client.HTTPConnection('www.python.org')
connection.request('GET', '/')
response = connection.getresponse()
print(response.status)
print(response.msg)
answer = response.read()
print(answer)

Remove default HTTP Headers from HTTPie's request

There are a couple of default headers that HTTPie sets. I'm wondering if there is a way to remove some header, like Accept-Encoding?
The reason I like to unset Accept-Encoding is to check our server's behavior about HTTP Compression.
Per https://github.com/jakubroztocil/httpie#http-headers , you can override those headers. For example, set Accept-Encoding to empty to achieve the same effect as if you had removed it -- per the rules at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3 .
Add the Headers follow by a colon.
No headers:
http -v https://jsonplaceholder.typicode.com/todos/1 \
Accept: \
Accept-Encoding: \
Connection: \
Host: \
User-Agent:
Request:
GET /todos/1 HTTP/1.1
Host: jsonplaceholder.typicode.com
Response:
HTTP/1.1 200 OK
...
Standard:
http -v https://jsonplaceholder.typicode.com/todos/1
Request:
GET /todos/1 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: jsonplaceholder.typicode.com
User-Agent: HTTPie/0.9.8
Response:
HTTP/1.1 200 OK
...
The -v option display the request. Also, remember no spaces after \ in multiline bash commands.

Add data to HTTP GET request using urllib2

If, for example, i send a request to google, it looks like thios in fiddler:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
and i want to make it look like this:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
a=112&b=2335
How can i do it using urllib2/1?
Thanks!!

Categories

Resources