Python33 - Improving server security with BaseHTTPRequestHandler

Python33 - Improving server security with BaseHTTPRequestHandler - python

I have lately been improving security on my webserver, which I wrote myself using http.server and BaseHTTPRequestHandler. I have blocked (403'd) most essential server files, which I do not want users to be able to access. Files include the python server script and all databases, plus some HTML templates.
However, in this post on stackoverflow I read that using open(curdir + sep + self.path) in a do_GET request might potentially make every file on your computer readable.
Can someone explain this to me? If the self.path is ip:port/index.html every time, how can someone access files that are above the root / directory?
I understand that the user (obviously) can change the index.html to anything else, but I don't see how they can access directories above root.
Also if you're wondering why I'm not using nginx or apache, I wanted to create my own web server and website for learning purposes. I have no intention to run an actual website myself, and if I do want to, I will probably rent a server or use existing server software.
class Handler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
try:
if "SOME BLOCKED FILE OR DIRECTORY" in self.path:
self.send_error(403, "FORBIDDEN")
return
#I have about 6 more of these 403 parts, but I left them out for readability
if self.path.endswith(".html"):
if self.path.endswith("index.html"):
#template is the Template Engine that I created to create dynamic HTML content
parser = template.TemplateEngine()
content = parser.get_content("index", False, "None", False)
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(content.encode("utf-8"))
return
elif self.path.endswith("auth.html"):
parser = template.TemplateEngine()
content = parser.get_content("auth", False, "None", False)
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(content.encode("utf-8"))
return
elif self.path.endswith("about.html"):
parser = template.TemplateEngine()
content = parser.get_content("about", False, "None", False)
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(content.encode("utf-8"))
return
else:
try:
f = open(curdir + sep + self.path, "rb")
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write((f.read()))
f.close()
return
except IOError as e:
self.send_response(404)
self.send_header("Content-type", "text/html")
self.end_headers()
return
else:
if self.path.endswith(".css"):
h1 = "Content-type"
h2 = "text/css"
elif self.path.endswith(".gif"):
h1 = "Content-type"
h2 = "gif"
elif self.path.endswith(".jpg"):
h1 = "Content-type"
h2 = "jpg"
elif self.path.endswith(".png"):
h1 = "Content-type"
h2 = "png"
elif self.path.endswith(".ico"):
h1 = "Content-type"
h2 = "ico"
elif self.path.endswith(".py"):
h1 = "Content-type"
h2 = "text/py"
elif self.path.endswith(".js"):
h1 = "Content-type"
h2 = "application/javascript"
else:
h1 = "Content-type"
h2 = "text"
f = open(curdir+ sep + self.path, "rb")
self.send_response(200)
self.send_header(h1, h2)
self.end_headers()
self.wfile.write(f.read())
f.close()
return
except IOError:
if "html_form_action.asp" in self.path:
pass
else:
self.send_error(404, "File not found: %s" % self.path)
except Exception as e:
self.send_error(500)
print("Unknown exception in do_GET: %s" % e)

You're making an invalid assumption:
If the self.path is ip:port/index.html every time, how can someone access files that are above the root / directory?
But self.path is never ip:port/index.html. Try logging it and see what you get.
For example, if I request http://example.com:8080/foo/bar/index.html, the self.path is not example.com:8080/foo/bar/index.html, but just /foo/bar/index.html. In fact, your code couldn't possibly work otherwise, because curdir+ sep + self.path would give you a path starting with ./example.com:8080/, which won't exist.
And then ask yourself what happens if it's /../../../../../../../etc/passwd.
This is one of many reasons to use os.path instead of string manipulation for paths. For examples, instead of this:
f = open(curdir + sep + self.path, "rb")
Do this:
path = os.path.abspath(os.path.join(curdir, self.path))
if os.path.commonprefix((path, curdir)) != curdir:
# illegal!
I'm assuming that curdir here is an absolute path, not just from os import curdir or some other thing that's more likely to give you . than anything else. If it's the latter, make sure to abspath it as well.
This can catch other ways of escaping the jail as well as passing in .. strings… but it's not going to catch everything. For example, if there's a symlink pointing out of the jail, there's no way abspath can tell that someone's gone through the symlink.

self.path contains the request path. If I were to send a GET request and ask for the resource located at /../../../../../../../etc/passwd, I would break out of your application's current folder and be able to access any file on your filesystem (that you have permission to read).

Related

How to make CGIHTTPRequestHandler execute my PHP scripts with POST data

I am currently trying to do simple web stuff with the http.server module in Python.
When I try to POST a form to my script, it does not receive the POST data, $_POST ist empty and file_get_contents('php://input') as well.
This is my post_test.html:
#!/usr/bin/php
<!DOCTYPE html>
<html>
<body>
<form method="post" action="post_test.html">
Name: <input type="text" name="fname">
<input type="submit">
</form>
<?php
if ($_SERVER["REQUEST_METHOD"] == "POST") {
// collect value of input field
echo "RAW POST: " . file_get_contents('php://input') . "<br>";
$name = $_POST['fname'];
if (empty($name)) {
echo "Name is empty";
} else {
echo $name;
}
}
?>
</body>
</html>
And this is my server script:
import urllib.parse
from http.server import CGIHTTPRequestHandler, HTTPServer
hostName = "localhost"
serverPort = 8080
handler = CGIHTTPRequestHandler
handler.cgi_directories.append('/php-cgi')
class MyServer(handler):
def do_GET(self):
# get path without first '/' and without anything besides path and filename
file = self.path[1:].split("?")[0].split("#")[0]
# if the file is in the script list, execute from php-cgi, else load from webapp
php_handle = ["post_test.html"]
if file in php_handle:
self.path = "/php-cgi" + self.path
else:
self.path = "/webapp" + self.path
CGIHTTPRequestHandler.do_GET(self)
def do_POST(self):
# get path without first '/' and without anything besides path and filename
file = self.path[1:].split("?")[0].split("#")[0]
# if the file is in the script list, execute from php-cgi
php_handle = ["post_test.html"]
if file in php_handle:
length = int(self.headers['Content-Length'])
post_data = urllib.parse.parse_qs(self.rfile.read(length).decode('utf-8'))
for key, data in post_data.items():
self.log_message(key + ": " + str.join(", ", data))
self.path = "/php-cgi" + self.path
CGIHTTPRequestHandler.do_POST(self)
def do_HEAD(self):
CGIHTTPRequestHandler.do_HEAD(self)
if __name__ == "__main__":
webServer = HTTPServer((hostName, serverPort), MyServer)
try:
webServer.serve_forever()
except KeyboardInterrupt:
pass
webServer.server_close()
print("Server stopped.")

Okay, I was finally able to solve the problem: #!/usr/bin/php is wrong, it should be #!/usr/bin/php-cgi.
Reason: php does NOT use the POST data, and there is no way giving it to php.
php-cgi is made for the webserver purpose and can handle it.
Hwo to solve the next problem: To run php-cgi successfully you have to create a php.ini in the current directory tho, with two settings.
First one to allow executing it directly, second one to set the directory where the scripts are. If you don't set it, you will be greeted with a 404 not found.
php.ini:
cgi.force_redirect = 0
doc_root = /home/username/server-directory
Where the server-directory folder is the folder containing the php.ini, and the php-cgi folder. Also the name of the folder is not related to the php-cgi binary, it's just bad naming that I did.
If you try to recreate this, it should work perfectly now.

python HTTPServer return multipart/form-data (returning multiple binary files)

due to the env I am working I steer to use only the standard libs.
My goal is return in a single do_GET multiple binary files.
Below is an stub that I just can't figure out why is it not working. I did browse SO extensively and other places, including the RFCs.
I am testing this via curl and firefox to no avail.
Any hints appreciated.
def do_GET(self):
parsed_path = parse.urlparse(self.path)
#ret="mensaje de vuelta"
#print("*** {} ***".format(ret))
#message = ret+'\r\n'
self.send_response(200)
self.send_header('Content-Type', 'multipart/form-data; boundary=qazwsxedcrfv')
#
#
self.end_headers()
#
#
self.wfile.write(b"\r\n--qazwsxedcrfv\r\n")
self.wfile.write(b'Content-Disposition: form-data; name="datafile1"; filename="'+
bytes(os.path.basename(files[0]).encode("utf8"))+b'"\r\n')
self.wfile.write(b'Content-Type: image/jpg\r\n')
self.wfile.write(b"\r\n")
pic=open(files[0],"rb").read()
self.wfile.write(pic)
self.wfile.write(b"\r\n--qazwsxedcrfv\r\n")
self.wfile.write(b'Content-Disposition: form-data; name="datafile1"; filename="'+
bytes(os.path.basename(files[1]).encode("utf8"))+b'"\r\n')
self.wfile.write(b'Content-Type: image/jpg\r\n')
self.wfile.write(b"\r\n")
pic=open(files[1],"rb").read()
self.wfile.write(pic)
self.wfile.write(b"\r\n--qazwsxedcrfv--\r\n")
#

How to disable output buffering in Python simplhttpserver

The below script snippet is part of a script that implements a simplehttpserver instance which triggers a third party module upon a GET request. I am able to capture the third party module's stdout messages and send them out to the webbrowser.
Currently, the script collects all the stdout messages and dumps them to the client, when the invoked module has been finished....
Since I want each message to appear in the browser as it is sent to stdout, output buffering needs to be disabled.
How do I do that in pythons simplehttpserver?
def do_GET(self):
global key
stdout_ = sys.stdout #Keep track of the previous value.
stream = cStringIO.StringIO()
sys.stdout = stream
''' Present frontpage with user authentication. '''
if self.headers.getheader('Authorization') == None:
self.do_AUTHHEAD()
self.wfile.write('no auth header received')
pass
elif self.headers.getheader('Authorization') == 'Basic '+key:
if None != re.search('/api/v1/check/*', self.path):
recordID = self.path.split('/')[-1]
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.send_header('Access-Control-Allow-Origin', '*')
self.send_header('Access-Control-Allow-Methods', 'GET,POST,PUT,OPTIONS')
self.send_header("Access-Control-Allow-Headers", "X-Requested-With, Content-Type, Authorization")
self.end_headers()
notStarted = True
while True:
if notStarted is True:
self.moduleXYZ.start()
notStarted is False
if "finished" in stream.getvalue():
sys.stdout = stdout_ # restore the previous stdout.
self.wfile.write(stream.getvalue())
break
Update
I modified the approach to fetch the status messages from the class, instead of using stdout. I included Martijns nice idea of how to keep track of changes.
When I run the server now, I realize that I really need threading? It appears that the script waits until it is finished before it proceeds to the while loop.
Should I better implement threading in the server or in the module class?
def do_GET(self):
global key
''' Present frontpage with user authentication. '''
if self.headers.getheader('Authorization') == None:
self.do_AUTHHEAD()
self.wfile.write('no auth header received')
pass
elif self.headers.getheader('Authorization') == 'Basic '+key:
if None != re.search('/api/v1/check/*', self.path):
recordID = self.path.split('/')[-1]
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.send_header('Access-Control-Allow-Origin', '*')
self.send_header('Access-Control-Allow-Methods', 'GET,POST,PUT,OPTIONS')
self.send_header("Access-Control-Allow-Headers", "X-Requested-With, Content-Type, Authorization")
self.end_headers()
self.moduleABC.startCrawl()
while True:
if self.moduleABC.done:
print "done"
break
output = self.moduleABC.statusMessages
self.wfile.write(output[sent:])
sent = len(output)
else:
self.send_response(403)
self.send_header('Content-Type', 'application/json')
self.end_headers()
Update 2 (working)
This is my updated GET method. The class object of the third party module is instatiated in the GET method. The module's main method is run in a thread. I use Martijns ideas to monitor progress.
It took me a while to figure out that it is necesarry to append some extra bytes to the status text that is sent to the browser to force a buffer flush!
Thanks for your help with this.
def do_GET(self):
global key
abcd = abcdModule(u"abcd")
''' Present frontpage with user authentication. '''
if self.headers.getheader('Authorization') == None:
self.do_AUTHHEAD()
self.wfile.write('no auth header received')
pass
elif self.headers.getheader('Authorization') == 'Basic '+key:
if None != re.search('/api/v1/check/*', self.path):
recordID = self.path.split('/')[-1]
abcd.setMasterlist([urllib.unquote(recordID)])
abcd.useCaching = False
abcd.maxRecursion = 1
self.send_response(200)
self.send_header('Content-Type', 'application/json')
self.send_header('Access-Control-Allow-Origin', '*')
self.send_header('Access-Control-Allow-Methods', 'GET,POST,PUT,OPTIONS')
self.send_header("Access-Control-Allow-Headers", "X-Requested-With, Content-Type, Authorization")
self.end_headers()
thread.start_new_thread(abcd.start, ())
sent = 0
while True:
if abcd.done:
print "done"
break
output = abcd.statusMessages
if len(output) == sent + 1:
print abcd.statusMessages[-1]
self.wfile.write(json.dumps(abcd.statusMessages))
self.wfile.write("".join([" " for x in range(1,1000)]))
sent = len(output)
else:
self.send_response(403)
self.send_header('Content-Type', 'application/json')
self.end_headers()
else:
self.do_AUTHHEAD()
self.wfile.write(self.headers.getheader('Authorization'))
self.wfile.write('not authenticated')
pass
return

You really want to fix moduleXYZ to not use stdout as the only means of output. This makes the module unsuitable for use in a multithreaded server, for example; two separate threads calling moduleXYZ will lead to output being woven together in unpredictable ways.
However, there is no stream buffering going on here. You are instead capturing all stdout in a cStringIO object, and only when you see the string "finished" in the captured string do you output the result. What you should do there instead is continuously output that value, tracking how much of it you already sent out:
self.moduleXYZ.start()
sent = 0
while True:
output = stream.getvalue()
self.wfile.write(output[sent:])
sent = len(output)
if "finished" in output:
sys.stdout = stdout_
break
Better still, just connect stdout to self.wfile and have the module write directly to the response; you'll need a different method to detect if the module thread is done in this case:
old_stdout = sys.stdout
sys.stdout = self.wfile
self.moduleXYZ.start()
while True:
if self.moduleXYZ.done():
sys.stdout = old_stdout
break

How to get current URL in python web page?

I am a noob in Python. Just installed it, and spent 2 hours googleing how to get to a simple parameter sent in the URL to a Python script
Found this
Very helpful, except I cannot for anything in the world to figure out how to replace
import urlparse
url = 'http://foo.appspot.com/abc?def=ghi'
parsed = urlparse.urlparse(url)
print urlparse.parse_qs(parsed.query)['def']
With what do I replace url = 'string' to make it work?
I just want to access http://site.com/test/test.py?param=abc and see abc printed.
Final code after Alex's answer:
url = os.environ["REQUEST_URI"]
parsed = urlparse.urlparse(url)
print urlparse.parse_qs(parsed.query)['param']

If you don't have any libraries to do this for you, you can construct your current URL from the HTTP request that gets sent to your script via the browser.
The headers that interest you are Host and whatever's after the HTTP method (probably GET, in your case). Here are some more explanations (first link that seemed ok, you're free to Google some more :).
This answer shows you how to get the headers in your CGI script:
If you are running as a CGI, you can't read the HTTP header directly,
but the web server put much of that information into environment
variables for you. You can just pick it out of os.environ[].
If you're doing this as an exercise, then it's fine because you'll get to understand what's behind the scenes. If you're building anything reusable, I recommend you use libraries or a framework so you don't reinvent the wheel every time you need something.

This is how I capture in Python 3 from CGI (A) URL, (B) GET parameters and (C) POST data:
=======================================================
import sys, os, io
CAPTURE URL
myDomainSelf = os.environ.get('SERVER_NAME')
myPathSelf = os.environ.get('PATH_INFO')
myURLSelf = myDomainSelf + myPathSelf
CAPTURE GET DATA
myQuerySelf = os.environ.get('QUERY_STRING')
CAPTURE POST DATA
myTotalBytesStr=(os.environ.get('HTTP_CONTENT_LENGTH'))
if (myTotalBytesStr == None):
myJSONStr = '{"error": {"value": true, "message": "No (post) data received"}}'
else:
myTotalBytes=int(os.environ.get('HTTP_CONTENT_LENGTH'))
myPostDataRaw = io.open(sys.stdin.fileno(),"rb").read(myTotalBytes)
myPostData = myPostDataRaw.decode("utf-8")
Write RAW to FILE
mySpy = "myURLSelf: [" + str(myURLSelf) + "]\n"
mySpy = mySpy + "myQuerySelf: [" + str(myQuerySelf) + "]\n"
mySpy = mySpy + "myPostData: [" + str(myPostData) + "]\n"
You need to define your own myPath here
myFilename = "spy.txt"
myFilePath = myPath + "\" + myFilename
myFile = open(myFilePath, "w")
myFile.write(mySpy)
myFile.close()
=======================================================
Here are some other useful CGI environment vars:
AUTH_TYPE
CONTENT_LENGTH
CONTENT_TYPE
GATEWAY_INTERFACE
PATH_INFO
PATH_TRANSLATED
QUERY_STRING
REMOTE_ADDR
REMOTE_HOST
REMOTE_IDENT
REMOTE_USER
REQUEST_METHOD
SCRIPT_NAME
SERVER_NAME
SERVER_PORT
SERVER_PROTOCOL
SERVER_SOFTWARE

Fetch a file from a local url with Python requests?

I am using Python's requests library in one method of my application. The body of the method looks like this:
def handle_remote_file(url, **kwargs):
response = requests.get(url, ...)
buff = StringIO.StringIO()
buff.write(response.content)
...
return True
I'd like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:
class RemoteTest(TestCase):
def setUp(self):
self.url = 'file:///tmp/dummy.txt'
def test_handle_remote_file(self):
self.assertTrue(handle_remote_file(self.url))
When I call requests.get with a local url, I got the KeyError exception below:
requests.get('file:///tmp/dummy.txt')
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme)
76
77 # Make a fresh ConnectionPool of the desired type
78 pool_cls = pool_classes_by_scheme[scheme]
79 pool = pool_cls(host, port, **self.connection_pool_kw)
80
KeyError: 'file'
The question is how can I pass a local url to requests.get?
PS: I made up the above example. It possibly contains many errors.

As #WooParadog explained requests library doesn't know how to handle local files. Although, current version allows to define transport adapters.
Therefore you can simply define you own adapter which will be able to handle local files, e.g.:
from requests_testadapter import Resp
import os
class LocalFileAdapter(requests.adapters.HTTPAdapter):
def build_response_from_file(self, request):
file_path = request.url[7:]
with open(file_path, 'rb') as file:
buff = bytearray(os.path.getsize(file_path))
file.readinto(buff)
resp = Resp(buff)
r = self.build_response(request, resp)
return r
def send(self, request, stream=False, timeout=None,
verify=True, cert=None, proxies=None):
return self.build_response_from_file(request)
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
requests_session.get('file://<some_local_path>')
I'm using requests-testadapter module in the above example.

Here's a transport adapter I wrote which is more featureful than b1r3k's and has no additional dependencies beyond Requests itself. I haven't tested it exhaustively yet, but what I have tried seems to be bug-free.
import requests
import os, sys
if sys.version_info.major < 3:
from urllib import url2pathname
else:
from urllib.request import url2pathname
class LocalFileAdapter(requests.adapters.BaseAdapter):
"""Protocol Adapter to allow Requests to GET file:// URLs
#todo: Properly handle non-empty hostname portions.
"""
#staticmethod
def _chkpath(method, path):
"""Return an HTTP status for the given filesystem path."""
if method.lower() in ('put', 'delete'):
return 501, "Not Implemented" # TODO
elif method.lower() not in ('get', 'head'):
return 405, "Method Not Allowed"
elif os.path.isdir(path):
return 400, "Path Not A File"
elif not os.path.isfile(path):
return 404, "File Not Found"
elif not os.access(path, os.R_OK):
return 403, "Access Denied"
else:
return 200, "OK"
def send(self, req, **kwargs): # pylint: disable=unused-argument
"""Return the file specified by the given request
#type req: C{PreparedRequest}
#todo: Should I bother filling `response.headers` and processing
If-Modified-Since and friends using `os.stat`?
"""
path = os.path.normcase(os.path.normpath(url2pathname(req.path_url)))
response = requests.Response()
response.status_code, response.reason = self._chkpath(req.method, path)
if response.status_code == 200 and req.method.lower() != 'head':
try:
response.raw = open(path, 'rb')
except (OSError, IOError) as err:
response.status_code = 500
response.reason = str(err)
if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url
response.request = req
response.connection = self
return response
def close(self):
pass
(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k's.) As with the other answer, follow this with:
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
r = requests_session.get('file:///path/to/your/file')

The easiest way seems using requests-file.
https://github.com/dashea/requests-file (available through PyPI too)
"Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs."
This in combination with requests-html is pure magic :)

packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url.
pool_classes_by_scheme = {
'http': HTTPConnectionPool,
'https': HTTPSConnectionPool,
}

In a recent project, I've had the same issue. Since requests doesn't support the "file" scheme, I'll patch our code to load the content locally. First, I define a function to replace requests.get:
def local_get(self, url):
"Fetch a stream from local files."
p_url = six.moves.urllib.parse.urlparse(url)
if p_url.scheme != 'file':
raise ValueError("Expected file scheme")
filename = six.moves.urllib.request.url2pathname(p_url.path)
return open(filename, 'rb')
Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:
#mock.patch('requests.get', local_get)
def test_handle_remote_file(self):
...
This technique is somewhat brittle -- it doesn't help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn't seem to be an obvious hook point, so I went with this simpler approach.

To load a file from a local URL, e.g. an image file you can do this:
import urllib
from PIL import Image
Image.open(urllib.request.urlopen('file:///path/to/your/file.png'))

I think simple solution for this will be creating temporary http server using python and using it.
Put all your files in temporary folder eg. tempFolder
Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http.server 8000 (Note 8000 is port no.)
This will you give you a link to http server. You can access it from http://127.0.0.1:8000/
Open your desired file in browser and copy the link to your url.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python33 - Improving server security with BaseHTTPRequestHandler - python

self.path contains the request path. If I were to send a GET request and ask for the resource located at /../../../../../../../etc/passwd, I would break out of your application's current folder and be able to access any file on your filesystem (that you have permission to read).

Related

How to make CGIHTTPRequestHandler execute my PHP scripts with POST data

python HTTPServer return multipart/form-data (returning multiple binary files)

How to disable output buffering in Python simplhttpserver

How to get current URL in python web page?

Fetch a file from a local url with Python requests?

Categories

Resources