I'm trying to run python on the web to do some CSS/JSS extraction from websites. I'm using mod_wsgi as my interface for python. I've been following this website to get an idea on getting started.
The below is their sample code.
#! /usr/bin/env python
# Our tutorial's WSGI server
from wsgiref.simple_server import make_server
def application(environ, start_response):
# Sorting and stringifying the environment key, value pairs
response_body = ['%s: %s' % (key, value)
for key, value in sorted(environ.items())]
response_body = '\n'.join(response_body)
status = '200 OK'
response_headers = [('Content-Type', 'text/plain'),
('Content-Length', str(len(response_body)))]
start_response(status, response_headers)
return [response_body]
# Instantiate the WSGI server.
# It will receive the request, pass it to the application
# and send the application's response to the client
httpd = make_server(
'localhost', # The host name.
8051, # A port number where to wait for the request.
application # Our application object name, in this case a function.
)
# Wait for a single request, serve it and quit.
httpd.handle_request()
While this runs fine with python 2.7, I can't get it to run on Python 3. For my CSS/JSS extraction, I have modified the above code and put in my own functionalities which use BeautifulSoup and urllib3. While for using those modules I need python 3, for the WSGI code I need python 2.7. Hence, I can't merge the two. When trying to run BS and urllib in python3, I get an error. But when I try to run the WSGI code with python3, I'm just unable to load the webpage.
Any help would be greatly appreciated! Any workarounds, or suggestions as well.
Related
Need to run python scrips with CGI options from a local python server
At the moment on my Apache I use CGI to get all the get and post requests anyone does to my python scripts to tell them to do things.
For example if I do a get request to 127.0.0.1:8080?filename=yomomma
My python script should print 'yomomma'
#!/usr/bin/python3
import cgi, os
form = cgi.FieldStorage()
fileitem = form['filename']
print(fileitem)
Heres the server im running in python (I have no idea what im doing apparently)
from http.server import *
from urllib import parse
import os
import cgi
class GetHandler(CGIHTTPRequestHandler):
def do_GET(self):
form = cgi.FieldStorage()
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write('<meta http-equiv="refresh" content=1; URL=http://127.0.0.1:8080" /><pre>'.encode('utf-8'))
self.wfile.write(str(os.popen('python my_site.py').read()).encode('utf-8'))
if __name__ == '__main__':
from http.server import HTTPServer
server = HTTPServer(('localhost', 8080), GetHandler)
print('Starting server, use <Ctrl-C> to stop')
server.serve_forever()
Id like to be able to point this at any python file and get that python file to read the CGI parameters
It's unclear what you're asking to do here.
If you just want to run CGI scripts on HTTPServer, that's very simple. CGIHTTPRequestHandler is not meant to be subclassed, which is what you have done, and you don't need to rewrite the do_X functons. It simply returns the output of a CGI script like any other server, if it's under a cgi_directories folder. Read this.
So in the main function, you would have:
server = HTTPServer(('localhost', 8080), CGIHTTPRequestHandler)
Then just call a CGI script normally:
127.0.0.1:8080/cgi-bin/myscript.cgi?filename=yomomma
If you want to utilise http.server to handle requests, you need to use BaseHTTPRequestHandler, not CGIHTTPRequestHandler. Getting form data is then slightly different but easy. Read the section of this under "HTTP POST"
I have just made a server (only for localhost) in Python to execute by CGI to execute and try my Python scripts. This is the code of the file that executes the server:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import BaseHTTPServer
import CGIHTTPServer
import cgitb
cgitb.enable() ## This line enables CGI error reporting
server = BaseHTTPServer.HTTPServer
handler = CGIHTTPServer.CGIHTTPRequestHandler
server_address = ("", 8000)
handler.cgi_directories = ["/"]
httpd = server(server_address, handler)
httpd.serve_forever()
When I access some script in the server (http://127.0.0.1:8000/index.py) there is no problem, but when I access the server (http://127.0.0.1:8000/) it shows:
Error response
Error code 403.
Message: CGI script is not executable ('//').
Error code explanation: 403 = Request forbidden -- authorization will not help.
It's like if index files aren't set as default file to access when access a folder instead of a specific file...
I would like to be able to access to http://127.0.0.1/index.py when I access http://127.0.0.1/. Thanks for everything.
Python's built-in HTTP server is extremely basic, so it doesn't include such feature. However you can implement it yourself by subclassing CGIHTTPRequestHandler, probably replacing the is_cgi function.
If you use handler.cgi_directories = ["/cgi"], you can place an index.html file in "/". And of course, if you want a cgi script index.py as default, you can use the index.html for forwarding...
I did try to modify is_cgi function and it's working!
def is_cgi(self):
collapsed_path = _url_collapse_path(self.path)
if collapsed_path == '//':
self.path = '/index.py'
collapsed_path = _url_collapse_path(self.path)
dir_sep = collapsed_path.find('/', 1)
head, tail = collapsed_path[:dir_sep], collapsed_path[dir_sep + 1:]
if head in self.cgi_directories:
self.cgi_info = head, tail
return True
return False
I put this method into this following class:
class CGIHandlerOverloadIndex(CGIHTTPServer.CGIHTTPRequestHandler):
I am using CGIHTTPServer.py for creating simple CGI server. I want my CGI script to take care of response code if some operation goes wrong . How can I do that?
Code snippet from my CGI script.
if authmxn.authenticate():
stats = Stats()
print "Content-Type: application/json"
print 'Status: 200 OK'
print
print json.dumps(stats.getStats())
else:
print 'Content-Type: application/json'
print 'Status: 403 Forbidden'
print
print json.dumps({'msg': 'request is not authenticated'})
Some of the snippet from request handler,
def run_cgi(self):
'''
rest of code
'''
if not os.path.exists(scriptfile):
self.send_error(404, "No such CGI script (%s)" % `scriptname`)
return
if not os.path.isfile(scriptfile):
self.send_error(403, "CGI script is not a plain file (%s)" %
`scriptname`)
return
ispy = self.is_python(scriptname)
if not ispy:
if not (self.have_fork or self.have_popen2):
self.send_error(403, "CGI script is not a Python script (%s)" %
`scriptname`)
return
if not self.is_executable(scriptfile):
self.send_error(403, "CGI script is not executable (%s)" %
`scriptname`)
return
if not self.have_fork:
# Since we're setting the env in the parent, provide empty
# values to override previously set values
for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
'HTTP_USER_AGENT', 'HTTP_COOKIE'):
env.setdefault(k, "")
self.send_response(200, "Script output follows") # overrides the headers
decoded_query = query.replace('+', ' ')
It is possible to implement support for a Status: code message header that overrides the HTTP status line (first line of HTTP response, e.g. HTTP/1.0 200 OK). This requires:
sub-classing CGIHTTPRequestHandler in order to trick it into writing the CGI script's output into a StringIO object instead of directly to a socket.
Then, once the CGI script is complete, update the HTTP status line with the value provided in the Status: header.
It's a hack, but it's not too bad and no standard library code needs to be patched.
import BaseHTTPServer
import SimpleHTTPServer
from CGIHTTPServer import CGIHTTPRequestHandler
from cStringIO import StringIO
class BufferedCGIHTTPRequestHandler(CGIHTTPRequestHandler):
def setup(self):
"""
Arrange for CGI response to be buffered in a StringIO rather than
sent directly to the client.
"""
CGIHTTPRequestHandler.setup(self)
self.original_wfile = self.wfile
self.wfile = StringIO()
# prevent use of os.dup(self.wfile...) forces use of subprocess instead
self.have_fork = False
def run_cgi(self):
"""
Post-process CGI script response before sending to client.
Override HTTP status line with value of "Status:" header, if set.
"""
CGIHTTPRequestHandler.run_cgi(self)
self.wfile.seek(0)
headers = []
for line in self.wfile:
headers.append(line) # includes new line character
if line.strip() == '': # blank line signals end of headers
body = self.wfile.read()
break
elif line.startswith('Status:'):
# Use status header to override premature HTTP status line.
# Header format is: "Status: code message"
status = line.split(':')[1].strip()
headers[0] = '%s %s' % (self.protocol_version, status)
self.original_wfile.write(''.join(headers))
self.original_wfile.write(body)
def test(HandlerClass = BufferedCGIHTTPRequestHandler,
ServerClass = BaseHTTPServer.HTTPServer):
SimpleHTTPServer.test(HandlerClass, ServerClass)
if __name__ == '__main__':
test()
Needless to say, this is probably not the best way to go and you should look at a non-CGIHTTPServer solution, e.g. a micro-framework such as bottle, a proper web-server (from memory, CGIHTTPServer is not recommended for production use), fastcgi, or WSGI - just to name a few options.
With the standard library HTTP server you cannot do this. From the library documentation:
Note CGI scripts run by the CGIHTTPRequestHandler class cannot execute redirects (HTTP code 302), because code 200 (script output follows) is sent prior to execution of the CGI script. This pre-empts the status code.
This means that the server does not support the Status: <status-code> <reason> header from the script. You correctly identified the portion in the code that shows this support does not exist: The server sends status code 200 before even running the script. There is no way you can change this from within the script.
There are several tickets related to this in the Python bugtracker, some with patches, see e.g., issue13893. So one option for you would be to patch the standard library to add this feature.
However, I would strongly suggest you switch to another technology instead of CGI (or run a real web server).
I have used mod_wsgi to create a web server that can be called locally. Now I just found out I need to change it so it runs through the Apache server. I'm hoping to do this without rewriting my whole script.
from wsgiref.simple_server import make_server
class FileUploadApp(object):
firstcult = ""
def __init__(self, root):
self.root = root
def __call__(self, environ, start_response):
if environ['REQUEST_METHOD'] == 'POST':
post = cgi.FieldStorage(
fp=environ['wsgi.input'],
environ=environ,
keep_blank_values=True
)
body = u"""
<html><body>
<head><title>title</title></head>
<h3>text</h3>
<form enctype="multipart/form-data" action="http://localhost:8088" method="post">
</body></html>
"""
return self.__bodyreturn(environ, start_response,body)
def __bodyreturn(self, environ, start_response,body):
start_response(
'200 OK',
[
('Content-type', 'text/html; charset=utf8'),
('Content-Length', str(len(body))),
]
)
return [body.encode('utf8')]
def main():
PORT = 8080
print "port:", PORT
ROOT = "/home/user/"
httpd = make_server('', PORT, FileUploadApp(ROOT))
print "Serving HTTP on port %s..."%(PORT)
httpd.serve_forever() # Respond to requests until process is killed
if __name__ == "__main__":
main()
I am hoping to find a way to make it possible to avoid making the server and making it possible to run multiple instances of my script.
The documentation at:
http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines
explains what mod_wsgi is expecting to be given.
If you also read:
http://blog.dscpl.com.au/2011/01/implementing-wsgi-application-objects.html
you will learn about the various ways that WSGI application entry points can be constructed.
From that you should identify that FileUploadApp fits one of the described ways of defining a WSGI application and thus you only need satisfy the requirement that mod_wsgi has of the WSGI application object being accessible as 'application'.
I have a Python script that I'd like to be run from the browser, it seem mod_wsgi is the way to go but the method feels too heavy-weight and would require modifications to the script for the output. I guess I'd like a php approach ideally. The scripts doesn't take any input and will only be accessible on an internal network.
I'm running apache on Linux with mod_wsgi already set up, what are the options here?
I would go the micro-framework approach just in case your requirements change - and you never know, it may end up being an app rather just a basic dump... Perhaps the simplest (and old fashioned way!?) is using CGI:
Duplicate your script and include print 'Content-Type: text/plain\n' before any other output to sys.stdout
Put that script somewhere apache2 can access it (your cgi-bin for instance)
Make sure the script is executable
Make sure .py is added to the Apache CGI handler
But - I don't see anyway this is going to be a fantastic advantage (in the long run at least)
You could use any of python's micro frameworks to quickly run your script from a server. Most include their own lightweight servers.
From cherrypy home page documentation
import cherrypy
class HelloWorld(object):
def index(self):
# run your script here
return "Hello World!"
index.exposed = True
cherrypy.quickstart(HelloWorld())
ADditionally python provides the tools necessary to do what you want in its standard library
using HttpServer
A basic server using BaseHttpServer:
import time
import BaseHTTPServer
HOST_NAME = 'example.net' # !!!REMEMBER TO CHANGE THIS!!!
PORT_NUMBER = 80 # Maybe set this to 9000.
class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_HEAD(s):
s.send_response(200)
s.send_header("Content-type", "text/html")
s.end_headers()
def do_GET(s):
"""Respond to a GET request."""
s.send_response(200)
s.send_header("Content-type", "text/html")
s.end_headers()
s.wfile.write("<html><head><title>Title goes here.</title></head>")
s.wfile.write("<body><p>This is a test.</p>")
# If someone went to "http://something.somewhere.net/foo/bar/",
# then s.path equals "/foo/bar/".
s.wfile.write("<p>You accessed path: %s</p>" % s.path)
s.wfile.write("</body></html>")
if __name__ == '__main__':
server_class = BaseHTTPServer.HTTPServer
httpd = server_class((HOST_NAME, PORT_NUMBER), MyHandler)
print time.asctime(), "Server Starts - %s:%s" % (HOST_NAME, PORT_NUMBER)
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
httpd.server_close()
print time.asctime(), "Server Stops - %s:%s" % (HOST_NAME, PORT_NUMBER)
What's nice about the microframeworks is they abstract out writing headers and such (but should still provide you an interface to, if required)