I have a python CGI script that receives a POST request containing a specific HTTP header. How do you read and parse the headers received? I am not using BaseHTTPRequestHandler or HTTPServer. I receive the body of the post with sys.stdin.read(). Thanks.
It is possible to get a custom request header's value in an apache CGI script with python. The solution is similar to this.
Apache's mod_cgi will set environment variables for each HTTP request header received, the variables set in this manner will all have an HTTP_ prefix, so for example x-client-version: 1.2.3 will be available as variable HTTP_X_CLIENT_VERSION.
So, to read the above custom header just call os.environ["HTTP_X_CLIENT_VERSION"].
The below script will print all HTTP_* headers and values:
#!/usr/bin/env python
import os
print "Content-Type: text/html"
print "Cache-Control: no-cache"
print
print "<html><body>"
for headername, headervalue in os.environ.iteritems():
if headername.startswith("HTTP_"):
print "<p>{0} = {1}</p>".format(headername, headervalue)
print "</html></body>"
You might want to look at the cgi module included with Python's standard library. It appears to have a cgi.parse_header(string) function that you might find to be helpful in trying to get the headers.
Related
I'm passing a URL to a python script using cgi.FieldStorage():
http://localhost/cgi-bin/test.py?file=http://localhost/test.xml
test.py just contains
#!/usr/bin/env python
import cgi
print "Access-Control-Allow-Origin: *"
print "Content-Type: text/plain; charset=x-user-defined"
print "Accept-Ranges: bytes"
print
print cgi.FieldStorage()
and the result is
FieldStorage(None, None, [MiniFieldStorage('file', 'http:/localhost/test.xml')])
Note that the URL only contains http:/localhost - how do I pass the full encoded URI so that file is the whole URI? I've tried encoding the file parameter (http%3A%2F%2Flocalhost%2ftext.xml) but this also doesn't work
The screenshot shows that the output to the webpage isn't what is expected, but that the encoded url is correct
Your CGI script works fine for me using Apache 2.4.10 and Firefox (curl also). What web server and browser are you using?
My guess is that you are using Python's CGIHTTPServer, or something based on it. This exhibits the problem that you identify. CGIHTTPServer assumes that it is being provided with a path to a CGI script so it collapses the path without regard to any query string that might be present. Collapsing the path removes duplicate forward slashes as well as relative path elements such as ...
If you are using this web server I don't see any obvious way around by changing the URL. You won't be using it in production, so perhaps look at another web server such as Apache, nginx, lighttpd etc.
The problem is with your query parameters, you should be encoding them:
>>> from urllib import urlencode
>>> urlencode({'file': 'http://localhost/test.xml', 'other': 'this/has/forward/slashes'})
'other=this%2Fhas%2Fforward%2Fslashes&file=http%3A%2F%2Flocalhost%2Ftest.xml'
I'm new to python, and i'm trying to run a simple script (On a Mac if that's important).
Now, this code, gives me Internal Server Error:
#!/usr/bin/python
print 'hi'
But this one works like a charm (Only extra 'print' command):
#!/usr/bin/python
print
print 'hi'
Any explanation? Thanks!
Update:
When I run this script from the Terminal everything is fine. But when I run it from the browser:
http://localhost/cgi-bin/test.py
I get this error (And again, only if i'm not adding the extra print command).
I use Apache server of course.
Looks like you're running your script as a CGI-script (your edit confirms that you're using CGI)
...and the initial (empty) print is required to signify the end of the headers.
Check your Apache's error log (/var/log/apache2/error.log probably) to see if it says 'Premature end of script headers' (more info here).
EDIT: a bit more explanation:
A CGI script in Apache is responsible for generating it's own HTTP response.
An HTTP response consists of a header block, an empty line, and the so-called body contents. Even though you should generate some headers, it's not mandatory to do so. However, you do need to output the empty line; Apache expects it to be there, and if it's not (or if you only output a body which can't be parsed as headers), Apache will generate an error.
That's why your first version didn't work, but your second did: adding the empty print added the required empty line that Apache was expecting.
This will also work:
#!/usr/bin/env python
print "Content-Type: text/html" # header block
print "Vary: *" # also part of the header block
print "X-Test: hello world" # this too is a header, but a made-up one
print # empty line
print "hi" # body
I can't seem to get the python module cgitb to output the stack trace in a browser. I have no problems in a shell environment. I'm running Centos 6 with python 2.6.
Here is an example simple code that I am using:
import cgitb; cgitb.enable()
print "Content-type: text/html"
print
print 1/0
I get an Internal Server error instead of the printed detailed report. I have tried different error types, different browsers, etc.
When I don't have an error, of course python works fine. It will print the error in a shell fine. The point of cgitb is to print the error instead of returning an "Internal Server Error" in the browser for most error exceptions. Basically I'm just trying to get cgitb to work in a browser environment.
Any Suggestions?
Okay, I got my problem fixed and the OP brought me to it: Even tho cgitb will output HTML by default, it will not output a header! And Apache does not like that and might give you some stupid error like:
<...blablabla>: Response header name '<!--' contains invalid characters, aborting request
It indicates, that Apache was still working its way through the headers when it already encountered some HTML. Look at what the OP prints before the error is triggered. That is a header and you need that. Including the empty line.
I will just quote the docs:
Make sure that your script is readable and executable by "others"; the Unix file mode should be 0755 octal (use chmod 0755 filename).
Make sure that the first line of the script contains #! starting in column 1 followed by the pathname of the Python interpreter, for instance:
#!/usr/local/bin/python
I just need to write a simple python CGI script to parse the contents of a POST request containing JSON. This is only test code so that I can test a client application until the actual server is ready (written by someone else).
I can read the cgi.FieldStorage() and dump the keys() but the request body containing the JSON is nowhere to be found.
I can also dump the os.environ() which provides lots of info except that I do not see a variable containing the request body.
Any input appreciated.
Chris
If you're using CGI, just read data from stdin:
import sys
data = sys.stdin.read()
notice that if you call cgi.FieldStorage() before in your code, you can't get the body data from stdin, because it just be read once.
I am looking to download a file from a http url to a local file. The file is large enough that I want to download it and save it chunks rather than read() and write() the whole file as a single giant string.
The interface of urllib.urlretrieve is essentially what I want. However, I cannot see a way to set request headers when downloading via urllib.urlretrieve, which is something I need to do.
If I use urllib2, I can set request headers via its Request object. However, I don't see an API in urllib2 to download a file directly to a path on disk like urlretrieve. It seems that instead I will have to use a loop to iterate over the returned data in chunks, writing them to a file myself and checking when we are done.
What would be the best way to build a function that works like urllib.urlretrieve but allows request headers to be passed in?
What is the harm in writing your own function using urllib2?
import os
import sys
import urllib2
def urlretrieve(urlfile, fpath):
chunk = 4096
f = open(fpath, "w")
while 1:
data = urlfile.read(chunk)
if not data:
print "done."
break
f.write(data)
print "Read %s bytes"%len(data)
and using request object to set headers
request = urllib2.Request("http://www.google.com")
request.add_header('User-agent', 'Chrome XXX')
urlretrieve(urllib2.urlopen(request), "/tmp/del.html")
If you want to use urllib and urlretrieve, subclass urllib.URLopener and use its addheader() method to adjust the headers (ie: addheader('Accept', 'sound/basic'), which I'm pulling from the docstring for urllib.addheader).
To install your URLopener for use by urllib, see the example in the urllib._urlopener section of the docs (note the underscore):
import urllib
class MyURLopener(urllib.URLopener):
pass # your override here, perhaps to __init__
urllib._urlopener = MyURLopener
However, you'll be pleased to hear wrt your comment to the question comments, reading an empty string from read() is indeed the signal to stop. This is how urlretrieve handles when to stop, for example. TCP/IP and sockets abstract the reading process, blocking waiting for additional data unless the connection on the other end is EOF and closed, in which case read()ing from connection returns an empty string. An empty string means there is no data trickling in... you don't have to worry about ordered packet re-assembly as that has all been handled for you. If that's your concern for urllib2, I think you can safely use it.