Check the request headers with url2lib - python

I am trying to fetch a page with Python, and using the cookie jar.
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
opener.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)')]
response = opener.open('http://www.example.com/')
print response.info()
Using the above, I can get the response headers, but other than WireShark, can I see the request headers? What urllib2 was sending?

No, though the docs indicate that there isn't much added. You could setup an http server in python, send the request to it first, pull out the headers, and then check those. But if you have wireshark, it's less work to just use that.

Related

Python - How do I wait for server response using requests

I use the following code to retrieve a web page.
import requests
payload = {'name': temp} #I extract temp from another page.
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; rv:49.0) Gecko/20100101 Firefox/49.0','Accept': 'text/html, */*; q=0.01','Accept-Language': 'en-US,en;q=0.5', 'X-Requested-With': 'XMLHttpRequest' }
full_url = url.rstrip() + '/test/log?'
r = requests.get(full_url, params=payload, headers=headers, stream=True)
for line in r.iter_lines():
if line:
print line
However for some reason the http response is lacking the text inside tags.
I found out that if I send the request to Burp, intercept it and wait for 3 secs before forwarding it, then I get the complete html page containing the text inside the tags....
I still could not find the cause. Ideas?
From requests documentation:
By default, when you make a request, the body of the response is
downloaded immediately. You can override this behaviour and defer
downloading the response body until you access the Response.content
attribute with the stream parameter:
Body Content Workflow
In other words try removing stream=True in your requests.get()
or
You will have all the content when you access r.content, where r is the response.

Why isn't my Python program working? It uses HTTP Headers

EDIT: I changed the code and it still doesn't work! I used the links from the answer to do it but it didn't work!
Why does this not work? When I run it takes a long time to run and never finishes!
import urllib
import urllib2
url = 'https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction'
values = {'inUserName' : 'USER',
'inUserPass' : 'PASSWORD'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
req.add_header('Host', 'www.locationary.com')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')
req.add_header('Accept-Language', 'en-us,en;q=0.5')
req.add_header('Accept-Encoding','gzip, deflate')
req.add_header('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7')
req.add_header('Connection','keep-alive')
req.add_header('Referer','http://www.locationary.com/')
req.add_header('Cookie','site_version=REGULAR; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; locaCountry=1033; locaState=1795; locaCity=Montreal; jforumUserId=1; PMS=1; TurnOFfTips=true; Locacookie=enable; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; PMS=1; __utmb=47547066.15.10.1324693472; __utmc=47547066; JSESSIONID=DC7F5AB08264A51FBCDB836393CB16E7; PSESSIONID=28b334905ab6305f7a7fe051e83857bc280af1a9; __utmc=47547066; __utmb=47547066.15.10.1324693472; ACTION_RESULT_CODE=ACTION_RESULT_FAIL; ACTION_ERROR_TEXT=java.lang.NullPointerException')
req.add_header('Content-Type','application/x-www-form-urlencoded')
#user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
#headers = { 'User-Agent' : user_agent }
response = urllib2.urlopen(req)
page = response.read()
print page
The remote server (the one at www.locationary.com) is waiting for the content of your HTTP post request, based on the Content-Type and Content-Length headers. Since you're never actually sending said awaited data, the remote server waits — and so does read() — until you do so.
I need to know how to send the content of my http post request.
Well, you need to actually send some data in the request. See:
urllib2 - The Missing Manual
How do I send a HTTP POST value to a (PHP) page using Python?
Final, "working" version:
import urllib
import urllib2
url = 'https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction'
values = {'inUserName' : 'USER',
'inUserPass' : 'PASSWORD'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
req.add_header('Host', 'www.locationary.com')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')
req.add_header('Accept-Language', 'en-us,en;q=0.5')
req.add_header('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7')
req.add_header('Connection','keep-alive')
req.add_header('Referer','http://www.locationary.com/')
req.add_header('Cookie','site_version=REGULAR; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; locaCountry=1033; locaState=1795; locaCity=Montreal; jforumUserId=1; PMS=1; TurnOFfTips=true; Locacookie=enable; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; PMS=1; __utmb=47547066.15.10.1324693472; __utmc=47547066; JSESSIONID=DC7F5AB08264A51FBCDB836393CB16E7; PSESSIONID=28b334905ab6305f7a7fe051e83857bc280af1a9; __utmc=47547066; __utmb=47547066.15.10.1324693472; ACTION_RESULT_CODE=ACTION_RESULT_FAIL; ACTION_ERROR_TEXT=java.lang.NullPointerException')
req.add_header('Content-Type','application/x-www-form-urlencoded')
response = urllib2.urlopen(req)
page = response.read()
print page
Don't explicitly set the Content-Length header
Remove the req.add_header('Accept-Encoding','gzip, deflate') line, so that the response doesn't have to be decompressed (or — exercise left to the reader — ungzip it yourself)

python urllib2 not returning https page

When I try to post data from http to https, urllib2 does not return desired https webpage instead website asks to enable cookies.
To get first http page:
proxyHandler = urllib2.ProxyHandler({'http': "http://proxy:port" })
opener = urllib2.build_opener(proxyHandler)
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')]
urllib2.install_opener(opener)
resp = urllib2.urlopen(url)
content = resp.read()
When I extract data from above page and post data to second https page, urllib2 returns success status 200 and page asks to enable cookies.
I've checked the post data, its fine. I'm getting cookies from website but not sure whether they are being sent with next request or not as I read in python docs that urllib2 automatically handles cookies.
To get second https page:
resp = urllib2.urlopen(url, data=postData)
content = resp.read()
I also tried to set proxy handler to this as read in a reply to similar problem on stackoverflow somewhere but got same result:
proxyHandler = urllib2.ProxyHandler({'https': "http://proxy:port" })
urllib2 "handles" cookies in responses but it doesn't not automatically store them and resend them with later requests. You'll need to use the cooklib module for that.
There are some examples in the documentation that show how it works with urllib2.

How to use same cookies in multiple request in python?

I am using this code:
def req(url, postfields):
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
return opener.open(url).read()
To make a simple http get request (using tor as proxy).
Now I would like to know how to make multiple request using the same cookie.
For example:
req('http://loginpage', 'postfields')
source = req('http://pageforloggedinonly', 0)
#do stuff with source
req('http://anotherpageforloggedinonly', 'StuffFromSource')
I know that my function req doesn't support POST (yet), but I have sent postfields using httplib so I guess I can figure that by myself, but I don't understand how to use cookies, I saw some examples but they are all one request only, I want to reuse the cookie from the first login request in the succeeding requests, or saving/using the cookie from a file (like curl does), that would make everything easier.
The code I posted I only to illustrate what I am trying to achieve, I think I will use httplib(2) for the final app.
UPDATE:
cookielib.LWPCOokieJar worked fine, here's a sample I did for testing:
import urllib2, cookielib, os
def request(url, postfields, cookie):
urlopen = urllib2.urlopen
cj = cookielib.LWPCookieJar()
Request = urllib2.Request
if os.path.isfile(cookie):
cj.load(cookie)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
req = Request(url, postfields, txheaders)
handle = urlopen(req)
cj.save(cookie)
return handle.read()
print request('http://google.com', None, 'cookie.txt')
The cookielib module is what you need to do this. There's a nice tutorial with some code samples.

How can I change a user agent string programmatically?

I would like to write a program that changes my user agent string.
How can I do this in Python?
I assume you mean a user-agent string in an HTTP request? This is just an HTTP header that gets sent along with your request.
using Python's urllib2:
import urllib2
url = 'http://foo.com/'
# add a header to define a custon User-Agent
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' }
req = urllib2.Request(url, '', headers)
response = urllib2.urlopen(req).read()
In urllib, it's done like this:
import urllib
class AppURLopener(urllib.FancyURLopener):
version = "MyStrangeUserAgent"
urllib._urlopener = AppURLopener()
and then just use urllib.urlopen normally. In urllib2, use req = urllib2.Request(...) with a parameter of headers=somedict to set all the headers you want (including user agent) in the new request object req that you make, and urllib2.urlopen(req).
Other ways of sending HTTP requests have other ways of specifying headers, of course.
Using Python you can use urllib to download webpages and use the version value to change the user-agent.
There is a very good example on http://wolfprojects.altervista.org/changeua.php
Here is an example copied from that page:
>>> from urllib import FancyURLopener
>>> class MyOpener(FancyURLopener):
... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11)
Gecko/20071127 Firefox/2.0.0.11'
>>> myopener = MyOpener()
>>> page = myopener.open('http://www.google.com/search?q=python')
>>> page.read()
[…]Results <b>1</b> - <b>10</b> of about <b>81,800,000</b> for <b>python</b>[…]
urllib2 is nice because it's built in, but I tend to use mechanize when I have the choice. It extends a lot of urllib2's functionality (though much of it has been added to python in recent years). Anyhow, if it's what you're using, here's an example from their docs on how you'd change the user-agent string:
import mechanize
cookies = mechanize.CookieJar()
opener = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
opener.addheaders = [("User-agent", "Mozilla/5.0 (compatible; MyProgram/0.1)"),
("From", "responsible.person#example.com")]
Best of luck.
As mentioned in the answers above, the user-agent field in the http request header can be changed using builtin modules in python such as urllib2. At the same time, it is also important to analyze what exactly the web server sees. A recent post on User agent detection gives a sample code and output, which gives a description of what the web server sees when a programmatic request is sent.
If you want to change the user agent string you send when opening web pages, google around for a Firefox plugin. ;) For example, I found this one. Or you could write a proxy server in Python, which changes all your requests independent of the browser.
My point is, changing the string is going to be the easy part; your first question should be, where do I need to change it? If you already know that (at the browser? proxy server? on the router between you and the web servers you're hitting?), we can probably be more helpful. Or, if you're just doing this inside a script, go with any of the urllib answers. ;)
Updated for Python 3.2 (py3k):
import urllib.request
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' }
url = 'http://www.google.com'
request = urllib.request.Request(url, b'', headers)
response = urllib.request.urlopen(request).read()

Categories

Resources