debugging connection with urllib2+httplib.debuglevel sometimes not showing debug info - python

Trying to get a login script working, I kept getting the same login page returned, so I turned on debugging of the http stream (can't use wireshark or the like because of https).
I got nothing, so I copied the example, it works. Any query to google.com works, but to my target page does not show debugging, what is the difference? If it was a redirect I would expect to see the first get/redirect header, and http://google redirects as well.
import urllib
import urllib2
import pdb
h=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(h)
urllib2.install_opener(opener)
print '================================'
data = urllib2.urlopen('http://google.com').read()
print '================================'
data = urllib2.urlopen('https://google.com').read()
print '================================'
data = urllib2.urlopen('https://members.poolplayers.com/default.aspx').read()
print '================================'
data = urllib2.urlopen('https://google.com').read()
When I run I get this.
$ python ex.py
================================
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Location: http://www.google.com/
header: Content-Type: text/html; charset=UTF-8
header: Date: Sat, 02 Jul 2011 16:20:11 GMT
header: Expires: Mon, 01 Aug 2011 16:20:11 GMT
header: Cache-Control: public, max-age=2592000
header: Server: gws
header: Content-Length: 219
header: X-XSS-Protection: 1; mode=block
header: Connection: close
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sat, 02 Jul 2011 16:20:12 GMT
header: Expires: -1
header: Cache-Control: private, max-age=0
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=4ca9123c4f8b617f:FF=0:TM=1309623612:LM=1309623612:S=o3GqHRj5_3BkKFuJ; expires=Mon, 01-Jul-2013 16:20:12 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=48=eZdXW-qQQC2fRrXps3HpzkGgeWbMCnyT_taxzdvW1icXS1KSM0SSYOL7B8-OPsw0eLLAbvCW863Viv9ICDj4VAL7dmHtF-gsPfro67IFN5SP6WyHHpLL7JsS_-MOvwSD; expires=Sun, 01-Jan-2012 16:20:12 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close
================================
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sat, 02 Jul 2011 16:20:14 GMT
header: Expires: -1
header: Cache-Control: private, max-age=0
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=d613768b3704482b:FF=0:TM=1309623614:LM=1309623614:S=xLxMwBVKEG_bb1bo; expires=Mon, 01-Jul-2013 16:20:14 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=48=im_KcHyhG2LrrGgLsQjYlwI93lFZa2jZjEYBzdn-xXEyQnoGo8xkP0234fROYV5DScfY_6UbbCJFtyP_V00Ji11kjZwJzR63LfkLoTlEqiaY7FQCIky_8hA2NEqcXwJe; expires=Sun, 01-Jan-2012 16:20:14 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close
================================
================================
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sat, 02 Jul 2011 16:20:16 GMT
header: Expires: -1
header: Cache-Control: private, max-age=0
header: Content-Type: text/html; charset=ISO-8859-1
header: Set-Cookie: PREF=ID=dc2cb55e6476c555:FF=0:TM=1309623616:LM=1309623616:S=o__g-Zcpts392D9_; expires=Mon, 01-Jul-2013 16:20:16 GMT; path=/; domain=.google.com
header: Set-Cookie: NID=48=R5gy1aTMjL8pghxQmfUkJaMLc3SxmpFxu5XpoZELAsZrdf8ogQLwyo9Vbk_pRkoETvKE-beWbHHBZu3xgJDt6IsjwmSHPaMGSzxXvsWERxsbKwQMy-wlLSfasvUq5x6q; expires=Sun, 01-Jan-2012 16:20:16 GMT; path=/; domain=.google.com; HttpOnly
header: Server: gws
header: X-XSS-Protection: 1; mode=block
header: Connection: close

You'll need an HTTPSHandler:
h = urllib2.HTTPSHandler(debuglevel=1)

Related

How to return http response after modifying header content type in python

I am writing an controller endpoint in Python and I want to modify response code, content-text etc. How can I achieve it.
#http.route('/get/data/',methods=['GET','POST'],type="http",auth='public',csrf=False)
def fetchSms(self,**kwargs):
mydata = {"date":"2018-10-13T00:46:17.25Z"}
return simplejson.dumps(mydata)
I have to return mydata to client after setting content-type="application/json" and response response code should be 200
Current Output:
HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Mon, 15 Oct 2018 16:57:49 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 192
Connection: keep-alive
Set-Cookie: session_id=43edf5hfgh436sdgdga9d7618deb74af7; Expires=Sun, 13-Jan-2019 16:57:49 GMT; Max-Age=7776000; Path=/

Trying to avoid varnish cache

I'm trying to avoid the varnish cache from client side. With nginx 1.6.1 it works with adding a random url parameter (see X-XHR-Current-Location) so it doesnt get the "X-Cache":"HIT".
Server: nginx/1.6.1
Date: Fri, 04 Sep 2015 13:13:02 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 20762
Status: 200 OK
X-XHR-Current-Location: /shop.json?1441372381.857126?1441372381.854355
X-UA-Compatible: IE=Edge,chrome=1
ETag: "de6da75aa7b7d6ce34bd736ccf991f36"
X-Request-Id: a39fc3b8d44687039a18499dd22a2c7d
X-Runtime: 0.371739
X-Rack-Cache: miss
Accept-Ranges: bytes
X-Varnish: 534989417
Age: 0
Via: 1.1 varnish
Cache-Control: private, max-age=0, must-revalidate
Pragma: no-cache
X-Cache: MISS
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Connection: close
but as soons as I hit with the request a nginx/1.8.0 Server the URL gets somehow striped (see X-XHR-Current-Location) and the random parameter gets removed. Also the "X-Cache" gets triggered and returns a "HIT".
Server: nginx/1.8.0
Date: Fri, 04 Sep 2015 13:13:14 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 3555
Status: 200 OK
X-XHR-Current-Location: /shop/301316.json
X-UA-Compatible: IE=Edge,chrome=1
ETag: "2e88dffe16a385872368e19e0370a999"
X-Request-Id: 3404c637c6a499d8e32a6e5c243e4d69
X-Runtime: 0.065267
X-Rack-Cache: miss
Accept-Ranges: bytes
X-Varnish: 561085217 561069463
Age: 823
Via: 1.1 varnish
Cache-Control: private, max-age=0, must-revalidate
Pragma: no-cache
X-Cache: HIT
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Connection: close
I guess thats also the reason I get old results sometimes. Is there any way I can avoid the "HIT" or also pretend to be a new URL for the nginx/1.8.0 servers?
thanks in advance!

http client in python fails to recive

I'm trying write an HTTP client in python using the sockets library and can't get the receive part working.
Here is my code:
import socket, sys
class httpBase:
def __init__(self, host, port=80):
self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.s.connect((host, port))
def send(self, msg):
self.s.sendall(msg)
def recive(self):
data = ''
while 1:
Tdata = self.s.recv(128)
print("||" + data + "|")
data += Tdata
if data.decode() == '': break
return data
http = httpBase('www.google.com')
http.send('GET / HTTP/1.1\r\n\r\n'.encode())
print(http.recive())
The problem is what I get in response with out the print inside of the recive function I get nothing back and the code just waits and I have to force stop it.
Here is the response from google:
|||
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8h3zii3-ibcyo8zdcKg8WmJjbYYr_hCX4NWWvMTCw1dVwTHKtJbo1M6ay977MwX5hswJ6XeadRFIpd5Pe4La2HBRF; expires=Fri, 12-Sep-2014 16:40:26 GMT;|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8h3zii3-ibcyo8zdcKg8WmJjbYYr_hCX4NWWvMTCw1dVwTHKtJbo1M6ay977MwX5hswJ6XeadRFIpd5Pe4La2HBRF; expires=Fri, 12-Sep-2014 16:40:26 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8h3zii3-ibcyo8zdcKg8WmJjbYYr_hCX4NWWvMTCw1dVwTHKtJbo1M6ay977MwX5hswJ6XeadRFIpd5Pe4La2HBRF; expires=Fri, 12-Sep-2014 16:40:26 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Thu, 13 Mar 2014 16:40:26 GMT
Server: gws
Content-Length: 277
X-XSS-Protection:|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8h3zii3-ibcyo8zdcKg8WmJjbYYr_hCX4NWWvMTCw1dVwTHKtJbo1M6ay977MwX5hswJ6XeadRFIpd5Pe4La2HBRF; expires=Fri, 12-Sep-2014 16:40:26 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Thu, 13 Mar 2014 16:40:26 GMT
Server: gws
Content-Length: 277
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
<HTML><HEAD><meta http-equiv="content-type" content=|
||HTTP/1.1 302 Found
Location: http://www.google.co.il/?gfe_rd=ctrl&ei=et8hU-qsFaXY8ge6moCYAg&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=502cb60127440cb1:FF=0:TM=1394728826:LM=1394728826:S=gXXQi28MXZy3d-U7; expires=Sat, 12-Mar-2016 16:40:26 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=pnIbo1mi1JNuqB9sxTHn41_sdPg6Za-1nQLp_Wk8h3zii3-ibcyo8zdcKg8WmJjbYYr_hCX4NWWvMTCw1dVwTHKtJbo1M6ay977MwX5hswJ6XeadRFIpd5Pe4La2HBRF; expires=Fri, 12-Sep-2014 16:40:26 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Thu, 13 Mar 2014 16:40:26 GMT
Server: gws
Content-Length: 277
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.g|
This looks like the same problem as in Python Recv() stalling, e.g. using HTTP/1.1 and wondering why the server does not close after the request. See there for details.

urllib2.urlopen failing while urllib.urlopen working on same URL

I am trying to use urllib and urllib2 to scrape some data from a particular website.
Now the urllib was primarily for reading and processing the data while the code section with urllib2 was mainly for reading and storing the data.
The external site experienced some changes and while the urllib code section kept working the urllib2 section simply keeled over.
So I did some checks and noticed the urllib2.urlopen(URL) always returned a blank String while the urllib.urlopen(URL) always worked OK.
I dug deeper and enable debug logging on both urllib and urllib modules:
>>> response2 =urllib2.urlopen('http://www.xxxxxxxxltd.com/web/guest/attendancelist')
send: 'GET /web/guest/attendancelist HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.xxxxxxxxltd.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 302 Moved Temporarily\r\n'
header: Server: nginx/0.7.67
header: Date: Thu, 28 Nov 2013 19:12:28 GMT
header: Transfer-Encoding: chunked
header: Connection: close
header: Location: http://www.xxxxxxxxplc.com/web/guest/attendancelist
send: 'GET /web/guest/attendancelist HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.xxxxxxxxplc.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Server: Apache-Coyote/1.1
header: Location: /home/new/attendancelist.jsp
header: Content-Length: 0
header: Date: Thu, 28 Nov 2013 19:12:26 GMT
header: Connection: close
send: 'GET /home/new/attendancelist.jsp HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.xxxxxxxxplc.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.6\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache-Coyote/1.1
header: Set-Cookie: JSESSIONID=F02B1F76CCCF6F41BE48951F6E1A6205; Path=/home
header: Content-Type: text/html;charset=utf-8
header: Content-Length: 0
header: Date: Thu, 28 Nov 2013 19:12:26 GMT
header: Connection: close
And....
>>> html3=urllib.urlopen('http://www.xxxxxxxxltd.com/web/guest/attendancelist')
send: 'GET /web/guest/attendancelist HTTP/1.0\r\nHost: www.xxxxxxxxltd.com\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
reply: 'HTTP/1.1 302 Moved Temporarily\r\n'
header: Server: nginx/0.7.67
header: Date: Thu, 28 Nov 2013 19:10:36 GMT
header: Connection: close
header: Location: http://www.xxxxxxxxplc.com/web/guest/attendancelist
send: 'GET /web/guest/attendancelist HTTP/1.0\r\nHost: www.xxxxxxxxplc.com\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Server: Apache-Coyote/1.1
header: Location: /home/new/attendancelist.jsp
header: Content-Length: 0
header: Date: Thu, 28 Nov 2013 19:10:34 GMT
header: Connection: close
send: 'GET /home/new/attendancelist.jsp HTTP/1.0\r\nHost: www.xxxxxxxxplc.com\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Apache-Coyote/1.1
header: Set-Cookie: JSESSIONID=8CFB903B80C42CA3DA37EDF90D84FF99; Path=/home
header: Content-Type: text/html;charset=utf-8
header: Date: Thu, 28 Nov 2013 19:10:35 GMT
header: Connection: close
As can be identified, the urllib2 connection flow has significantly more Connection headers ( one of which is the Connection header which has its value as Close).
Can anyone assist in finding why the urllib2 fails to retrieve the data while urllib module works well.
I am certain that it has something to do with the Connection headers but I want some sort of confirmation and thinking process explanation.
Thanks.
I would suggest debugging using curl to replicate the headers the two versions of urllib are using. With a bit of trial and error you should be able to find the header causing the problem and go from there.

urllib2 python (Transfer-Encoding: chunked)

I used the following python code to download the html page:
response = urllib2.urlopen(current_URL)
msg = response.read()
print msg
For a page such as this one, it opens the url without error but then prints only part of the html-page!
In the following lines you can find the http headers of the html-page. I think the problem is due to "Transfer-Encoding: chunked".
It seems urllib2 returns only the first chunk! I have difficulties reading the remaining chunks. How I can read the remaining chunks?
Server: nginx/1.0.5
Date: Wed, 27 Feb 2013 14:41:28 GMT
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Connection: close
Set-Cookie: route=c65b16937621878dd49065d7d58047b2; Path=/
Set-Cookie: JSESSIONID=EE18E813EE464664EA64086D5AE9A290.tpdjo13v_3; Path=/
Pragma: No-cache
Cache-Control: no-cache,no-store,max-age=0
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Vary: Accept-Encoding
Content-Language: fr
I've found out that if I Accept-Language header is specified than server doesn't drop TCP connection, otherwise it does.
curl -H "Accept-Language:uk,en-US;q=0.8,en;q=0.6,ru;q=0.4" -v 'http://www.legifrance.gouv.fr/affichJuriJudi.do?oldAction=rechJuriJudi&idTexte=JURITEXT000024053954&fastReqId=660326373&fastPos=1'

Categories

Resources