Fiddler does not capture my script's requests - python

my code:
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://www.google.com')
print f.read()
this request does not show in Fiddler's capture, does anyone know how to configure Fiddler so that the request is captured?
EDIT: the request works, and I can see the contents. Also, if I close Fiddler, the request fails, as expected, because there is no proxy. It is just that I do not see anything in Fiddler.
EDIT2: I see traffic from a .NET test console application that I wrote. But I do not see traffic from my python script.

I got the exactly the same issue, when fiddler2 opens, even I change
proxy = urllib2.ProxyHandler({'http': 'http://asdfl.com:13212/'}) (such none existing proxy server), it still can get the page content, I guess maybe when proxy server has been setup by fiddler2, urllib2 totally ignore the ProxyHandler for some reason, still can't figure out.
I got it, check that thread in stackoverflow:
urllib2 doesn't use proxy (Fiddler2), set using ProxyHandler
In Fiddler2, go to the page Tools->Fiddler Options ...->Connections, remove the trailing semicolon from the value in the "IE should bypass Fiddler for ..." field and restart Fiddler2.
this solution solved my problem, hope can help someone if you are struggling with it.

Related

HTTPS request using python requests library

I am trying to send a https request using python requests library
my code is
full_url = ''.join(['https://', get_current_site(request).domain, '/am/reply'])
data = {'agent_type':'trigger','input':platform,'user':request.user.id}
print "hi" ### this is printing
a = requests.get(full_url,params=data,verify=False) ##the execution is stucked here even error are not appearing
print "hello" ## this code is not printed
The problem is that there is no execution after requests whole code is stucked at this point.
I tried to verify my code using python shell and it run perfectly.
Is there any way that i can debug whole my requests response that is going on real time or can someone suggest me a solution
The whole code was working fine when there was http but after switching to https whole code stopped working. I even tried to place the certificate file but also no success
It is normal. Some website only accept http and some https and some of them both. http port is 80 and https port is port 443. For example, if a website is using https which means is secure http. So they actually need extra information in header etc. Check requests api for http
http://docs.python-requests.org/en/master/user/advanced/#ssl-cert-verification

Python urllib2 trace route

I'm using Python and urllib2 to make POST requests and I have it working successfully. However, when I make several posts one after the other at times I get the error 502 proxy in use. Our company does us proxy but I'm not set up to hit the proxy since I'm working internally. Is there a way to get a trace route of how the POST request is being routed using urllib2 and Python?
Thanks
I'm not sure what you mean by "a trace route". traceroute is an IP thing, two levels below HTTP. And I doubt you want anything like that. You can find out whether there were any redirects, whether a proxy was used, etc., either by using a general-purpose sniffer or, much more simply, by just asking urllib2.
For example, let's say your code looks like this:
url = 'http://example.com'
data = urllib.urlencode({'spam': 'eggs'})
req = urllib2.Request(url, data)
resp = urllib2.urlopen(req)
respdata = resp.read()
Then req.has_proxy() will tell you whether it's going to use a proxy, resp.geturl() == url will tell you whether there was a redirect, etc. Read the docs for all the info available.
Meanwhile, if you don't want a proxy, you can either disable whatever settings urllib2 picked up that made it auto-configure the proxy (e.g., unset http_proxy before running your script), override the default handler chain to make sure there's no ProxyHandler, build an explicit OpenerDirector instead of using the default one, etc.

How do you open a URL with Python without using a browser?

I want to open a URL with Python code but I don't want to use the "webbrowser" module. I tried that already and it worked (It opened the URL in my actual default browser, which is what I DON'T want). So then I tried using urllib (urlopen) and mechanize. Both of them ran fine with my program but neither of them actually sent my request to the website!
Here is part of my code:
finalURL="http://www.locationary.com/access/proxy.jsp?ACTION_TOKEN=proxy_jsp$JspView$SaveAction&inPlaceID=" + str(newPID) + "&xxx_c_1_f_987=" + str(ZA[z])
print finalURL
print ""
br.open(finalURL)
page = urllib2.urlopen(finalURL).read()
When I go into the site, locationary.com, it doesn't show that any changes have been made! When I used "webbrowser" though, it did show changes on the website after I submitted my URL. How can I do the same thing that webbrowser does without actually opening a browser?
I think the website wants a "GET"
I'm not sure what OS you're working on, but if you use something like httpscoop (mac) or fiddler (pc) or wireshark, you should be able to watch the traffic and see what's happening. It may be that the website does a redirect (which your browser is following) or there's some other subsequent activity.
Start an HTTP sniffer, make the request using the web browser and watch the traffic. Once you've done that, try it with the python script and see if the request is being made, and what the difference is in the HTTP traffic. This should help identify where the disconnect is.
A HTTP GET doesn't need any specific code or action on the client side: It's just the base URL (http://server/) + path + optional query.
If the URL is correct, then the code above should work. Some pointers what you can try next:
Is the URL really correct? Use Firebug or a similar tool to watch the network traffic which gives you the full URL plus any header fields from the HTTP request.
Maybe the site requires you to log in, first. If so, make sure you set up cookies correctly.
Some sites require a correct "referrer" field (to protect themselves against deep linking). Add the referrer header which your browser used to the request.
The log file of the server is a great source of information to trouble shoot such problems - when you have access to it.

Does urllib2.urlopen() cache stuff?

They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right?
So I wonder it does cache stuff somewhere, right?
It doesn't.
If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching.
If the problem is caused by server-side caching, usally there's no way to force the server to give you the latest data.
For caching proxies like squid, things are different. Usually, squid adds some additional headers to the HTTP response (response().info().headers).
If you see a header field called X-Cache or X-Cache-Lookup, this means that you aren't connected to the remote server directly, but through a transparent proxy.
If you have something like: X-Cache: HIT from proxy.domain.tld, this means that the response you got is cached. The opposite is X-Cache MISS from proxy.domain.tld, which means that the response is fresh.
Very old question, but I had a similar problem which this solution did not resolve.
In my case I had to spoof the User-Agent like this:
request = urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0')
content = urllib2.build_opener().open(request)
Hope this helps anyone...
Your web server or an HTTP proxy may be caching content. You can try to disable caching by adding a Pragma: no-cache request header:
request = urllib2.Request(url)
request.add_header('Pragma', 'no-cache')
content = urllib2.build_opener().open(request)
If you make changes and test the behaviour from browser and from urllib, it is easy to make a stupid mistake.
In browser you are logged in, but in urllib.urlopen your app can redirect you always to the same login page, so if you just see the page size or the top of your common layout, you could think that your changes have no effect.
I find it hard to believe that urllib2 does not do caching, because in my case, upon restart of the program the data is refreshed. If the program is not restarted, the data appears to be cached forever. Also retrieving the same data from Firefox never returns stale data.

Python urllib.urlopen() call doesn't work with a URL that a browser accepts

If I point Firefox at http://bitbucket.org/tortoisehg/stable/wiki/Home/ReleaseNotes, I get a page of HTML. But if I try this in Python:
import urllib
site = 'http://bitbucket.org/tortoisehg/stable/wiki/Home/ReleaseNotes'
req = urllib.urlopen(site)
text = req.read()
I get the following:
500 Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
What am I doing wrong?
You are not doing anything wrong, bitbucket does some user agent detection (to detect mercurial clients for example). Just changing the user agent fixes it (if it doesn't have urllib as a substring).
You should fill an issue regarding this: http://bitbucket.org/jespern/bitbucket/issues/new/
You're doing nothing wrong, on the surface, and as the error page says you should contact the site's administrators because they're the ones with the server logs which may explain what's happening. Fortunately, bitbucket's site admins are a friendly bunch!
No doubt there is some header or combination of headers that browsers set one way, urllib sets another way, and a bug on the server gets tickled in the latter case. You may want to see exactly what headers are being sent e.g. with firebug in firefox, and reproduce those until you isolate exactly the server bug; most likely it's going to be the user agent or some "accept"-ish header that's tickling that bug.
I don't think you're doing anything wrong -- it looks like this server was just down? Your script worked fine for me ('text' contained the same data as that displayed in the browser).

Categories

Resources