I'm trying to connect to website with python requests, but not with my real IP. So, I found some proxy on the internet and wrote this code:
import requests
proksi = {
'http': 'http://5.45.64.97:3128'
}
x = requests.get('http://www.whatsmybrowser.org/', proxies = proksi)
print(x.text)
When I get output, proxy simple don't work. Site returns my real IP Address. What I did wrong? Thanks.
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Now, when the proxy server sends this request to the actual destination, it sends:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/
Related
I'd like to know what a HTTP GET request header is sent by requests.get() from the client side.
requests.get('http://localhost:9000')
The request header sent by the above python command monitored by netcat is the following. However, I don't find a way to directly monitor the HTTP GET request header sent at the client side.
GET / HTTP/1.1
Host: localhost:9000
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.18.4
import requests
import sys
req = requests.Request('GET', 'localhost:9000')
print req.headers
prepared = req.prepare()
s = requests.Session()
page = s.send(prepared)
The request header sent by the above python command monitored by netcat is the following.
GET / HTTP/1.1
Host: localhost:9000
Accept-Encoding: identity
Also, req.headers can be used to monitor the header. It is not exactly the same as the header sent as it does not contain the Accept-Encoding header. Also, the HTTP GET request header sent by this way is also different from that of the first way.
$ ./main.py
{}
Is there a method to directly monitor what HTTP GET header is sent in the first way?
Also, why the requests send by the two methods are not the same? Isn't it better to make them consistent to avoid possible confusion?
I'd like to know what a HTTP GET request header is sent by requests.get() from the client side
If I got it right, you want to view headers, that were actually sent by requests.get().
You can access them by using .request.headers attributes:
import requests
r = requests.get("http://example.com")
print(r.request.headers)
I'm getting "urllib.error.HTTPError: HTTP Error 401: Unauthorized" after trying this code:
_HTTPHandler = urllib.request.HTTPBasicAuthHandler()
_HTTPHandler.add_password(None,'http://192.168.1.205','admin','password')
opener = urllib.request.build_opener(_HTTPHandler)
opener.open('http://192.168.1.205/api/swis/resource')
I'm sure that the user/password is correct. I've tested it with Google's postman app setting a Basic Auth header and i receive the correct response.
My question is how can a see the headers that are being used by the "opener" so i can check if they are being generated correctly or not.
For manual debugging, you can set the debuglevel of your HTTPHandler:
handler = HTTPHandler(debuglevel=1)
This will produce rich output in stdout, where you can see the full request/response dump:
send: GET / HTTP/1.1
Accept-Encoding: identity
Host: www.example.com
Connection: close
reply: HTTP/1.1 200 OK
header: Date: Mon May 22 2017 12:21:31 GMT
header: Cache-Control: private
header: Content-Type: text/html; charset=ISO-8859-1
Relevant docs: https://docs.python.org/3/library/urllib.request.html
Im currently creating a python socket http server, and I'm working on my GET and POST requests. I got my GET implementation working fine, but the body element of the POST requests won't show up.
Code snippet:
self.host = ''
self.port = 8080
self.listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.listener.bind((self.host, self.port))
self.listener.listen(1)
while True:
client_connection, client_address = self.listener.accept()
request = client_connection.recv(2048)
print request
This code yields the http header after processing the post request from the webpage:
POST /test.txt HTTP/1.1
Host: localhost:8080
Content-Type: application/x-www-form-urlencoded
Origin: http://localhost:8080
Content-Length: 21
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17
Referer: http://localhost:8080/
Accept-Language: nb-no
Accept-Encoding: gzip, deflate
But there is no body, so the question is why am i not receiving the http body when i know it is sent?
Thanks!
while True:
client_connection, client_address = self.listener.accept()
request = client_connection.recv(2048)
print request
recv does not read exactly 2048 bytes but it reads up to 2048 bytes. If some data arrive recv will return with the data even if more data might follow. My guess is that in your case the client is first sending the HTTP header and then the body. If NAGLE algorithms is off at the client side (common) it is likely that your first recv will only get the header and that you would need another recv for the body. This would explain what happens in your case: you get the header but not the body since you don't do another recv.
But even that would be a too simple implementation which will go wrong sooner or later. To make it correctly you should implement the HTTP protocol correctly: first read the HTTP header which might need multiple recv if the header is large. Then you should parse the header, figure out the size of the body (Content-length header) and read the remaining bytes.
I'm working with socket operations and have coded a basic interception proxy in python. It works fine, but some hosts return 400 bad request responses.
These requests do not look malformed though. Here's one:
GET http://www.baltour.it/ HTTP/1.1
Host: www.baltour.it
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Same request, raw:
GET http://www.baltour.it/ HTTP/1.1\r\nHost: www.baltour.it\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n
The code I use to send the request is the most basic socket operation (though I don't think the problem lies there, it works fine with most hosts)
socket_client.send(request_raw)
while socket_client.recv is used to get the response (but no problems here, the response is well-formed, though its status is 400).
Any ideas?
When not talking to a proxy, you are not supposed to put the http://hostname part in the HTTP header; see section 5.1.2 of the HTTP 1.1 RFC 2616 spec:
The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case the absolute path of the URI MUST be transmitted (see section 3.2.1, abs_path) as the Request-URI, and the network location of the URI (authority) MUST be transmitted in a Host header field.
(emphasis mine); abs_path is the absolute path part of the request URI, not the full absolute URI itself.
E.g. the server expects you to send:
GET / HTTP/1.1
Host: www.baltour.it
A receiving server should be tolerant of the incorrect behaviour, however. The server seems to violate the RFC as well here too. Further on in the same section it reads:
To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.
If, for example, i send a request to google, it looks like thios in fiddler:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
and i want to make it look like this:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
a=112&b=2335
How can i do it using urllib2/1?
Thanks!!