Using httplib to connect to a website in Python

Using httplib to connect to a website in Python - python

tl;dr: Used the httplib to create a connection to a site. I failed, I'd love some guidance!
I've ran into some trouble. Read about socket and httplib of python's, altough I have some problems with the syntax, it seems.
Here is it:
connection = httplib.HTTPConnection('www.site.org', 80, timeout=10, 1.2.3.4)
The syntax is this:
httplib.HTTPConnection(host[, port[, strict[, timeout[, source_address]]]])
How does "source_address" behave? Can I make requests with any IP from it?
Wouldn't I need an User-Agent for it?
Also, how do I check if the connect is successful?
if connection:
print "Connection Successful."
(As far as I know, HTTP doesn't need a "are you alive" ping every one second, as long as both client & server are okay, when a request is made, it'll be processed. So I can't constantly ping.)

Creating the object does not actually connect to the website:
HTTPConnection.connect():
Connect to the server specified when the object was created.
source_address seems to be sent to the server with any request, but it doesn't
seem to have any effect. I'm not sure why you'd need to use a User-Agent for it.
Either way, it is an optional parameter.
You don't seem to be able to check if a connection was made, either, which
is strange.
Assuming what you want to do is get the contents of the website root, you can use this:
from httplib import HTTPConnection
conn = HTTPConnection("www.site.org", 80, timeout=10)
conn.connect()
conn.request("GET", "http://www.site.org/")
resp = conn.getresponse()
data = resp.read()
print(data)
(slammed together from the HTTPConnection documentation)
Honestly though, you should not be using httplib, but instead urllib2 or another HTTP library that is less... low-level.

Related

Python Contacting Web Failing to Respond

I've been working through a LinkedIn Learning course trying to learn some Python, but I've run into a problem that's stopped my progress. I'm trying to work with JSONs and pull data from a website, but I keep getting an error saying that "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host failed to respond".
I'm using VSCode and have tried on both my work's network (which is heavily restricted, though not to this webpage for browsing) and on my home network. Is there some sort of network permission that would be stopping access? I experienced the same issue when trying to complete an API training course that used the OpenNotify API.
This is the code I'm trying to use.
import urllib.request
def main():
webUrl = urllib.request.urlopen("https://www.google.com")
print("result code: " + str(webUrl.getcode()))
if __name__ == "__main__":
main()

As ping works, but telnet to port 80 does not, the HTTP port 80 is closed on your machine. I assume that your browser's HTTP connection goes through a proxy (as browsing works, how else would you read stackoverflow?). You need to add some code to your python program, that handles the proxy.
You can take a look at here for more details info.
But why don't you try requests library, it is pretty much straightforward and easy to use also.
Heres some example:
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
'{"type":"User"...'
>>> r.json()
{'private_gists': 419, 'total_private_repos': 77, ...}
You can just start using it by doing pip install requests and here's the documentation.

Python simple HTTPS forwarder

Below is the simple script I'm using to redirect regular HTTP requests on port 8080, it redirects(causes them to be at least) them depending on the source IP address right away.
It works (for HTTP), however I would like to have the same behavior for HTTPS requests coming over 443 port. Assume that if the redirection was not present, incoming clients to this simple server would be able to handshake with the target they are being redirected to via a self signed certificate.
import SimpleHTTPServer
import SocketServer
LISTEN_PORT = 8080
source = "127.0.0.1"
target = "http://target/"
class simpleHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
def do_POST(self):
clientAddressString = ''.join(str(self.clientAddress))
if source in clientAddressString:
# redirect incoming request
self.send_response(301)
new_path = '%s%s' % (target, self.path)
self.send_header('Location', new_path)
self.end_headers()
handler = SocketServer.TCPServer(("", LISTEN_PORT), simpleHandler)
handler.serve_forever()
I can use a self signed certificate and have access to files "server.crt" and "server.key" that are normally used for this connection(without the middle redirecting python server). I am not sure what happens when I put a redirection in between like this, although I assume it has to be part of the hand-shaking chain.
How can I achieve this behavior?
Is there anything I should modify apart from the new target and the response code within request headers?

I will split my answer into Networking and Python parts.
On the Networking side, you cannot redirect at the SSL layer - hence you need a full HTTPs server, and redirect the GET/POST request once the SSL handshake is complete. The response code, and the actual do_POST or do_GET implementation would be exactly the same for both HTTP and HTTPs.
As a side note, don't you get any issues with redirecting POSTs? When you do a 301 on POST, the browser will not resend the POST data to your new target, so something is likely to break at the application level.
On the Python side, you can augment an HTTP server to an HTTPs one by wrapping the socket:
import BaseHTTPServer, SimpleHTTPServer
import ssl
handler = BaseHTTPServer.HTTPServer(("", LISTEN_PORT), simpleHandler)
handler.socket = ssl.wrap_socket (handler.socket, certfile='path/to/combined/PKCS12/container', server_side=True)
handler.serve_forever()
Hope this helps.

Python HTTP HEAD request keepalive

Using Python httplib or httpclient, what code do I need to use in my HTTP client to:
use an HTTP HEAD request and
contact a web server by just specifying only its IP address and
contact a web server without specifying any webpage (or homepage) on the request
to extend its HTTP connection using Keepalive messages?
I used the following code example but it has two problems:
It does not extend the http connection using Keepalive,
It gives me an error message "500 Domain Not Found" if I use the IP address instead of the domain name.
import http.client
Connection = http.client.HTTPConnection("www.python.org")
Connection.request("HEAD", "")
response = Connection.getresponse()
print(response.status, response.reason)

requests allows to:
send requests with HEAD method:
import requests
resp = requests.head("http://www.python.org")
use sessions for auto Keep-alive: info
s = requests.Session()
resp = s.head("http://www.python.org")
resp2 = s.get("http://www.python.org/")
Regarding using the IP address instead of domain, that has nothing to do with your request. Most sites use some kind of virtual hosts, so they don't respond to IP address only to specific domain names. If you ask for the IP address you may get a 500 error or a message error.

How to NOT use a proxy with Python Mechanize

I am currently using Python + Mechanize for retrieving pages from a local server. As you can see the code uses "localhost" as a proxy. The proxy is an instance of the Fiddler2 debug proxy. This works exactly as expected. This indicates that my machine can reach the test_box.
import time
import mechanize
url = r'http://test_box.test_domain.com:8000/helloWorldTest.html'
browser = mechanize.Browser();
browser.set_proxies({"http": "127.0.0.1:8888"})
browser.add_password(url, "test", "test1234")
start_timer = time.time()
resp = browser.open(url)
resp.read()
latency = time.time() - start_timer
However when I remove the browser.set_proxies statement it stops to work. I get an error <"urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>". The point is that I can access the test_box from my machine with any browser. This also indicates that test_box can be reached from my machine.
My suspicion is that this has something to do with Mechanize trying to guess the proper proxy settings. That is: my Browsers are configured to go to a web proxy for any domain but test_domain.com. So I suspect that mechanize tries to use the web proxy while it should actually not use the proxy.
How can I tell mechanize to NOT guess any proxy settings and instead force it to try to connect directly to the test_box?

Argh, found it out myself. The docstring says:
"To avoid all use of proxies, pass an empty proxies dict."
This fixed the issue.

how to handle non http request in pywsgi

I have a web server using gevent.pywsgi.WSGIServer (http://www.gevent.org/gevent.pywsgi.html)and I need to handle a non-http request as well as normal http requests.
Server:
web_server = gevent.pywsgi.WSGIServer(('', 8080), web_server);
web_server.serve_forever();
Handler:
def viewer_command_server(env, start_response):
if env['REQUEST_METHOD'].upper() == "PUT":
path = env["PATH_INFO"]
start_response("200 OK", [("Content-Type", "text/html"), ("Cache-Control", "no-cache"), ("Connection","keep-alive")])
return [ ""]
This handles normal PUT requests, but I would like also server the crossdomain.xml file used by a flash application. But the problem is I get this when the flash application tries to retrieve its crossdomain.xml file.
"socket fileno=13 sock=66.228.55.170:9090 peer=96.54.202.251:63380: Invalid HTTP method: '<policy-file-request/>\x00'
96.54.202.251 - - [2012-05-21 22:58:53] "<policy-file-request/>" 400 0 2.940527
"
Is there any way to handle this request as well?
Adobe recommends running a separate tcp server on port 843 to serve this file.
I would like to keep everything on port 8080.

The protocol spoken on port 843 is not HTTP. See http://www.adobe.com/devnet/flashplayer/articles/socket_policy_files.html.
A valid HTTP request looks like
GET /path HTTP/1.0
(See e.g. http://www.jmarshall.com/easy/http/#sample for more examples.)
If there's a way to tell the Flash Player client to look for the policy file on some port other than 843, then maybe there's a way to tell it to use HTTP instead of this custom XML-ish "" message, and then and only then could you handle this from your HTTP server.
Anything is possible but I don't think it sounds like a good idea at all to handle non-HTTP requests as part of your WSGI server on the same port 8080 that it uses for HTTP.

I managed to peel this one back a bit further today. Buried in the adobe documentation is a note that if you are using a raw socket then fit will go looking for your cross domain file using their raw XML query. It does appear to work if you specify 'http' and it does go and get the cross domain file via http. The problem for me was that I was using a raw tcp socket in my flash script. So it went off to try to get the cross domain file from that server.
So to keep things simple I will change the network calls to use http. That is what they are doing anyway (I was using a sample I found that does streaming using http multipart response)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using httplib to connect to a website in Python - python

Related

Python Contacting Web Failing to Respond

Python simple HTTPS forwarder

Python HTTP HEAD request keepalive

How to NOT use a proxy with Python Mechanize

how to handle non http request in pywsgi

Categories

Resources