Python Contacting Web Failing to Respond - python

I've been working through a LinkedIn Learning course trying to learn some Python, but I've run into a problem that's stopped my progress. I'm trying to work with JSONs and pull data from a website, but I keep getting an error saying that "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host failed to respond".
I'm using VSCode and have tried on both my work's network (which is heavily restricted, though not to this webpage for browsing) and on my home network. Is there some sort of network permission that would be stopping access? I experienced the same issue when trying to complete an API training course that used the OpenNotify API.
This is the code I'm trying to use.
import urllib.request
def main():
webUrl = urllib.request.urlopen("https://www.google.com")
print("result code: " + str(webUrl.getcode()))
if __name__ == "__main__":
main()

As ping works, but telnet to port 80 does not, the HTTP port 80 is closed on your machine. I assume that your browser's HTTP connection goes through a proxy (as browsing works, how else would you read stackoverflow?). You need to add some code to your python program, that handles the proxy.
You can take a look at here for more details info.
But why don't you try requests library, it is pretty much straightforward and easy to use also.
Heres some example:
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
'{"type":"User"...'
>>> r.json()
{'private_gists': 419, 'total_private_repos': 77, ...}
You can just start using it by doing pip install requests and here's the documentation.

Related

How to access a website using SOCKS proxy in Python

I'm attending an online Python course for beginners. The content of a unit is to teach students to extract all links in the source code of a webpage. The code is as follows, with Block_of_Code unknown:
def get_page(url):
<Block_of_Code>
def get_next_target(page):
start_link=page.find('<a href=')
if start_link==-1:
return None,0
start_quote=page.find('"',start_link)
end_quote=page.find('"',start_quote+1)
url=page[start_quote+1:end_quote]
return url,end_quote
def print_all_links(page):
while True:
url,endpos=(get_next_target(page))
if url:
print(url)
page=page[endpos:]
else:
break
print_all_links(get_page('https://youtube.com'))
If I were not in China, the Block_of_Code should not have been a problem for me. As far as I know, it may have been:
import urllib.request
return urllib.request.urlopen(url).read().decode('utf-8')
But here in China, certain websites (youtube included) are blocked. So the above code doesn't apply to them.
My goal for Block_of_Code is to get the source code of any website, whether blocked or not.
I have searched on Google and found some codes using socks proxy, but none of them worked. For example, I wrote and tried the following code based on this article (having executed pip install PySocks).
import socket
import socks
import urllib.request
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 2012)
socket.socket = socks.socksocket
return urllib.request.urlopen(url).read().decode('utf-8')
The error message is:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
The reason for my searching for code using socks proxy is that I have always been using socks proxy service to visit blocked websites. By launching an app provided by my service provider, I am able to visit those websites using a web browser like Firefox. (My socks proxy port is 2012)
Nevertheless, any kind of solution is welcome, whether it is socks proxy or not, as long as it will enable me to get the source of any page.
I'm using Python 3.6.3 on Windows 10.

Python Script won't connect using Proxy Settings, Pycharm appears to be connected

I'm trying to execute a few simple Python scripts to retrieve information from the internet, unfortunately I'm behind a Corporate Proxy which makes this somewhat Tricky. So far I have installed CNTLM and configured it to work with Pycharm 1.4. I have configured both such that when I 'Check Connection' to www.google.com in pycharm using my manual proxy settings it returns 'Connection Successful'.
However when I try and simple scripts from pycharm, they all seem to timeout. Any advice? By way of code example, this will return a 503 response. Thanks!
import requests
URL = "http://google.com"
try:
response = requests.get(URL)
print response
except Exception as e:
print "Something went wrong:"
print e

Using httplib to connect to a website in Python

tl;dr: Used the httplib to create a connection to a site. I failed, I'd love some guidance!
I've ran into some trouble. Read about socket and httplib of python's, altough I have some problems with the syntax, it seems.
Here is it:
connection = httplib.HTTPConnection('www.site.org', 80, timeout=10, 1.2.3.4)
The syntax is this:
httplib.HTTPConnection(host[, port[, strict[, timeout[, source_address]]]])
How does "source_address" behave? Can I make requests with any IP from it?
Wouldn't I need an User-Agent for it?
Also, how do I check if the connect is successful?
if connection:
print "Connection Successful."
(As far as I know, HTTP doesn't need a "are you alive" ping every one second, as long as both client & server are okay, when a request is made, it'll be processed. So I can't constantly ping.)
Creating the object does not actually connect to the website:
HTTPConnection.connect():
Connect to the server specified when the object was created.
source_address seems to be sent to the server with any request, but it doesn't
seem to have any effect. I'm not sure why you'd need to use a User-Agent for it.
Either way, it is an optional parameter.
You don't seem to be able to check if a connection was made, either, which
is strange.
Assuming what you want to do is get the contents of the website root, you can use this:
from httplib import HTTPConnection
conn = HTTPConnection("www.site.org", 80, timeout=10)
conn.connect()
conn.request("GET", "http://www.site.org/")
resp = conn.getresponse()
data = resp.read()
print(data)
(slammed together from the HTTPConnection documentation)
Honestly though, you should not be using httplib, but instead urllib2 or another HTTP library that is less... low-level.

How to NOT use a proxy with Python Mechanize

I am currently using Python + Mechanize for retrieving pages from a local server. As you can see the code uses "localhost" as a proxy. The proxy is an instance of the Fiddler2 debug proxy. This works exactly as expected. This indicates that my machine can reach the test_box.
import time
import mechanize
url = r'http://test_box.test_domain.com:8000/helloWorldTest.html'
browser = mechanize.Browser();
browser.set_proxies({"http": "127.0.0.1:8888"})
browser.add_password(url, "test", "test1234")
start_timer = time.time()
resp = browser.open(url)
resp.read()
latency = time.time() - start_timer
However when I remove the browser.set_proxies statement it stops to work. I get an error <"urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>". The point is that I can access the test_box from my machine with any browser. This also indicates that test_box can be reached from my machine.
My suspicion is that this has something to do with Mechanize trying to guess the proper proxy settings. That is: my Browsers are configured to go to a web proxy for any domain but test_domain.com. So I suspect that mechanize tries to use the web proxy while it should actually not use the proxy.
How can I tell mechanize to NOT guess any proxy settings and instead force it to try to connect directly to the test_box?
Argh, found it out myself. The docstring says:
"To avoid all use of proxies, pass an empty proxies dict."
This fixed the issue.

Logging into a website which uses Microsoft ForeFront "Thread Management Gateway"

I want to use python to log into a website which uses Microsoft Forefront, and retrieve the content of an internal webpage for processing.
I am not new to python but I have not used any URL libraries.
I checked the following posts:
How can I log into a website using python?
How can I login to a website with Python?
How to use Python to login to a webpage and retrieve cookies for later usage?
Logging in to websites with python
I have also tried a couple of modules such as requests. Still I am unable to understand how this should be done, Is it enough to enter username/password? Or should I somehow use the cookies to authenticate? Any sample code would really be appreciated.
This is the code I have so far:
import requests
NAME = 'XXX'
PASSWORD = 'XXX'
URL = 'https://intra.xxx.se/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=3'
def main():
# Start a session so we can have persistant cookies
session = requests.session()
# This is the form data that the page sends when logging in
login_data = {
'username': NAME,
'password': PASSWORD,
'SubmitCreds': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://intra.xxx.se/?t=1-2')
print r
main()
but the above code results in the following exception, on the session.post-line:
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='intra.xxx.se', port=443): Max retries exceeded with url: /CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=3 (Caused by <class 'socket.error'>: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)
UPDATE:
I noticed that I was providing wrong username/password.
Once that was updated I get a HTTP-200 response with the above code, but when I try to access any internal site I get a HTTP 401 response. Why Is this happening? What is wrong with the above code? Should I be using the cookies somehow?
TMG can be notoriously fussy about what types of connections it blocks. The next step is to find out why TMG is blocking your connection attempts.
If you have access to the TMG server, log in to it, start the TMG management user-interface (I can't remember what it is called) and have a look at the logs for failed requests coming from your IP address. Hopefully it should tell you why the connection was denied.
It seems you are attempting to connect to it over an intranet. One way I've seen it block connections is if it receives them from an address it considers to be on its 'internal' network. (TMG has two network interfaces as it is intended to be used between two networks: an internal network, whose resources it protects from threats, and an external network, where threats may come from.) If it receives on its external network interface a request that appears to have come from the internal network, it assumes the IP address has been spoofed and blocks the connection. However, I can't be sure that this is the case as I don't know what this TMG server's internal network is set up as nor whether your machine's IP address is on this internal network.

Categories

Resources