I am trying to use Tor with python and urllib2 and am stuck. The following
print opener.open('http://check.torproject.org/').read()
And
telnet 127.0.0.1 9051
gives me the following error:
514 Authentication Required.
Here is the code I want to use: But I receive the same 514 Authentication Error on the urllib2.urlopen call.
import urllib2
# using TOR !
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:9051"} )
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
# every urlopen connection will then use the TOR proxy like this one :
urllib2.urlopen('http://www.google.com').read()
Any suggestions on why this is occurring?
The Tor Vidalia browser -> settings -> Advanced: Authentication set to 'Randomly Generate'
I am Using Python 2.65 urllib2 Tor
Google search suggests (and Tor manual confirms) that 9051 is Tor's default control port. Actual proxy is running on port 9050 by default, which is the one you need to use. However, Vidalia does not use the default ports without extra configuration.
The other problem is that urllib2 is by default unable to work with SOCKS proxies. For possible solutions see these two questions.
Related
I'm attending an online Python course for beginners. The content of a unit is to teach students to extract all links in the source code of a webpage. The code is as follows, with Block_of_Code unknown:
def get_page(url):
<Block_of_Code>
def get_next_target(page):
start_link=page.find('<a href=')
if start_link==-1:
return None,0
start_quote=page.find('"',start_link)
end_quote=page.find('"',start_quote+1)
url=page[start_quote+1:end_quote]
return url,end_quote
def print_all_links(page):
while True:
url,endpos=(get_next_target(page))
if url:
print(url)
page=page[endpos:]
else:
break
print_all_links(get_page('https://youtube.com'))
If I were not in China, the Block_of_Code should not have been a problem for me. As far as I know, it may have been:
import urllib.request
return urllib.request.urlopen(url).read().decode('utf-8')
But here in China, certain websites (youtube included) are blocked. So the above code doesn't apply to them.
My goal for Block_of_Code is to get the source code of any website, whether blocked or not.
I have searched on Google and found some codes using socks proxy, but none of them worked. For example, I wrote and tried the following code based on this article (having executed pip install PySocks).
import socket
import socks
import urllib.request
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 2012)
socket.socket = socks.socksocket
return urllib.request.urlopen(url).read().decode('utf-8')
The error message is:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
The reason for my searching for code using socks proxy is that I have always been using socks proxy service to visit blocked websites. By launching an app provided by my service provider, I am able to visit those websites using a web browser like Firefox. (My socks proxy port is 2012)
Nevertheless, any kind of solution is welcome, whether it is socks proxy or not, as long as it will enable me to get the source of any page.
I'm using Python 3.6.3 on Windows 10.
Here is the code that i have till now
import socks
import socket
import requests
import json
socks.setdefaultproxy(proxy_type=socks.PROXY_TYPE_SOCKS5, addr="127.0.0.1", port=9050)
socket.socket = socks.socksocket
data = json.loads(requests.get("http://freegeoip.net/json/").text)
and it works fine. The problem is when i use a .onion url it shows error
Failed to establish a new connection: [Errno -2] Name or service not known
After researching a little i found that although the http request is made over tor the resolution still occours over clearnet. What is the proper way so i can also have the domain resolved over tor network to connect to .onion urls ?
Try to avoid the monkey patching if possible. If you're using modern version of requests, then you should have this functionality already.
import requests
import json
proxies = {
'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'
}
data = requests.get("http://altaddresswcxlld.onion",proxies=proxies).text
print(data)
It's important to specify the proxies using the socks5h:// scheme so that DNS resolution is handled over SOCKS so Tor can resolve the .onion address properly.
There is a more simple solution for this, but therefore you will need Kali Linux. If you have this OS, you can install tor service and kalitorify, start tor service with: sudo service tor start and start kalitorify with sudo kalitorify -t. Now your trafic will be send through tor and you can access .onion sites just as they would be normal sites.
i am using the stem example of connection to the tor network, this should connect a client to the tor network, it seems to be doing this but when i check the ip address it is incorrect and not of a tor ip, any ideas as to why this and more importantly how can i fix this issue :)
import StringIO
import socket
import urllib
import socks # SocksiPy module
import stem.process
from stem.util import term
SOCKS_PORT = 7000
# Set socks proxy and wrap the urllib module
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT)
socket.socket = socks.socksocket
# Perform DNS resolution through the socket
def getaddrinfo(*args):
return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
socket.getaddrinfo = getaddrinfo
def query(url):
"""
Uses urllib to fetch a site using SocksiPy for Tor over the SOCKS_PORT.
"""
try:
return urllib.urlopen(url).read()
except:
return "Unable to reach %s" % url
# Start an instance of Tor configured to only exit through Russia. This prints
# Tor's bootstrap information as it starts. Note that this likely will not
# work if you have another Tor instance running.
def print_bootstrap_lines(line):
if "Bootstrapped " in line:
print term.format(line, term.Color.BLUE)
print term.format("Starting Tor:\n", term.Attr.BOLD)
tor_process = stem.process.launch_tor_with_config(
config = {
'SocksPort': str(SOCKS_PORT),
'ExitNodes': '{ru}',
},
init_msg_handler = print_bootstrap_lines,
)
I get the output :
richard#Tornado:~/Documents/Masters Project$ python russiaExample.py
Starting Tor:
May 26 21:56:49.000 [notice] Bootstrapped 80%: Connecting to the Tor network.
May 26 21:56:50.000 [notice] Bootstrapped 85%: Finishing handshake with first hop.
May 26 21:56:50.000 [notice] Bootstrapped 90%: Establishing a Tor circuit.
May 26 21:56:50.000 [notice] Bootstrapped 100%: Done.
however when i visit https://check.torproject.org/ to check i am using tor, it says i am now, and my normal ip is shown,
what is causing this issue, as the output shown above seems to suggest it has established a circuit all ok to Tor, but seems as although it is not using it ?
i am on the right lines here ?
Thanks guys
You have to set up your browser to use Tor as a proxy.
If you are using Firefox:
Go to Edit, preference, advanced and choose "configure how firefox connects to the internet" settings.
In socks host enter 127.0.0.1 and under port enter 7000.
Go to whatismyip.com and you will see a new ip.
Or check tor,project to see you are using Tor successfully.
I am currently using Python + Mechanize for retrieving pages from a local server. As you can see the code uses "localhost" as a proxy. The proxy is an instance of the Fiddler2 debug proxy. This works exactly as expected. This indicates that my machine can reach the test_box.
import time
import mechanize
url = r'http://test_box.test_domain.com:8000/helloWorldTest.html'
browser = mechanize.Browser();
browser.set_proxies({"http": "127.0.0.1:8888"})
browser.add_password(url, "test", "test1234")
start_timer = time.time()
resp = browser.open(url)
resp.read()
latency = time.time() - start_timer
However when I remove the browser.set_proxies statement it stops to work. I get an error <"urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>". The point is that I can access the test_box from my machine with any browser. This also indicates that test_box can be reached from my machine.
My suspicion is that this has something to do with Mechanize trying to guess the proper proxy settings. That is: my Browsers are configured to go to a web proxy for any domain but test_domain.com. So I suspect that mechanize tries to use the web proxy while it should actually not use the proxy.
How can I tell mechanize to NOT guess any proxy settings and instead force it to try to connect directly to the test_box?
Argh, found it out myself. The docstring says:
"To avoid all use of proxies, pass an empty proxies dict."
This fixed the issue.
I'm trying to access a website with python through tor, but I'm having problems. I started my attempts with this thread and the one referenced in it: How to make urllib2 requests through Tor in Python?
First I tried the original code snippet:
import urllib2
proxy_handler = urllib2.ProxyHandler({"tcp":"http://127.0.0.1:9050"})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
then I tried the modified code posted in one of the answers, which people said worked for them. Unfortunately, the code works in that it downloads the page, but it doesn't work because my IP address is still the same:
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
print opener.open('http://www.google.com').read()
I have TOR set up in the standard configuration, per the Ubuntu and TOR sites respective documentation, and nmap shows the TOR tcp proxy running on port 9050: 9050/tcp open tor-socks However, my IP address isn't changed when I run either of the above scripts. Is python not respecting the http environment variables, or is there a code problem that I'm missing?
TOR provides a SOCKS proxy. Since urllib2 can only handle HTTP proxies, you'll have to use a SOCKS implementation.