Python urllib over TOR? [duplicate] - python

This question already has answers here:
How to route urllib requests through the TOR network? [duplicate]
(3 answers)
Closed 7 years ago.
Sample code:
#!/usr/bin/python
import socks
import socket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, "127.0.0.1", 9050, True)
socket.socket = socks.socksocket
print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read()
TOR is running a SOCKS proxy on port 9050 (its default). The request goes through TOR, surfacing at an IP address other than my own. However, TOR console gives the warning:
"Feb 28 22:44:26.233 [warn] Your
application (using socks4 to port 80)
is giving Tor only an IP address.
Applications that do DNS resolves
themselves may leak information.
Consider using Socks4A (e.g. via
privoxy or socat) instead. For more
information, please see
https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS."
i.e. DNS lookups aren't going through the proxy. But that's what the 4th parameter to setdefaultproxy is supposed to do, right?
From http://socksipy.sourceforge.net/readme.txt:
setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])
rdns - This is a boolean flag than
modifies the behavior regarding DNS
resolving. If it is set to True, DNS
resolving will be preformed remotely,
on the server.
Same effect with both PROXY_TYPE_SOCKS4 and PROXY_TYPE_SOCKS5 selected.
It can't be a local DNS cache (if urllib2 even supports that) because it happens when I change the URL to a domain that this computer has never visited before.

The problem is that httplib.HTTPConnection uses the socket module's create_connection helper function which does the DNS request via the usual getaddrinfo method before connecting the socket.
The solution is to make your own create_connection function and monkey-patch it into the socket module before importing urllib2, just like we do with the socket class.
import socks
import socket
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection
import urllib2
# Now you can go ahead and scrape those shady darknet .onion sites

The problem is that you are importing urllib2 before you set up the socks connection.
Try this instead:
import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, '127.0.0.1', 9050, True)
socket.socket = socks.socksocket
import urllib2
print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read()
Manual request example:
import socks
import urlparse
SOCKS_HOST = 'localhost'
SOCKS_PORT = 9050
SOCKS_TYPE = socks.PROXY_TYPE_SOCKS5
url = 'http://www.whatismyip.com/automation/n09230945.asp'
parsed = urlparse.urlparse(url)
socket = socks.socksocket()
socket.setproxy(SOCKS_TYPE, SOCKS_HOST, SOCKS_PORT)
socket.connect((parsed.netloc, 80))
socket.send('''GET %(uri)s HTTP/1.1
host: %(host)s
connection: close
''' % dict(
uri=parsed.path,
host=parsed.netloc,
))
print socket.recv(1024)
socket.close()

I've published an article with complete source code showing how to use urllib2 + SOCKS + Tor on http://blog.databigbang.com/distributed-scraping-with-multiple-tor-circuits/
Hope it solves your issues.

Related

Python requests-html with Tor

The requirement is to scrap anonymously or change ip after certain number of calls. I use the https://github.com/kennethreitz/requests-html module to parse the HTML, but i get the below error,
socks.SOCKS5Error: 0x01: General SOCKS server failure
Code
import socks
import socket
import requests_html
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, addr='127.0.0.1', port=int('9150'))
socket.socket = socks.socksocket
session = requests_html.HTMLSession()
r = session.get('http://icanhazip.com')
r.html.render(sleep=5)
print(r.html.text)
But it works perfectly fine with requests module,
import socks
import socket
import requests
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, addr='127.0.0.1', port=int('9150'))
socket.socket = socks.socksocket
print(requests.get("http://icanhazip.com").text)
Any help to solve the issue with requests-html module would be highly appreciated.
Try:
session = requests_html.HTMLSession(browser_args=["--no-sandbox","--proxy-server=127.0.0.1:9150"])
Depends on how your proxy is set up to use tor but this worked for me!

Python General SOCKS server failure when assigning socket.socket

I know similar questions have been asked several times:
General SOCKS server failure with python tor but working from tor browser
General SOCKS server failure when switching identity using stem
General SOCKS server failure while using tor proxy
I checked all related posts and googled a lot, but still got stuck.
I'm on Win10. I download Tor browser, run it and make sure it's on port 127.0.0.1:9150 with cmd netstat -aon in administrator.
Then I run the following example code in Python:
import socks
import socket
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9150)
socket.socket = socks.socksocket
The last line socket.socket = socks.socksocket gives the Error message.
socks.GeneralProxyError: Socket error: 0x01: General SOCKS server failure
It's supposed to return a socket object which is assigned to socket.socket that opens a socket. Like this example:
https://deshmukhsuraj.wordpress.com/2015/03/08/anonymous-web-scraping-using-python-and-tor/
Can anyone tell me what's wrong?
Thanks.
Update
Thanks to drew010's answer, this code will work (with Tor browser running and it's port = 9150):
import requests
proxies = {
'http': 'socks5h://127.0.0.1:9150',
'https': 'socks5h://127.0.0.1:9150'
}
url = 'http://icanhazip.com'
# request without Tor (original IP)
r = requests.get(url)
print(r.text)
# request with Tor (Tor IP)
r = requests.get(url, proxies=proxies)
print(r.text)
# Force change IP
from stem.control import Controller
from stem import Signal
with Controller.from_port(port = 9151) as controller:
controller.authenticate('mypassword')
controller.signal(Signal.NEWNYM)
# Changed Tor IP
r = requests.get(url, proxies=proxies)
print(r.text)
Note that we need to set password in torrc before.
by doing "socket.socket = socks.socksocket" you're actually replacing each future socket objects to actually be a socksocket object, which means after that you can just use regular sockets and they will go through your socks proxy.

What's the correct way to use a unix domain socket in requests framework?

Usually, doing a post request using requests framework is done by:
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)
But: How do I connect to a unix socket instead of doing a TCP connection?
On a related note, how to encode domain path in the URL?
libcurl allows application to supply own socket on which to perform request
LDAP invented own scheme ldapi where socket name is %-encoded in host field
httpie uses http+unix scheme and %-encoded path in host field
These are some examples, but is there an RFC or established best practice?
There's no need to reinvent the wheel:
https://github.com/msabramo/requests-unixsocket
URL scheme is http+unix and socket path is percent-encoded into the host field:
import requests_unixsocket
session = requests_unixsocket.Session()
# Access /path/to/page from /tmp/profilesvc.sock
r = session.get('http+unix://%2Ftmp%2Fprofilesvc.sock/path/to/page')
assert r.status_code == 200
If you are looking for a minimalistic and clean approach to this in Python 3, here's a working example that will talk to Ubuntu's snapd on a unix domain socket.
import requests
import socket
import pprint
from urllib3.connection import HTTPConnection
from urllib3.connectionpool import HTTPConnectionPool
from requests.adapters import HTTPAdapter
class SnapdConnection(HTTPConnection):
def __init__(self):
super().__init__("localhost")
def connect(self):
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.sock.connect("/run/snapd.socket")
class SnapdConnectionPool(HTTPConnectionPool):
def __init__(self):
super().__init__("localhost")
def _new_conn(self):
return SnapdConnection()
class SnapdAdapter(HTTPAdapter):
def get_connection(self, url, proxies=None):
return SnapdConnectionPool()
session = requests.Session()
session.mount("http://snapd/", SnapdAdapter())
response = session.get("http://snapd/v2/system-info")
pprint.pprint(response.json())
You can use socat to create a TCP to UNIX socket proxy, something like:
socat TCP-LISTEN:80,reuseaddr,fork UNIX-CLIENT:/tmp/foo.sock
And then send your http requests to that proxy. The server listening on UNIX socket /tmp/foo.sock still has to understand HTTP because socat does not do any message conversion.
requests has no implementation to work with unix sockets out-of-the-box.
But you can create custom adapter that will connect to unix socket, send request and read answer.
All methods you need to implement are .send() and .close(), that's easy and straightforward.
After registering the adapter in session object you can use requests machinery with UNIX transport.
Another solution by https://stackoverflow.com/users/1105249/luis-masuelli is to use the httpx package instead,
>>> import httpx
>>> # Define a transporter
>>> transport = httpx.HTTPTransport(uds="/tmp/profilesvc.sock")
>>> client = httpx.Client(transport=transport)
>>> response = client.get('http://path/to/page')
>>> assert response.status_code == 200

Stem as python tor client - stuck on loading descriptors

I'm trying to connect to tor with python stem, while trying to connect (using th emodified example) it just won't work...here's my code:
(I'm using python 3.4.1)
import socket,urllib, sys, socks, stem.process
from stem.util import term
SOCKS_PORT = 7000
# Set socks proxy and wrap the urllib module
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT) socket.socket = socks.socksocket
# Perform DNS resolution through the socket
def getaddrinfo(*args): return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
socket.getaddrinfo = getaddrinfo
def query(url): """ Uses urllib to fetch a site using SocksiPy for Tor over the SOCKS_PORT. """
try:
return urllib.urlopen(url).read() except:
return "Unable to reach %s" % url
def print_bootstrap_lines(line): if "Bootstrapped " in line:
print(term.format(line, term.Color.BLUE))
print(term.format("Starting Tor:\n", term.Attr.BOLD))
tor_process = stem.process.launch_tor_with_config( tor_cmd = "C:\Users\Nadav\Desktop\Tor Browser\Tor\\tor.exe" , config = {
'SocksPort': str(SOCKS_PORT),
'ExitNodes': '{ru}', }, init_msg_handler = print_bootstrap_lines, )
print(term.format("\nChecking our endpoint:\n", term.Attr.BOLD)) print(term.format(query("https://www.atagar.com/echo.php"), term.Color.BLUE))
tor_process.kill
The socks port can be different from the port to manipulate tor using stem / stem.control
Hopefully this helps to get things working for you:
import requesocks as requests
from stem import Signal
from stem.control import Controller
# proxies for requests
proxies = {'http': 'socks5://127.0.0.1:9150',
'https': 'socks5://127.0.0.1:9150'}
# when using the Controller
with Controller.from_port(port=9151) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
Notice that the port for sock is different from the port for the Controller. You can find the port for the controller in your torrc file (for me it was called torrc-defaults).
Looks something like this:
# Bind to this address to listen to connections from SOCKS-speaking
# applications.
SocksPort 9150
ControlPort 9151
Hope this helps!

Mechanize you different ip

I'm playing around with mechanize on a website that appears differently based on your ip.
Is there a way to change you ip in mechanize?
I've tried:
br.set_proxies({"http": '127.0.0.1:80'})
but that timesout. Is there something else I'm supposed to do to make this work?
no, I do not believe this is possible. IP address is set on outgoing packets by your network stack, outside of mechanize's control.
You can use tor with menchanize it will allowed you tu use different IP and anonymous.
import socks
import socket
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
And This code before create the browser of mechanize
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
socket.create_connection = create_connection

Categories

Resources