urlllib still reads over proxy when remote proxy is turned off

urlllib still reads over proxy when remote proxy is turned off - python

In a small test project, I'm trying to test a remote proxy server. We have a client that runs on the local machine that talks to the proxy server. I've setup my python code to talk over the client like so:
proxy_handler = urllib.request.ProxyHandler({'http': 'http://localhost:' + self.proxy_port})
opener = urllib.request.build_opener(proxy_handler)
urllib.request.install_opener(opener)
And then I have code that opens the url and reads N bytes at a time:
try:
video_stream = urllib.request.urlopen(self.url, timeout=10)
while something_is_true:
data = video_stream.read(bucket_size)
except URLError:
# handle errors
except Exception:
# handle errors
The problem I'm having is that I can start the test and have it run for several minutes. During that time, I can turn off the remote proxy and the code continues to read from the video_stream connection. No exceptions are thrown.
If I run the code with the remote proxy off from the start, it fails on urlopen(), which I'd expect. But I'd also expect the read() to fail if I stop the remote proxy during the run.

Related

Flask application randomly refuses connections when testing

I have an API written in Flask and am testing the endpoints with nosetests using requests to send a request to the API. During the tests, I randomly get an error
ConnectionError: HTTPConnectionPool(host='localhost', port=5555): Max retries exceeded with url: /api (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fe4e794fd50>: Failed to establish a new connection: [Errno 111] Connection refused',))
This error only seems to happen when running tests and randomly affects anywhere between none and all of the tests. All of my tests are being run from one subclass of unittests.TestCase:
class WebServerTests(unittest.TestCase):
# Args to run web server with
server_args = {'port': WEB_SERVER_PORT, 'debug': True}
# Process to run web server
server_process = multiprocessing.Process(
target=les.web_server.run_server, kwargs=server_args)
#classmethod
def setup_class(cls):
"""
Set up testing
"""
# Start server
cls.server_process.start()
#classmethod
def teardown_class(cls):
"""
Clean up after testing
"""
# Kill server
cls.server_process.terminate()
cls.server_process.join()
def test_api_info(self):
"""
Tests /api route that gives information about API
"""
# Test to make sure the web service returns the expected output, which at
# the moment is just the version of the API
url = get_endpoint_url('api')
response = requests.get(url)
assert response.status_code == 200, 'Status Code: {:d}'.format(
response.status_code)
assert response.json() == {
'version': module.__version__}, 'Response: {:s}'.format(response.json())
Everything is happening on localhost and the server is listening on 127.0.0.1. My guess would be that too many requests are being sent to the server and some are being refused, but I'm not seeing anything like that in the debug logs. I had also thought that it may be an issue with the server process not being up before the requests were being made, but the issue persists with a sleep after starting the server process. Another attempt was to let requests attempt retrying the connection by setting requests.adapters.DEFAULT_RETRIES. That didn't work either.
I've tried running the tests on two machines both normally and in docker containers and the issue seems to occur regardless of the platform on which they are run.
Any ideas of what may be causing this and what could be done to fix it?

It turns out that my problem was indeed an issue with the server not having enough time to start up, so the tests would be running before it could respond to tests. I thought I had tried to fix this with a sleep, but had accidentally placed it after creating the process instead of after starting the process. In the end, changing
cls.server_process.start()
to
cls.server_process.start()
time.sleep(1)
fixed the issue.

Python Script won't connect using Proxy Settings, Pycharm appears to be connected

I'm trying to execute a few simple Python scripts to retrieve information from the internet, unfortunately I'm behind a Corporate Proxy which makes this somewhat Tricky. So far I have installed CNTLM and configured it to work with Pycharm 1.4. I have configured both such that when I 'Check Connection' to www.google.com in pycharm using my manual proxy settings it returns 'Connection Successful'.
However when I try and simple scripts from pycharm, they all seem to timeout. Any advice? By way of code example, this will return a 503 response. Thanks!
import requests
URL = "http://google.com"
try:
response = requests.get(URL)
print response
except Exception as e:
print "Something went wrong:"
print e

printing URL parameters of a HTTP request using Python + Nginx + uWSGI

I have used this link and successfully run a python script using uWSGI. Although i just followed the doc line by line.
I have a GPS device which is sending data to a remote server. Document of the same device say that it connect to server using TCP which therefore would be http as simple device like a GPS device would not be able to do https (i hope i am right here.) Now as i have configure my Nginx server to forward all incoming HTTP request to python script for processing via uWSGI.
What i want to do is to simply print the url or query string on the HTML page. As i don't have control on the device side (i can only configure device to send data on a IP + Port), i have no clue how data is coming. Below is my access log
[23/Jan/2016:01:50:32 +0530] "(009591810720BP05000009591810720160122A1254.6449N07738.5244E000.0202007129.7200000000L00000008)" 400 172 "-" "-" "-"
Now i have look at this link on how to get the url parameters values, but i don't have a clue that what is the parameter here.
I tried to modified my wsgi.py file as
import requests
r = requests.get("http://localhost.com/")
# or r = requests.get("http://localhost.com/?") as i am directly routing incoming http request to python script and incoming HTTP request might not have #any parameter, just data #
text1 = r.status_code
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
return ["<h1 style='color:blue'>Hello There shailendra! %s </h1>" %(text1)]
but when i restarted nginx, i get internal server error. Can some one help me to understand wrong i am doing here (literally i have no clue about the parameters of the application function. Tried to read this link, but what i get from here is that environ argument take care of many CGI environment variables.)
Can some one please help me to figure out what wrong am i doing and guide me to even a doc or resource.
Thanks.

why are you using localhost ".com" ?
Since you are running the webserver on the same machine,
you should change the line to
r = requests.get("http://localhost/")
Also move below lines from wsgi.py and put them in testServerConnection.py
import requests
r = requests.get("http://localhost/")
# or r = requests.get("http://localhost.com/?") as i am directly routing incoming http request to python script and incoming HTTP request might not have #any parameter, just data #
text1 = r.status_code
Start NGINX
and you also might have to run (i am not sure uwsgi set up on nginx)
uwsgi --socket 0.0.0.0:8080 --protocol=http -w wsgi
run testConnection.py to send a test request to localhost webserver and print the response

i got the answer for my question. Basically to process a TCP request, you need to open a socket and accept the TCP request on a specific port (as you specified on the hardware)
import SocketServer
class MyTCPHandler(SocketServer.BaseRequestHandler):
def handle(self):
# self.request is the TCP socket connected to the client
self.data = self.request.recv(1024).strip()
print "{} wrote:".format(self.client_address[0])
#data which is received
print self.data
if __name__ == "__main__":
#replace IP by your server IP
HOST, PORT = <IP of the server>, 8000
# Create the server, binding to localhost on port 9999
server = SocketServer.TCPServer((HOST, PORT), MyTCPHandler)
# Activate the server; this will keep running until you
# interrupt the program with Ctrl-C
server.serve_forever()
After you get the data, you can do any thing with the data. As my data format was shared in the GPS datasheet, i was able to parse the string and get the Lat and long out of it.

How to NOT use a proxy with Python Mechanize

I am currently using Python + Mechanize for retrieving pages from a local server. As you can see the code uses "localhost" as a proxy. The proxy is an instance of the Fiddler2 debug proxy. This works exactly as expected. This indicates that my machine can reach the test_box.
import time
import mechanize
url = r'http://test_box.test_domain.com:8000/helloWorldTest.html'
browser = mechanize.Browser();
browser.set_proxies({"http": "127.0.0.1:8888"})
browser.add_password(url, "test", "test1234")
start_timer = time.time()
resp = browser.open(url)
resp.read()
latency = time.time() - start_timer
However when I remove the browser.set_proxies statement it stops to work. I get an error <"urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>". The point is that I can access the test_box from my machine with any browser. This also indicates that test_box can be reached from my machine.
My suspicion is that this has something to do with Mechanize trying to guess the proper proxy settings. That is: my Browsers are configured to go to a web proxy for any domain but test_domain.com. So I suspect that mechanize tries to use the web proxy while it should actually not use the proxy.
How can I tell mechanize to NOT guess any proxy settings and instead force it to try to connect directly to the test_box?

Argh, found it out myself. The docstring says:
"To avoid all use of proxies, pass an empty proxies dict."
This fixed the issue.

Logging into a website which uses Microsoft ForeFront "Thread Management Gateway"

I want to use python to log into a website which uses Microsoft Forefront, and retrieve the content of an internal webpage for processing.
I am not new to python but I have not used any URL libraries.
I checked the following posts:
How can I log into a website using python?
How can I login to a website with Python?
How to use Python to login to a webpage and retrieve cookies for later usage?
Logging in to websites with python
I have also tried a couple of modules such as requests. Still I am unable to understand how this should be done, Is it enough to enter username/password? Or should I somehow use the cookies to authenticate? Any sample code would really be appreciated.
This is the code I have so far:
import requests
NAME = 'XXX'
PASSWORD = 'XXX'
URL = 'https://intra.xxx.se/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=3'
def main():
# Start a session so we can have persistant cookies
session = requests.session()
# This is the form data that the page sends when logging in
login_data = {
'username': NAME,
'password': PASSWORD,
'SubmitCreds': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://intra.xxx.se/?t=1-2')
print r
main()
but the above code results in the following exception, on the session.post-line:
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='intra.xxx.se', port=443): Max retries exceeded with url: /CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=3 (Caused by <class 'socket.error'>: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)
UPDATE:
I noticed that I was providing wrong username/password.
Once that was updated I get a HTTP-200 response with the above code, but when I try to access any internal site I get a HTTP 401 response. Why Is this happening? What is wrong with the above code? Should I be using the cookies somehow?

TMG can be notoriously fussy about what types of connections it blocks. The next step is to find out why TMG is blocking your connection attempts.
If you have access to the TMG server, log in to it, start the TMG management user-interface (I can't remember what it is called) and have a look at the logs for failed requests coming from your IP address. Hopefully it should tell you why the connection was denied.
It seems you are attempting to connect to it over an intranet. One way I've seen it block connections is if it receives them from an address it considers to be on its 'internal' network. (TMG has two network interfaces as it is intended to be used between two networks: an internal network, whose resources it protects from threats, and an external network, where threats may come from.) If it receives on its external network interface a request that appears to have come from the internal network, it assumes the IP address has been spoofed and blocks the connection. However, I can't be sure that this is the case as I don't know what this TMG server's internal network is set up as nor whether your machine's IP address is on this internal network.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

urlllib still reads over proxy when remote proxy is turned off - python

Related

Flask application randomly refuses connections when testing

Python Script won't connect using Proxy Settings, Pycharm appears to be connected

printing URL parameters of a HTTP request using Python + Nginx + uWSGI

How to NOT use a proxy with Python Mechanize

Logging into a website which uses Microsoft ForeFront "Thread Management Gateway"

Categories

Resources