Selenium in Python - python
I've been using urllib2 to access webpages, but it doesn't support javascript, so I took a look at Selenium, but I'm quite confused even having read its docs.
I downloaded Selenium IDE add-on for firefox and I tried some simple things.
from selenium import selenium
import unittest, time, re
class test(unittest.TestCase):
def setUp(self):
self.verificationErrors = []
self.selenium = selenium("localhost", 4444, "*chrome", "http://www.wikipedia.org/")
self.selenium.start()
def test_test(self):
sel = self.selenium
sel.open("/")
sel.type("searchInput", "pacific ocean")
sel.click("go")
sel.wait_for_page_to_load("30000")
def tearDown(self):
self.selenium.stop()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
I just access wikipedia.org and type pacific ocean in the search field, but when I try to compile it, it gives me a lot of errors.
If running the script results in a [Errno 111] Connection refused error such as this:
% test.py
E
======================================================================
ERROR: test_test (__main__.test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/unutbu/pybin/test.py", line 11, in setUp
self.selenium.start()
File "/data1/unutbu/pybin/selenium.py", line 189, in start
result = self.get_string("getNewBrowserSession", [self.browserStartCommand, self.browserURL, self.extensionJs])
File "/data1/unutbu/pybin/selenium.py", line 219, in get_string
result = self.do_command(verb, args)
File "/data1/unutbu/pybin/selenium.py", line 207, in do_command
conn.request("POST", "/selenium-server/driver/", body, headers)
File "/usr/lib/python2.6/httplib.py", line 898, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.6/httplib.py", line 935, in _send_request
self.endheaders()
File "/usr/lib/python2.6/httplib.py", line 892, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 764, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 723, in send
self.connect()
File "/usr/lib/python2.6/httplib.py", line 704, in connect
self.timeout)
File "/usr/lib/python2.6/socket.py", line 514, in create_connection
raise error, msg
error: [Errno 111] Connection refused
----------------------------------------------------------------------
Ran 1 test in 0.063s
FAILED (errors=1)
then the solution is most likely that you need get the selenium server running first.
In the download for SeleniumRC you will find a file called selenium-server.jar (as of a few months ago, that file was located at SeleniumRC/selenium-server-1.0.3/selenium-server.jar).
On Linux, you could run the selenium server in the background with the command
java -jar /path/to/selenium-server.jar 2>/dev/null 1>&2 &
You will find more complete instructions on how to set up the server here.
I would suggest you to use a webdriver, you can find it here: http://code.google.com/p/selenium/downloads/list. If you want to write tests as a coder (and not with the use of your mouse), that thing would work better then the RC version you're trying to use, at least because it would not ask you for an SeleniumRC Jar Instance. You would simply have a binary of a browser or use those ones that are already installed on your system, for example, Firefox.
I faced with this issue in my project and found that problem was in few webdriver.get calls with very small time interval between them. My fix was not to put delay, just remove unneeded calls and error disappears.
Hope, it can helps for somebody.
Related
python read and run commands from a remote text file
I have a script that is supposed to read a text file from a remote server, and then execute whatever is in the txt file. For example, if the text file has the command: "ls". The computer will run the command and list directory. Also, pls don't suggest using urllib2 or whatever. I want to stick with python3.x As soon as i run it i get this error: Traceback (most recent call last): File "test.py", line 4, in <module> data = urllib.request.urlopen(IP_Address) File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 509, in open req = Request(fullurl, data) File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 328, in __init__ self.full_url = url File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 354, in full_url self._parse() File "C:\Users\jferr\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 383, in _parse raise ValueError("unknown url type: %r" % self.full_url) ValueError: unknown url type: 'domain.com/test.txt' Here is my code: import urllib.request IP_Address = "domain.com/test.txt" data = urllib.request.urlopen(IP_Address) for line in data: print("####") os.system(line) Edit: yes i realize this is a bad idea. It is a school project, we are playing red team and we are supposed to get access to a kiosk. I figured instead of writing code that will try and get around intrusion detection and firewalls, it would just be easier to execute commands from a remote server. Thanks for the help everyone!
The error occurs because your url does not include a protocol. Include "http://" (or https if you're using ssl/tls) and it should work. As others have commented, this is a dangerous thing to do since someone could run arbitrary commands on your system this way.
Try “http://localhost/domain.com/test.txt" Or remote address If local host need to run http server
Changing Permissions on Python Modules so I Don't Need to "sudo" My Python Script Calls
Is there a way to reconfigure all my Python3 modules from which I call certain utilities (i.e., urlopen) so that I no longer need to preface my Python3 script calls with "sudo", without having to rebuild my Ubuntu VM? Example, with my script code as follows: import socks import socket from urllib.request import urlopen from time import sleep from bs4 import BeautifulSoup socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050) socket.socket = socks.socksocket url_name1 = "http://www.google.com" print("url name is : " + url_name1) print("About to open the web page") sleep(5) webpage = urlopen(url_name1) print("Web page opened successfully") sleep(5) html = webpage.read().decode("utf-8") soup = BeautifulSoup(html, "html.parser") print("HTML extracted") sleep(5) Without prefacing my command with "sudo", the output looks like this: $ python3 sample_script2.py url name is : http://www.google.com About to open the web page 1599238298 WARNING torsocks[29740]: [connect] Connection to a local address are denied since it might be a TCP DNS query to a local DNS server. Rejecting it for safety reasons. (in tsocks_connect() at connect.c:193) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/socks.py", line 832, in connect super(socksocket, self).connect(proxy_addr) PermissionError: [Errno 1] Operation not permitted During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/usr/lib/python3.8/http/client.py", line 1240, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 946, in send self.connect() File "/usr/lib/python3.8/http/client.py", line 917, in connect self.sock = self._create_connection( File "/usr/lib/python3.8/socket.py", line 808, in create_connection raise err File "/usr/lib/python3.8/socket.py", line 796, in create_connection sock.connect(sa) File "/usr/lib/python3/dist-packages/socks.py", line 100, in wrapper return function(*args, **kwargs) File "/usr/lib/python3/dist-packages/socks.py", line 844, in connect raise ProxyConnectionError(msg, error) socks.ProxyConnectionError: Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted During handling of the above exception, another exception occurred: Traceback (most recent call last): File "sample_script2.py", line 14, in <module> webpage = urlopen(url_name1) File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.8/urllib/request.py", line 525, in open response = self._open(req, data) File "/usr/lib/python3.8/urllib/request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(*args) File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open return self.do_open(http.client.HTTPConnection, req) File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open raise URLError(err) urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted> $ Adding "sudo" to the command yields the following: jbottiger#ubuntu:~/DarkWeb$ sudo python3 sample_script2.py [sudo] password for jbottiger: url name is : http://www.google.com About to open the web page Web page opened successfully HTML extracted Printing soup object text Google(function(){window.google={kEI:'uHBSX4DxFqWd5wKA1KSAAw',kEXPI:'0,202162,1151585,5662,730,224,5105,206,3204,10,1226,364,1499,612,91,114,383,246,5,1354,648,3451,315,3,66,308,676,90,41,153,864,117,44,407,415,205,138,511,258,1119056,1197771,329496,13677,4855,32691,15248,861,28690,9188,8384,1326,3532,1362,9290,3028,4735,5,2649,8384,1808,4998,7933,5295,2054,920,873,4192,6430,7432,7095,4517,2778,919,2277,8,2796,1593,1279,2212,532,147,1103,842,515,1139,1,278,104,4258,312,1137,2,2063,606,2023,1733,43,521,1947,2229,93,328,1284,16,2927,2247,1819,1780,3227,2845,7,2903,2696,469,6286,4455,641,602,1847,3685,1742,4929,108,1456,1951,908,2,941,715,1899,2397,2650,4820,1704,473,1098,3,346,230,1835,4,4620,149,189,3313,743,1745,2220,32,4072,1661,4,498,1030,2304,1236,271,874,405,1860,2393,1791,52,2377,464,459,1201,354,4067,153,882,1316,3,610,1498,1172,1426,69,644,1,1388,386,196,2811,935,818,690,1542,1639,533,2,425,862,1019,189,56,264,198,25,887,564,464,217,8,431,30,130,340,832,2287,181,223,1314,23,1102,655,990,52,535,1239,1257,254,1209,35,591,379,850,437,2,16,6,86,197,22,689,6,632,146,411,108,1,958,360,115,2,93,200,1189,157,1938,792,80,4,26,500,37,891,820,765,286,63,299,60,696,86,1,353,290,52,56,3,403,11,89,685,78,1,217,513,92,383,617,363,1393,5765060,8800593,1323,549,333,444,1,2,80,1,900,896,1,9,2,2551,1,748,141,795,10,553,1,4265,1,1,2,1017,9,305,3299,248,283,527,32,1,10,2,3,1,6,1,14,9,1,2,2,4,4,12,6,10,8,2,35,12,2,1,23959867,53,2704777',kBL:'QdLX'};google.sn='webhp';google.kHL='ru';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var c;a&&(!a.getAttribute||!(c=a.getAttribute("eid")));)a=a.parentNode;return c||google.kEI};google.getLEI=function(a){for(var c=null;a&&(!a.getAttribute||!(c=a.getAttribute("leid")));)a=a.parentNode;return c};google.ml=function(){return null};google.time=function(){return Date.now()};google.log=function(a,c,b,d,g){if(b=google.logUrl(a,c,b,d,g)){a=new Image;var e=google.lc,f=google.li;e[f]=a;a.onerror=a.onload=a.onabort=function(){delete e[f]};google.vel&&google.vel.lu&&google.vel.lu(b);a.src=b;google.li=f+1}};google.logUrl=function(a,c,b,d,g){var e="",f=google.ls||"";b||-1!=c.search("&ei=")||(e="&ei="+google.getEI(d),-1==c.search("&lei=")&&(d=google.getLEI(d))&&(e+="&lei="+d));d="";!b&&google.cshid&&-1==c.search("&cshid=")&&"slh"!=a&&(d="&cshid="+google.cshid);b=b||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+c+e+f+"&zx="+google.time()+d;/^http:/i.test(b)&&"https:"==window.location.protocol&&(google.ml(Error("a"),!1,{src:b,glmm:1}),b="");return b};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};(function(){ document.documentElement.addEventListener("submit",function(b){var a;if(a=b.target){var c=a.getAttribute("data-submitfalse");a="1"==c||"q"==c&&!a.elements.q.value?!0:!1}else a=!1;a&&(b.preventDefault(),b.stopPropagation())},!0);document.documentElement.addEventListener("click",function(b){var a;a:{for(a=b.target;a&&a!=document.documentElement;a=a.parentElement)if("A"==a.tagName){a="1"==a.getAttribute("data-nohref");break a}a=!1}a&&b.preventDefault()},!0);}).call(this); var a=window.location,b=a.href.indexOf("#");if(0<=b){var c=a.href.substring(b+1);/(^|&)q=/.test(c)&&-1==c.indexOf("#")&&a.replace("/search?"+c.replace(/(^|&)fp=[^&]*/g,"")+"&cad=h")};#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}#media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important} body,td,a,p,.h{font-family:arial,sans-serif}body{margin:0;overflow-y:scroll}#gog{padding:3px 8px 0}td{line-height:.8em}.gac_m td{line-height:17px}form{margin-bottom:20px}.h{color:#36c}.q{color:#00c}em{font-weight:bold;font-style:normal}.lst{height:25px;width:496px}.gsfi,.lst{font:18px arial,sans-serif}.gsfs{font:17px arial,sans-serif}.ds{display:inline-box;display:inline-block;margin:3px 0 4px;margin-left:4px}input{font-family:inherit}body{background:#fff;color:#000}a{color:#11c;text-decoration:none}a:hover,a:active{text-decoration:underline}.fl a{color:#36c}a:visited{color:#551a8b}.sblc{padding-top:5px}.sblc a{display:block;margin:2px 0;margin-left:13px;font-size:11px}.lsbb{background:#eee;border:solid 1px;border-color:#ccc #999 #999 #ccc;height:30px}.lsbb{display:block}#fll a{display:inline-block;margin:0 12px}.lsb{background:url(/images/nav_logo229.png) 0 -261px repeat-x;border:none;color:#000;cursor:pointer;height:30px;margin:0;outline:0;font:15px arial,sans-serif;vertical-align:top}.lsb:active{background:#ccc}.lst:focus{outline:none}.tiah{width:458px}(function(){var src='/images/nav_logo229.png';var iesg=false;document.body.onload = function(){window.n && window.n();if (document.images){new Image().src=src;} if (!iesg){document.f&&document.f.q.focus();document.gbqf&&document.gbqf.q.focus();} } })();Поиск Картинки Карты Play YouTube Новости Почта Диск Ещё »История веб-поиска | Настройки | Войти (function(){var id='tsuid1';document.getElementById(id).onclick = function(){var s = document.createElement('script');s.src = this.getAttribute('data-script-url');(document.getElementById('xjsc')||document.body).appendChild(s);};})();(function(){var id='tsuid2';document.getElementById(id).onclick = function(){if (this.form.q.value){this.checked = 1;if (this.form.iflsig)this.form.iflsig.disabled = false;} else top.location='/doodles/';};})();Расширенный поиск(function(){var a,b="1";if(document&&document.getElementById)if("undefined"!=typeof XMLHttpRequest)b="2";else if("undefined"!=typeof ActiveXObject){var c,d,e=["MSXML2.XMLHTTP.6.0","MSXML2.XMLHTTP.3.0","MSXML2.XMLHTTP","Microsoft.XMLHTTP"];for(c=0;d=e[c++];)try{new ActiveXObject(d),b="2"}catch(h){}}a=b;if("2"==a&&-1==location.search.indexOf("&gbv=2")){var f=google.gbvu,g=document.getElementById("gbv");g&&(g.value=a);f&&window.setTimeout(function(){location.href=f},0)};}).call(this);Рекламные программыРешения для бизнесаВсё о GoogleGoogle.ru© 2020 - Конфиденциальность - Условия(function(){window.google.cdo={height:0,width:0};(function(){var a=window.innerWidth,b=window.innerHeight;if(!a||!b){var c=window.document,d="CSS1Compat"==c.compatMode?c.documentElement:c.body;a=d.clientWidth;b=d.clientHeight}a&&b&&(a!=google.cdo.width||b!=google.cdo.height)&&google.log("","","/client_204?&atyp=i&biw="+a+"&bih="+b+"&ei="+google.kEI);}).call(this);})();(function(){var u='/xjs/_/js/k\x3dxjs.hp.en.6FZeP6lo3MI.O/m\x3dsb_he,d/am\x3dAJ5gcw/d\x3d1/rs\x3dACT90oG6N5VH73PFnXBwBd2MrAZnJY6t4Q'; setTimeout(function(){var b=document;var a="SCRIPT";"application/xhtml+xml"===b.contentType&&(a=a.toLowerCase());a=b.createElement(a);a.src=u;google.timers&&google.timers.load&&google.tick&&google.tick("load","xjsls");document.body.appendChild(a)},0);})();(function(){window.google.xjsu='/xjs/_/js/k\x3dxjs.hp.en.6FZeP6lo3MI.O/m\x3dsb_he,d/am\x3dAJ5gcw/d\x3d1/rs\x3dACT90oG6N5VH73PFnXBwBd2MrAZnJY6t4Q';})();function _DumpException(e){throw e;} function _F_installCss(c){} (function(){google.jl={dw:false,em:[],emw:false,lls:'default',pdt:0,snet:true,uwp:true};})();(function(){var pmc='{\x22d\x22:{},\x22sb_he\x22:{\x22agen\x22:true,\x22cgen\x22:true,\x22client\x22:\x22heirloom-hp\x22,\x22dh\x22:true,\x22dhqt\x22:true,\x22ds\x22:\x22\x22,\x22ffql\x22:\x22ru\x22,\x22fl\x22:true,\x22host\x22:\x22google.com\x22,\x22isbh\x22:28,\x22jsonp\x22:true,\x22msgs\x22:{\x22cibl\x22:\x22Удалить поисковый запрос\x22,\x22dym\x22:\x22Возможно, вы имели в виду:\x22,\x22lcky\x22:\x22Мне повезёт!\x22,\x22lml\x22:\x22Подробнее...\x22,\x22oskt\x22:\x22Экранная клавиатура\x22,\x22psrc\x22:\x22Этот запрос был удален из вашей \\u003Ca href\x3d\\\x22/history\\\x22\\u003Eистории веб-поиска\\u003C/a\\u003E\x22,\x22psrl\x22:\x22Удалить\x22,\x22sbit\x22:\x22Поиск по картинке\x22,\x22srch\x22:\x22Поиск в Google\x22},\x22ovr\x22:{},\x22pq\x22:\x22\x22,\x22refpd\x22:true,\x22rfs\x22:[],\x22sbpl\x22:16,\x22sbpr\x22:16,\x22scd\x22:10,\x22stok\x22:\x22WKTHIsN6ufJvVLrcm5Yf_IkFoE0\x22,\x22uhde\x22:false}}';google.pmc=JSON.parse(pmc);})(); jbottiger#ubuntu:~/DarkWeb$
I posed this question to my professor who recommended that I preface my python3 command with "torsocks" after enabling torsocks on my Ubuntu VM (must have torsocks installed and configured prior to running the script). After that, remove the following two statements from the script. socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050) socket.socket = socks.socksocket Now when I enter: "torsocks python3 <script_name>.py", I do not receive these errors anymore, including when trying to open a dark-web page. According to my professor, Dr. Terrence O'Connor, PhD (Florida Institute of Technology), both my original approach of specifying a proxy (i.e., tor) in my script and using torsocks to tunnel traffic of a specific command (i.e., "python3" in my case) are viable methods of connecting to the ToR network via the proxy service om my Ubuntu VM. It appears as if the second method recommended by Dr. O'Connor worked better than the first one.
When initially creating a chromedriver in python, http.client.BadStatusLine: '' is thrown
When creating a new chromedriver instance (in python): webdriver.Chrome("./venv/selenium/webdriver/chromedriver"), I get an error http.client.BadStatusLine: ''. I am not navigating to a site, or using a server, just creating a new chromedriver. I am in a VirtualEnv that has the most recent version of Selenium (3.0.1) and chromedriver (2.24.1). This was working fine a few days ago, and I didn't change any code. I'm not really sure where to begin solving the code. My first step was to run pip install --upgrade -r requirements.txt to make sure all packages were up to date. My only idea now is that selenium is no handling the default start page, with url as data;,, because there is no response. However, as that is the default behavior, I would be surprised if selenium could not handle its' own default behavior. Any help would be much appreciated! When the code is run (via python from the bash terminal), a new chromedriver instance is successfully created, but the error http.client.BadStatusLine: '' gets thrown, and the python terminal loses the connection to the chromedriver. Full code: import pythonscripts # Creates a new webdriver driver = pythonscripts.md() # Never gets here, attempts to use driver get NameError: name 'driver' is not defined Pythonscripts md method: def md(): return webdriver.Chrome("./venv/selenium/webdriver/chromedriver") Full error output: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/brydenr/server_scripts/cad_tests/pythonscripts.py", line 65, in md return webdriver.Chrome("./venv/selenium/webdriver/chromedriver") File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__ desired_capabilities=desired_capabilities) File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__ self.start_session(desired_capabilities, browser_profile) File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session response = self.execute(Command.NEW_SESSION, capabilities) File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute response = self.command_executor.execute(driver_command, params) File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 407, in execute return self._request(command_info[0], url, body=data) File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 439, in _request resp = self._conn.getresponse() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1171, in getresponse response.begin() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin version, status, reason = self._read_status() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status raise BadStatusLine(line) http.client.BadStatusLine: '' Tried doing try: webdriver.Chrome("./venv/selenium/webdriver/chromedriver") except Exception: webdriver.Chrome("./venv/selenium/webdriver/chromedriver") The result is two of the same traceback as before, and two chromedriver instances. It seems like this question points to an error in urllib, but it is for a slightly different situation.
This happened to me after I updated chrome to the latest version. I just updated chromedriver to 2.25 and it works again.
Python script suddenly throwing timeout exceptions
I have a Python script that downloads product feeds from multiple affiliates in different ways. This didn't give me any problems until last Wednesday, when it started throwing all kinds of timeout exceptions from different locations. Examples: Here I connect with a FTP service: ftp = FTP(host=self.host) threw: Exception in thread Thread-7: Traceback (most recent call last): File "C:\Python27\Lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Python27\Lib\threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\LDLC.py", line 23, in main ftp = FTP(host=self.host) File "C:\Python27\Lib\ftplib.py", line 120, in __init__ self.connect(host) File "C:\Python27\Lib\ftplib.py", line 138, in connect self.welcome = self.getresp() File "C:\Python27\Lib\ftplib.py", line 215, in getresp resp = self.getmultiline() File "C:\Python27\Lib\ftplib.py", line 201, in getmultiline line = self.getline() File "C:\Python27\Lib\ftplib.py", line 186, in getline line = self.file.readline(self.maxline + 1) File "C:\Python27\Lib\socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) timeout: timed out Or downloading an XML File : xmlFile = urllib.URLopener() xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType) xmlFile.close() throws: File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\FeedCrawler.py", line 106, in save xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType) File "C:\Python27\Lib\urllib.py", line 240, in retrieve fp = self.open(url, data) File "C:\Python27\Lib\urllib.py", line 208, in open return getattr(self, name)(url) File "C:\Python27\Lib\urllib.py", line 346, in open_http errcode, errmsg, headers = h.getreply() File "C:\Python27\Lib\httplib.py", line 1139, in getreply response = self._conn.getresponse() File "C:\Python27\Lib\httplib.py", line 1067, in getresponse response.begin() File "C:\Python27\Lib\httplib.py", line 409, in begin version, status, reason = self._read_status() File "C:\Python27\Lib\httplib.py", line 365, in _read_status line = self.fp.readline(_MAXLINE + 1) File "C:\Python27\Lib\socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) IOError: [Errno socket error] timed out These are just two examples but there are other methods, like authenticate or other API specific methods where my script throws these timeout errors. It never showed this behavior until Wednesday. Also, it starts throwing them at random times. Sometimes at the beginning of the crawl, sometimes later on. My script has this behavior on both my server and my local machine. I've been struggling with it for two days now but can't seem to figure it out. This is what I know might have caused this: On Wednesday one affiliate script broke down with the following error: URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)> I didn't change anything to my script but suddenly it stopped crawling that affiliate and threw that error all the time where I tried to authenticate. I looked it up and found that is was due to an OpenSSL error (where did that come from). I fixed it by adding the following before the authenticate method: if hasattr(ssl, '_create_unverified_context'): ssl._create_default_https_context = ssl._create_unverified_context Little did I know, this was just the start of my problems... At that same time, I changed from Python 2.7.8 to Python 2.7.9. It seems that this is the moment that everything broke down and started throwing timeouts. I tried changing my script in endless ways but nothing worked and like I said, it's not just one method that throws it. Also I switched back to Python 2.7.8, but this didn't do the trick either. Basically everything that makes a request to an external source can throw an error. Final note: My script is multi threaded. It downloads product feeds from different affiliates at the same time. It used to run 10 threads per affiliate without a problem. Now I tried lowering it to 3 per affiliate, but it still throws these errors. Setting it to 1 is no option because that will take ages. I don't think that's the problem anyway because it used to work fine. What could be wrong?
Python and proxy - urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
I tried to google and search for similar question on stackOverflow, but still can't solve my problem. I need my python script to perform http connections via proxy. Below is my test script: import urllib2, urllib proxy = urllib2.ProxyHandler({'http': 'http://255.255.255.255:3128'}) opener = urllib2.build_opener(proxy, urllib2.HTTPHandler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://www.whatismyip.com/') return_str = conn.read() webpage = open('webpage.html', 'w') webpage.write(return_str) webpage.close() This script works absolutely fine on my local computer (Windows 7, Python 2.7.3), but when I try to run it on the server, it gives me the following error: Traceback (most recent call last): File "proxy_auth.py", line 18, in <module> conn = urllib2.urlopen('http://www.whatismyip.com/') File "/home/myusername/python/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/home/myusername/python/lib/python2.7/urllib2.py", line 400, in open response = self._open(req, data) File "/home/myusername/python/lib/python2.7/urllib2.py", line 418, in _open '_open', req) File "/home/myusername/python/lib/python2.7/urllib2.py", line 378, in _call_chai n result = func(*args) File "/home/myusername/python/lib/python2.7/urllib2.py", line 1207, in http_open return self.do_open(httplib.HTTPConnection, req) File "/home/myusername/python/lib/python2.7/urllib2.py", line 1177, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno 110] Connection timed out> I also tried to use requests library, and got the same error. # testing request library r = requests.get('http://www.whatismyip.com/', proxies={'http':'http://255.255.255.255:3128'}) If I don't set proxy, then the program works fine. # this works fine conn = urllib2.urlopen('http://www.whatismyip.com/') I think the problem is that on my shared hosting account it is not possible to set an environment variable for proxy ... or something like that. Are there any workarounds or alternative approaches that would let me set proxies for http connections? How should I modify my test script?
The problem was in closed ports. I had to buy a dedicated IP before tech support could open the ports I needed. Now my script works fine. Conclusion: when you are on a shared hosting, most ports are probably closed and you will have to contact tech support to open ports.