Python - check if file/webpage exists

Python - check if file/webpage exists - python

I would like to use Python to check if a file/webpage exists based off its response code and act accordingly. However, I have a requirement to use HTTPS and to also provide username and password credentials. I couldn't get it running through curl (doesn't like HTTPS) but had success by using wget (with --spider and --user and --password). I suppose I can try incorporating wget into the script via os.system but it prints out a lot of output that would be very tricky to parse and if the URI does not exist (aka 404), I think gets stuck "awaiting response..".
I've had a look at urllib2 around the web and have seen people do some stuff, but I'm not sure if this addresses my situation and the solutions are always very convoluted (such as Python urllib2, basic HTTP authentication, and tr.im) . Anyway, if I can get some guidance on what the easiest avenue for me to pursue is using python, that would be appreciated.
edit: using the os.system method (and providing wget with "-q") seems to return a different number if the URI exists or not, so that gives me something to work with for now.

You can make a HEAD request using python requests.
import requests
r = requests.head('http://google.com/sjklfsjd', allow_redirects=True, auth=('user', 'pass'))
assert r.status_code != 404
If the request fails with a ConnectionError, the website does not exist. If you only want to check whether a certain page exists, you will get a successful response but the status code will be 404.
Requests has a pretty nice interface so I recommend checking it out. You'll probably like it a lot as it is very intuitive and powerful (while being lightweight).

urllib2 is the way to go to open any web page
urllib2.urlopen('http://google.com')
for added functionality, you'll need an opener with handlers. I reckon you'll only need the https because you're barely extracting any info
opener = urllib2.build_opener(
urllib2.HTTPSHandler())
opener.open('https://google.com')
add data and it will automatically become a POST request, or so i believe:
opener.open('https://google.com',data="username=bla&password=da")
the object you'll receive will have a code attribute.
That's the basic gist of it, do add as many handlers as you like, i believe they can't hurt.
source: https://docs.python.org/2.7/library/urllib2.html#httpbasicauthhandler-objects

You should use urllib2 to check that:
import urllib2, getpass
url = raw_input('Enter the url to search: ')
username = raw_input('Enter your username: ')
password = getpass.getpass('Enter your password: ')
if not url.startswith('http://') or not url.startswith('https://'):
url = 'http://'+url
def check(url):
try:
urllib2.urlopen(url)
return True
except urllib2.HTTPError:
return False
if check(url):
print 'The webpage exists!'
else:
print 'The webpage does not exist!'
opener = urllib2.build_opener(
urllib2.HTTPSHandler())
opener.open(url,data="username=%s&password=%s" %(username, password))
This runs as:
bash-3.2$ python url.py
Enter the url to search: gmail.com
Enter your username: aj8uppal
Enter your password:
The webpage exists!

Related

Python Requests: Check if Login was successful

I've looked all over the place for the solution I'm looking for but just can't find it. Basically, I'm developing a tool which takes a list of URLs from a text document, logs into them with your username/password, and returns which ones work.
I have the login figured out and all, but how do I actually return if the login works? Sites entered on the list won't necessarily use cookies and will be developed on various platforms.
Here's my login code:
r = requests.post(action, data=values, verify=False)
print(r.headers)
print(r.status_code)
print(r.content)
print(r.url)
sleep(1)
I'd like everything after that first line to be in an if statement if the login actually works, but I can't figure out how to actually determine that.
Thanks.

You can parse the r.content to check if you login or not, like string success or some flag means that you login.
if the r.content for login user and not login user are same, you can request a special link that need to login and find a flag to check.

status bit will let you know
r = requests.post(action, data=values, verify=False)
authi = requests.get(r.url, auth=('user', 'pass'))
>>> authi.status_code
200
You will find more status bit info here
STATUS CODE DEFINITION

Trying to access a password protected url using python

I've had problems accessing www.bizi.si or more specifically
http://www.bizi.si/BALMAR-D-O-O/ for instance. If you look at it without registering you won't see any financial data. But if you use the free registration, I used username: Lukec, password: lukec12345, you can see some of the financial data. I've used this next code:
import urllib.parse
import urllib.request
import re
import csv
username = 'Lukec'
password = 'lukec12345'
url = 'http://www.bizi.si/BALMAR-D-O-O/'
values = {'username':username, 'password':password}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url,data,values)
resp = urllib.request.urlopen(req,data)
respData = resp.read()
paragraphs = re.findall('<tbody>(.*?)</tbody>',str(respData))
And my len(paragraphs) is zero. I would really be grateful if anyone of you would be able to tell me how to access the page correctly. I know that the length being zero isnt the best indicator, but also the len(respData) if I use values as stated in my code or if I take it out of my code is the same, so I know I have not accessed the page through username, password.
Thank you for you're help in advance and have a nice day.

There are two issues here:
You are not using POST, but GET for the request.
There is no <tbody> element in the HTML produced; any such tags have been automatically added by your browser, do not rely on them being there.
To create a POST request, use:
req = urllib.request.Request(url, data, method='POST')
resp = urllib.request.urlopen(req)
Note that I removed the values argument (those are not headers, the third positional argument to Request() and you don't pass in a data argument when using a Request object.
The resulting HTML returned does not necessarily include the same data that is sent to a browser; you probably need to maintain a session here, return the cookies that the site sets.
It is far easier to do this with better tools such as the requests library and BeautifulSoup (the latter lets you parse HTML without having to resort to regular expressions), which can be combined with the robobrowser project to help you fill and submit forms on websites.
Note however that the page forms and state are managed by ASP.NET JavaScript code and are not easily reverse-engineered even by robobrowser. When you log in with a browser (which has run the JavaScript code for you), the POST looks like this:
{'__EVENTTARGET': ['ctl00$ctl00$loginBoxPopup$loginBox1$ButtonLogin'],
'__VSTATE': ['H4sIAAAAAAAEAO18zXPbSJKvBYmkPixLdrfVs909EkbTPZa7JYrfH27bsyAJSRA/QJEUbXNmVg8kIAoSCNAASFmanXd7t+4XsaeNjY2NcEfsfU8bsZeek3V8t72+/+D9D/t+CYAU9WW7pydidzZMm0AhKzMrMysrqwqV4n+Mzf1s3Le4t5c1dNs0NKuivOypplI2LDsjtY7yysne3oLP92XL1kKhL9xrVZHM1gEn9yW9pcjhL1rNWqUidhXd9+CdaFnNsBTZt/JOxIxmtI6A+dXbMT1Iy1b7iu9X74Nb5n0PR/Fa3YOipOqDe9bQvij2NFutq8pxeG5O3pd9/Dvws0anK+knOcWWVM3KmGjxQLFyki2FL/Oa802vhRPRZCoejsfmFpjFOxsmTK7odjWfE3JW5P+O3Rq7devWf+BDd/pMUOF/Vk8sW+kE0Z6mQF1Dt4Kbiq6YaitYUC37f4R/8xsPRdDtaGSV7Vgtw9TU5ipbV0wLBE9iwRD9W2WzEKpnKk90pWebkrbKlntNTW2ht2vGkaI/aSaTUrwVT4TT0ZgSSqV/97txiODfU8He8u1Z6qkyudd3uQZu3ZqcnJxigDDmfefoYQLfSYdu5DOzwOzPkdr3N7maCQdTTM94NdXWFN9sRtI6ksnKQQP/ZMKWF27T5R7jI7pAC44Ka/n+bcxFXWW7pqGe9g1ZP5RWWdtsG31Vl1hVZy3bMFW7r3jcVtm8Kpvqm++UvsT2oK7ERmKZVTYaCoXYrKIdKkE2J/XffOdSdyRbdcpn39tKX9WOwL1rWJrR11Wq30crOhBUQGXJPnLuh4p9KLH9AaLSYSULnQOBe2xTPVWDlhqUGT80WWJ83485ra6ystfqSEvXtX4E1aUjW12FZpLds0CoHEp9HWN1lcWQWWVPpb5yKulKCyU2l6uvXpWS7KUeGDKVZKNJ5jgCa6mr2uQImmLrBnBN4813qmbIzDRZfaKmvLJ9k0FPBZmZIQ3GfX4C0PNt93nCNnuKzMy6TysHtt19tL5+fHw8oFxX9X3D7Egt9VBZ76rWkQGR1mXmjkvxr9OPrZapdm3WPukqT5ZtNLsOFSUXuvx0dprFRzZavQ5sGjxG/yqavrJ8gey3l+l+u8xaZgtwT6BTpQO791U5qEuHsiMX9F8vVRuDirVoMBwNRoOH1q/DwcRvl58+Xma/ZpfXXX5Plx9+MzvtytKHt3akLvuE1Xua9s05sH10FabCe69C4cUYBxfh+z3dGeSsdWAcF6Vu2YRYyvFKS9KhlSCvsi3DMOXn3v0F+t4wOqtsz0QnujTE9CH7e5ffiKhZhwWa+2LlwS8fQK0Bz4ffnOOq++zKEDdIA37lIfvkCRt6eI5DH1NBINGH1qDPOZmqIz5t1YoFNLa8/M0FlIvq0ueywRwxrhjMJb9osCuMlWN2pDNXrlMQmEFJlsuS3oDhvOlvpVTdK3OlhigW92ovynzk4RX5LrIObuZXLnbE5TYsxc7CVRVzpX3kdtJlM/9ipLveadyBQS6JIQAs2Kq18nsiwph/tC9pFkIL4VcV+9FyV9WX8ajLyqtHYfYPD6815yWurlCO4L93OD2iC+KGKbXbUlNTBq0cGJgM3IfrWJOh+T6sUHBiIVgutxB/jyDQ0M9XOgOHZY8hpXEcNLCiWHH8eXmvqUn6EUbde3LvGD1LIZlGWhhp4IuVZVnt/0aV/+bJA1q3FKQTzGV7zm2vjtnVMPcqhmGTV2CKV8wHv1t+GGwdqJpsQqaHQeXlSuhhULJttGXTxIV2lsumeiSd/VFlEbWvRphrhHcc0LOxJ9xV5eA/VYVmfEVeoeg6QPnD7PTjQTiSmXk3fs4goCOm9iUQyMxdZ8KsI0Cjqw5khnGR/m7smiDb1aDoMtvSJMt6stxqrUm6pJ3YastaftqV2oo3WQf3IE9dgto93VTaZHBTkaHbUJZ3BPBWSwqaiqCr9soIkcyMY4qfZMbouk9i+xxJfBOmccwuME6Fsxag5QOuiz+TerbRwtIBE5ayZilmX20pa/AWX+KaWcert1CgJWBQsjqv1rmsuyDEAqRj6DLaoIY/3lc1KJWVbKWN5YNikVdF/nn8yqrsNi1yfp3pWXAQyzpfZIXfd5FFwctZ2v3D2Hus7ThZatlSMJNTzeCg0eAzpRmsogCKoCfxyYYjvwB+q+xPlO5tyz4s7+J/gky/+R2ZDevF0YVhwHkYLnejfwJf4jojL/gdT/nC9ZcRr0HR8f3xOnYfTGAUjqHjX5ylxTotcjZURZN9E2JZqC7eIWBd0nqKC2WE3OLM3i8ImjF6utyW5xZuM2MMw4wzE4yP8TMBZpKZYqaZGeb2wu1539SmWOFL29U852NC6fa8bxIA4ew1PYfDeL6/vbsp1htbYq7EsdWCWOdLwjYhJwk5L1ZEFzkUxfO9klircMROXMtXOI9NCDV3xAxXKI1CI4DeFasVPlfa5l3GrhQpVEyVxVyFq7sAwpwsi8XdivtMYk1VubowkDsGwFwVJOBUFqsDugTRNbjhc7y9MLYvM1OXTbs06fOLmT2hyC9N+XxUyi1Nt+X5+/8+dus2t50T62ev33wrlDg/V+befMtPZvhCTShlBRRKfE7I1+5mhGqtImQ5VsywVbFWECYyBT7nyxTEPO/PiFsQdSoDW9VLQPJloGx2OlPhSG0wnMhUctxUpsI3xDrqJ1E6+0HI8oEsV8qLdc6X5Qvb/P0sX8njzqIjBr3GFycJSlz9TkGcphtsuc3lp7JCJb9b4Er8zJtvgX/2WqxnhXsogwP+e53HF6cIJBbRyGSOhyIlIR/ICXXoyvlyYmabn8a1AtOidT8VS9x9ukG0NQC3xDq7WeFyfHbKhYL+bk4ssOWKwBa2dzOFba4kEMSVma2J5QLUm8yJxbMfuAIfyIlgWeemqM9FYuXP7ZYLfP4jV0+OrXNVagl8+GnXQR1DorjN38GVcEC2KZa4GedRcCSa88plvlYh3ScIOLVZEatgv80HttC76Nc7WyJ6Ya1aEJz+8cPZYRjcKtu7hckt9FMNKt3ZqpSpH9byYoOcQchVMBIYYXNeKAjkmuzAC+Zgu9Kbb91+IjmFhljgJuHmGDzQe5tv8CAQp7Z3K2evyZP8ea6IJnx5rsQVpvPEGhLVxUAe46Yi5OguoG0MOIJv8748+qsUcK45Dk9lvvJxXqzWMPp4ciKngyt5cAaQLwCjQVQ0Omfz7hh1xOOm8jDnD3ydhIA0ebS5W9oUa2C62yhy/gINWNGPvuMqtUCBmqsDKtSg/NSgdwFAqSRO4laDTJVAQcT2mc/exn0N3bpJwNmCEyxYeAIMeNt5QueINTF/DyMBXZ0lscuiY3lfYZcGW2E374yKqSK3ffYaaub9RS4PXwoUYRkMqknc89S7gSKfq4s5VPClTf7sNe4YiXkOdxpO3KdFIV/g6mc/UCNueOGLLLnV7r2igNbXRg3oA6jETTtXONCbbyeLAjwEnna3KOYrYgmK1So8YgHULsKQNJLnncKIjweKcBY46awbvRAcMjDtRHEX9i1x+YIYKHENDggzJYwmz1+mURbZIoaiGKA4CAeZAjux5ninWCmKZz9MilWhQM+BMl91BkNZACau9Tw3g1gIaWqVGp+nUFrgt+BdiKG5Ol/Ic1OkMwZ3iQ+g1ODhmwibNXG7BG5QSSzkJnHH7hzWRaHOwfUmyrXd7UB5NytCnOkKBcK1jQpXDNDIx4NzR9Uc7hSlEAnJxoSKCFrYhoj38FziL4Qe1DYcB9zm71XAZhBDnegu3EGnUDsUFDBeAujvksPHvY84y3wF/uY4V7XAIThz0y7A8YuK64u+yu7Za36+yjt2Gwror/JFjLZAlXdGP+4ISggK7py07YW2e8MpajjM7w5APPSBtFl+ukrd5TjbZFUswMR17iOa4gSaQc5tEoCbYazxuBM691m1HmQRoshIbN2ZZoOeL2xNVet8jWMh0c+pJKDktHEZ76/cWoQUYZsaohY50kj4bLTmEtVttw5jFiwnz15zBbLYXchWRNDkK2v1Cvwiw6GKumK34hRqiEUzTmGbr5T4bReIBlxgZRfDtDZJcSOL4HCbChuI0JgNuWnnwXGHj85eF8n1nXlimy+g9a0BjJxjMIbmHVitJJJaZS4vfDQEODMMhSJhCn199ppGrx83TBu+GofgEKgJFGE5f00sFIXSZK2SIWfkAzR0MdJvY7YTaX1AU4yvVkFkRxV68M2307VdzIROxwUwbOAK/CzuCCgUM8hDnCeepQjpIoGQ3/bVhRxf9NeFMno/UBeciOJHYIIquG3T7FrHUCWHrldoYHKT9d0GTQ3cXINz5rTBGPA3uHrlzbf+Bo0N3u/MltlptFLgGyAUAijWAPNB4oowTVd3jTFz9sMu6ooZxEtax4yFx8Pp+FhkPBxL4Zsei46H46Gx2Fh8LIFSeCw5lhpLoxTBsg+LMyYcZcIxPAMvnWDCcSaMa5IJp5gwocWYSAi3OBMB43iCiYBzPMlEokwEVQADhMfUeCSEW5qJhpgoYaaYKGGCRQL0CYASESYaY6JxJppgokkmCow0EwsxMaqLMrEIE8M1xsTiTCzBxJJMLMXEiD7GxMEinUQxzsTDDEQnaRMJBtJBMkgFidBiPM0kQkwizKApcEzEIFSKAVECqieSDP3HM3imU0ySxMItzCQjTDLKJGGERJpJxscjYUieDDFJUKUBSTJJ4KWZVIhJhZlUhElFmVSMScWZFDDAIJVkUikmlWbSISYdZtJEHmHSUTSPRsA7Dd7JGJOO4xZnYOZ0koEIaUhCGCFYIASiELRCo5FQGl+CERnUDqGZEGhDVEavJvEFTjgM2jDwSN4waKkfk4CnQqAnGHQJkRSAp4CXIhj4h6ke/QCSFJGBbRgsqb/R2RFiSb1NXR1BfYTsCBEiEIE0iVAZNLABSYC+D0dAGkU5BRQyCrkA9b/TBFCjJAHwoqiPUj3B6JmkxTcGWvKDGPktaGLATxOMuoDgwCezpqMffz3+72Pt/4YfWWY+G2xYJs43LIyYw2aFcXcqcwtzN+3xmFnmDoP6ed9YCFsg2oqNhb091FjE21yNRb3921jc202hB50NE0qRkLeVwnDztn3knt5GjaKIt3Mkt/Z2e+MRgjr7x/G4A6VmJkBGZWpgujR43S/5MILcndn4ULuc+GfULjbQLj3ULj3QLpYeaJdOD7VzgK52DtTVjqCudjGCutqlnSI140un3Ycb1fv51R3/cMuvop+XBuo7e/tRA/jODbDgw143Z3SkN9+pgzMRKI02Z2q9w3MIiTRXVM6+l/QRPDJDoKSc6pJuwB60v8+7daylsLrCyoZmvPlOgtHI835xo0T+0S5Z8GPrnzN7Z39sKrrEapJl66ouuWLNNyRLaV4Ak2z3GpJMFBcqSLx5V+hRMFn0bs48+6PUvwCnHvx8eG4ksQZrNDX1SB2gKKQi1Pil+0IuIOpZeqfru0uvswtGW9WLoNRWHn4jMw+gKpCWZq6rBY+HczdXM1+/rXLtHZzD76iPv6M+/bbGv3lb5ZPRwHIj3uh7kokCX6Oh6d6dlyTjE5iUonSJ0CV8BYDQPI5A5o3wC1EsQG61VxWGnnWDCPP3ubvqaVOhszT2iK6meriwtraGtThWYlg0O/s2jgVosmoqsmX3pY9zhtZ2TgnhNJYH5EqK0UW5o9jKCJhOS+UhusLSGWYfdwwFiw5JWVM67SiHdFSbEkfoDUs3+pf46JJmOCOBBhRwO3RKCmdczI00MDj0VIboH49WQ8FDCY3dz5vw7MtKfDVQkV2xjnpd8KbDVEO2e5BQ6T+kp64J2x0a/gbYt5WlET7XNb54oX5wNDwUYz4HZc0RCVYu4L/FWp+LzT7CjWHZKgzM9lXT6HtclH7Aiz1zomPGYZxKVJRTxew7R7CXe6YrWTe1tTBqQWPY7icXZD2HX9ThLXw/JkcTGnmuwe5Wy/yW42YLZe+MmlVPPYNJ+qH06ZtvVdJ1cIKNXvG6QuGq5AxKpymxbB8WcI1ySv0zTBCAcciDFNk4MtBDEgk2zB1QPkGMbavnR/yDRubLA4ghO4Avq7ZpnH2P5pua1HbP6jFkJG0k6UDpzw6xZEWT5kW5q1rquWi3uY5h2uopdYO0cKlpr6GlnHc+z67DDu0edLtgjnsb54f7nrB3z0Eekzsebw9h1n30Kr/ISX2wB1c3FeDIUcGQyW8V8JCl/pE6sPogW4CVNNWT5+6wnwa5D8Wy0UdPQAJjkLpgoBtIaEVvKX06HAMuDOcYhtQ9YHsm24fvmJh2e7p1pHTwgNYOe5+Rc2xgh0kv/AQWTsIVaGdGPvJzqjuHkPOcvS6J1ZpTi1lcpnluYPBPLj07h3RSN+jnM0Itx91xb9R7XVj2Y9Ir6OE4oitt6Wfo0aDTm5cqPr8oSU6EFENBlhzrSbCeTvEMLXvjEPGlIvIPr9YPBjDVcw+/dpiL/IaQFbD7ZzdFehfPVdwGyiK9cBhE5082DYuma9Phc+4ol0yFfTV20UMJP9lSbRPhwVD2gyxm+L7quujncPwulgvkCFcqR2PaNdVfUYvVmlgmyQrOK156NUEtOt2ZpddprtCfV12be+sKDBln6DphNPhglA9Zte6KLZTqPO5DJp8OmLhzhsNB1fuKZQcfXTJgnisL2NXTi6WymK8INYehy1vYYumFUrXG14nrohdCg7AlTYwOW7QQdPooeJ+7dQuTcIqm41SMLnG6JOiSpEuKLmlc0iG6hOlC03aaKNJEkSaKNFGkiSJNFGlQYDFPlzBdIrfoGqVLjC5xuiTokqRLii5EEiaSMJHQ2iBMqwTsTulCFGGiCBNFmJAjhBwh5IjDP0LYEcKOEHaEsCOEHSH+MaeC6KJEFyW6KDUSJbIo1UaJLEpkUSKLElmUKGJEESOKGFHEaDfJ/Tl2buNvW339+h3ruvHhdmFipDzcOnjnV7LsLKwCo0u5CxuMycVxTbMOFsdty2gvTlBTeLAULfJ//oLz+AafucHD5MzAZCOqL90+WJptL905WJr7b6it73wTufhZAVFe2zDMjuXmYoBBVZe66G478v+Y61MDvsmop+pPSQv4Xz86LYCyHQaJvMGhzKvsT5DkHSkAX75H++9z5P/L9+DjHvFTGgtt7CcOHJek5MQDU9n3fXVd2oeXALpOySyQfM3JDl2XmYnhMF8gh/Z7I943tzCxOOVkrR3YHc23kjF0KG7TXqRFO5JDWtSzfUM7NFBwJ78joxNcmmi7wkAsN9lgYnFc0uxL+ayIDGbL8S3nzcSlZNfLEWhiGIdcvtNu5uxMSeqrbSw5d03N94trtP596A/rHeNQWaP11brsJbYs3fd9dJ6I86hvqPJKiPafNwRQxiNavqEFi95zdGBRc92NxQ72r27AHmx61hTdsBWimPAorksSdTSQbLWlG2vewlJZvxCBXCvNeglBsHagZhhaTe36vqoZLJYlIPJ6y01TMlVT0tUO2+saptR0e20xwOuUxSYfLExca4Wlu9fajIxzzYwx6eQqu17lukDA0Fs3vBO5kTW8aGLpZz+GwkcGgCXXr7FkhisUucpabk1cEy93gjuEprzMsarARkOJaCwUjQzTe6ejkWQilIqEQiEnqdd/gwyQd5uy9YpSt6ZaK/FIOJmMr7JYX4RTCOkPLvj5317JwmbZSIzN/K2TDP0A3Cba5wk7/3vssdWVdJbiDzZr3SfL6EtFsdckWTYVy1p+ei27zON1Inu6yl4mhxGwoltrGbKy/JRSvz3MK4ia0ULQsE+WnzqCeWiyF31GREy5lF7qXPfA0JXlET42ZgwX+BTrQDYWi6aHvD7yIhlzf5TpBMy80JFUzTYe9UwVG46/RrjVgrpiLzKmPtotwylqUuhIbSck/MPYNX7g1Dpzh4SYGJSs7qtf81w8Z9RfSFs7DelZ1nxVTm8YKTWX07ZKqV5L4Pu6JdXCm19G9kNWOpLrtyPdlxX7OHlUfN5/tnMQK0RCRyf5Qqa0vXWSNFMprcyVQ9Hq/g6vlEFUV7a04+dNM5o62Dmx0qEvo7Tu+mREUTcAHI6Iq9oqJRuvK6/sYaYeJC0dcx0uF+IEjjO4HEf3I46PcZXjTJPjd7iddqbB8RZXce5weO7ifYfL7HA8z3mv+/1e247/j8wmE+54xuM9X5GzaPrBADx0t/V9xXY2Uwr9FcCRBo+j12WIL26nstF0cDpLcwT+d1S9ZxvsEza8mk6z/G5lECDRzPlC0nGetbIX39gLg4TFPgg+zWIzo7CWQZvOfkt55E4dlIRGOW9Zo6fb+8P16vjAmZzlqhOjpofB4et3BYeReO5aI/CnzBpeNBkLUwwZybsbCw3/KmAs4maJ+b0o6oj9GTM28rIUgE99//S+fryfyti93ZIVfl7aifV3hC6c79gyT+qvpFNzu26VI1uyoZ4C2nhVFjp8vXGYKHOJZq2W2ai+iAKeOYjtb0RzTW278+zLSLNla6WdyE5DK5mb3czLaP7Fq3gx+ywXTbafx8yTrNl2fXn83HX8rhLMwsRFJf7+fZVISfqRHVeeZaXaYcnaiYXr9m7+RVnd2Hq2vxtN7lZ4K90/aeT6z05CnNnfCGdyiX6ef3lY2slkmvvlnePYfnkzcxp/GRcqpWoumkqWas3CK6hT3Xx28CpXTSbqWw11s7fVFa6Iz1wQ+h/fV+hSezPx/MR61T1uFnLF3Xi+3KxGQzCo8MyM67WadLCVr3R60mFms7gFScIHx+Uyt//ypVqt5NvPZCVVfl4vHhQSae2oEI9xG+rRSymhylrkJN8Bfr5eq2Or2+TkTG2n265YVwSfvsHu//KTVIhYfe7woFJoHh/WMulQNHsgmhnIk9isSRsHr7q70ecbcSu78aIC++7z7dP6zm69lTosV0PNciLXPmk8z0nJbVAkX+zU+6excvHgefKkqecytRehbvfVgcaVGr3Obo2M9bxsvTx6W3/8NCeSd7XnYvE01NjdN3dCfSUuHJe4rrHxMnwi5huV/ahy/PzFaTO0D3EbL3byL8KJTKOBEdHXzNSpnjN1e4vPp4uHuVOjpeVf9iKv7FziVepKX9CAvzOYv65dJlwfM8Z/NIXvR1MEfjTFzLlis+dFb/KnwuiLAgDv+r582w4EC/I1Aq5fWDxeeelwcfV//fng/P3Z8b6lnCcXD7OKr08nHuYRX5NAfClz+GrK8Hmu8DBJ+Dw7+HJa8DAf2JWQCaUp3yWUZEJRJhyixJdQignhCmCMCSWYUPzjr8dnL7zYuXhW7Teae2pHodRhKsmD1GFi/yF7+EP28Ifs4Q/Zwx+yhz9kD3/IHv6QPfwXmj1MS5kPCcQfEog/JBD/BX6c3crEyIbO3dz9Se/ORk4cB6dbo3++OrH0V6NvfhfHe6qsDM5RHHzfyPEIvRbZp/dSI69Krj3l9jun3LfoJzy8w5Thrvfi853B32Y77xiZQWFiUPAPtsa+C/tbel/J9Ns+f1pupiMx2X1T472CXZyrS5oqS3S2uWkave4QbfBmYXEE4rsCcd6TDA+E/AOLOW9w/wmBV9JYXeqw7m+ssDr9uTrbV958B+jZ9+cneMHHTXP96cXfXhh86oRpm4res3XV++EVesfr7fDpkKlpKqddTXJynkZOm87Pmt7G/y1VmXfxZTXp4Mh4dANjWe17hxIPmj1NU2zrwQ0NOeg97S21DoamPq0e0DHa6C/dWH3jkC0bpr2vaIf00yMOgmodSZp+nk7qvEM/VA9Yq2sa9BsJRhNGtDSFMu86Drp+2Hu8jibeLURdlVVW7kvDVEtpJEnZE6336C2GHf3cnNxGIju1yiB7ThlJOHyHrI/Xb7Tn43X0zNPhcP3grOwHZ/0v76wB9x0n+65jpJEshovh+Huf6OR0jnog5UBbNm6ukzgn9l3DVo4044OXf/Dy/9yQ/MFhPzjsf3mHff+w7HeXyg76j8qV8063Pr32Z5UoZ2Dd2wS4x/7+H/3bf4GLvyX4/wE1p0V+k1QAAA=='],
'ctl00$ctl00$SearchAdvanced1$ActivitiesAndProductsSearch1$RadioButtonList1': ['TSMEDIA '
'dejavnost'],
'ctl00$ctl00$SearchAdvanced1$DropDownListYearSelection': ['2013'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihDo': ['do'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihOd': ['od'],
'ctl00$ctl00$SearchAdvanced1$ddlLegalEvents': ['0'],
'ctl00$ctl00$loginBoxPopup$loginBox1$Password': ['lukec12345'],
'ctl00$ctl00$loginBoxPopup$loginBox1$UserName': ['Lukec'],
'ctl00_ctl00_ScriptManager1_HiddenField': [';;AjaxControlToolkit, '
'Version=3.5.40412.0, '
'Culture=neutral, '
'PublicKeyToken=28f01b0e84b6d53e:sl:1547e793-5b7e-48fe-8490-03a375b13a33:475a4ef5:effe2a26:3ac3e789:5546a2b:d2e10b12:37e2e5c9:5a682656:12bbc599:1d3ed089:497ef277:a43b07eb:751cdd15:dfad98a5:3cf12cf1'],
'hiddenInputToUpdateATBuffer_CommonToolkitScripts': ['1']}
That's a lot more information than a simple username / password combination.
See post request using python to asp.net page for approaches on how to handle such pages instead.

HTTP Error 401: Authorization Required while downloading a file from HTTPS website and saving it

Basically i need a program that given a URL, it downloads a file and saves it. I know this should be easy but there are a couple of drawbacks here...
First, it is part of a tool I'm building at work, I have everything else besides that and the URL is HTTPS, the URL is of those you would paste in your browser and you'd get a pop up saying if you want to open or save the file (.txt).
Second, I'm a beginner at this, so if there's info I'm not providing please ask me. :)
I'm using Python 3.3 by the way.
I tried this:
import urllib.request
response = urllib.request.urlopen('https://websitewithfile.com')
txt = response.read()
print(txt)
And I get:
urllib.error.HTTPError: HTTP Error 401: Authorization Required
Any ideas? Thanks!!

You can do this easily with the requests library.
import requests
response = requests.get('https://websitewithfile.com/text.txt',verify=False, auth=('user', 'pass'))
print(response.text)
to save the file you would type
with open('filename.txt','w') as fout:
fout.write(response.text):
(I would suggest you always set verify=True in the resquests.get() command)
Here is the documentation:

Doesn't the browser also ask you to sign in? Then you need to repeat the request with the added authentication like this:
Python urllib2, basic HTTP authentication, and tr.im
Equally good: Python, HTTPS GET with basic authentication

If you don't have Requests module, then the code below works for python 2.6 or later. Not sure about 3.x
import urllib
testfile = urllib.URLopener()
testfile.retrieve("https://randomsite.com/file.gz", "/local/path/to/download/file")

You can try this solution: https://github.qualcomm.com/graphics-infra/urllib-siteminder
import siteminder
import getpass
url = 'https://XYZ.dns.com'
r = siteminder.urlopen(url, getpass.getuser(), getpass.getpass(), "dns.com")
Password:<Enter Your Password>
data = r.read() / pd.read_html(r.read()) # need to import panda as pd for the second one

Does urllib2 support server which returns both basic and digest authentication

I'm using urllib2.
I have problem to login a server which returns both basic and digest authentication.
it returns :
WWW-Authenticate: Digest realm="rets#aus.rets.interealty.com",nonce="c068c3d7d30cc0cd80db4d1c599e6d54",opaque="e75078c8-a825-474b-b101-f8ca2d1627ca",qop="auth"
WWW-Authenticate: Basic realm="rets#aus.rets.interealty.com"
here's my code :
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(realm=None, uri='http://aus.rets.interealty.com', user='user', passwd='pwd')
opener = urllib2.build_opener(urllib2.HTTPDigestAuthHandler(passman))
urllib2.install_opener(opener)
retsRequest= urllib2.Request('http://aus.rets.interealty.com/Login.asmx/Login')
retsRequest.add_header("User-Agent", 'userAgent')
retsRequest.add_header("RETS-Version",'retsVersion')
response=urllib2.urlopen(retsRequest)
print response.read()
I can login this server using IE and it seems IE uses digest authentication.

I have not encountered the same problem, but was once surprised to realize that the opener with urllib2.HTTPBasicAuthHandler makes actually two requests instead of one: the first without authentification and then as fallback the second with the authentification.
That is it may turn out that your case would require three requests, where the third request might forget the authentification from the second - one should check.
And you should probably add both: urllib2.HTTPDigestAuthHandler, and urllib2.HTTPBasicAuthHandler to the opener.

Recently I have time to review this question, I think I found the answer.It's kind of a bug of python urllib2. In urllib2:
class AbstractDigestAuthHandler:
def http_error_auth_reqed(self, auth_header, host, req, headers):
authreq = headers.get(auth_header, None)
if self.retried > 5:
# Don't fail endlessly - if we failed once, we'll probably
# fail a second time. Hm. Unless the Password Manager is
# prompting for the information. Crap. This isn't great
# but it's better than the current 'repeat until recursion
# depth exceeded' approach <wink>
raise HTTPError(req.get_full_url(), 401, "digest auth failed",
headers, None)
else:
self.retried += 1
if authreq:
scheme = authreq.split()[0]
if scheme.lower() == 'digest':
return self.retry_http_digest_auth(req, authreq)
Here authreq is :
Basic realm="rets#tra.rets.interealty.com", Digest realm="rets#tra.rets.interealty.com",nonce="512f616ed13813817feddb4fb0ce9e2d",opaque="84f5e814-d38a-44b4-8239-3f5be6ee3153",qop="auth"
authreq.split()[0] will be "Basic", it will never be "digest", so urllib2 will not do the second request in a digest authentication.
Basicly urllib2 assumes only one authentication can be in the first 401 response header. Unfortunately this server returns two types of authentication.
To solve this problem, either you can use the basic authentication or you can use other library like 'requests'.

Python: How do you login to a page and view the resulting page in a browser?

I've been googling around for quite some time now and can't seem to get this to work. A lot of my searches have pointed me to finding similar problems but they all seem to be related to cookie grabbing/storing. I think I've set that up properly, but when I try to open the 'hidden' page, it keeps bringing me back to the login page saying my session has expired.
import urllib, urllib2, cookielib, webbrowser
username = 'userhere'
password = 'passwordhere'
url = 'http://example.com'
webbrowser.open(url, new=1, autoraise=1)
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://example.com', login_data)
resp = opener.open('http://example.com/afterlogin')
print resp
webbrowser.open(url, new=1, autoraise=1)

First off, when doing cookie-based authentication, you need to have a CookieJar to store your cookies in, much in the same way that your browser stores its cookies a place where it can find them again.
After opening a login-page through python, and saving the cookie from a successful login, you should use the MozillaCookieJar to pass the python created cookies to a format a firefox browser can parse. Firefox 3.x no longer uses the cookie format that MozillaCookieJar produces, and I have not been able to find viable alternatives.
If all you need to do is to retrieve specific (in advance known format formatted) data, then I suggest you keep all your HTTP interactions within python. It is much easier, and you don't have to rely on specific browsers being available. If it is absolutely necessary to show stuff in a browser, you could render the so-called 'hidden' page through urllib2 (which incidentally integrates very nicely with cookielib), save the html to a temporary file and pass this to the webbrowser.open which will then render that specific page. Further redirects are not possible.

I've generally used the mechanize library to handle stuff like this. That doesn't answer your question about why your existing code isn't working, but it's something else to play with.

The provided code calls:
opener.open('http://example.com', login_data)
but throws away the response. I would look at this response to see if it says "Bad password" or "I only accept IE" or similar.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - check if file/webpage exists - python

Related

Python Requests: Check if Login was successful

Trying to access a password protected url using python

HTTP Error 401: Authorization Required while downloading a file from HTTPS website and saving it

Does urllib2 support server which returns both basic and digest authentication

Python: How do you login to a page and view the resulting page in a browser?

Categories

Resources