HTTP Error 401 using Mechanize for Python scraping script

HTTP Error 401 using Mechanize for Python scraping script - python

I am writing a script to automatically scrape information from my companies directory website using mechanize. However, the interpreter returns _response.httperror_seek_wrapper: HTTP Error 401: Authorization Required onbr.open(url) when I run my script.
This is the portion of my code where the interpreter runs into the error.
from sys import path
path.append("./mechanize/mechanize")
import _mechanize
from base64 import b64encode
def login (url, username, password):
b64login = b64encode('%s:%s' % (username, password))
br = _mechanize.Browser()
br.set_handle_robots(False)
br.addheaders.append(('Authorization','Basic %s' % b64login))
br.open(url)
r = br.response()
print r.read()
The site I am trying to access is an internal site within my companies network, and it uses a GlobalSign Certificate for authentication on company-issued computers.
I am sure the authentication information I am inputting is correct, and I have looked everywhere for a solution. Any hints on how to resolve this? Thanks!

It looks like your authentication methods don't match up. You state that your company uses GlobalSign certificates but your code is using Basic authentication. They are NOT equal!!
From a brief look at the Mechanize documentation (limited as it is), you don't implement authentication by manually adding headers. It has it's own add_password method for handling authentication.
Also, as a general HTTP authentication policy, you should NOT use preemptive authentication by adding the authentication headers yourself. You should set up your code with the necessary authentication (based on your library's documentation) and let it handle the authentication negotiation.

Related

How to pass credential to REST API

I am using below python code. But it keep throwing wrong user name or password. Looks like credential are not parsed correctly. But i know credential are correct since it works when i use CURL in DOS command prompt.
import requests as re
import json
re.packages.urllib3.disable_warnings()
url = 'https://nwsppl300p:9090/nwrestapi/v3/global/clients/?q=hostname:BMCJCA001T.corpads.local'
auth = ('cormama.remote\jamal', 'Inventigation100get$pump')
r = re.get(url, auth=auth,verify=False)
print (r.content)
Getting message
b'{"message":"Unauthorized access: The username or password is incorrect","status":{"code":401,"codeClass":"Client Error","reasonPhrase":"Unauthorized"},"timestamp":"2022-06-17T15:00:14-04:00","userAgentRequest":{"headers":[{"name":"Accept","value":"*/*"},{"name":"Accept-Language"},{"name":"Content-Type"}],"method":"GET","query":"q=hostname:BMCJCA001T.corpads.local","url":"https://nwsppl300p:9090/nwrestapi/v3/global/clients/"},"version":"19.5.0.5.Build.154"}'

It seems to me that you are either providing the wrong creds, or perhaps in the wrong format.
Are you able to access your site in a browser using those credentials?
Do you know how to use Fiddler Classic?
You can use Fiddler to capture the call (turn ON HTTPS encryption) when using the browser and capture that call to understand the format needed. note: if you leave fiddler running when debugging; it is a proxy and may interfere with VScode if you are using that to debug...you can use the following to get proxy certs:
os.environ['CURL_CA_BUNDLE'] = ''
The example below requires that I POST a json with my creds in order get my auth token. Your site may be different, but you can use this to figure out what it needs specifically.
in the example shown:
userName = {"email":"someEmail", "password":"somepass"}
auth = re.post(url, json=userName)

Why do I keep on getting GET request error from GitHub API

I'm trying to learn the requests library in python and I'm following a guide. I'm sending a get request to api.github.com/user but I keep on getting a Status Code of 401. For username, I was using my email at first, but I thought that was what was making it fail so I changed it to my GitHub username and it still doesn't work. Is there anything I'm doing wrong or are there solutions?
import requests
from getpass import getpass
response = requests.get(
"https://api.github.com/user",
auth=('username', getpass())
)
print(response)

You can no longer authenticate to the GitHub API using Basic authentication (a username and password). That ability has been removed. This API endpoint requires authentication because it tells you the current user, and when you're not logged in, there is no current user.
You'll need to generate a personal access token with the appropriate scopes and use it to authenticate to the API instead. You can also use an OAuth token if you're using an OAuth app, but it doesn't sound like you need that in this case.

Python web/server authentication

I am trying to create a very basic app that will be able to connect to a web server which host my college assignments, results, and more and notify me when ever there's something new on it. Currently I am trying to get the hang of the requests module, but I am not able to login as the server uses this kind of authentication, and it gives me error 401 unauthorized.
I tried searching how to authenticate to web servers tried using sockets, with no luck. Could you please help me find out how to do this?
EDIT: I am using python 3.4

After inspecting the headers in the response for that URL, I think the server is trying to use NTLM authentication.
Try installing requests-ntlm (e.g. with pip install requests_ntlm) and then doing this:
import requests
from requests_ntlm import HttpNtlmAuth
requests.get('http://moodle.mcast.edu.mt:8085/',
auth=HttpNtlmAuth('domain\\username', 'password'))

You need to attach a simple authentication header within the socket request headers.
Example;
import base64
mySocket.send('GET / HTTP/1.1\r\nAuthorization: Basic %s\r\n\r\n' % base64.b64encode('user:pass'))
Python 3x;
import base64
template = 'GET / HTTP/1.1\r\nAuthorization: Basic %s\r\n\r\n'
mySocket.send(bytes(template % base64.b64encode(bytes('user:pass', 'UTF-8')), 'UTF-8'))

They might not supply a programmatic API with which to authorize requests. If that is the case, you could try using selenium to manually open a browser and fill the details in for you. Selenium has handling for alert boxes too apparently, though I haven't used it myself.

JIRA REST API and kerberos authentication

I am struggling with Jira REST API authentication via kerberos. Basic authentication works as expected.
If I access the login page with the web browser (after I did kinit) and then use the generated JSESSIONID in my python script, I can use REST without getting 401. But I have no ide how to do that with my python script, I tried to use requests_kerberos, but when I request the login page, it simply returns the basic login form instead of automatic login.
Do you know how to use JIRA REST API with kerberos authentication?
Thanks for you answers.

After a day of struggle I finally figured it out.
First you have to send an HTTP GET request to ${jira-url}/step-auth-gss:
r = requests.get("https://example-jira.com/step-auth-gss", auth=requests_kerberos.HTTPKerberosAuth())
Then you get the JSESSIONID from the cookie header and you can REST away:
rd = requests.get(url, headers={"Cookie": "JSESSIONID=%s" % r.cookies['JSESSIONID']})

As explained by VaclavDedik, the first step is to get a valid JSESSIONID cookie (along with atlassian.xsrf.token and crowd.token_key cookies if you use Crowd for user management and SSO) upon successful Kerberos authentication on a private Jira resource / URL.
In Python, the PycURL package makes it very easy to authenticate with Kerberos. You can install it on Windows/Mac OS/Linux either with easy_install or pip. The PycURL package relies on libcurl. You will need to check that your libcurl version is >=7.38.0 as the HTTPAUTH_NEGOTIATE directive was introduced in that very version.
Then, it is as simple as:
import pycurl
curl = pycurl.Curl()
# GET JSESSIONID
curl.setopt(pycurl.COOKIEFILE, "")
curl.setopt(pycurl.HTTPAUTH, pycurl.HTTPAUTH_NEGOTIATE)
curl.setopt(pycurl.USERPWD, ':')
curl.setopt(pycurl.URL, <ANY_JIRA_PRIVATE_URL>)
curl.perform()
# Then REST request
curl.setopt(pycurl.URL, <YOUR_JIRA_REST_URL>)
curl.perform()
curl.close()
Please, check out the following page for detailed examples in Python, PowerShell and Groovy: https://www.cleito.com/products/iwaac/documentation/integrated-windows-authentication-for-non-browser-clients/
Though this is the official documentation of the Cleito IWAAC plugin mentioned by Xabs, this will work with any server-side Kerberos plugin for Jira

How can I pass my ID and my password to a website in Python using Google App Engine?

Here is a piece of code that I use to fetch a web page HTML source (code) by its URL using Google App Engine:
from google.appengine.api import urlfetch
url = "http://www.google.com/"
result = urlfetch.fetch(url)
if result.status_code == 200:
print "content-type: text/plain"
print
print result.content
Everything is fine here, but sometimes I need to get an HTML source of a page from a site where I am registered and can only get an access to that page if I firstly pass my ID and password. (It can be any site, actually, like any mail-account-providing site like Yahoo: https://login.yahoo.com/config/mail?.src=ym&.intl=us or any other site where users get free accounts by firstly getting registered there).
Can I somehow do it in Python (trough "Google App Engine")?

You can check for an HTTP status code of 401, "authorization required", and provide the kind of HTTP authorization (basic, digest, whatever) that the site is asking for -- see e.g. here for more details (there's not much that's GAE specific here -- it's a matter of learning HTTP details and obeying them!-).

As Alex said you can check for status code and see what type of autorization it wants, but you can not generalize it as some sites will not give any hint or only allow login thru a non standard form, in those cases you may have to automate the login process using forms, for that you can use library like twill (http://twill.idyll.org/)
or code a specific form submit for each site.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

HTTP Error 401 using Mechanize for Python scraping script - python

Related

How to pass credential to REST API

Why do I keep on getting GET request error from GitHub API

Python web/server authentication

JIRA REST API and kerberos authentication

How can I pass my ID and my password to a website in Python using Google App Engine?

Categories

Resources