I would like to access a ressource with a particular url. Let's say I have only access to a PC (without admin rights) from which I cannot use the requests module due to different reasons.
Normally, I would address an API und perform HTTP GET and HTTP POST requests with:
import requests
url = r"https://httpbin.org/json"
r = requests.get(url)
If I would like to provide header and authorisation details, I would add
headers = {"Content-Type": "application/json"}
auth = ("username", "password")
r = requests.post(url, auth=auth, headers=headers)
as well as the payload in the data exchange format of the API (either JSON or XML).
Unfortunately, I cannot use the requests module on the aforementioned system. However, I can use the selenium module with the Internet Explorer webdriver (no Firefox and no Chrome).
I tried to access the url of the API with
from selenium import webdriver
driver = webdriver.Ie()
driver.get(url)
This does open an authentication popup, which I cannot access with the selenium "switch_to" functions. Ideally, I would like to perform a HTTP POST via selenium and provide authentication as well as header information. Would that be possible?
Related
I am new to python and web scraping and i'm trying to scrape a website that uses JavaScript. I have managed to automate the log in sequence via Selenium, however when I try to send the API call to get the data, I am not able to get anything. I'm assuming it's because the API call requires some sort of authentication. How can I get past this?
Here's my code:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import time
import pandas as pd
import requests
import json
username = 'xxx'
password = 'xxx'
url = 'https://www.example.com/login'
#log in
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
driver.find_element(By.XPATH, '//*[#id="username"]').send_keys(username)
driver.find_element(By.XPATH, '//*[#id="password"]').send_keys(password)
driver.find_element(By.XPATH, '//*[#id="login_button"]').click()
# go to User Lines
driver.get('http://www.example.com/lines')
time.sleep(5)
response = requests.request("GET", url, headers=headers, data=payload)
subs = json.loads(response.text)
print(subs)
Every time an HTTP request is made some metadata is included. This is all the header data and cookies and maybe some other session data. It has to be sent every time because that's the only way to maintain a 'session'
If you login in Selenium, the browser is managing your session there. Making a request with the python requests library has nothing to do with Selenium, and most likely the authentication that you're missing is what is provided by logging in in Selenium.
So you have a few options:
1. Make the API call using Selenium After logging in just get() the API URL and the page source should be the data within a tag.
2. Log in using the requests library Instead of using Selenium, you can exclusively use requests. This can be tedious; you'll have to inspect the network calls using the devtools and piece together what you would need to replicate using requests to simulate the same login that happens on the browser. You would also need to use a persistent session using requests.Session() to create a session instance. You can use this object to make the requests instead of the requests library directly. But once you do, you can just make the API request as you were. This method has the fastest runtime too since you're not rendering a whole browser and running the javascript within that, and making all the network requests therein.
3. Pass the session data from Selenium to your requests' session instance I haven't tried doing this, but since session data is just passed along in the headers and are just strings, you can probably find a way to get the cookies from Selenium and add them to your session requests instance to make your API call without selenium.
I can login to a website with selenium and i can receive all cookies.
But then I have to quickly submit a request to the site. Meanwhile, selenium stays very slow.
That's why I want to receive cookies with selenium and send requests via the request module.
My Selenium Code (First I log in to the website and received all cookies with the code below.)
browser.get('https://www.example.com/login')
cookiem1 = browser.get_cookies()
print(cookiem1)
2nd stage, I will go to another page of the website and make a request.
s = requests.Session()
for cookie in cookiem1:
s.cookies.set(cookie['name'], cookie['value'])
r = s.get("https://example.com/postcomment')
print(r.content)
I use cookies in this way, but when I send the url via request module, the site does not autohorize my user.
My error:
"errorMessage": "Unauthorized user",\r\n "errorDetails": "No cookie"
Probably with this code the site doesn't unauthorized my session
Thanks in advance
try this
import requests as re
ck = browser.get_cookies()
s = re.Session()
c = [s.cookies.set(c['name'], c['value']) for c in ck]
response = s.get("https://example.com/postcomment")
So I'm trying to log into my hotmail account via python and keep getting this response on the page when I make this request
r = requests.post('https://login.live.com', auth=('Email', 'Pass'),verify=False)
Cookies must be allowed
Your browser is currently set to block cookies. Your browser must allow cookies before you can use a Microsoft account.
Cookies are small text files stored on your computer that tell Microsoft sites and services when you're signed in. To learn how to allow cookies, see online help in your web browser.
I would also like to mention that I am trying to httpPOST to this webpage because I would rather handle the cookies in the response and access other pages of my microsoft profile (rather than just accessing my email via the smtp server)
Thanks!
Edit :
import requests
s = requests.Session()
r = s.get('https://login.live.com',verify=False)
r = s.post('https://login.live.com', auth=('user', 'pass'),verify=False)
print r.status_code
print r.text
Use requests.Session to persist a session (with cookies included):
import requests
s = requests.Session()
res = s.get('https://login.live.com')
cookies = dict(res.cookies)
res = s.post('https://login.live.com',
auth=('Email', 'Password'),
verify=False,
cookies=cookies)
How can I use automatic NTLM authentication from python on Windows?
I want to be able to access the TFS REST API from windows without hardcoding my password, the same as I do from the web browser (firefox's network.automatic-ntlm-auth.trusted-uris, for example).
I found this answer which works great for me because:
I'm only going to run it from Windows, so portability isn't a problem
The response is a simple json document, so no need to store an open session
It's using the WinHTTP.WinHTTPRequest.5.1 COM object to handle authentication natively:
import win32com.client
URL = 'http://bigcorp/tfs/page.aspx'
COM_OBJ = win32com.client.Dispatch('WinHTTP.WinHTTPRequest.5.1')
COM_OBJ.SetAutoLogonPolicy(0)
COM_OBJ.Open('GET', URL, False)
COM_OBJ.Send()
print(COM_OBJ.ResponseText)
You can do that with https://github.com/requests/requests-kerberos. Under the hood it's using https://github.com/mongodb-labs/winkerberos. The latter is marked as Beta, I'm not sure how stable it is. But I have requests-kerberos in use for a while without any issue.
Maybe a more stable solution would be https://github.com/brandond/requests-negotiate-sspi, which is using pywin32's SSPI implementation.
I found solution here https://github.com/mullender/python-ntlm/issues/21
pip install requests
pip install requests_negotiate_sspi
import requests
from requests_negotiate_sspi import HttpNegotiateAuth
GetUrl = "http://servername/api/controller/Methodname" # Here you need to set your get Web api url
response = requests.get(GetUrl, auth=HttpNegotiateAuth())
print("Get Request Outpot:")
print("--------------------")
print(response.content)
for request by https:
import requests
from requests_negotiate_sspi import HttpNegotiateAuth
import urllib3
urllib3.disable_warnings()
GetUrl = "https://servername/api/controller/Methodname" # Here you need to set your get Web api url
response = requests.get(GetUrl, auth=HttpNegotiateAuth(), verify=False)
print("Get Request Outpot:")
print("--------------------")
print(response.content)
NTLM credentials are based on data obtained during the interactive logon process, and include a one-way hash of the password. You have to provide the credential.
Python has requests_ntlm library that allows for HTTP NTLM authentication.
You can reference this article to access the TFS REST API :
Python Script to Access Team Foundation Server (TFS) Rest API
If you are using TFS 2017 or VSTS, you can try to use Personal Access Token in a Basic Auth HTTP Header along with your REST request.
I am trying to use urllib2 through a proxy; however, after trying just about every variation of passing my verification details using urllib2, I either get a request that hangs forever and returns nothing or I get 407 Errors. I can connect to the web fine using my browser which connects to a prox-pac and redirects accordingly; however, I can't seem to do anything via the command line curl, wget, urllib2 etc. even if I use the proxies that the prox-pac redirects to. I tried setting my proxy to all of the proxies from the pac-file using urllib2, none of which work.
My current script looks like this:
import urllib2 as url
proxy = url.ProxyHandler({'http': 'username:password#my.proxy:8080'})
auth = url.HTTPBasicAuthHandler()
opener = url.build_opener(proxy, auth, url.HTTPHandler)
url.install_opener(opener)
url.urlopen("http://www.google.com/")
which throws HTTP Error 407: Proxy Authentication Required and I also tried:
import urllib2 as url
handlePass = url.HTTPPasswordMgrWithDefaultRealm()
handlePass.add_password(None, "http://my.proxy:8080", "username", "password")
auth_handler = url.HTTPBasicAuthHandler(handlePass)
opener = url.build_opener(auth_handler)
url.install_opener(opener)
url.urlopen("http://www.google.com")
which hangs like curl or wget timing out.
What do I need to do to diagnose the problem? How is it possible that I can connect via my browser but not from the command line on the same computer using what would appear to be the same proxy and credentials?
Might it be something to do with the router? if so, how can it distinguish between browser HTTP requests and command line HTTP requests?
Frustrations like this are what drove me to use Requests. If you're doing significant amounts of work with urllib2, you really ought to check it out. For example, to do what you wish to do using Requests, you could write:
import requests
from requests.auth import HTTPProxyAuth
proxy = {'http': 'http://my.proxy:8080'}
auth = HTTPProxyAuth('username', 'password')
r = requests.get('http://wwww.google.com/', proxies=proxy, auth=auth)
print r.text
Or you could wrap it in a Session object and every request will automatically use the proxy information (plus it will store & handle cookies automatically!):
s = requests.Session(proxies=proxy, auth=auth)
r = s.get('http://www.google.com/')
print r.text