Using Selenium to access a site hosted on sharepoint

Using Selenium to access a site hosted on sharepoint - python

For privacy concerns, I cannot distribute the url publicly.
I have been able to access this site successfully using python requests session = requests.Session(); r = session.post(url, auth = HttpNtlmAuth(USERNAME, PASSWORD), proxies = proxies) which works great and I can parse the webpage with bs4. I have tried to return cookies using session.cookies.get_dict() but it returns an empty dict (assuming b/c site is hosted using sharepoint). My original thought was to retrieve cookies then use them to access the site.
The issue that I'm facing is when you redirect to the url, a box comes up asking for credentials - which when entered directs you to the url. You can not inspect the page that the box is on- which means that I can't use send.keys() etc. to login using selenium/chromedriver.
I read through some documentation but was unable to find a way to enter pass/username when calling driver = webdriver.Chrome(path_driver) or following calls.
Any help/thoughts would be appreciated.
When right clicking the below - no option to inspect webpage.

Related

Unable to log in a site using payload with appropriate parameters as it doesn't show up in chrome dev tools

I'm trying to log in this website using my credentials running python script but the problem is that the xhr requests visible as login in chrome dev tools stays for a moment and then vanishes, so I can't see the appropriate parameters (supposed to be recorded) necessary to log in. However, I do find that login in xhr if I put my password wrong. The form then looks incomplete, though.
I've tried so far (an incomplete payload because of chrome dev tools):
import requests
url = "https://member.angieslist.com/gateway/platform/v1/session/login"
payload = {"identifier":"username","token":"sometoken"}
res = requests.post(url,json=payload,headers={
"User-Agent":"Mozilla/5.0",
"Referer":"https://member.angieslist.com/member/login"
})
print(res.url)
How can I log in that site filling in appropriate parameters issuing a post http requests?

There is a checkbox called Persist logs in the Network tab and if its switched on the data about the post request remains. I think you should requests a session if you need to keep the script logged in. It may be done with:
import requests
url = 'https://member.angieslist.com/gateway/platform/v1/session/login'
s = requests.session()
payload = {"identifier":"youremail","token":"your password"}
res = s.post(url,json=payload,headers={"User-Agent":"Mozilla/5.0",'Referer': 'https://member.angieslist.com/member/login?redirect=%2Fapp%2Faccount'}).text
print(res)
the post requests returns a json file with all details of user.

Authenticating on ADFS with Python script

I need to parse site, which is hidden by ADFS service.
and struggling with authentication to it.
Is there any options to get in?
what i can see, most of solutions for backend applications, or for "system users"(with app_id, app_secret).
in my case, i can't use it, only login and password.
example of problem:
in chrome I open www.example.com and it redirects me to to https://login.microsoftonline.com/ and then to https://federation-sts.example.com/adfs/ls/?blabla with login and password form.
and how to get access into it with python3?

ADFS uses complicated redirection and CSRF protection techniques. Thus, it is better to use a browser automation tool to perform the authentication and parse the webpage afterwards. I recommend the selenium toolkit with python bindings. Here is a working example:
from selenium import webdriver
def MS_login(usrname, passwd): # call this with username and password
driver = webdriver.Edge() # change to your browser (supporting Firefox, Chrome, ...)
driver.delete_all_cookies() # clean up the prior login sessions
driver.get('https://login.microsoftonline.com/') # change the url to your website
time.sleep(5) # wait for redirection and rendering
driver.find_element_by_xpath("//input[#name='loginfmt'").send_keys(usrname)
driver.find_element_by_xpath("//input[#type='submit']").click()
time.sleep(5)
driver.find_element_by_xpath("//input[#name='passwd'").send_keys(passwd)
driver.find_element_by_xpath("//input[#name='KMSI' and #type='checkbox'").click()
driver.find_element_by_xpath("//input[#type='submit']").click()
time.sleep(5)
driver.find_element_by_xpath("//input[#type='submit']").click()
# Successfully login
# parse the site ...
driver.close() # close the browser
return driver
This script calls Microsoft Edge to open the website. It injects the username and password to the correct DOM elements and then let the browser to handle the rest. It has been tested on the webpage "https://login.microsoftonline.com". You may need to modify it to suit your website.

To Answer your question "How to Get in with python" i am assuming you want perform some web scraping operation on the pages which is secured by Azure AD authentication.
In these kind of scenario, you have to do the following steps.
For this script we will only need to import the following:
import requests
from lxml import html
First, we would like to create our session object. This object will allow us to persist the login session across all our requests.
session_requests = requests.session()
Second, we would like to extract the csrf token from the web page, this token is used during login. For this example we are using lxml and xpath, we could have used regular expression or any other method that will extract this data.
login_url = "https://bitbucket.org/account/signin/?next=/"
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[#name='csrfmiddlewaretoken']/#value")))[0]
Next, we would like to perform the login phase. In this phase, we send a POST request to the login url. We use the payload that we created in the previous step as the data. We also use a header for the request and add a referer key to it for the same url.
result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)
Payload would be a dictionary object of user name and password etc.
payload = {
"username": "<USER NAME>",
"password": "<PASSWORD>",
"csrfmiddlewaretoken": "<CSRF_TOKEN>"
}
Note:- This is just an example.
Step 2:
Scrape content
Now, that we were able to successfully login, we will perform the actual scraping
url = 'https://bitbucket.org/dashboard/overview'
result = session_requests.get(
url,
headers = dict(referer = url)
)
So in other words, you need to get the request details payload from Azure AD and then create a session object using logged in method and then finally do the scraping.
Here is a very good example of Web scraping of a secured website.
Hope it helps.

How can I set the cookie by using requests in python?

HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?

Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.

Creating a connection to a subscription site in python

I am looking to open a connection with python to http://www.horseandcountry.tv which takes my login parameters via the POST method. I would like to open a connection to this website in order to scrape the site for all video links (this, I also don't know how to do yet but am using the project to learn).
My question is how do I pass my credentials to the individual pages of the website? For example if all I wanted to do was use python code to open a browser window pointing to http://play.horseandcountry.tv/live/ and have it open with me already logged in, how do I go about this?

As far as I know you have two options depending how you want to crawl and what you need to crawl:
1) Use urllib. You can do your POST request with the necessary login credentials. This is the low level solution, which means that this is fast, but doesn't handle high level stuff like javascript codes.
2) Use selenium. Whith that you can simulate a browser (Chrome, Firefox, other..), and run actions via your python code. Then it is much slower but works well with too "sophisticated" websites.
What I usually do: I try the first option and if a encounter a problem like a javascript security layer on the website, then go for option 2. Moreover, selenium can open a real web browser from your desktop and give you a visual of your scrapping.
In any case, just goolge "urllib/selenium login to website" and you'll find what you need.

If you want to avoid using Selenium (opening web browsers), you can go for requests, it can login the website and grab anything you need in the background.
Here is how you can login to that website with requests.
import requests
from bs4 import BeautifulSoup
#Login Form Data
payload = {
'account_email': 'your_email',
'account_password': 'your_passowrd',
'submit': 'Sign In'
}
with requests.Session() as s:
#Login to the website.
response = s.post('https://play.horseandcountry.tv/login/', data=payload)
#Check if logged in successfully
soup = BeautifulSoup(response.text, 'lxml')
logged_in = soup.find('p', attrs={'class': 'navbar-text pull-right'})
print s.cookies
print response.status_code
if logged_in.text.startswith('Logged in as'):
print 'Logged In Successfully!'
If you need explanations for this, you can check this answer, or requests documentation

You could also use the requests module. It is one of the most popular. Here are some questions that relate to what you would like to do.
Log in to website using Python Requests module
logging in to website using requests

Using Python to request draftkings.com info that requires login?

I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?

The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()

Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Selenium to access a site hosted on sharepoint - python

Related

Unable to log in a site using payload with appropriate parameters as it doesn't show up in chrome dev tools

Authenticating on ADFS with Python script

How can I set the cookie by using requests in python?

Creating a connection to a subscription site in python

Using Python to request draftkings.com info that requires login?

Categories

Resources