I would like to authenticate to linkedin.com and get some content
I use requests python module and do something like this:
import requests
from BeautifulSoup import BeautifulSoup
client = requests.Session()
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = HOMEPAGE_URL + '/uas/login-submit'
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html)
csrf = soup.find(id="loginCsrfParam-login")['value']
login_information = {
'session_key':'my_login',
'session_password':'my_password',
'loginCsrfParam': csrf,
}
client.post(LOGIN_URL, data=login_information)
content = client.get(HOMEPAGE_URL + 'vsearch/c').content
And, got the content, all right,
But, now I want to use tornado framework to do the same work
I get loginCsrfParam in the similar way and make post request:
login_information = {
'session_key':'my_login',
'session_password':'my_password',
'loginCsrfParam':csrf
}
body = urllib.urlencode(login_information)
http_client.fetch(LOGIN_URL,
handle_request_post,
method='POST',
headers=None,
body=body)
And after arriving response
http_client.fetch(HOMEPAGE_URL + '/vsearch/c',
handle_request_get_content,
method = 'GET')
But I get simply a login page
What's wrong?
Tornado's AsyncHTTPClient doesn't have any concept of a session; each request is independent. It looks like requests.Session is transferring something from the login request to the vsearch request, probably cookies. You'll need to handle the Set-Cookie header from the login request and transfer the cookies to any following requests (perhaps using the http.cookiejar module)
Related
I'm just learning about the request module in python and want to test it.
First I started looking at the chrome network tab to see the request URL and method: POST and the used form data, looking at the response tab on the chrome gives me the expected data in JSON but after doing it on a python file like this:
import requests
uid = "23415191"
url = "https://info.gbfteamraid.fun/web/userrank"
honors = requests.post(url, data={
"method": "getUserDayPoint",
"params": {"teamraidid":"teamraid056","userid":uid}
}).text
print(honors)
This gives me a HTML elements of the homepage of the site instead of the JSON response, also used postman and it gives me the same result
You are being redirected to a different page ,maybe cos you are not authenticated .Most probably you may need to authenticate your request . I have added the basic auth ,which uses id and password.Try this with the id ,pass that you have for that website and then see if it works. Just replace your and with your id and password.
import requests
from requests.auth import HTTPBasicAuth
uid = "11111111"
url = "https://info.gbfteamraid.fun/web/userrank"
honors = requests.post(url,auth=HTTPBasicAuth(username="<your username>",password="<your password>"), data={
"method": "getUserDayPoint",
"params": {"teamraidid":"teamraid056","userid":uid}
}).text
print(honors)
I have tried using a rest client (ARC)for doing a post request to a private API and I am getting correct response but when I switch to python and did the same request using the python request package , the response is this
b'{"code":"Invalid Checksum","message":"Invalid Checksum"}'
I am using the same URL , header and body tag. Where can I possibly go wrong .
Here is the code snippet
import requests
import json
request_args = {"Id": -1,"startDate": "2018-01-13","endDate": "2018-01-14","ProviderId": 1}
headers = {'Authorization':'xxxxxx','Content-Type':'application/json','content-md5':'yyyy'}
base_url = "https://myendpoint"
response = requests.post(base_url,data=request_args, headers=headers)
print(response.content)
I am a python newbie and I've been trying to scrap my University's Website to get a list of my grades with no luck so far.
I try to log in making a POST request with the form-data that's shown in the code and then make a GET request for request_URL. However when I try to print the requested url's content login url's content comes up. So it is pretty clear that I can't get past login page.
import requests
from bs4 import BeautifulSoup
login_URL = 'http://gram-web.ionio.gr/unistudent/login.asp'
request_URL = 'http://gram-web.ionio.gr/unistudent/stud_CResults.asp?studPg=1&mnuid=mnu3&'
payload = {
'userName': 'username',
'pwd': 'password',
'submit1': 'Login',
'loginTrue': 'login'
}
with requests.Session() as session:
post = session.post(login_URL, data=payload)
r = session.get(request_URL)
root = BeautifulSoup(r.text, "html.parser")
print(root)
I assume there is some kind of token value involved in the process because that's what the POST request looks like when I try to log in manually. Has anybody seen this before ? Is this the cause why I cannot log in ?
These are also the request headers.
I saw this post - Passing csrftoken with python Requests
I've been working through it trying to make it work for Greenhouse. I'm trying to build a script that will automate profile creation.
I can fetch data using GET and cookies, but I think I'm I'm getting stuck with X-CSRF. I downloaded the Live HTTP headers plugin for Mozilla to get the CSRF token, but I'm unsure how to pass it in.
So far what I have:
csrf = 'some_csrf_token'
cookie = 'some_cookie_id'
data = dict('person_first_name'='Morgan') ## this is submitting my name on the form
url = 'https://app.greenhouse.io/people/new?hiring_plan_id=24047' ##submission form page
headers = {'Cookie':cookie}
r = requests.post(url, data=data, headers=headers)
Any thoughts how I should construct my requests.post?
If you want requests to handle the cookies for you, you should set a session.
session = requests.session()
logindata = {'authenticity_token': 'whatevertokenis',
'user[email]': 'your#loginemail.com',
'user[password]': 'yourpassword',
'user[remember_me]': '0'}
login = session.post('https://app.greenhouse.io/users/sign_in', data=logindata) #this should log in in, i don't have an account there to test.
data = dict('person_first_name'='Morgan')
url = 'https://app.greenhouse.io/people/new?hiring_plan_id=24047'
r = session.post(url, data=data) #unless you need to set a user agent or referrer address you may not need the header to be added.
I use below code try to grab a linked in page,but it seems this method couldn't let me login,just show me the unauthorized home page.
#/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
payload = {
'session-key': 'my account',
'session-password': 'my password'
}
URL = 'https://www.linkedin.com/uas/login'
s = requests.session()
s.post(URL, data=payload)
r = s.get('http://www.linkedin.com/nhome')
soup = BeautifulSoup(r.text)
print(soup)
`
This is much more complicated than what you've got so far.
You will need to do something like:
Load https://www.linkedin.com/uas/login
Parse the response with BeautifulSoup to get the login form, with all the hidden form fields etc. (The CSRF ones are particularly important, as the server will reject a POST request without the correct values).
Build your POST data dictionary from the parsed login form data + your username and password
POST that data to https://www.linkedin.com/uas/login-submit (you might have to fake some of the headers too, as it might only accept requests marked as AJAX)
Finally GET http://www.linkedin.com/nhome
You can see this whole process by opening the developer tools in chrome/firefox and going through the login process in the network tab.
Something like this should work:
import requests
from bs4 import BeautifulSoup
# Get login form
URL = 'https://www.linkedin.com/uas/login'
session = requests.session()
login_response = session.get('https://www.linkedin.com/uas/login')
login = BeautifulSoup(login_response.text)
# Get hidden form inputs
inputs = login.find('form', {'name': 'login'}).findAll('input', {'type': ['hidden', 'submit']})
# Create POST data
post = {input.get('name'): input.get('value') for input in inputs}
post['session_key'] = 'username'
post['session_password'] = 'password'
# Post login
post_response = session.post('https://www.linkedin.com/uas/login-submit', data=post)
# Get home page
home_response = session.get('http://www.linkedin.com/nhome')
home = BeautifulSoup(home_response.text)