How could I use python-request to grab a linkedin page? - python

I use below code try to grab a linked in page,but it seems this method couldn't let me login,just show me the unauthorized home page.
#/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
payload = {
'session-key': 'my account',
'session-password': 'my password'
}
URL = 'https://www.linkedin.com/uas/login'
s = requests.session()
s.post(URL, data=payload)
r = s.get('http://www.linkedin.com/nhome')
soup = BeautifulSoup(r.text)
print(soup)
`

This is much more complicated than what you've got so far.
You will need to do something like:
Load https://www.linkedin.com/uas/login
Parse the response with BeautifulSoup to get the login form, with all the hidden form fields etc. (The CSRF ones are particularly important, as the server will reject a POST request without the correct values).
Build your POST data dictionary from the parsed login form data + your username and password
POST that data to https://www.linkedin.com/uas/login-submit (you might have to fake some of the headers too, as it might only accept requests marked as AJAX)
Finally GET http://www.linkedin.com/nhome
You can see this whole process by opening the developer tools in chrome/firefox and going through the login process in the network tab.
Something like this should work:
import requests
from bs4 import BeautifulSoup
# Get login form
URL = 'https://www.linkedin.com/uas/login'
session = requests.session()
login_response = session.get('https://www.linkedin.com/uas/login')
login = BeautifulSoup(login_response.text)
# Get hidden form inputs
inputs = login.find('form', {'name': 'login'}).findAll('input', {'type': ['hidden', 'submit']})
# Create POST data
post = {input.get('name'): input.get('value') for input in inputs}
post['session_key'] = 'username'
post['session_password'] = 'password'
# Post login
post_response = session.post('https://www.linkedin.com/uas/login-submit', data=post)
# Get home page
home_response = session.get('http://www.linkedin.com/nhome')
home = BeautifulSoup(home_response.text)

Related

Web Scraping using Requests - Python

I am trying to get data using the Resquest library, but I’m doing something wrong. My explanation, manual search:
URL - https://www9.sabesp.com.br/agenciavirtual/pages/template/siteexterno.iface?idFuncao=18
I fill in the “Informe o RGI” field and after clicking on the Prosseguir button (like Next):
enter image description here
I get this result:
enter image description here
Before I coding, I did the manual search and checked the Form Data:
enter image description here
And then I tried it with this code:
import requests
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data)
print(res.text)
My output is:
<session-expired/>
What am I doing wrong?
Many thanks.
When you go to the site using the browser, a session is created and stored in a cookie on your machine. When you make the POST request, the cookies are sent with the request. You receive an session-expired error because you're not sending any session data with your request.
Try this code. It requests the entry page first and stores the cookies. The cookies are then sent with the POST request.
import requests
session = requests.Session() # start session
# get entry page with cookies
response = session.get('https://www9.sabesp.com.br/agenciavirtual/pages/home/paginainicial.iface', timeout=30)
cks = session.cookies # save cookies with Session data
print(session.cookies.get_dict())
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data, cookies=cks) # send cookies with request
print(res.text)

Can't login into a webpage using python

I am a python newbie and I've been trying to scrap my University's Website to get a list of my grades with no luck so far.
I try to log in making a POST request with the form-data that's shown in the code and then make a GET request for request_URL. However when I try to print the requested url's content login url's content comes up. So it is pretty clear that I can't get past login page.
import requests
from bs4 import BeautifulSoup
login_URL = 'http://gram-web.ionio.gr/unistudent/login.asp'
request_URL = 'http://gram-web.ionio.gr/unistudent/stud_CResults.asp?studPg=1&mnuid=mnu3&'
payload = {
'userName': 'username',
'pwd': 'password',
'submit1': 'Login',
'loginTrue': 'login'
}
with requests.Session() as session:
post = session.post(login_URL, data=payload)
r = session.get(request_URL)
root = BeautifulSoup(r.text, "html.parser")
print(root)
I assume there is some kind of token value involved in the process because that's what the POST request looks like when I try to log in manually. Has anybody seen this before ? Is this the cause why I cannot log in ?
These are also the request headers.

Request login and data pull

I'm trying to use Python and Request to login to a website but can't seem to get it to work. I'm pretty new to Python but it appears request is pretty simple to use.
I can successfully iniate a session and get the HTML data, but can't figure out why post is not successfully loging me in.
My code is as follows:
import requests
from bs4 import BeautifulSoup
url="https://www.trilink-usa-portal.com/cs/index.php"
page = requests.get(url)
soup = BeautifulSoup(page.text,"html.parser")
print(soup.prettify())
payload = {
'usercode' : '{username}',
'password' : '{password}',
'login' : 'Login',
'sid':'{sid}',
'redirect' : '{redirect}'
}
with requests.Session() as s:
s.get(url)
p = s.post("https://www.trilink-usa-portal.com/cs/index.php?login", data=payload)
print(p.text)
r = s.get("https://www.trilink-usa-portal.com/cs/rpt01V2.php")
print(r.text)
I'm not sure if I need the sid or redirect, but any help would be appreciated. My print of 'r' just shows the html of the login page.

Using python requests module to login on an Wordpress based website

I'm trying to login to a Wordpress based website using python's request module and beautifulsoup4. It seems like the code fails to sucessfully login. Also, there is no csrf token on the website. How do I sucessfully login to the website?
import requests
import bs4 as bs
with requests.session() as c:
link="https://gpldl.com/sign-in/" #link of the webpage to be logged in
initial=c.get(link) #passing the get request
login_data={"log":"*****","pwd":"******"} #the login data from any account on the site. Stars must be replaced with username and password
page_login=c.post(link, data=login_data) #posting the login data into the link
print(page_login) #checking status of requested page
page=c.get("https://gpldl.com/my-gpldl-account/") #requesting source code of logged in page
good_data = bs.BeautifulSoup(page.content, "lxml") #parsing it with BS4
print(good_data.title) #printing this gives the title that is got from the page when accessed from an logged-out account
You are sending your POST request to a wrong URL, the correct one should be https://gpldl.com/wp-login.php, also there're 5 parameters for the payload which are log, pwd, rememberme, redirect_to, redirect_to_automatic.
So it should be:
login_data = {"log": "*****","pwd": "******",
"rememberme": "forever",
"redirect_to": "https://gpldl.com/my-gpldl-account/",
"redirect_to_automatic": "1"
}
page_login = c.post('https://gpldl.com/wp-login.php', data=login_data)
Edit:
You could use Chrome Dev tool to find out all these info while logging in, it's like this:
As to rememberme key, I would suggest you to do exact same thing a browser does, also add some headers for your request, especially User-Agent, because for some websites they just don't welcome you got login this way.

Not able to login to https site(https://malwr.com) through python script

I need to login to malwr site through python script
I tried with various modules like machanize module,request module, however no success to login to site using scrpt.
I want to create automation script to download files from malware analysis site by parsing html page, but due to login issue I am not able to parse href attribute of html page to get links to download file.
Below is my code:
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypassword'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://malwr.com/account/login/', login_data)
resp = opener.open('https://malwr.com/analysis/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/')
print resp.read()
am I doing somthing wrong?
The key thing to do is to parse the csrf token from the form and to pass it alongside with username and password in POST parameters to the https://malwr.com/account/login/ endpoint.
Here is the solution using requests and BeautifulSoup libraries.
First, it opens up a session to maintain cookies for "staying logged in" through the web-scraping session, then it is getting a csrf token from the login page. The next step is sending a POST request to log in. Then, you can open up "analysis" pages and retrieve the links:
from urlparse import urljoin
from bs4 import BeautifulSoup
import requests
base_url = 'https://malwr.com/'
url = 'https://malwr.com/account/login/'
username = 'username'
password = 'password'
session = requests.Session()
# getting csrf value
response = session.get(url)
soup = BeautifulSoup(response.content)
form = soup.form
csrf = form.find('input', attrs={'name': 'csrfmiddlewaretoken'}).get('value')
# logging in
data = {
'username': username,
'password': password,
'csrfmiddlewaretoken': csrf
}
session.post(url, data=data)
# getting analysis data
response = session.get('https://malwr.com/analysis/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/')
soup = BeautifulSoup(response.content)
link = soup.find('section', id='file').find('table')('tr')[-1].a.get('href')
link = urljoin(base_url, link)
print link
Prints:
https://malwr.com/analysis/file/MDMxMmY0NjMzNjYyNDIyNDkzZTllOGVkOTc5ZTQ5NWU/sample/7fe8157c0aa251b37713cf2dc0213a3ca99551e41fb9741598eb75c294d1537c/

Categories

Resources