How to give a session from requests to requests_html? - python

I want to scrape data from a site with login. I used the requests libary to login but i dont get js data from there. So I use also requests_html to get the js data but now i cant give the session from request to request_html or take the active session to scrape.
I know that there is "selenium" but when I use it there is always a recaptcha on the page, so I decided to use request_html.
If there are other possibilities, which might be easier, I will gladly accept suggestions.
Here is my code:
from requests_html import HTMLSession
import requests
url='...'
url2='...'
headers = {
...
}
data = {
'_csrf': '...',
'User[username]': '...',
'User[password]': '...'
}
session = requests.Session()
session.post(url,headers=headers,data=data)
session = HTMLSession()
r = session.get(url2)
r.html.render()
print(r.html.html)

Why don't you use requests_html.HTMLSession as session object, instead of requests.Session?
It inherits from requests.Session, so it's perfectly capable of calling HTML methods like post.

payload = {
'username': 'admin',
'password': 'password',
'Login': 'Login'
}
with HTMLSession() as c:
r = c.get(url)
login_html = BeautifulSoup(
r.html.html, "html.parser")
csrf_token_name = None
csrf_token_value = None
for tag in login_html.find_all('input'):
if tag.attrs['type'] == 'hidden':
csrf_token = True
csrf_token_name = tag.attrs['name']
csrf_token_value = tag.attrs['value']
payload[csrf_token_name] = csrf_token_value
p = c.post(url, data=payload)
r = c.get(
'http://localhost/vulnerabilities/xss_r/?name=xx1xx')
if 'Reflected' in r.text:
print('test')

Related

Why cant I login into a website using request module

So I needed to login to a website as I need to do an action that requires logging in first.
Here's my code:
import requests from bs4 import BeautifulSoup
logdata = {'username': 'xxx', 'password': 'xxx'}
url = 'https://darkside-ro.com/?module=account&action=login&return_url='
with requests.Session() as s:
r = [s.post](https://s.post)(url, data=logdata)
html = r.text soup = BeautifulSoup(html, "html.parser")
print(soup.title.get_text())
it gives me the title of when you're not logged in :(
I'm not sure why did I flagged this as duplicate, sorry.
Okay, so I created a dummy account and tried logging in - I noticed that when I submit the form, the following data are sent to https://darkside-ro.com/?module=account&action=login&return_url=.
So to fix your issue, you have to include a server in your logdata dictionary.
import requests
from bs4 import BeautifulSoup
logdata = {
'username': 'abc123456',
'password': 'abc123456',
'server': 'Darkside RO'
}
url = 'https://darkside-ro.com/?module=account&action=login&return_url='
with requests.Session() as s:
r = s.post(url, data=logdata)
html = r.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.title.get_text())
Running the code above will print
Darkside RO - The Rise of Skywalker
PS: When you do this things again, it would be a good idea to check for hidden inputs in the form by inspecting the elements. On the site above, it has
<input type="hidden" name="server" value="Darkside RO">

Module 'requests' doesn't go through with the login

I am trying to get information from a website by using the requests module. To get to the information you have to be logged in and then you can access the page. I looked into the input tags and noticed that they are called login_username and login_password but for some reasons the post doesn't go through. I also read here that he solved it by waiting for few seconds before going thorugh the other page, it didn't helped either..
Here is my code:
import requests
import time
#This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
#This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
payload = {
'login_username': 'username',
'login_password': 'password'
}
with requests.Session() as session:
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)
login_username and login_password are not all the necessary parameters. If you look at the /login/ POST request in the browser developer tools, you would see that there is also a _token being sent.
This is something you would need to parse out of the login HTML. So the flow would be the following:
get the https://jadepanel.nephrite.ro/login page
HTML parse it and extract _token value
make a POST request with login, password and token
use the logged in session to navigate the site
For the HTML parsing we could use BeautifulSoup (there are other options, of course):
from bs4 import BeautifulSoup
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
Complete code:
import time
import requests
from bs4 import BeautifulSoup
# This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
# This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
with requests.Session() as session:
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)

Login in a website with requests

I need to log me in a website with requests, but all I have try don't work :
from bs4 import BeautifulSoup as bs
import requests
s = requests.session()
url = 'https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5'
def authenticate():
headers = {'username': 'myuser', 'password': 'mypasss', '_Id': 'submit'}
page = s.get(url)
soup = bs(page.content)
value = soup.form.find_all('input')[2]['value']
headers.update({'value_name':value})
auth = s.post(url, params=headers, cookies=page.cookies)
authenticate()
or :
import requests
payload = {
'inUserName': 'user',
'inUserPass': 'pass'
}
with requests.Session() as s:
p = s.post('https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5', data=payload)
print(p.text)
print(p.status_code)
r = s.get('A protected web page url')
print(r.text)
When I try this with the .status_code, it return 200 but I want 401 or 403 for do a script like 'if login'...
I have found this but I think it works in python 2, but I use python 3 and I don't know how to convert... :
import requests
import sys
payload = {
'username': 'sopier',
'password': 'somepassword'
}
with requests.Session(config={'verbose': sys.stderr}) as c:
c.post('http://m.kaskus.co.id/user/login', data=payload)
r = c.get('http://m.kaskus.co/id/myform')
print 'sopier' in r.content
Somebody know how to do ?
Because each I have test test all script I have found and it don't work...
When you submit the logon, the POST request is sent to https://www.ent-place.fr/CookieAuth.dll?Logon not https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5 -- You get redirected to that URL afterwards.
When I tested this, the post request contains the following parameters:
curl:Z2F
flags:0
forcedownlevel:0
formdir:5
username:username
password:password
SubmitCreds.x:69
SubmitCreds.y:9
SubmitCreds:Ouvrir une session
So, you'll likely need to supply those additional parameters as well.
Also, the line s.post(url, params=headers, cookies=page.cookies) is not correct. You should pass headers into the keyword argument data not params -- params encodes to the request url -- you need to pass it in the form data. And I'm assuming you really mean payload when you say headers
s.post(url, data=headers, cookies=page.cookies)
The site you're trying to login to has an onClick JavaScript when you process the login form. requests won't be able to execute JavaScript for you. This may cause issues with the site functionality.

Python requests post doing nothing

I am using requests and cfscrape library to login to https://kissanime.to/Login
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=payload)
print(a.text)
print(a.status_code)
a.text gives me the same login page
a.status_code gives me 200
That means my login is not working at all. Am I missing something? According to chrome's network monitor, I should also get status code 302
POST DATA image:
I solved it using mechanicalsoup
Code:
import mechanicalsoup
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
# Creating cfscrape instance
self.r = cfscrape.create_scraper()
login_page = self.r.get(login_url)
# Creating a mechanicalsoup browser instance with
# response object of cfscrape
browser = mechanicalsoup.Browser(self.r)
soup = BeautifulSoup(login_page.text, 'html.parser')
# grab the login form
login_form = soup.find('form', {'id':'formLogin'})
# find login and password inputs
login_form.find('input', {'name': 'username'})['value'] = usr
login_form.find('input', {'name': 'password'})['value'] = pw
browser.submit(login_form, login_page.url)
This content is from Requests Documentation:
Many web services that require authentication accept HTTP Basic Auth. This is the simplest kind, and Requests supports it straight out of the box.
requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
You have to send the payload as JSON..
import requests,json
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=json.dumps(payload))
print(a.text)
print(a.status_code)
Reference: http://docs.python-requests.org/en/master/user/authentication/

What's wrong with my requests.Session for python crawler?

I'm coding a crawler for www.researchgate.net, but it seems that I'll be stuck in the login page forever.
Here's my code:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
params = {'login': 'my_email', 'password': 'my_password'}
session.post("https://www.researchgate.net/application.Login.html", data = params)
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title
Can anybody find anything wrong with my code? Why does s redirect to login page every time?
There are hidden fields in the login form that probably need to be supplied (I can't test - I don't have a login there).
One is request_token which is set to a long base64 encoded string. Others are invalidPasswordCount and loginCookie which might also be required.
Further to that there is a session cookie that you might need to send with the login credentials.
To make this work will require an initial GET to get the request_token, which you need to extract somehow - e.g. with BeautifulSoup. If you use your requests session then the cookie will be presented in the following POST, so you shouldn't need to worry about that.
import requests
from bs4 import BeautifulSoup
session = requests.Session()
# initial GET to retrieve token and set cookies
r = session.get('https://www.researchgate.net/application.Login.html')
soup = r.BeautifulSoup(r.text)
request_token = soup.find('input', attrs={'name':'request_token'})['value']
params = {'login': 'my_email', 'password': 'my_password', 'request_token': request_token, 'invalidPasswordCount': 0, 'loginCookie': 'yes'}
session.post("https://www.researchgate.net/application.Login.html", data=params)
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title
Thanks for mhawke, I modified my original code as he suggested and I finally logged in successfully.
Here's my new code:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
loginpage = session.get("https://www.researchgate.net/application.Login.html")
request_token = BeautifulSoup(loginpage.text).form.find("input",{"name":"request_token"}).attrs["value"]
print request_token
params = {"request_token":request_token,
"invalidPasswordCount":"0",
'login': 'my_email',
'password': 'my_password',
"setLoginCookie":"yes"
}
session.post("https://www.researchgate.net/application.Login.html", data = params)
#print s.cookies.get_dict()
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title

Categories

Resources