Passing CSRF token - python

This doesn't get past the login screen. I don't think I am passing in the CSRF token correctly. How should I do it?
from bs4 import BeautifulSoup
import requests
url = 'https://app.greenhouse.io/people/new?hiring_plan_id=24047'
cookies = {'_session_id':'my_session_id'}
client = requests.session()
soup = BeautifulSoup(client.get(url, cookies=cookies).content)
csrf_metatags = soup.find_all('meta',attrs={'name':'csrf-token'})[0].get('content')
posting_data = dict(person_first_name='Morgan') ## this is what I want to post to the form
headers = dict(Referer=url, csrf_token=csrf_metatags)
r = client.post(url, data=posting_data, headers=headers)
Thanks!

If you inspect the code, you'll find that the form has a hidden attached value like this:
<input name="authenticity_token" type="hidden"
value="2auOlN425EcdnmmoXmd5HFCt4PkEOhq0gpjOCzxNKns=" />
You can catch this value with:
csrf_data = soup.find("input", {"name": "authenticity_token"}).get("value")
Now re-attach the value to the posting data, as you did with person_first_name:
posting_data = dict(person_first_name='Morgan',
authenticity_token=csrf_data)

Related

How to scrape data behind a login

I am going to extract posts in a forum, named positive wellbeing during isolation" in HealthUnlocked.com
I can extract posts without login, but I cannot extract posts with logging. I used " url = 'https://solaris.healthunlocked.com/posts/positivewellbeing/popular?pageNumber={0}'.format(page)" to extract pots, but I don't know how I can connect it to login as the URL is in JSON format.
I would appreciate it if you could help me.
import requests, json
import pandas as pd
from bs4 import BeautifulSoup
from time import sleep
url = "https://healthunlocked.com/private/programs/subscribed?user-id=1290456"
payload = {
"username" : "my username goes here",
"Password" : "my password goes hereh"
}
s= requests.Session()
p= s.post(url, data = payload)
headers = {"user-agent": "Mozilla/5.0"}
pages =2
data = []
listtitles=[]
listpost=[]
listreplies=[]
listpostID=[]
listauthorID=[]
listauthorName=[]
for page in range(1,pages):
url = 'https://solaris.healthunlocked.com/posts/positivewellbeing/popular?pageNumber=
{0}'.format(page)
r = requests.get(url,headers=headers)
posts = json.loads(r.text)
for post in posts:
sleep(3.5)
listtitles.append(post['title'])
listreplies.append(post ["totalResponses"])
listpostID.append(post["postId"])
listauthorID.append(post ["author"]["userId"])
listauthorName.append(post ["author"]["username"])
url = 'https://healthunlocked.com/positivewellbeing/posts/{0}'.format(post['postId'])
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
listpost.append(soup.select_one('div.post-body').get_text('|', strip=True))
## save to CSV
df=pd.DataFrame(list(zip(*
[listpostID,listtitles,listpost,listreplies,listauthorID,listauthorName]))).add_prefix('Col')
df.to_csv('out1.csv',index=False)
print(df)
sleep(2)
For most websites, you have to first get a token by logging in. Most of the time, this is a cookie. Then, in authorized requests, you can send that cookie along. Open the network tab in developer tools and then log in with your username and password. You'll be able to see how the request is formatted and where it is too. From there, just try to replicate it in your code.

Why cant I login into a website using request module

So I needed to login to a website as I need to do an action that requires logging in first.
Here's my code:
import requests from bs4 import BeautifulSoup
logdata = {'username': 'xxx', 'password': 'xxx'}
url = 'https://darkside-ro.com/?module=account&action=login&return_url='
with requests.Session() as s:
r = [s.post](https://s.post)(url, data=logdata)
html = r.text soup = BeautifulSoup(html, "html.parser")
print(soup.title.get_text())
it gives me the title of when you're not logged in :(
I'm not sure why did I flagged this as duplicate, sorry.
Okay, so I created a dummy account and tried logging in - I noticed that when I submit the form, the following data are sent to https://darkside-ro.com/?module=account&action=login&return_url=.
So to fix your issue, you have to include a server in your logdata dictionary.
import requests
from bs4 import BeautifulSoup
logdata = {
'username': 'abc123456',
'password': 'abc123456',
'server': 'Darkside RO'
}
url = 'https://darkside-ro.com/?module=account&action=login&return_url='
with requests.Session() as s:
r = s.post(url, data=logdata)
html = r.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.title.get_text())
Running the code above will print
Darkside RO - The Rise of Skywalker
PS: When you do this things again, it would be a good idea to check for hidden inputs in the form by inspecting the elements. On the site above, it has
<input type="hidden" name="server" value="Darkside RO">

CSRF Token Missing When Posting Request To DVWA Using Python Requests Library

I'm trying to make a program that will allow me to submit username and password on a website. For this, I am using DVWA(Damn Vulnerable Web Application) which is running on localhost:8080.
But whenever I try to send post request, it always returns an error.
csrf token is incorrect
Here's my code:
import requests
url = 'http://192.168.43.1:8080/login.php'
data_dict = {"username": "admin", "password": "password", "Login": "Login"}
response = requests.post(url, data_dict)
print(response.text)
You need to make GET request for that URL first, and parse the correct "CSRF" value from the response (in this case user_token). From response HTML, you can find hidden value:
<input type="hidden" name="user_token" value="28e01134ddf00ec2ea4ce48bcaf0fc55">
Also, it seems that you need to include cookies from first GET request for following request - this can be done automatically by using request.Session() object. You can see cookies by for example print(resp.cookies) from first response.
Here is modified code. I'm using BeautifulSoup library for parsing the html - it finds correct input field, and gets value from it.
POST method afterwards uses this value in user_token parameter.
from bs4 import BeautifulSoup
import requests
with requests.Session() as s:
url = 'http://192.168.43.1:8080/login.php'
resp = s.get(url)
parsed_html = BeautifulSoup(resp.content, features="html.parser")
input_value = parsed_html.body.find('input', attrs={'name':'user_token'}).get("value")
data_dict = {"username": "admin", "password": "password", "Login": "Login", "user_token":input_value}
response = s.post(url, data_dict)
print(response.content)

Module 'requests' doesn't go through with the login

I am trying to get information from a website by using the requests module. To get to the information you have to be logged in and then you can access the page. I looked into the input tags and noticed that they are called login_username and login_password but for some reasons the post doesn't go through. I also read here that he solved it by waiting for few seconds before going thorugh the other page, it didn't helped either..
Here is my code:
import requests
import time
#This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
#This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
payload = {
'login_username': 'username',
'login_password': 'password'
}
with requests.Session() as session:
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)
login_username and login_password are not all the necessary parameters. If you look at the /login/ POST request in the browser developer tools, you would see that there is also a _token being sent.
This is something you would need to parse out of the login HTML. So the flow would be the following:
get the https://jadepanel.nephrite.ro/login page
HTML parse it and extract _token value
make a POST request with login, password and token
use the logged in session to navigate the site
For the HTML parsing we could use BeautifulSoup (there are other options, of course):
from bs4 import BeautifulSoup
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
Complete code:
import time
import requests
from bs4 import BeautifulSoup
# This URL will be the URL that your login form points to with the "action" tag.
loginurl = 'https://jadepanel.nephrite.ro/login'
# This URL is the page you actually want to pull down with requests.
requesturl = 'https://jadepanel.nephrite.ro/clan/view/123'
with requests.Session() as session:
login_html = session.get(loginurl).text
soup = BeautifulSoup(login_html, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
payload = {
'login_username': 'username',
'login_password': 'password',
'_token': token
}
post = session.post(loginurl, data=payload)
time.sleep(3)
r = session.get(requesturl)
print(r.text)

python-requests and complicated forms

I'm trying to make a web scraper for my university web, but I can't get past the login page.
import requests
URL = "https://login.ull.es/cas-1/login?service=https%3A%2F%2Fcampusvirtual.ull.es%2Flogin%2Findex.php%3FauthCAS%3DCAS"
USER = "myuser"
PASS = "mypassword"
payload = {
"username": USER,
"password": PASS,
"warn": "false",
"lt": "LT-2455188-fQ7b5JcHghCg1cLYvIMzpjpSEd0rlu",
"execution": "e1s1",
"_eventId": "submit",
"submit": "submit"
}
with requests.Session() as s:
r = s.post(URL, data=payload)
#r = s.get(r"http://campusvirtual.ull.es/my/index.php")
with open("test.html","w") as f:
f.write(r.text)
That code is obviously not working and I don't know where's the mistake, I tried putting only the username and the password in the payload (the other values are in the source code of the web that are marked as hidden) but that is also failing.
Can anyone point me in the right direction? Thanks. (sorry for my english)
The "lt": "LT-2455188-fQ7b5JcHghCg1cLYvIMzpjpSEd0rlu" is a session ID or some sort of anti-CSRF protection or similar (wild guess: hmac-ed random id number). What matters is that it is not a constant value, you will have to read it from the same URL by issuing a GET request.
In the GET response you have something like:
<input type="hidden" name="lt" value="LT-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" />
Additionally, there is a JSESSIONID cookie that might be important.
This should be your flow:
GET the URL
extract the lt parameter and the JSESSIONID cookie from the response
fill the payload['lt'] field
set cookie header
POST the same URL.
Extracting the cookie is very simple, see the requests documentation.
Extracting the lt param is a bit more difficult, but you can do it using BeautifulSoup package. Assuming that you have the response in a variable named text, you can use:
from BeautifulSoup import BeautifulSoup as soup
payload['lt'] = soup(text).find('input', {'name': 'lt', 'type': 'hidden'}).get('value')

Categories

Resources