Web Scraping using Requests - Python - python

I am trying to get data using the Resquest library, but I’m doing something wrong. My explanation, manual search:
URL - https://www9.sabesp.com.br/agenciavirtual/pages/template/siteexterno.iface?idFuncao=18
I fill in the “Informe o RGI” field and after clicking on the Prosseguir button (like Next):
enter image description here
I get this result:
enter image description here
Before I coding, I did the manual search and checked the Form Data:
enter image description here
And then I tried it with this code:
import requests
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data)
print(res.text)
My output is:
<session-expired/>
What am I doing wrong?
Many thanks.

When you go to the site using the browser, a session is created and stored in a cookie on your machine. When you make the POST request, the cookies are sent with the request. You receive an session-expired error because you're not sending any session data with your request.
Try this code. It requests the entry page first and stores the cookies. The cookies are then sent with the POST request.
import requests
session = requests.Session() # start session
# get entry page with cookies
response = session.get('https://www9.sabesp.com.br/agenciavirtual/pages/home/paginainicial.iface', timeout=30)
cks = session.cookies # save cookies with Session data
print(session.cookies.get_dict())
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data, cookies=cks) # send cookies with request
print(res.text)

Related

There is no valid form to send username and login using Python requests

I am trying to scrape a website called quantumonline.com. There are 2 forms there, neither of which have the respective 'acctname' and 'pswrd' as a values I can fill in. Here is my code:
session = requests.session()
data = {
'acctname':'myusername',
'pswrd':'mypassword'
}
session.post('http://quantumonline.com/login.cfm',data=data)
Afterward, I try to access a secure page on it using the same session, and it tells me to register. I have also tried using
data = {
'acctname':'myusername',
'pswrd':'mypassword',
'submit':'Login'
}
I have no clue why it wont work. Any help is appreciated.
The website required the requests Session to have gone to the homepage before it tries to submit a login request. Everyone who helped me inch closer to the answer: Thank you.
Here is the working code:
session = requests.Session()
data = {
'acctname':'username',
'pswrd':'password',
'action':'login_test.cfm'
}
headers = {'User-Agent': 'Mozilla/5.0'}
session.get('http://quantumonline.com', headers=headers) #Get cookies
response = session.post('http://quantumonline.com/login_test.cfm', headers=headers, data=data)

how to use python requests to login to website

Im trying to login and scrape a job site and send me notification when ever certain key words are found.I think i have correctly traced the xpath for the value of feild "login[iovation]" but i cannot extract the value, here is what i have done so far to login
import requests
from lxml import html
header = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)"}
login_url = 'https://www.upwork.com/ab/account-security/login'
session_requests = requests.session()
#get csrf
result = session_requests.get(login_url)
tree=html.fromstring(result.text)
auth_token = list(set(tree.xpath('//*[#name="login[_token]"]/#value')))
auth_iovat = list(set(tree.xpath('//*[#name="login[iovation]"]/#value')))
# create payload
payload = {
"login[username]": "myemail#gmail.com",
"login[password]": "pa$$w0rD",
"login[_token]": auth_token,
"login[iovation]": auth_iovation,
"login[redir]": "/home"
}
#perform login
scrapeurl='https://www.upwork.com/ab/find-work/'
result=session_requests.post(login_url, data = payload, headers = dict(referer = login_url))
#test the result
print result.text
This is screen shot of form data when i login successfully
This is because upworks uses something called iOvation (https://www.iovation.com/) to reduce fraud. iOvation uses digital fingerprint of your device/browser, which are sent via login[iovation] parameter.
If you look at the javascripts loaded on your site, you will find two javascript being loaded from iesnare.com domain. This domain and many others are owned by iOvaiton to drop third party javascript to identify your device/browser.
I think if you copy the string from the successful login and send it over along with all the http headers as is including the browser agent in python code, you should be okie.
Are you sure that result is fetching 2XX code
When I am this code result = session_requests.get(login_url)..its fetching me a 403 status code, which means I am not going to login_url itself
They have an official API now, no need for scraping, just register for API keys.

Can't emulate browser behavior with requests

I'm trying to send a post request to a website to get a json response. I can see the json response in Chrome Inspector when I click on a link, but I can get it using requests.
Firstly I tried to used requests Session to get the cookies first and use them in the post request, to no avail.
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print response.text
Secondly I used Selenium+PhantomJS to get the cookies and used them in requests, no results!
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
all_cookie = {}
for cookie in browser.get_cookies():
all_cookie[cookie['name']] = cookie['value']
rep = requests.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997', cookies=all_cookie)
It only works when I manually take the cookies from Chrome.
I can't see what's the problem!
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print(response.json)
Using the json attribute will fetch the JSON response. You can also use requests to make a persistent session, so the cookies are provided.
response.cookies #The cookies attribute

save session/cookies with requests

I want to send a post request to a login form with the action /ShowLogin.xpl. I wrote a code using requests which sends an http request and sends post request with the user name & password credentials. How over, when I try to print out the content of the web page I still see the login page because the session or the cookie isn't saved. Can anyone help me?
The url is: https://www.xplace.com/ShowLogin.xpl
My code is:
payload = {"j_username" : "myusername", "j_password" : "mypassword"}
with requests.Session() as s:
webpage = s.post("https://www.xplace.com/ShowLogin.xpl", data=payload)
# An authorised request
r = s.get("https://www.xplace.com/ShowRecommendedProjects.xpl")
print r.text

Passing a CSRF token into requests.post()

I saw this post - Passing csrftoken with python Requests
I've been working through it trying to make it work for Greenhouse. I'm trying to build a script that will automate profile creation.
I can fetch data using GET and cookies, but I think I'm I'm getting stuck with X-CSRF. I downloaded the Live HTTP headers plugin for Mozilla to get the CSRF token, but I'm unsure how to pass it in.
So far what I have:
csrf = 'some_csrf_token'
cookie = 'some_cookie_id'
data = dict('person_first_name'='Morgan') ## this is submitting my name on the form
url = 'https://app.greenhouse.io/people/new?hiring_plan_id=24047' ##submission form page
headers = {'Cookie':cookie}
r = requests.post(url, data=data, headers=headers)
Any thoughts how I should construct my requests.post?
If you want requests to handle the cookies for you, you should set a session.
session = requests.session()
logindata = {'authenticity_token': 'whatevertokenis',
'user[email]': 'your#loginemail.com',
'user[password]': 'yourpassword',
'user[remember_me]': '0'}
login = session.post('https://app.greenhouse.io/users/sign_in', data=logindata) #this should log in in, i don't have an account there to test.
data = dict('person_first_name'='Morgan')
url = 'https://app.greenhouse.io/people/new?hiring_plan_id=24047'
r = session.post(url, data=data) #unless you need to set a user agent or referrer address you may not need the header to be added.

Categories

Resources