I was trying to do this,
import requests
s=requests.Session()
login_data = dict(userName='user', password='pwd')
ra=s.post('http://example/checklogin.php', data=login_data)
print ra.content
print ra.headers
ans = dict(answer='5')
f=s.cookies
r=s.post('http://example/level1.php',data=ans,cookies=f)
print r.content
But the second post request returns a 404 error, can someone help me why ?
In the latest version of requests, the sessions object comes equipped with Cookie Persistence, look at the requests Sessions ojbects docs.
So you don't need add the cookie artificially.
Just
import requests
s=requests.Session()
login_data = dict(userName='user', password='pwd')
ra=s.post('http://example/checklogin.php', data=login_data)
print ra.content
print ra.headers
ans = dict(answer='5')
r=s.post('http://example/level1.php',data=ans)
print r.content
Just print the cookie to look up wheather you were logged.
for cookie in s.cookies:
print (cookie.name, cookie.value)
And is the example site is yours?
If not maybe the site reject the bot/crawler !
And you can change your requests's user-agent as looks likes you are using a browser.
For example:
import requests
s=requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36'
}
login_data = dict(userName='user', password='pwd')
ra=s.post('http://example/checklogin.php', data=login_data, headers=headers)
print ra.content
print ra.headers
ans = dict(answer='5')
r=s.post('http://example/level1.php',data=ans, headers = headers)
print r.content
Related
I want to login to a site (webnovel.com) through facebook. Here is my Code:
import requests
from bs4 import BeautifulSoup
import re
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'
}
with requests.session() as session:
session.headers.update(headers)
homepage=session.get('https://www.webnovel.com')
loginFb=session.get('https://ptlogin.webnovel.com/login/facebook?appid=900&areaid=1&returnurl=https%3A%2F%2Fwww.webnovel.com%2FloginSuccess&auto=1&autotime=0&source=&ver=2&fromuid=0&target=iframe&option=&logintab=&popup=1&format=redirect', allow_redirects=False)
loginFb2=session.get(loginFb.headers['Location'], allow_redirects=False)
loginFb3=session.get(loginFb2.headers['Location'], allow_redirects=False)
soup=BeautifulSoup(loginFb3.text,'html.parser')
action_url = soup.find('form', id='login_form')['action']
link=re.search(r'https.*',action_url).group()
hexInLink=re.findall(r'%[0-9A-Z]{2,2}',link)
for item in hexInLink:
link=link.replace(item,bytearray.fromhex(item[1:]).decode())
action_url=link
inputs = soup.find('form', id='login_form').findAll('input', {'type': ['hidden', 'submit']})
post_data = {input.get('name'): input.get('value') for input in inputs}
post_data['email'] = 'email'
post_data['pass'] = 'pass'
scripts = soup.findAll('script')
scripts_string = '/n/'.join([script.text for script in scripts])
datr_search = re.search(r'\["_js_datr","([^"]*)"', scripts_string, re.DOTALL)
if datr_search:
datr = datr_search.group(1)
cookies = {'_js_datr' : datr}
r=session.post(action_url, data=post_data, cookies=cookies, allow_redirects=True)
print(r.history[0].url)
But i get redirected to the same facebook page without being logged in. Apparently the POST request method doesn't get accepted but i don't know what else to change or what information to add.
Thank you for your help
Check is there _js_datr on the first page. I had the same problem and found that there was no
datr sometimes.
Also, you can save all pages in a separate file with loginFb3.text and r.text content for debug
I'm using this code to try and do some web scraping. I'm trying to access my school grades using requests and beautiful soup and I'm having a lot of trouble logging in. I just get the error:
TypeError: 'NoneType' object has no attribute '__getitem__'
Here's the code that I'm using:
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
login_data = {
'name': 'my_username',
'pass': 'my_password',
'form_id': 'new_login_form',
'op': 'Login'
}
with requests.Session() as s:
url = 'https://irc.d125.org'
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html5lib')
login_data['form_build_id'] = soup.find('input', attrs={'name': 'form_build_id'})['value']
r = s.post(url, data=login_data, headers=headers)
print(r.content)
Any help is appreciated! Thanks so much!
When the login button is pressed, the site sends an xhr request with the login information. The following should work, just replace your username and password in the space provided.
Code
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
login_data = {
"UserName": "REPLACE_USER", # Enter Username
"Password": "REPLACE_PASSWORD", # Enter password
"RememberMe": False,
}
with requests.Session() as s:
url = 'https://irc.d125.org/Login'
s.get(url, headers=headers)
r = s.post(url, data=login_data)
print(r.text)
You should use something to render the javascript of the webpage before posting the data. A good approach to do that is to put your login script inside a Scrapy spider in combination with Splash:
see https://github.com/scrapy-plugins/scrapy-splash
You can use selenium. I use it to get my grades from school page, too.
I'm trying to set the user agent for my urllib request:
opener = urllib.request.build_opener(
urllib.request.HTTPCookieProcessor(cj),
urllib.request.HTTPRedirectHandler(),
urllib.request.ProxyHandler({'http': proxy})
)
and finally:
response3 = opener.open("https://www.google.com:443/search?q=test", timeout=timeout_value).read().decode("utf-8")
What would be the best way to set the user-agent header to
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36
With urllib we have two options, as far as I know.
build_opener returns a OpenerDirector object, which has an addheaders attribute. We can change the user-agent and other headers with that attribute.
opener.addheaders = [('User-Agent', 'My User-Agent')]
url = 'http://httpbin.org/user-agent'
r = opener.open(url, timeout=5)
text = r.read().decode("utf-8")
Alternatively, we can install the OpenerDirector object to the global opener with install_opener and use urlopen to submit the request. Now can use Request to set the headers.
urllib.request.install_opener(opener)
url = 'http://httpbin.org/user-agent'
headers = {'user-agent': "My User-Agent"}
req = urllib.request.Request(url, headers=headers)
r = urllib.request.urlopen(req, timeout=5)
text = r.read().decode("utf-8")
Personally, I prefer the second method because it is more consistent. Once we install the opener all requests will have the same handlers, and we can continue using urllib the same way. However, if you don't want to use those handlers for all requests you should choose the first method and use addheaders to set headers for a specific OpenerDirector object.
With requests things are simpler.
We can use the session.heders attribute if we want to change the user-agent or other headers for all requests,
s = requests.session()
s.headers['user-agent'] = "My User-Agent"
r = s.get(url, timeout=5)
or use the headers parameter if we want to set headers for a specific request only.
headers = {'user-agent': "My User-Agent"}
r = requests.get(url, headers=headers, timeout=5)
I try to login to the member area of the following website :
https://trader.degiro.nl/
Unfortunately, I tried many way without success.
The post form since to be a json it's the reason why I sent a json instead of the post data
import requests
session = requests.Session()
data = {"username":"test", "password":"test", "isRedirectToMobile": "false", "loginButtonUniversal": ""}
url = "https://trader.degiro.nl/login/#/login"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36'}
r = session.post(url, headers=headers, json={'json_payload': data})
Does any one have a idea why it doesn't work ?
Looking at the request my browser sends, the code should be:
url = "https://trader.degiro.nl/login/secure/login"
...
r = session.post(url, headers=headers, json=data)
That is, there's no need to wrap the data in json_payload and the url is slightly different to the one for viewing the login page.
I have tried logging into GitHub using the following code:
url = 'https://github.com/login'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'login':'username',
'password':'password',
'authenticity_token':'Token that keeps changing',
'commit':'Sign in',
'utf8':'%E2%9C%93'
}
res = requests.post(url)
print(res.text)
Now, res.text prints the code of login page. I understand that it maybe because the token keeps changing continuously. I have also tried setting the URL to https://github.com/session but that does not work either.
Can anyone tell me a way to generate the token. I am looking for a way to login without using the API. I had asked another question where I mentioned that I was unable to login. One comment said that I am not doing it right and it is possible to login just by using the requests module without the help of Github API.
ME:
So, can I log in to Facebook or Github using the POST method? I have tried that and it did not work.
THE USER:
Well, presumably you did something wrong
Can anyone please tell me what I did wrong?
After the suggestion about using sessions, I have updated my code:
s = requests.Session()
headers = {Same as above}
s.put('https://github.com/session', headers=headers)
r = s.get('https://github.com/')
print(r.text)
I still can't get past the login page.
I think you get back to the login page because you are redirected and since your code doesn't send back your cookies, you can't have a session.
You are looking for session persistance, requests provides it :
Session Objects The Session object allows you to persist certain
parameters across requests. It also persists cookies across all
requests made from the Session instance, and will use urllib3's
connection pooling. So if you're making several requests to the same
host, the underlying TCP connection will be reused, which can result
in a significant performance increase (see HTTP persistent
connection).
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
http://docs.python-requests.org/en/master/user/advanced/
Actually in post method the request parameters should be in request body, not in header.So the login data should be in data parameter.
For github, authenticity token is present in value attribute of an input tag which is extracted using BeautifulSoup library.
This code works fine
import requests
from getpass import getpass
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': input('Username: '),
'password': getpass()
}
url = 'https://github.com/session'
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html5lib')
login_data['authenticity_token'] = soup.find(
'input', attrs={'name': 'authenticity_token'})['value']
response = session.post(url, data=login_data, headers=headers)
print(response.status_code)
response = session.get('https://github.com', headers=headers)
print(response.text)
This code works perfectly
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
login_data = {
'commit': 'Sign in',
'utf8': '%E2%9C%93',
'login': 'your-username',
'password': 'your-password'
}
with requests.Session() as s:
url = "https://github.com/session"
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html5lib')
login_data['authenticity_token'] = soup.find('input', attrs={'name': 'authenticity_token'})['value']
r = s.post(url, data=login_data, headers=headers)
You can also try using the PyGitHub API to perform common git tasks.
Check the link below:
https://github.com/PyGithub/PyGithub