I am struggling with python ,I want to Write a Python script that creates a cookie and Counts how many times the cookie is called during a session
I just have tried as per below:
import Cookie
import os
if os.environ.has_key('HTTP_COOKIE'):
cookie=SimpleCookie(os.environ['HTTP_COOKIE'])
cookie=SimpleCookie()
for key in initialvalues.keys():
if not cookie.has_key(key):
cookie[key]=intialvalues[key]
return cookie
if __name__=='__main__':
c=getCookie({'counter':0})
c['counter']=int(c['counter'].value)+1
print c
But I know it is wrong, can someone help me to write down the script?
Any help would be appreciated
I'm confused by your question. What I believe you want to do is request some webpage and count how many times your cookie was found? You can gather cookies using a CookieJar:
import urllib, urllib2, cookielib
url = "http://example.com/cookies"
form_data = {'username': '', 'password': '' }
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
form_data = urllib.urlencode(form_data)
# data returned from this pages contains redirection
resp = opener.open(url, form_data)
print resp.read()
for cookie in jar:
# Look for your cookie
Related
I am trying to login to a website using python.
The login URL is :
https://login.flash.co.za/apex/f?p=pwfone:login
and the 'form action' url is shown as :
https://login.flash.co.za/apex/wwv_flow.accept
When I use the ' inspect element' on chrome when logging in manually, these are the form posts that show up (pt_02 = password):
There a few hidden items that I'm not sure how to add into the python code below.
When I use this code, the login page is returned:
import requests
url = 'https://login.flash.co.za/apex/wwv_flow.accept'
values = {'p_flow_id': '1500',
'p_flow_step_id': '101',
'p_page_submission_id': '3169092211412',
'p_request': 'LOGIN',
'p_t01': 'solar',
'p_t02': 'password',
'p_checksum': ''
}
r = requests.post(url, data=values)
print r.content
How can I adjust this code to perform a login?
Chrome network:
This is more or less your script should look like. Use session to handle the cookies automatically. Fill in the username and password fields manually.
import requests
from bs4 import BeautifulSoup
logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"
posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'
with requests.Session() as s:
s.headers = {"User-Agent":"Mozilla/5.0"}
res = s.get(logurl)
soup = BeautifulSoup(res.text,"lxml")
values = {
'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],
'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],
'p_instance': soup.select_one("[name='p_instance']")['value'],
'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],
'p_request': 'LOGIN',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t01': 'username',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t02': 'password',
'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],
'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']
}
r = s.post(posturl, data=values)
print r.content
since I cannot recreate your case I can't tell you what exactly to change, but when I was doing such things I used Postman to intercept all requests my browser sends. So I'd install that, along with browser extension and then perform login. Then you can view the request in Postman, also view the response it received there, what's more it provides you with Python code of request too, so you could simply copy and use it then.
Shortly, use Pstman, perform login, clone their request.
I need to log me in a website with requests, but all I have try don't work :
from bs4 import BeautifulSoup as bs
import requests
s = requests.session()
url = 'https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5'
def authenticate():
headers = {'username': 'myuser', 'password': 'mypasss', '_Id': 'submit'}
page = s.get(url)
soup = bs(page.content)
value = soup.form.find_all('input')[2]['value']
headers.update({'value_name':value})
auth = s.post(url, params=headers, cookies=page.cookies)
authenticate()
or :
import requests
payload = {
'inUserName': 'user',
'inUserPass': 'pass'
}
with requests.Session() as s:
p = s.post('https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5', data=payload)
print(p.text)
print(p.status_code)
r = s.get('A protected web page url')
print(r.text)
When I try this with the .status_code, it return 200 but I want 401 or 403 for do a script like 'if login'...
I have found this but I think it works in python 2, but I use python 3 and I don't know how to convert... :
import requests
import sys
payload = {
'username': 'sopier',
'password': 'somepassword'
}
with requests.Session(config={'verbose': sys.stderr}) as c:
c.post('http://m.kaskus.co.id/user/login', data=payload)
r = c.get('http://m.kaskus.co/id/myform')
print 'sopier' in r.content
Somebody know how to do ?
Because each I have test test all script I have found and it don't work...
When you submit the logon, the POST request is sent to https://www.ent-place.fr/CookieAuth.dll?Logon not https://www.ent-place.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=5 -- You get redirected to that URL afterwards.
When I tested this, the post request contains the following parameters:
curl:Z2F
flags:0
forcedownlevel:0
formdir:5
username:username
password:password
SubmitCreds.x:69
SubmitCreds.y:9
SubmitCreds:Ouvrir une session
So, you'll likely need to supply those additional parameters as well.
Also, the line s.post(url, params=headers, cookies=page.cookies) is not correct. You should pass headers into the keyword argument data not params -- params encodes to the request url -- you need to pass it in the form data. And I'm assuming you really mean payload when you say headers
s.post(url, data=headers, cookies=page.cookies)
The site you're trying to login to has an onClick JavaScript when you process the login form. requests won't be able to execute JavaScript for you. This may cause issues with the site functionality.
I am new to Python and Web Scraping and I am trying to write a very basic script that will get data from a webpage that can only be accessed after logging in. I have looked at a bunch of different examples but none are fixing the issue. This is what I have so far:
from bs4 import BeautifulSoup
import urllib, urllib2, cookielib
username = 'name'
password = 'pass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()
As of right now when I print the page it just prints the contents of the page as if I was not logged in. I think the issue has something to do with the way I am setting the cookies but I am really not sure because I do not fully understand what is happening with the cookie processor and its libraries.
Thank you!
Current Code:
import requests
import sys
EMAIL = 'usr'
PASSWORD = 'pass'
URL = 'https://connect.lehigh.edu/app/login'
def main():
# Start a session so we can have persistant cookies
session = requests.session(config={'verbose': sys.stderr})
# This is the form data that the page sends when logging in
login_data = {
'username': EMAIL,
'password': PASSWORD,
'LOGIN': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')
if __name__ == '__main__':
main()
You can use the requests module.
Take a look at this answer that i've linked below.
https://stackoverflow.com/a/8316989/6464893
I'm coding a crawler for www.researchgate.net, but it seems that I'll be stuck in the login page forever.
Here's my code:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
params = {'login': 'my_email', 'password': 'my_password'}
session.post("https://www.researchgate.net/application.Login.html", data = params)
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title
Can anybody find anything wrong with my code? Why does s redirect to login page every time?
There are hidden fields in the login form that probably need to be supplied (I can't test - I don't have a login there).
One is request_token which is set to a long base64 encoded string. Others are invalidPasswordCount and loginCookie which might also be required.
Further to that there is a session cookie that you might need to send with the login credentials.
To make this work will require an initial GET to get the request_token, which you need to extract somehow - e.g. with BeautifulSoup. If you use your requests session then the cookie will be presented in the following POST, so you shouldn't need to worry about that.
import requests
from bs4 import BeautifulSoup
session = requests.Session()
# initial GET to retrieve token and set cookies
r = session.get('https://www.researchgate.net/application.Login.html')
soup = r.BeautifulSoup(r.text)
request_token = soup.find('input', attrs={'name':'request_token'})['value']
params = {'login': 'my_email', 'password': 'my_password', 'request_token': request_token, 'invalidPasswordCount': 0, 'loginCookie': 'yes'}
session.post("https://www.researchgate.net/application.Login.html", data=params)
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title
Thanks for mhawke, I modified my original code as he suggested and I finally logged in successfully.
Here's my new code:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
loginpage = session.get("https://www.researchgate.net/application.Login.html")
request_token = BeautifulSoup(loginpage.text).form.find("input",{"name":"request_token"}).attrs["value"]
print request_token
params = {"request_token":request_token,
"invalidPasswordCount":"0",
'login': 'my_email',
'password': 'my_password',
"setLoginCookie":"yes"
}
session.post("https://www.researchgate.net/application.Login.html", data = params)
#print s.cookies.get_dict()
s = session.get("https://www.researchgate.net/search.Search.html?type=researcher&query=zhang")
print BeautifulSoup(s.text).title
After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?
Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text
For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml