I am trying to write a small web-based proxy using python, I can fetch and show normal websites, but I can not login to facebook/gmail/...anything with login .
I have seen some examples of authentication here
http://docs.python.org/release/2.5.2/lib/urllib2-examples.html but I don't know how I can make a general solution for all web sites with login , any idea?
my code is :
def showurl():
url=request.vars.url
response = urllib2.urlopen(url)
html = response.read()
return html
Your proxy-server needs to store cookies, search stackoverflow for cookielib.
Many web sites authenticate clients in different way, so your job is to fake client as much as possible with your proxy-server. Some web sites authenticate by browser type, some by creating cookies and storing sessionId in it, or other JavaScript hidden content that allows to do some authentication steps.
As far as my small experience, all important stuff ends in cookies.
This is just flat example how to use cookielib.
import urllib, urllib2, cookielib, getpass
username = ''
button = 'submit'
www_login = 'http://website.com'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders.append(('User-agent', 'Mozilla/4.0'))
opener.addheaders.append( ('Referer', '/dev/null') )
login_data = urllib.urlencode({'username' : username, 'password': getpass.getpass("Password:"), 'login' : button})
resp = opener.open(www_login, login_data)
print resp.read()
EDITED:
Don't mislead yourself with "Basic HTTP Authentication" and authentication by facebook/gmail because it is different stuff. "Basic HTTP Authentication" or "Digest HTTP Authentication" is done by web-server not web-site that you want to log in.
http://www.voidspace.org.uk/python/articles/authentication.shtml#id24
Related
I have a script for Python 2 to login into a webpage and then move inside to reach a couple of files pointed to on the same site, but different pages. Python 2 let me open the site with my credentials and then create a opener.open() to keep the connection available to navigate to the other pages.
Here's the code that worked in Python 2:
$Your admin login and password
LOGIN = "*******"
PASSWORD = "********"
ROOT = "https:*********"
#The client have to take care of the cookies.
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
#POST login query on '/login_handler' (post data are: 'login' and 'password').
req = urllib2.Request(ROOT + "/login_handler",
urllib.urlencode({'login': LOGIN,
'password': PASSWORD}))
opener.open(rep)
#Set the right accountcode
for accountcode, queues in QUEUES.items():
req = urllib2.Request(ROOT + "/switch_to" + accountcode)
opener.open(req)
I need to do the same thing in Python 3. I have tried with request module and urllib, but although I can establish the first login, I don't know how to keep the opener to navigate the site. I found the OpenerDirector but it seems like I don't know how to do it, because I haven't reached my goal.
I have used this Python 3 code to get the result desired but unfortunately I can't get the csv file to print it.
enter image description here
Question: I don't know how to keep the opener to navigate the site.
Python 3.6ยป Documentation urllib.request.build_opener
Use of Basic HTTP Authentication:
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.example.com/login.html')
csv_content = f.read()
Use python requests library for python 3 and session.
http://docs.python-requests.org/en/master/user/advanced/#session-objects
Once you login your session will be automatically managed. You dont need to create your own cookie jar. Following is the sample code.
s = requests.Session()
auth={"login":LOGIN,"pass":PASS}
url=ROOT+/login_handler
r=s.post(url, data=auth)
print(r.status_code)
for accountcode, queues in QUEUES.items():
req = s.get(ROOT + "/switch_to" + accountcode)
print(req.text) #response text
Im trying to login and scrape a job site and send me notification when ever certain key words are found.I think i have correctly traced the xpath for the value of feild "login[iovation]" but i cannot extract the value, here is what i have done so far to login
import requests
from lxml import html
header = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)"}
login_url = 'https://www.upwork.com/ab/account-security/login'
session_requests = requests.session()
#get csrf
result = session_requests.get(login_url)
tree=html.fromstring(result.text)
auth_token = list(set(tree.xpath('//*[#name="login[_token]"]/#value')))
auth_iovat = list(set(tree.xpath('//*[#name="login[iovation]"]/#value')))
# create payload
payload = {
"login[username]": "myemail#gmail.com",
"login[password]": "pa$$w0rD",
"login[_token]": auth_token,
"login[iovation]": auth_iovation,
"login[redir]": "/home"
}
#perform login
scrapeurl='https://www.upwork.com/ab/find-work/'
result=session_requests.post(login_url, data = payload, headers = dict(referer = login_url))
#test the result
print result.text
This is screen shot of form data when i login successfully
This is because upworks uses something called iOvation (https://www.iovation.com/) to reduce fraud. iOvation uses digital fingerprint of your device/browser, which are sent via login[iovation] parameter.
If you look at the javascripts loaded on your site, you will find two javascript being loaded from iesnare.com domain. This domain and many others are owned by iOvaiton to drop third party javascript to identify your device/browser.
I think if you copy the string from the successful login and send it over along with all the http headers as is including the browser agent in python code, you should be okie.
Are you sure that result is fetching 2XX code
When I am this code result = session_requests.get(login_url)..its fetching me a 403 status code, which means I am not going to login_url itself
They have an official API now, no need for scraping, just register for API keys.
I am planning to write a website crawler in Python using Requests and PyQuery.
However, the site I am targeting requires me to be signed into my account. Using Requests, is it possible for me to establish a session with the server (using my credentials for the site), and use this session to crawl sites that I have access to only when logged in?
I hope this question is clear, thank you.
Yes it is possible.
I don't know about PyQuery but I've made crawlers that log in to sites using urllib2.
All you need is to use cookiejar to handle cookies and send the login form using a request.
If you ask something more specific I will try to be more explicit too.
LE:
urllib2 is not a mess. It's the best library for such things in my opinion.
Here's a code snipet that will log in to a site (after that you can just parse the site normally):
import urllib
import urllib2
import cookielib
"""Adding cookie support"""
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
"""Next we will log in to the site. The actual url will be different and also the data.
You should check the log in form to see what parameters it takes and what values.
"""
data = {'username' : 'foo',
'password' : 'bar'
}
data = urllib.urlencode(data)
urllib2.urlopen('http://www.siteyouwanttoparse.com/login', data) #this should log us in
"""Now you can parse the site"""
html = urllib2.urlopen('http://www.siteyoutwanttoparse.com').read()
print html
I'm developing a client for some website,
when I use Chrome/Firefox to access the website, it writes some cookies in my local side, in addition to the Cookie field in HTTP response,
I need to extract those additional information from my local files to send a request which can be accepted by the remote server successfully
Can anyone tell me how to do it in Python?
Best,
You have many options. The best one seems to be to use urllib2. Take a look at How to use Python to login to a webpage and retrieve cookies for later usage? for some excellent answers.
Here's the code from the top answer there. It's to log in, set some cookies, and access a restricted page:
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypassword'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()
I'm trying to use a code read in Kent's Korner for Form-based authentication. At least I'm told the web site I'm trying to read is form-based authenticated.
But I don't seem to be able to get past the login page. The code I'm using is
Import urllib, urllib2, cookielib, string
# configure an opener that will handle cookies
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
# use the opener to POST to the login form and the protected page
params = urllib.urlencode(dict(username='user', password='stuff'))
f = opener.open('http://www.hammernutrition.com/forums/memberlist.php?mode=viewprofile&u=1323', params)
data = f.read()
f.close()
f = opener.open('http://www.hammernutrition.com/forums/memberlist.php?mode=viewprofile&u=1323')
data = f.read()
f.close()
You can simulate a web browser in Python without using much too resources with mechanize
(Debian/Ubuntu package is called python-mechanize). It handles both cookies and submitting forms, just the way a web browser would do, one great example is a Python Dropbox Uploader script, which you can transform to your needs.