grabbing cookies through python - python

I have been trying to authenticate to my router for a long time now, and have been utterly unsuccessful, this is my question here. What I am trying to do is to authenticate to my router and grab the session cookie. As after I login into my router using the browser the URL displayed becomes http://192.168.1.2/DQOPHPHAILDUSWQC/userRpm/Index.htm the string between userRpm and the router url itself I am assuming is the Cookie. And then request the URL again with the session Cookie in it.

You can use cookielib, it will automatically handle the cookies for you.
You can load the initial URL and then get the content of the cookie like this:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
data = opener.open("initialURL")
# print(data)
print(cj)

Related

How to post data to a website, store cookies, then get another page

I've tried solving the problem using Mechanize, but I couldn't get it to work.
A website only allows access to the data if cookies are sent after login. I need to do the following:
Log in using POST to a page
Store cookies
Access protected page
You can use a Session in Requests. From the documentation:
The Session object allows you to persist certain parameters across
requests. It also persists cookies across all requests made from the
Session instance.
Here's how a log in and subsequent request might look:
import requests
s = requests.Session(verify='my_cert_file.crt')
r = s.post('https://secure-site.com/login', data={
'username': my_username,
'password': my_password,
})
# logging in sets a cookie which the session remembers
print s.cookies
r = s.get('https://secure-site.com/secure-data')
print r.json()
I came up with the following solution using Mechanize. Cookies are managed by mechanize.Browser.
br = mechanize.Browser()
resp = br.open('https://xxxxxxxxxxxxxxxxxxx')
br.select_form(nr=0)
br['username'] = username
br['password'] = password
response = br.submit()
time.sleep(1)
resp_second = br.open('https://secretwebpage')
print resp_second.read()

Establishing session with web app to crawl

I am planning to write a website crawler in Python using Requests and PyQuery.
However, the site I am targeting requires me to be signed into my account. Using Requests, is it possible for me to establish a session with the server (using my credentials for the site), and use this session to crawl sites that I have access to only when logged in?
I hope this question is clear, thank you.
Yes it is possible.
I don't know about PyQuery but I've made crawlers that log in to sites using urllib2.
All you need is to use cookiejar to handle cookies and send the login form using a request.
If you ask something more specific I will try to be more explicit too.
LE:
urllib2 is not a mess. It's the best library for such things in my opinion.
Here's a code snipet that will log in to a site (after that you can just parse the site normally):
import urllib
import urllib2
import cookielib
"""Adding cookie support"""
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
"""Next we will log in to the site. The actual url will be different and also the data.
You should check the log in form to see what parameters it takes and what values.
"""
data = {'username' : 'foo',
'password' : 'bar'
}
data = urllib.urlencode(data)
urllib2.urlopen('http://www.siteyouwanttoparse.com/login', data) #this should log us in
"""Now you can parse the site"""
html = urllib2.urlopen('http://www.siteyoutwanttoparse.com').read()
print html

read the cookies written by a certain site

I'm developing a client for some website,
when I use Chrome/Firefox to access the website, it writes some cookies in my local side, in addition to the Cookie field in HTTP response,
I need to extract those additional information from my local files to send a request which can be accepted by the remote server successfully
Can anyone tell me how to do it in Python?
Best,
You have many options. The best one seems to be to use urllib2. Take a look at How to use Python to login to a webpage and retrieve cookies for later usage? for some excellent answers.
Here's the code from the top answer there. It's to log in, set some cookies, and access a restricted page:
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypassword'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

how to open facebook/gmail/authencation sites using python urllib2.urlopen?

I am trying to write a small web-based proxy using python, I can fetch and show normal websites, but I can not login to facebook/gmail/...anything with login .
I have seen some examples of authentication here
http://docs.python.org/release/2.5.2/lib/urllib2-examples.html but I don't know how I can make a general solution for all web sites with login , any idea?
my code is :
def showurl():
url=request.vars.url
response = urllib2.urlopen(url)
html = response.read()
return html
Your proxy-server needs to store cookies, search stackoverflow for cookielib.
Many web sites authenticate clients in different way, so your job is to fake client as much as possible with your proxy-server. Some web sites authenticate by browser type, some by creating cookies and storing sessionId in it, or other JavaScript hidden content that allows to do some authentication steps.
As far as my small experience, all important stuff ends in cookies.
This is just flat example how to use cookielib.
import urllib, urllib2, cookielib, getpass
username = ''
button = 'submit'
www_login = 'http://website.com'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders.append(('User-agent', 'Mozilla/4.0'))
opener.addheaders.append( ('Referer', '/dev/null') )
login_data = urllib.urlencode({'username' : username, 'password': getpass.getpass("Password:"), 'login' : button})
resp = opener.open(www_login, login_data)
print resp.read()
EDITED:
Don't mislead yourself with "Basic HTTP Authentication" and authentication by facebook/gmail because it is different stuff. "Basic HTTP Authentication" or "Digest HTTP Authentication" is done by web-server not web-site that you want to log in.
http://www.voidspace.org.uk/python/articles/authentication.shtml#id24

Scrape a web page that requires they give you a session cookie first

I'm trying to scrape an excel file from a government "muster roll" database. However, the URL I have to access this excel file:
http://nrega.ap.gov.in/Nregs/FrontServlet?requestType=HouseholdInf_engRH&hhid=192420317026010002&actionVal=musterrolls&type=Normal
requires that I have a session cookie from the government site attached to the request.
How could I grab the session cookie with an initial request to the landing page (when they give you the session cookie) and then use it to hit the URL above to grab our excel file? I'm on Google App Engine using Python.
I tried this:
import urllib2
import cookielib
url = 'http://nrega.ap.gov.in/Nregs/FrontServlet?requestType=HouseholdInf_engRH&hhid=192420317026010002&actionVal=musterrolls&type=Normal'
def grab_data_with_cookie(cookie_jar, url):
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
data = opener.open(url)
return data
cj = cookielib.CookieJar()
#grab the data
data1 = grab_data_with_cookie(cj, url)
#the second time we do this, we get back the excel sheet.
data2 = grab_data_with_cookie(cj, url)
stuff2 = data2.read()
I'm pretty sure this isn't the best way to do this. How could I do this more cleanly, or even using the requests library?
Using requests this is a trivial task:
>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)
>>> print r.cookies
{'requests-is': 'awesome'}
Using cookies and urllib2:
import cookielib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# use opener to open different urls
You can use the same opener for several connections:
data = [opener.open(url).read() for url in urls]
Or install it globally:
urllib2.install_opener(opener)
In the latter case the rest of the code looks the same with or without cookies support:
data = [urllib2.urlopen(url).read() for url in urls]

Categories

Resources