I've set up a system to practice making a Padding Oracle attack and after much work I've discovered that my exploit isn't working because my code isn't maintaining state with a cookie! After reading up on cookies I could still use a little help on modifying my code so it properly maintains state.
I start off by making my cookie jar. This should also grab the cookie from the site I want (to my understanding):
cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
opener.open('http://192.168.1.12/main_login.php')
I have normal working code that grabs the website data so I can parse it through BeautifulSoup
usock = urllib2.urlopen("http://192.168.1.12/main_login.php")
data = usock.read()
usock.close()
And sends the post with the appropriate data:
url = 'http://192.168.1.3/check_login.php'
values = {'login_captcha': CAPTCHAguess, 'captchaID': BogusCipher, 'iv': IVprime}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
response.close()
What do I need to change in the above two bits of code so that it will use the cookie to maintain state when pulling and POSTing the data?
You need to use the same opener for all requests for this to work. So instead of:
response = urllib2.urlopen(req)
use:
response = opener.open(req)
Mandatory note in these cases: consider using the excellent requests library
Related
I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'.
This is my code:
import urllib
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
req = Request('https://www.thewebsite.com/')
webpage = urlopen(req).read()
print(webpage)
I'm new to urllib so please answer me as a beginner
To do this with urllib, you need to:
Construct a Cookie object. The constructor isn't documented in the docs, but if you help(http.cookiejar.Cookie) in the interactive interpreter, you can see that its constructor demands values for all 16 attributes. Notice that the docs say, "It is not expected that users of http.cookiejar construct their own Cookie instances."
Add it to the cookiejar with cj.set_cookie(cookie).
Tell the cookiejar to add the correct headers to the request with cj.add_cookie_headers(req).
Assuming you've configured the policy correctly, you're set.
But this is a huge pain. As the docs for urllib.request say:
See also The Requests package is recommended for a higher-level HTTP client interface.
And, unless you have some good reason you can't install requests, you really should go that way. urllib is tolerable for really simple cases, and it can be handy when you need to get deep under the covers—but for everything else, requests is much better.
With requests, your whole program becomes a one-liner:
webpage = requests.get('https://www.thewebsite.com/', cookies={'required_cookie': required_value}, headers={'User-Agent': 'Mozilla/5.0'}).text
… although it's probably more readable as a few lines:
cookies = {'required_cookie': required_value}
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.thewebsite.com/', cookies=cookies, headers=headers)
webpage = response.text
With the help of Kite documentation: https://www.kite.com/python/answers/how-to-add-a-cookie-to-an-http-request-using-urllib-in-python
You can add cookie this way:
import urllib
a_request = urllib.request.Request("http://www.kite.com/")
a_request.add_header("Cookie", "cookiename=cookievalue")
or in a different way:
from urllib.request import Request
url = "https://www.kite.com/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0', 'Cookie':'myCookie=lovely'})
I want to use urllib2 through a proxy http site, post to the form component having name="what", click submit, and return the resulting webpage as a string. I know many have asked this question before, see here for example. However, I couldn't get their solutions to work for my example code below:
url = "http://anonymouse.org/anonwww.html"
posturl = "www.google.ca"
values = {'what':posturl}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
html = response.read()
print html
piggybacking on Christian's answer:
requests is a very good library for this stuff... however, urllib2 also suffices:
import urllib2
def get_anon_content(url):
anon_url = 'http://anonymouse.org/cgi-bin/anon-www.cgi/%s' % url
req = urllib2.Request(anon_url)
response = urllib2.urlopen(req)
content = response.read()
return content
url = 'http://www.google.ca'
print get_anon_content(url)
in youre case you can just use this url:
http://anonymouse.org/cgi-bin/anon-www.cgi/http://www.google.ca
its the same thing as using anonymouse except you dont have to go to the site you just use the url
and next time make it easier on youreself and use requests you can get the same effect of urllib but with like 4 lines so check that out
and good luck :)
Using urlopen also for url queries seems obvious. What I tried is:
import urllib2
query='http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
f = urllib2.urlopen(query)
s = f.read()
f.close()
However, for this specific url query it fails with HTTP error 403 forbidden
When entering this query in my browser, it works.
Also when using http://www.httpquery.com/ to submit the query, it works.
Do you have suggestions how to use Python right to grab the correct response?
Looks like it requires cookies... (which you can do with urllib2), but an easier way if you're doing this, is to use requests
import requests
session = requests.session()
r = session.get('http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627')
This is generally a much easier and less-stressful method of retrieving URLs in Python.
requests will automatically store and re-use cookies for you. Creating a session is slightly overkill here, but is useful for when you need to submit data to login pages etc..., or re-use cookies across a site... etc...
using urllib2 is something like
import urllib2, cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookies) )
data = opener.open('url').read()
It appears that the urllib2 default user agent is banned by the host. You can simply supply your own user agent string:
import urllib2
url = 'http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
request = urllib2.Request(url, headers={"User-Agent" : "MyUserAgent"})
contents = urllib2.urlopen(request).read()
print contents
I want to crawl bets of bookmakers directly from their webpages. Currently I try to get the quotes from a provider called unibet.com. The problem: I need to send a post request in order to get an appropriate filtering of the quotes I want.
Therefore I go to the following webpage https://www.unibet.com/betting/grid/all-football/germany/bundesliga/1000094994.odds# where in the upper part of the bets section are several checkboxes. I uncheck every box instead of "Match". Then I click on the update Button and recorded the post request with chrome. The following screenshot demonstrates what is being sent:
After that I get a filtered result that only contains the quotes for a match.
Now, I just want to have these quotes. Therefore I wrote the following python code:
req = urllib2.Request( 'https://www.unibet.com/betting/grid/grid.do?eventGroupIds=1000094994' )
req.add_header("Content-type", "application/x-www-form-urlencoded")
post_data = [ ('format','iframe'),
('filtered','true'),
('gridSelectedTab','1'),
('_betOfferCategoryTab.filterOptions[1_604139].checked','true'),
('betOfferCategoryTab.filterOptions[1_604139].checked','on'),
('_betOfferCategoryTab.filterOptions[1_611318].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611319].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611321].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604144].checked','false'),
('_betOfferCategoryTab.filterOptions[1_624677].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604142].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604145].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611322].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604148].checked','false'),
('gridSelectedTimeframe','')]
post_data = urllib.urlencode(post_data)
req.add_header('Content-Length', len(post_data ))
resp = urllib2.urlopen(req, post_data )
html = resp.read()
The problem: Instead of a filtered result I get the full list of all quotes and bet types as if all checkboxes had been checked. I do not understand why my python request returns the unfiltered data?
The site stores your preferences in a session cookie. Because you're not capturing and sending the appropriate cookie, upon updating the site presents its default results.
Try this:
import cookielib
cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cookiejar),
)
Now, instead of using urllib2.open() just call opener as a function call: opener() and pass your args.
I'm trying to scrape an excel file from a government "muster roll" database. However, the URL I have to access this excel file:
http://nrega.ap.gov.in/Nregs/FrontServlet?requestType=HouseholdInf_engRH&hhid=192420317026010002&actionVal=musterrolls&type=Normal
requires that I have a session cookie from the government site attached to the request.
How could I grab the session cookie with an initial request to the landing page (when they give you the session cookie) and then use it to hit the URL above to grab our excel file? I'm on Google App Engine using Python.
I tried this:
import urllib2
import cookielib
url = 'http://nrega.ap.gov.in/Nregs/FrontServlet?requestType=HouseholdInf_engRH&hhid=192420317026010002&actionVal=musterrolls&type=Normal'
def grab_data_with_cookie(cookie_jar, url):
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
data = opener.open(url)
return data
cj = cookielib.CookieJar()
#grab the data
data1 = grab_data_with_cookie(cj, url)
#the second time we do this, we get back the excel sheet.
data2 = grab_data_with_cookie(cj, url)
stuff2 = data2.read()
I'm pretty sure this isn't the best way to do this. How could I do this more cleanly, or even using the requests library?
Using requests this is a trivial task:
>>> url = 'http://httpbin.org/cookies/set/requests-is/awesome'
>>> r = requests.get(url)
>>> print r.cookies
{'requests-is': 'awesome'}
Using cookies and urllib2:
import cookielib
import urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# use opener to open different urls
You can use the same opener for several connections:
data = [opener.open(url).read() for url in urls]
Or install it globally:
urllib2.install_opener(opener)
In the latter case the rest of the code looks the same with or without cookies support:
data = [urllib2.urlopen(url).read() for url in urls]