I am trying to access the Buxfer REST API using Python and urllib2.
The issue is I get the following response:
urllib2.HTTPError: HTTP Error 403: Forbidden
But when I try the same call through my browser, it works fine...
The script goes as follows:
username = "xxx#xxxcom"
password = "xxx"
#############
def checkError(response):
result = simplejson.load(response)
response = result['response']
if response['status'] != "OK":
print "An error occured: %s" % response['status'].replace('ERROR: ', '')
sys.exit(1)
return response
base = "https://www.buxfer.com/api";
url = base + "/login?userid=" + username + "&password=" + password;
req = urllib2.Request(url=url)
response = checkError(urllib2.urlopen(req))
token = response['token']
url = base + "/budgets?token=" + token;
req = urllib2.Request(url=url)
response = checkError(urllib2.urlopen(req))
for budget in response['budgets']:
print "%12s %8s %10.2f %10.2f" % (budget['name'], budget['currentPeriod'], budget['limit'], budget['remaining'])
sys.exit(0)
I also tried using the requests library but the same error appears.
The server I am tring to access from is an ubuntu 14.04, any help on explaining or solving why this happens will be appreciated.
EDIT:
This is the full error message:
{
'cookies': <<class 'requests.cookies.RequestsCookieJar'>[]>,
'_content': '
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /api/login
on this server.</p>
<hr>
<address>Apache/2.4.7 (Ubuntu) Server at www.buxfer.com Port 443</address>
</body></html>',
headers': CaseInsensitiveDict({
'date': 'Sun, 31 Jan 2016 12:06:44 GMT',
'content-length': '291',
'content-type': 'text/html; charset=iso-8859-1',
'server': 'Apache/2.4.7 (Ubuntu)'
}),
'url': u'https://www.buxfer.com/api/login?password=xxxx&userid=xxxx%40xxxx.com',
'status_code': 403,
'_content_consumed': True,
'encoding': 'iso-8859-1',
'request': <PreparedRequest [GET]>,
'connection': <requests.adapters.HTTPAdapter object at 0x7fc7308102d0>,
'elapsed': datetime.timedelta(0, 0, 400442),
'raw': <urllib3.response.HTTPResponse object at 0x7fc7304d14d0>,
'reason': 'Forbidden',
'history': []
}
EDIT 2: (Network parameters in GoogleChrome browser)
Request Method:GET
Status Code:200 OK
Remote Address:52.20.61.39:443
Response Headers
view parsed
HTTP/1.1 200 OK
Date: Mon, 01 Feb 2016 11:01:10 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Frame-Options: SAMEORIGIN
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Cache-Controle: no-cache
Set-Cookie: login=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=buxfer.com
Set-Cookie: remember=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=buxfer.com
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-javascript; charset=utf-8
Request Headers
view source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,fr;q=0.6
Connection:keep-alive
Cookie:PHPSESSID=pjvg8la01ic64tkkfu1qmecv20; api-session=vbnbmp3sb99lqqea4q4iusd4v3; __utma=206445312.1301084386.1454066594.1454241953.1454254906.4; __utmc=206445312; __utmz=206445312.1454066594.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Host:www.buxfer.com
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
EDIT 3:
I can also access through my local pycharm console without any issues, it's just when I try to do it from my remote server...
It could be that you need to do a POST rather than a GET request. Most logins work this way.
Using the requests library, you would need
response = requests.post(
base + '/login',
data={
'userid': username,
'password': password
}
)
To affirm #Carl from the offical website:
This command only looks at POST parameters and discards GET parameters.
https://www.buxfer.com/help/api#login
Related
I'm a bit new to the python requests library, and am having some trouble accessing cookies after forms authentication. I captured packets in wireshark, and I'm certain that the cookies are being set (HTTP stream output at bottom). I'm following documentation here: https://2.python-requests.org/en/master/user/quickstart/#cookies.
My request is as follows:
r = requests.post('http://192.168.2.111/Account/LogOn', data = {'User': 'Customer', 'Password': 'Customer', 'button': 'submit'})
If I invoke
print(r.cookies), The only thing that is returned is <RequestsCookieJar[]>
OR, if I try to access them using r.cookies['mydeviceAG_POEWebTool'] I get a key error: KeyError: "name='mydeviceAG_POEWebTool', domain=None, path=None"
The HTTP Stream from wireshark is here:
POST /Account/LogOn HTTP/1.1
Host: 192.168.2.111
User-Agent: python-requests/2.24.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 49
Content-Type: application/x-www-form-urlencoded
User=Customer&Password=Customer&button=submitHTTP/1.1 302 Found
Date: Thu, 24 Sep 2020 10:30:43 GMT
Server: Apache/2.4.25 (Debian)
Location: /States
X-AspNet-Version: 4.0.30319
Cache-Control: private
Set-Cookie: mydeviceAG_POEWebTool=8081AD7E6EEEB463C0AD8458; path=/
Set-Cookie: .mydeviceAG_POEWebTool_AUTH=jmnjqmKeL0ge8fz/sYrP3Xm+ntUTnPLrWBGtKAmAvnkKIjPKYQVn9xVrRa7EUEHLTfB1KNCKjotabnb7QqnDlQlZKuQkJ0J8rLmxuxrtCMFDsa/d6jyUj/PUckJ8V0Te; path=/; expires=Thu, 24 Sep 2020 13:30:44 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 112
Keep-Alive: timeout=5, m=100
Connection: Keep-Alive
Content-Type: text/html
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here</h2>
</body><html>
** After reading the link offered by #D-E-N below, and searching around a bit, I tried the code below, which gets me further, but only stores the first of two cookies in the cookie jar. The server sends back two cookies:
Set-Cookie: AG_POEWebTool=9E508CCAE50EACB4AD1B33D8; path=/
Set-Cookie: .AG_POEWebTool_AUTH=WaXAtyh/tGnoZFPIiV4xoF6BhwGbr0jlaFq
but the only cookie stored is the AG_POEWebTool one. **
with requests.Session() as s:
s.get('http://192.168.2.111/Account/LogOn', timeout = 5, params = {'ReturnUrl':'%2fLogfiles'}, proxies = proxies)
r = s.post('http://192.168.2.111/Account/LogOn', data = {'User': 'Customer', 'Password': 'Customer', 'button': 'submit'}, timeout = 5, params = {'ReturnUrl':'%2fLogfiles'}, proxies = proxies)
cookiedict = s.cookies.get_dict()
print('URL:')
for item in cookiedict.items():
print(item)
response=s.get('http://192.168.2.111/Logfiles')
This is the response that I get:
cookiedict items:
('AG_POEWebTool', '471B4EB1153E4733E0EA1A40')
Process finished with exit code 0
I am trying to get JSON response from this URL.
But the JSON I see in the browser is different than what I get from python's requests response.
The code and its output:-
#code
import requests
r = requests.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
print("Status code: ", r.status_code)
print("JSON: ", r.json())
print("Headers:\n",r.headers())
#output
Status code: 200
JSON: '{"cart_info": {}, "tab_info": [], "screen_name": ""}'
Headers:
{'Content-Type': 'application/json',
'Content-Length': '52',
'Server': 'nginx',
'x-xss-protection': '1; mode=block',
'x-content-type-options': 'nosniff',
'x-frame-options': 'SAMEORIGIN',
'Access-Control-Allow-Origin': 'https://b2b.bigbasket.com',
'Date': 'Sat, 02 Sep 2017 18:43:51 GMT',
'Connection': 'keep-alive',
'Set-Cookie': '_bb_cid=4; Domain=.bigbasket.com; expires=Fri, 28-Aug-2037 18:43:51 GMT; Max-Age=630720000; Path=/, ts="2017-09-03 00:13:51.164"; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 18:43:51 GMT; Max-Age=31536000; Path=/, _bb_rd=6; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 18:43:51 GMT; Max-Age=31536000; Path=/'}
This is what Chrome shows in dev tools:-
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 4206
Server: nginx
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
Content-Encoding: gzip
x-frame-options: SAMEORIGIN
Access-Control-Allow-Origin: https://b2b.bigbasket.com
Date: Sat, 02 Sep 2017 15:43:20 GMT
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: ts="2017-09-02 21:13:20.193"; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 15:43:20 GMT; Max-Age=31536000; Path=/
Set-Cookie: _bb_rd=6; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 15:43:20 GMT; Max-Age=31536000; Path=/
Also tried separating query string and specifying it as params argument but it is giving the same result.
import requests
s = requests.session()
s.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
r = s.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
print("Status code: ", r.status_code)
print("JSON: ", r.json())
This is happening because of different City ID identified by your web browser and Requests.
You can check value of _bb_cid in both the cases
I am currently trying to get data off http://www.spotrac.com/ that requires being signed in. My current attempt uses this code(which I got by going through a bunch of other stack overflow questions on a similar topic)
from bs4 import BeautifulSoup as bs
from requests import session
payload = {
'id': 'contactForm',
'cmd': 'http://www.spotrac.com/signin/submit/',
'email': '*****',
'password': '*****'
}
with session() as c:
r_login = c.post('http://www.spotrac.com/signin/', data=payload)
print(r_login.headers)
response = c.get('http://www.spotrac.com/nba/cleveland-cavaliers/lebron-james')
print(response.cookies)
soup=bs(response.text, 'html.parser')
with open('ex.html','w') as f:
f.write(soup.prettify())
My current code does everything right, except I am not logged in when I'm making the request.
Thanks
You're sending POST request to a wrong URL, and with an incorrect payload as well.
POST http://www.spotrac.com/signin/submit/ HTTP/1.1
Host: www.spotrac.com
Connection: keep-alive
Content-Length: 86
Cache-Control: max-age=0
Origin: http://www.spotrac.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://www.spotrac.com/signin/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2206021e191bdbbaf955f111f67b961056%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485487245%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dd6089620b21ecce6837161605055ae04; _ga=GA1.2.910256341.1481865346; _gali=contactForm
redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 27 Jan 2017 04:21:16 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22badb1275aee1cdad6736a6b4bb1ce809%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485490876%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dad486866c32cac526487707cea85b8a9; expires=Fri, 10-Feb-2017 04:21:16 GMT; path=/
Location: http://www.spotrac.com/register/
X-Powered-By: PleskLin
MS-Author-Via: DAV
As you can see from above session, the correct url should be http://www.spotrac.com/signin/submit/, and payload string is redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad, which is basically:
payload = {'redirect': 'http://www.spotrac.com/',
'email': mail_address,
'password': password}
Also make sure simulate headers with correct parameters, then you're good to go.
When I try to open http://www.comicbookdb.com/browse.php (which works fine in my browser) from Python, I get an empty response:
>>> import urllib.request
>>> content = urllib.request.urlopen('http://www.comicbookdb.com/browse.php')
>>> print(content.read())
b''
The same also happens when I set a User-agent:
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>>> content = opener.open('http://www.comicbookdb.com/browse.php')
>>> print(content.read())
b''
Or when I use httplib2 instead:
>>> import httplib2
>>> h = httplib2.Http('.cache')
>>> response, content = h.request('http://www.comicbookdb.com/browse.php')
>>> print(content)
b''
>>> print(response)
{'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'content-location': 'http://www.comicbookdb.com/browse.php', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'content-length': '0', 'set-cookie': 'PHPSESSID=590f5997a91712b7134c2cb3291304a8; path=/', 'date': 'Wed, 25 Dec 2013 15:12:30 GMT', 'server': 'Apache', 'pragma': 'no-cache', 'content-type': 'text/html', 'status': '200'}
Or when I try to download it using cURL:
C:\>curl -v http://www.comicbookdb.com/browse.php
* About to connect() to www.comicbookdb.com port 80
* Trying 208.76.81.137... * connected
* Connected to www.comicbookdb.com (208.76.81.137) port 80
> GET /browse.php HTTP/1.1
User-Agent: curl/7.13.1 (i586-pc-mingw32msvc) libcurl/7.13.1 zlib/1.2.2
Host: www.comicbookdb.com
Pragma: no-cache
Accept: */*
< HTTP/1.1 200 OK
< Date: Wed, 25 Dec 2013 15:20:06 GMT
< Server: Apache
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Set-Cookie: PHPSESSID=0a46f2d390639da7eb223ad47380b394; path=/
< Content-Length: 0
< Content-Type: text/html
* Connection #0 to host www.comicbookdb.com left intact
* Closing connection #0
Opening the URL in a browser or downloading it with Wget seems to work fine, though:
C:\>wget http://www.comicbookdb.com/browse.php
--16:16:26-- http://www.comicbookdb.com/browse.php
=> `browse.php'
Resolving www.comicbookdb.com... 208.76.81.137
Connecting to www.comicbookdb.com[208.76.81.137]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 40,687 48.75K/s
16:16:27 (48.75 KB/s) - `browse.php' saved [40687]
As does downloading a different file from the same server:
>>> content = urllib.request.urlopen('http://www.comicbookdb.com/index.php')
>>> print(content.read(100))
b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"\n\t\t"http://www.w3.org/TR/1999/REC-html'
So why doesn't the other URL work?
It seems the server expects a Connection: keep-alive header, which e.g. curl (and I expect the other failing clients too) do not add by default.
With curl you can use this command, which will display a non-empty response:
curl -v -H 'Connection: keep-alive' http://www.comicbookdb.com/browse.php
With Python you can use this code:
import httplib2
h = httplib2.Http('.cache')
response, content = h.request('http://www.comicbookdb.com/browse.php', headers={'Connection':'keep-alive'})
print(content)
print(response)
Hi iam trying to scrap some data off from this URL:
http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1
As you may have noticed, if cookies and session data is not yet set you will be redirected to its base url (http://www.21cineplex.com/)
I tried to do it like this:
def main():
try:
cj = CookieJar()
baseurl = "http://www.21cineplex.com"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(baseurl)
urllib2.install_opener(opener)
movieSource = urllib2.urlopen('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1').read()
splitSource = re.findall(r'<ul class="w462">(.*?)</ul>', movieSource)
print splitSource
except Exception, e:
str(e)
print "Error occured in main Block"
However, i ended up failing to scrap from that particular URL.
A quick inspection reveals that the website is setting a session ID (PHPSESSID) and make a copy to the client's cookie as such.
The question is how do i mitigate such example?
ps: i've tried to install request (via pip) how ever it gives me (404):
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Getting page https://pypi.python.org/simple/
URLs to search for versions for request:
* https://pypi.python.org/simple/request/
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Could not find any downloads that satisfy the requirement request
Cleaning up...
Thanks to #Chainik i got it to work now. I ended up modify my code like this:
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
baseurl = "http://www.21cineplex.com/"
regex = '<ul class="w462">(.*?)</ul>'
opener.open(baseurl)
urllib2.install_opener(opener)
request = urllib2.Request('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1')
request.add_header('Referer', baseurl)
requestData = urllib2.urlopen(request)
htmlText = requestData.read()
Once, the html text is retrieved. It's all about parsing its content.
Cheers
Try setting a referer URL, see below.
Without referer URL set (302 redirect):
$ curl -I "http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Thu, 19 Sep 2013 09:19:19 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=5effe043db4fd83b2c5927818cb1a7ca; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:19 GMT; path=/
Location: http://www.21cineplex.com/
With referer URL set (HTTP/200):
$ curl -I -e "http://www.21cineplex.com/"
"http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 19 Sep 2013 09:19:24 GMT
Content-Type: text/html
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=a7abd6592c87e0c1a8fab4f855baa0a4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:24 GMT; path=/
To set referer URL using urllib, see this post
-- ab1