403 error when accessing API from server and not from browser

403 error when accessing API from server and not from browser - python

I am trying to access the Buxfer REST API using Python and urllib2.
The issue is I get the following response:
urllib2.HTTPError: HTTP Error 403: Forbidden
But when I try the same call through my browser, it works fine...
The script goes as follows:
username = "xxx#xxxcom"
password = "xxx"
#############
def checkError(response):
result = simplejson.load(response)
response = result['response']
if response['status'] != "OK":
print "An error occured: %s" % response['status'].replace('ERROR: ', '')
sys.exit(1)
return response
base = "https://www.buxfer.com/api";
url = base + "/login?userid=" + username + "&password=" + password;
req = urllib2.Request(url=url)
response = checkError(urllib2.urlopen(req))
token = response['token']
url = base + "/budgets?token=" + token;
req = urllib2.Request(url=url)
response = checkError(urllib2.urlopen(req))
for budget in response['budgets']:
print "%12s %8s %10.2f %10.2f" % (budget['name'], budget['currentPeriod'], budget['limit'], budget['remaining'])
sys.exit(0)
I also tried using the requests library but the same error appears.
The server I am tring to access from is an ubuntu 14.04, any help on explaining or solving why this happens will be appreciated.
EDIT:
This is the full error message:
{
'cookies': <<class 'requests.cookies.RequestsCookieJar'>[]>,
'_content': '
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /api/login
on this server.</p>
<hr>
<address>Apache/2.4.7 (Ubuntu) Server at www.buxfer.com Port 443</address>
</body></html>',
headers': CaseInsensitiveDict({
'date': 'Sun, 31 Jan 2016 12:06:44 GMT',
'content-length': '291',
'content-type': 'text/html; charset=iso-8859-1',
'server': 'Apache/2.4.7 (Ubuntu)'
}),
'url': u'https://www.buxfer.com/api/login?password=xxxx&userid=xxxx%40xxxx.com',
'status_code': 403,
'_content_consumed': True,
'encoding': 'iso-8859-1',
'request': <PreparedRequest [GET]>,
'connection': <requests.adapters.HTTPAdapter object at 0x7fc7308102d0>,
'elapsed': datetime.timedelta(0, 0, 400442),
'raw': <urllib3.response.HTTPResponse object at 0x7fc7304d14d0>,
'reason': 'Forbidden',
'history': []
}
EDIT 2: (Network parameters in GoogleChrome browser)
Request Method:GET
Status Code:200 OK
Remote Address:52.20.61.39:443
Response Headers
view parsed
HTTP/1.1 200 OK
Date: Mon, 01 Feb 2016 11:01:10 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Frame-Options: SAMEORIGIN
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Cache-Controle: no-cache
Set-Cookie: login=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=buxfer.com
Set-Cookie: remember=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=buxfer.com
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-javascript; charset=utf-8
Request Headers
view source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,fr;q=0.6
Connection:keep-alive
Cookie:PHPSESSID=pjvg8la01ic64tkkfu1qmecv20; api-session=vbnbmp3sb99lqqea4q4iusd4v3; __utma=206445312.1301084386.1454066594.1454241953.1454254906.4; __utmc=206445312; __utmz=206445312.1454066594.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
Host:www.buxfer.com
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
EDIT 3:
I can also access through my local pycharm console without any issues, it's just when I try to do it from my remote server...

It could be that you need to do a POST rather than a GET request. Most logins work this way.
Using the requests library, you would need
response = requests.post(
base + '/login',
data={
'userid': username,
'password': password
}
)

To affirm #Carl from the offical website:
This command only looks at POST parameters and discards GET parameters.
https://www.buxfer.com/help/api#login

Related

trouble accessing cookies using python requests

I'm a bit new to the python requests library, and am having some trouble accessing cookies after forms authentication. I captured packets in wireshark, and I'm certain that the cookies are being set (HTTP stream output at bottom). I'm following documentation here: https://2.python-requests.org/en/master/user/quickstart/#cookies.
My request is as follows:
r = requests.post('http://192.168.2.111/Account/LogOn', data = {'User': 'Customer', 'Password': 'Customer', 'button': 'submit'})
If I invoke
print(r.cookies), The only thing that is returned is <RequestsCookieJar[]>
OR, if I try to access them using r.cookies['mydeviceAG_POEWebTool'] I get a key error: KeyError: "name='mydeviceAG_POEWebTool', domain=None, path=None"
The HTTP Stream from wireshark is here:
POST /Account/LogOn HTTP/1.1
Host: 192.168.2.111
User-Agent: python-requests/2.24.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 49
Content-Type: application/x-www-form-urlencoded
User=Customer&Password=Customer&button=submitHTTP/1.1 302 Found
Date: Thu, 24 Sep 2020 10:30:43 GMT
Server: Apache/2.4.25 (Debian)
Location: /States
X-AspNet-Version: 4.0.30319
Cache-Control: private
Set-Cookie: mydeviceAG_POEWebTool=8081AD7E6EEEB463C0AD8458; path=/
Set-Cookie: .mydeviceAG_POEWebTool_AUTH=jmnjqmKeL0ge8fz/sYrP3Xm+ntUTnPLrWBGtKAmAvnkKIjPKYQVn9xVrRa7EUEHLTfB1KNCKjotabnb7QqnDlQlZKuQkJ0J8rLmxuxrtCMFDsa/d6jyUj/PUckJ8V0Te; path=/; expires=Thu, 24 Sep 2020 13:30:44 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 112
Keep-Alive: timeout=5, m=100
Connection: Keep-Alive
Content-Type: text/html
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here</h2>
</body><html>
** After reading the link offered by #D-E-N below, and searching around a bit, I tried the code below, which gets me further, but only stores the first of two cookies in the cookie jar. The server sends back two cookies:
Set-Cookie: AG_POEWebTool=9E508CCAE50EACB4AD1B33D8; path=/
Set-Cookie: .AG_POEWebTool_AUTH=WaXAtyh/tGnoZFPIiV4xoF6BhwGbr0jlaFq
but the only cookie stored is the AG_POEWebTool one. **
with requests.Session() as s:
s.get('http://192.168.2.111/Account/LogOn', timeout = 5, params = {'ReturnUrl':'%2fLogfiles'}, proxies = proxies)
r = s.post('http://192.168.2.111/Account/LogOn', data = {'User': 'Customer', 'Password': 'Customer', 'button': 'submit'}, timeout = 5, params = {'ReturnUrl':'%2fLogfiles'}, proxies = proxies)
cookiedict = s.cookies.get_dict()
print('URL:')
for item in cookiedict.items():
print(item)
response=s.get('http://192.168.2.111/Logfiles')
This is the response that I get:
cookiedict items:
('AG_POEWebTool', '471B4EB1153E4733E0EA1A40')
Process finished with exit code 0

Different JSON repsonse when using requests module in python

I am trying to get JSON response from this URL.
But the JSON I see in the browser is different than what I get from python's requests response.
The code and its output:-
#code
import requests
r = requests.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
print("Status code: ", r.status_code)
print("JSON: ", r.json())
print("Headers:\n",r.headers())
#output
Status code: 200
JSON: '{"cart_info": {}, "tab_info": [], "screen_name": ""}'
Headers:
{'Content-Type': 'application/json',
'Content-Length': '52',
'Server': 'nginx',
'x-xss-protection': '1; mode=block',
'x-content-type-options': 'nosniff',
'x-frame-options': 'SAMEORIGIN',
'Access-Control-Allow-Origin': 'https://b2b.bigbasket.com',
'Date': 'Sat, 02 Sep 2017 18:43:51 GMT',
'Connection': 'keep-alive',
'Set-Cookie': '_bb_cid=4; Domain=.bigbasket.com; expires=Fri, 28-Aug-2037 18:43:51 GMT; Max-Age=630720000; Path=/, ts="2017-09-03 00:13:51.164"; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 18:43:51 GMT; Max-Age=31536000; Path=/, _bb_rd=6; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 18:43:51 GMT; Max-Age=31536000; Path=/'}
This is what Chrome shows in dev tools:-
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 4206
Server: nginx
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
Content-Encoding: gzip
x-frame-options: SAMEORIGIN
Access-Control-Allow-Origin: https://b2b.bigbasket.com
Date: Sat, 02 Sep 2017 15:43:20 GMT
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: ts="2017-09-02 21:13:20.193"; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 15:43:20 GMT; Max-Age=31536000; Path=/
Set-Cookie: _bb_rd=6; Domain=.bigbasket.com; expires=Sun, 02-Sep-2018 15:43:20 GMT; Max-Age=31536000; Path=/
Also tried separating query string and specifying it as params argument but it is giving the same result.

import requests
s = requests.session()
s.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
r = s.get("https://www.bigbasket.com/product/get-products/?slug=fruits-vegetables&page=1&tab_type=[%22all%22]&sorted_on=popularity&listtype=pc")
print("Status code: ", r.status_code)
print("JSON: ", r.json())

This is happening because of different City ID identified by your web browser and Requests.
You can check value of _bb_cid in both the cases

Log in to website using requests

I am currently trying to get data off http://www.spotrac.com/ that requires being signed in. My current attempt uses this code(which I got by going through a bunch of other stack overflow questions on a similar topic)
from bs4 import BeautifulSoup as bs
from requests import session
payload = {
'id': 'contactForm',
'cmd': 'http://www.spotrac.com/signin/submit/',
'email': '*****',
'password': '*****'
}
with session() as c:
r_login = c.post('http://www.spotrac.com/signin/', data=payload)
print(r_login.headers)
response = c.get('http://www.spotrac.com/nba/cleveland-cavaliers/lebron-james')
print(response.cookies)
soup=bs(response.text, 'html.parser')
with open('ex.html','w') as f:
f.write(soup.prettify())
My current code does everything right, except I am not logged in when I'm making the request.
Thanks

You're sending POST request to a wrong URL, and with an incorrect payload as well.
POST http://www.spotrac.com/signin/submit/ HTTP/1.1
Host: www.spotrac.com
Connection: keep-alive
Content-Length: 86
Cache-Control: max-age=0
Origin: http://www.spotrac.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://www.spotrac.com/signin/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2206021e191bdbbaf955f111f67b961056%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485487245%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dd6089620b21ecce6837161605055ae04; _ga=GA1.2.910256341.1481865346; _gali=contactForm
redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 27 Jan 2017 04:21:16 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22badb1275aee1cdad6736a6b4bb1ce809%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485490876%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dad486866c32cac526487707cea85b8a9; expires=Fri, 10-Feb-2017 04:21:16 GMT; path=/
Location: http://www.spotrac.com/register/
X-Powered-By: PleskLin
MS-Author-Via: DAV
As you can see from above session, the correct url should be http://www.spotrac.com/signin/submit/, and payload string is redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad, which is basically:
payload = {'redirect': 'http://www.spotrac.com/',
'email': mail_address,
'password': password}
Also make sure simulate headers with correct parameters, then you're good to go.

URL works fine from browser or wget, but comes up empty from Python or cURL

When I try to open http://www.comicbookdb.com/browse.php (which works fine in my browser) from Python, I get an empty response:
>>> import urllib.request
>>> content = urllib.request.urlopen('http://www.comicbookdb.com/browse.php')
>>> print(content.read())
b''
The same also happens when I set a User-agent:
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>>> content = opener.open('http://www.comicbookdb.com/browse.php')
>>> print(content.read())
b''
Or when I use httplib2 instead:
>>> import httplib2
>>> h = httplib2.Http('.cache')
>>> response, content = h.request('http://www.comicbookdb.com/browse.php')
>>> print(content)
b''
>>> print(response)
{'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'content-location': 'http://www.comicbookdb.com/browse.php', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'content-length': '0', 'set-cookie': 'PHPSESSID=590f5997a91712b7134c2cb3291304a8; path=/', 'date': 'Wed, 25 Dec 2013 15:12:30 GMT', 'server': 'Apache', 'pragma': 'no-cache', 'content-type': 'text/html', 'status': '200'}
Or when I try to download it using cURL:
C:\>curl -v http://www.comicbookdb.com/browse.php
* About to connect() to www.comicbookdb.com port 80
* Trying 208.76.81.137... * connected
* Connected to www.comicbookdb.com (208.76.81.137) port 80
> GET /browse.php HTTP/1.1
User-Agent: curl/7.13.1 (i586-pc-mingw32msvc) libcurl/7.13.1 zlib/1.2.2
Host: www.comicbookdb.com
Pragma: no-cache
Accept: */*
< HTTP/1.1 200 OK
< Date: Wed, 25 Dec 2013 15:20:06 GMT
< Server: Apache
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Set-Cookie: PHPSESSID=0a46f2d390639da7eb223ad47380b394; path=/
< Content-Length: 0
< Content-Type: text/html
* Connection #0 to host www.comicbookdb.com left intact
* Closing connection #0
Opening the URL in a browser or downloading it with Wget seems to work fine, though:
C:\>wget http://www.comicbookdb.com/browse.php
--16:16:26-- http://www.comicbookdb.com/browse.php
=> `browse.php'
Resolving www.comicbookdb.com... 208.76.81.137
Connecting to www.comicbookdb.com[208.76.81.137]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 40,687 48.75K/s
16:16:27 (48.75 KB/s) - `browse.php' saved [40687]
As does downloading a different file from the same server:
>>> content = urllib.request.urlopen('http://www.comicbookdb.com/index.php')
>>> print(content.read(100))
b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"\n\t\t"http://www.w3.org/TR/1999/REC-html'
So why doesn't the other URL work?

It seems the server expects a Connection: keep-alive header, which e.g. curl (and I expect the other failing clients too) do not add by default.
With curl you can use this command, which will display a non-empty response:
curl -v -H 'Connection: keep-alive' http://www.comicbookdb.com/browse.php
With Python you can use this code:
import httplib2
h = httplib2.Http('.cache')
response, content = h.request('http://www.comicbookdb.com/browse.php', headers={'Connection':'keep-alive'})
print(content)
print(response)

Python Scraping Web with Session Cookie

Hi iam trying to scrap some data off from this URL:
http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1
As you may have noticed, if cookies and session data is not yet set you will be redirected to its base url (http://www.21cineplex.com/)
I tried to do it like this:
def main():
try:
cj = CookieJar()
baseurl = "http://www.21cineplex.com"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(baseurl)
urllib2.install_opener(opener)
movieSource = urllib2.urlopen('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1').read()
splitSource = re.findall(r'<ul class="w462">(.*?)</ul>', movieSource)
print splitSource
except Exception, e:
str(e)
print "Error occured in main Block"
However, i ended up failing to scrap from that particular URL.
A quick inspection reveals that the website is setting a session ID (PHPSESSID) and make a copy to the client's cookie as such.
The question is how do i mitigate such example?
ps: i've tried to install request (via pip) how ever it gives me (404):
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Getting page https://pypi.python.org/simple/
URLs to search for versions for request:
* https://pypi.python.org/simple/request/
Getting page https://pypi.python.org/simple/request/
Could not fetch URL https://pypi.python.org/simple/request/: HTTP Error 404: Not Found (request does not have any releases)
Will skip URL https://pypi.python.org/simple/request/ when looking for download links for request
Could not find any downloads that satisfy the requirement request
Cleaning up...

Thanks to #Chainik i got it to work now. I ended up modify my code like this:
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
baseurl = "http://www.21cineplex.com/"
regex = '<ul class="w462">(.*?)</ul>'
opener.open(baseurl)
urllib2.install_opener(opener)
request = urllib2.Request('http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1')
request.add_header('Referer', baseurl)
requestData = urllib2.urlopen(request)
htmlText = requestData.read()
Once, the html text is retrieved. It's all about parsing its content.
Cheers

Try setting a referer URL, see below.
Without referer URL set (302 redirect):
$ curl -I "http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Thu, 19 Sep 2013 09:19:19 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=5effe043db4fd83b2c5927818cb1a7ca; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:19 GMT; path=/
Location: http://www.21cineplex.com/
With referer URL set (HTTP/200):
$ curl -I -e "http://www.21cineplex.com/"
"http://www.21cineplex.com/nowplaying/jakarta,3,JKT.htm/1"
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 19 Sep 2013 09:19:24 GMT
Content-Type: text/html
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.17
Set-Cookie: PHPSESSID=a7abd6592c87e0c1a8fab4f855baa0a4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: kota=3; expires=Fri, 19-Sep-2014 09:19:24 GMT; path=/
To set referer URL using urllib, see this post
-- ab1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

403 error when accessing API from server and not from browser - python

It could be that you need to do a POST rather than a GET request. Most logins work this way. Using the requests library, you would need response = requests.post( base + '/login', data={ 'userid': username, 'password': password } )

To affirm #Carl from the offical website: This command only looks at POST parameters and discards GET parameters. https://www.buxfer.com/help/api#login

Related

trouble accessing cookies using python requests

Different JSON repsonse when using requests module in python

Log in to website using requests

URL works fine from browser or wget, but comes up empty from Python or cURL

Python Scraping Web with Session Cookie

Categories

Resources