I have been using client id and auth token of instagram api for a while to make requests with urllib and json. Since a few days, any client id/auth token I create for an instagram account returns systematically "HTTP Error 400: BAD REQUEST" when I make a request, it can be for like, follow, or unfollow, it always returns this error. The script is Python 2.7 based.
It was working great before, and the keys created before this happened still work great! I tried to create new accounts and new keys from usa with proxies, but the error persist..
Here is the part of the code :
user_agent = 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7’
headers = { 'User-Agent' : user_agent,
"Content-type": "application/x-www-form-urlencoded”
def likePicture(pictureId):
liked = 0
try:
urlLike = "https://api.instagram.com/v1/media/%s/likes"
values = {'access_token' : auth_token,
'client_id' : client_id}
newLike = urlLike % (pictureId)
data = urllib.urlencode(values)
req = urllib2.Request(newLike,data,headers)
response = urllib2.urlopen(req)
result = response.read()
dataObj = json.loads(result)
liked = 1
except Exception, e:
print e
return liked
The print e gives me systematically "HTTP Error 400: BAD REQUEST", even if the key is brand new and the account brand new. And this code is working like a charm on older keys (from a week ago).
Any idea or suggestion? Maybe blocked somehow by instagram because I created to many client id/auth token? If it is the case, how to do to resolve this situation? (I already tried to use different proxies, unsuccessful, so how would they track that?). If someone finds a solution to this problem I will be infinitely grateful to him!
Cheers, Kevin
First of all:
You may also receive responses with an HTTP response code of 400 (Bad
Request) if we detect spammy behavior by a person using your app.
These errors are unrelated to rate limiting.
Have you read the "Limits" from API docs?
When calling Instagram API methods it send two HTTP headers:
X-Ratelimit-Remaining: the remaining number of calls available to your app within the 1-hour window
X-Ratelimit-Limit: the total number of calls allowed within the 1-hour window
So check if you've reached the limit.
Keep in mind that multiples calls in a short time window is considered abusive.
Read more:
Limits
P.S: It's not necessary forge headers in order to make API calls! It isn't web scraping!
Related
I'm trying to automate log-in into Costco.com to check some member only prices.
I used dev tool and the Network tab to identify the request that handles the Logon, from which I inferred the POST URL and the parameters.
Code looks like:
import requests
s = requests.session()
payload = {'logonId': 'email#email.com',
'logonPassword': 'mypassword'
}
#get this data from Google-ing "my user agent"
user_agent = {"User-Agent" : "myusergent"}
url = 'https://www.costco.com/Logon'
response = s.post(url, headers=user_agent,data=payload)
print(response.status_code)
When I run this, it just runs and runs and never returns anything. Waited 5 minutes and still running.
What am I going worng?
maybe you should try to make a get requests to get some cookies before make the post requests, if the post requests doesnt work, maybe you should add a timeout so the script stop and you know that it doesnt work.
r = requests.get(w, verify=False, timeout=10)
This one is tough. Usually, in order to set the proper cookies, a get request to the url is first required. We can go directly to https://www.costco.com/LogonForm so long as we change the user agent from the default python requests one. This is accomplished as follows:
import requests
agent = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/85.0.4183.102 Safari/537.36"
)
with requests.Session() as s:
headers = {'user-agent': agent}
s.headers.update(headers)
logon = s.get('https://www.costco.com/LogonForm')
# Saved the cookies in variable, explanation below
cks = s.cookies
Logon get request is successful, ie status code 200! Taking a look at cks:
print(sorted([c.name for c in cks]))
['C_LOC',
'CriteoSessionUserId',
'JSESSIONID',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'_abck',
'ak_bmsc',
'akaas_AS01',
'bm_sz',
'client-zip-short']
Then using the inspect network in google chrome and clicking login yields the following form data for the post in order to login. (place this below cks)
data = {'logonId': username,
'logonPassword': password,
'reLogonURL': 'LogonForm',
'isPharmacy': 'false',
'fromCheckout': '',
'authToken': '-1002,5M9R2fZEDWOZ1d8MBwy40LOFIV0=',
'URL':'Lw=='}
login = s.post('https://www.costco.com/Logon', data=data, allow_redirects=True)
However, simply trying this makes the request just sit there and infinitely redirect.
Using burp suite, I stepped into the post and and found the post request when done via browser. This post has many more cookies than obtained in the initial get request.
Quite a few more in fact
# cookies is equal to the curl from burp, then converted curl to python req
sorted(cookies.keys())
['$JSESSIONID',
'AKA_A2',
'AMCVS_97B21CFE5329614E0A490D45%40AdobeOrg',
'AMCV_97B21CFE5329614E0A490D45%40AdobeOrg',
'C_LOC',
'CriteoSessionUserId',
'OptanonConsent',
'RT',
'WAREHOUSEDELIVERY_WHS',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'WRIgnore',
'WRUIDCD20200731',
'__CT_Data',
'_abck',
'_cs_c',
'_cs_cvars',
'_cs_id',
'_cs_s',
'_fbp',
'ajs_anonymous_id_2',
'ak_bmsc',
'akaas_AS01',
'at_check',
'bm_sz',
'client-zip-short',
'invCheckPostalCode',
'invCheckStateCode',
'mbox',
'rememberedLogonId',
's_cc',
's_sq',
'sto__count',
'sto__session']
Most of these look to be static, however because there are so many its hard to tell which is which and what each is supposed to be. It's here where I myself get stuck, and I am actually really curious how this would be accomplished. In some of the cookie data I can also see some sort of ibm commerce information, so I am linking Prevent Encryption (Krypto) Of Url Paramaters in IBM Commerce Server 6 as its the only other relevant SO answer question pertaining somewhat remotely to this.
Essentially though the steps would be to determine the proper cookies to pass for this post (and then the proper cookies and info for the redirect!). I believe some of these are being set by some js or something since they are not in the get response from the site. Sorry I can't be more help here.
If you absolutely need to login, try using selenium as it simulates a browser. Otherwise, if you just want to check if an item is in stock, this guide uses requests and doesn't need to be logged in https://aryaboudaie.com/python/technical/educational/2020/07/05/using-python-to-buy-a-gift.html
I am running the following code on Google App Engine Standard Python 2.7 version
import requests
import re
import datetime
ReviewsURL='https://play.google.com/store/getreviews'
payload = { 'reviewType': '0'
, 'pageNum': 0 #loads max 40 reviews for each page number
, 'id': 'net.one97.paytm'
, 'reviewSortOrder': '0'
, 'xhr':'1'
}
r = requests.post(
url=ReviewsURL,
data=payload,
headers={
'X-Requested-With': 'XMLHttpRequest'
}
)
print r.text.encode('cp850', errors='replace').decode("unicode-escape")
This code used to run fine and output the latest 40 reviews but now it gives the following output:
Status_code = 500
When I run the same code on my Mac, it works fine as well.
Taking into account the message “Server Error: None for url” looks like the URL cannot get reached. 500 are mainly caused by 2 reasons:
Dynamic responses, which are limited to 32MB. If a script handler generates a response larger than this limit, the server sends back an empty response with a 500 Internal Server Error status code.
Exceeded response time. A request handler has a limited amount of time to generate and return a response to a request, typically around 60 seconds. There are several root causes listed here.
There are many reasons that can cause a response time exceeding. I expect that you find yours in the documentation. I am not able to guess one of them based on the information we have.
I can open a webpage such as nike's page with Python 2.7's urllib2 library on my ubuntu desktop. But, when I move that code to a google compute engine server (with the same O.S.), it starts returning a HTTP Error 503: Service Unavailable.
What could be causing this error from one place and not another and, if possible, how would I go about making my machines behave consistently?
That server returns urllib2.HTTPError: HTTP Error 403: Forbidden unless you pass an 'Accept' header. Using only the 'User-Agent' header failed when I tried. Here is the working code; I've commented out the unnecessary 'User-Agent' and 'Connection' headers, but left them for reference:
import urllib2
user_agent = {'User-Agent': 'Mozilla/5.0'}
req_headers = {
# 'User-Agent': user_agent,
# 'Connection': 'Keep-Alive',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}
request = urllib2.Request('http://www.nike.com/us/en_us/c/men', headers=req_headers)
response = urllib2.urlopen(request)
data = response.read()
print data
Also see this other Stackoverflow answer, which I used as a reference for the 'Accept' string.
HTTP Status 503 means, and I quote RFC 2612: "The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header."
So, it's not at all about where the request comes from: it's all about the server being temporarily overloaded, or, in maintenance. Check for a Retry-After header in the response and apply it; or, if missing, "retry later" more generically.
If persistent (it shouldn't be: 503 means the server is suffering a temporary condition), contact the web site system administrators and get an explanation of what's going on. To repeat, this is strictly about the web server you're contacting, and should be a temporary condition; not at all about your client.
I'm learning python and as my first project I want to login to several airline websites and scrape my frequent flyer Mile info. I have successfully been able to login and scrape American Airlines and United but I am unable to do it on Delta, USairways, and Britishairways.
The methodology that I have been using is watching network traffic from Fiddler2, Chrome, or Firebug. Wireshark seems too complicated at the moment.
For my script to work with American and United scraping all I did was watch the traffic on fiddler2, copy the FORM DATA and REQUEST HEADER DATA and then use the python 3rd party Requests library to access the data. Very simple. Very Easy. The other airline website are giving me a lot of trouble.
Let's talk about British Airways specifically. Below are pictures of the FORM DATA and REQUEST HEADER DATA that I took from fiddler when I logged into my dummy BA account. I have also included the test script that I have been using. I wrote two different versions. One using the Requests library and one using urllib. They both produce the same error but I thought I would provide both to make it easier for somebody to help me if they didn't have the Requests library imported. Use the one you would like.
Basically, when I make a request.post I am getting a
10054, 'An existing connection was forcibly closed by the remote host' error.
I have no idea what is going on. Been searching for 3 days and come up with nothing. I hope somebody can help me. The below code is using my dummy BA account info. username:python_noob password:p4ssword. Feel free to use and test it.
Here are some pictures to the fiddler2 data
http://i.imgur.com/iOL91.jpg?1
http://i.imgur.com/meLHL.jpg?1
import requests
import urllib
def get_BA_login_using_requests ():
url_loginSubmit1 = 'https://www.britishairways.com/travel/loginr/public/en_us'
url_viewaccount1 = 'https://www.britishairways.com/travel/viewaccount/public/en_us?eId=106011'
url_viewaccount2 = 'https://www.britishairways.com/travel/viewaccount/execclub/_gf/en_us?eId=106011'
form_data = {
'Directional_Login':'',
'eId':'109001',
'password':'p4ssword',
'membershipNumber':'python_noob',
}
request_headers= {
'Cache-Control':'max-age=0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding':'gzip,deflate,sdch',
'Accept-Language':'en-US,en;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11',
'Cookie': 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BASessionA=kDtBQWGclJymXtlsTXyYtykDLLsy3KQKvd3wMrbygd7JZZPJfJz2!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356203603.1356203603.1; __utmb=28787695.1.10.1356203603; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":3,"c":"http://www.britishairways.com/travel/home/public/en_us","pv":1,"lc":{"d0":{"v":1,"s":false}},"cd":0}',
'Content-Length':'78',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://www.britishairways.com',
'Referer':'https://www.britishairways.com/travel/loginr/public/en_us',
'Connection':'keep-alive',
'Host':'www.britishairways.com',
}
print ('Trying to login to British Airways using Requests Library (takes about 1 minute for error to occur)')
try:
r1 = requests.post(url_loginSubmit1, data = form_data, headers = request_headers)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def get_BA_login_using_urllib():
"""Tries to request the URL. Returns True if the request was successful; false otherwise.
https://www.britishairways.com/travel/loginr/public/en_us
response -- After the function has finished, will possibly contain the response to the request.
"""
response = None
print ('Trying to login to British Airways using urllib Library (takes about 1 minute for error to occur)')
# Create request to URL.
req = urllib.request.Request("https://www.britishairways.com/travel/loginr/public/en_us")
# Set request headers.
req.add_header("Connection", "keep-alive")
req.add_header("Cache-Control", "max-age=0")
req.add_header("Origin", "https://www.britishairways.com")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11")
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
req.add_header("Referer", "https://www.britishairways.com/travel/home/public/en_us")
req.add_header("Accept-Encoding", "gzip,deflate,sdch")
req.add_header("Accept-Language", "en-US,en;q=0.8")
req.add_header("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3")
req.add_header("Cookie", 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; BAAUTHKEY=BA4760A2434L; BA_ENROLMENT_APPLICATION_COOKIE=1356219482491AT; BASessionA=wKG4QWGSTggNGnsLTnrgQnMxGMyzvspGLCYpjdSZgv2pSgYN1YRn!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; HOME_AD_DISPLAY=1; previousCountryInfo=us; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356216924.1356219076.6; __utmb=28787695.15.10.1356219076; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":5,"c":"https://www.britishairways.com/travel/home/public/en_us","pv":31,"lc":{"d0":{"v":31,"s":true}},"cd":0,"f":1356219889982,"sd":0}')
# Set request body.
body = b"Directional_Login=&eId=109001&password=p4ssword&membershipNumber=python_noob"
# Get response to request.
try:
response = urllib.request.urlopen(req, body)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def main():
get_BA_login_using_urllib()
print()
get_BA_login_using_requests()
return
main()
Offhand, I'd say you managed to create a malformed or illegal request, and the server (or even proxy) on the other side simply refuses to process it.
Do use the requests library. It's excellent. Urllib is quite outdated (and, well, not fun to use at all.)
Get rid of nearly all of the custom headers. In particular Content-Length, Keep-Alive, Connection and Cookie. The first three you should let the requests library take care of, as they're part of the HTTP 1.1 protocol. With regards to the Cookie: that, too, will be handled by the requests library, depending on how you use sessions. (You might want to consult the documentation there.) Without having any previous cookies, you'll probably get something like a 401 when you try to access the site, or you'll be (transparently) redirected to a login-page. Doing the login will set the correct cookies, after which you should be able to re-try the original request.
If you use a dict for the post-data, you won't need the Content-Type header either. You might want to experiment with using unicode-values in said dict. I've found that that sometimes made a difference.
In other words: try to remove as much as you can, and then build it up from there. Doing things like this typically should not cost more than a handful of lines. Now, scraping a web page, that's another matter: try 'beautifulsoup' for that.
P.S.: Don't ever post cookie-data on public forums: they might contain personal or otherwise sensitive data that shady characters might be able to abuse.
It seems there is a bug in the windows versions of Python 3.3 that is the cause of my problem. I used the answer from here
HTTPS request results in reset connection in Windows with Python 3
to make progress with urllib version of my script. I would like to use Requests so I need to figure out how to do the SSL downgrade work around with that module. I will make that a separate thread. If anybody has an answer to that you can post here as well. thx.
I am trying to get my Django app (NOT using Google app engine) retrieve data from Google Contacts using Google Contacts Data API. Going through authentication documentation as well as Data API Python client docs
First step (AuthSubRequest) which is getting the single-use token works fine. The next step(AuthSubSessionToken), which is upgrade single-use token to a session token. The python API call UpgradeToSessionToken() simply didn't work for me it gave me NonAuthSubToken exception:
gd_client = gdata.contacts.service.ContactsService()
gd_client.auth_token = authsub_token
gd_client.UpgradeToSessionToken()
As an alternative I want to get it working by "manually" constructing the HTTP request:
url = 'https://www.google.com/accounts/AuthSubSessionToken'
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'Authorization': 'AuthSub token=' + authsub_token,
'User-Agent': 'Python/2.6.1',
'Host': 'https://www.google.com',
'Accept': 'text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2',
'Connection': 'keep-alive',
}
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req)
this gives me a different error:
HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Moved Temporarily
What am I doing wrong here? I'd appreciate help/advice/suggestions with either of the methods I am trying to use: Python API call (UpgradeToSessionToken) or manually constructing HTTP request with urllib2.
According to the 2.0 documentation here there is a python example set...
Running the sample code
A full working sample client, containing all the sample code shown in this document, is available in the Python client library distribution, under the directory samples/contacts/contacts_example.py.
The sample client performs several operations on contacts to demonstrate the use of the Contacts Data API.
Hopefully it will point you in the right direction.
I had a similar issue recently. Mine got fixed by setting "secure" to "true".
next = 'http://www.coolcalendarsite.com/welcome.pyc'
scope = 'http://www.google.com/calendar/feeds/'
secure = True
session = True
calendar_service = gdata.calendar.service.CalendarService()
There are four different ways to authenticate. Is it really that important for you to use AuthSub? If you can't get AuthSub to work, then consider the ClientLogin approach. I had no trouble getting that to work.