I'm not sure how else to describe this. I'm trying to log into a website using the requests library with Python but it doesn't seem to be capturing all cookies from when I login and subsequent requests to the site go back to the login page.
The code I'm using is as follows: (with redactions)
with requests.Session() as s:
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
Looking at the developer tools in Chrome. I see the following:
After checking r.cookies it seems only that PHPSESSID was captured there's no sign of the amember_nr cookie.
The value in PyCharm only shows:
{RequestsCookieJar: 1}<RequestsCookieJar[<Cookie PHPSESSID=kjlb0a33jm65o1sjh25ahb23j4 for .website.co.uk/>]>
Why does this code fail to save 'amember_nr' and is there any way to retrieve it?
SOLUTION:
It appears the only way I can get this code to work properly is using Selenium, selecting the elements on the page and automating the typing/clicking. The following code produces the desired result.
from seleniumrequests import Chrome
driver = Chrome()
driver.get('http://www.website.co.uk')
username = driver.find_element_by_xpath("//input[#name='amember_login']")
password = driver.find_element_by_xpath("//input[#name='amember_pass']")
username.send_keys("username")
password.send_keys("password")
driver.find_element_by_xpath("//input[#type='submit']").click() # Page is logged in and all relevant cookies saved.
You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.
FYI I would use something like BurpSuite to capture ALL the data being sent to the server and sort out what headers etc are required ... sometimes people/servers to referrer checking, set cookies via JAVA or wonky scripting, even seen java obfuscation and blocking of agent tags not in whitelist etc... it's likely something the headers that the server is missing to give you the cookie.
Also you can have Python use burp as a proxy so you can see exactly what gets sent to the server and the response.
https://github.com/freeload101/Python/blob/master/CS_HIDE/CS_HIDE.py (proxy support )
Trying to get some values from Duolingo using Python, but urllib is giving me something different than when I navigate to the url via my browser.
Navigating to a url (https://www.duolingo.com/2017-06-30/users/215344344?fields=xpGoalMetToday) via browser gives: {"xpGoalMetToday": false}.
However, trying via the below script:
import urllib.request
url = 'http://www.duolingo.com/2017-06-30/users/215344344?fields=xpGoalMetToday'
user_agent = '[insert my local user agent copied from browser attempt]'
# header variable
headers = { 'User-Agent' : user_agent, "Cache-Control": "no-cache, max-age=0" }
# creating request
req = urllib.request.Request(url, None, headers)
print(urllib.request.urlopen(req).read())
returns just a blank {}.
As you can tell from the above, I've tried a couple things: adding a user agent, cache control. I've even tried using the response module and adding authentication (didn't work).
Any ideas? Am I missing something?
Actually when I open the link in the browser it show me {}
Maybe you have some kind of cookie set in your browser?
I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?
The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()
Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.
I have been using client id and auth token of instagram api for a while to make requests with urllib and json. Since a few days, any client id/auth token I create for an instagram account returns systematically "HTTP Error 400: BAD REQUEST" when I make a request, it can be for like, follow, or unfollow, it always returns this error. The script is Python 2.7 based.
It was working great before, and the keys created before this happened still work great! I tried to create new accounts and new keys from usa with proxies, but the error persist..
Here is the part of the code :
user_agent = 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7’
headers = { 'User-Agent' : user_agent,
"Content-type": "application/x-www-form-urlencoded”
def likePicture(pictureId):
liked = 0
try:
urlLike = "https://api.instagram.com/v1/media/%s/likes"
values = {'access_token' : auth_token,
'client_id' : client_id}
newLike = urlLike % (pictureId)
data = urllib.urlencode(values)
req = urllib2.Request(newLike,data,headers)
response = urllib2.urlopen(req)
result = response.read()
dataObj = json.loads(result)
liked = 1
except Exception, e:
print e
return liked
The print e gives me systematically "HTTP Error 400: BAD REQUEST", even if the key is brand new and the account brand new. And this code is working like a charm on older keys (from a week ago).
Any idea or suggestion? Maybe blocked somehow by instagram because I created to many client id/auth token? If it is the case, how to do to resolve this situation? (I already tried to use different proxies, unsuccessful, so how would they track that?). If someone finds a solution to this problem I will be infinitely grateful to him!
Cheers, Kevin
First of all:
You may also receive responses with an HTTP response code of 400 (Bad
Request) if we detect spammy behavior by a person using your app.
These errors are unrelated to rate limiting.
Have you read the "Limits" from API docs?
When calling Instagram API methods it send two HTTP headers:
X-Ratelimit-Remaining: the remaining number of calls available to your app within the 1-hour window
X-Ratelimit-Limit: the total number of calls allowed within the 1-hour window
So check if you've reached the limit.
Keep in mind that multiples calls in a short time window is considered abusive.
Read more:
Limits
P.S: It's not necessary forge headers in order to make API calls! It isn't web scraping!
I'm learning python and as my first project I want to login to several airline websites and scrape my frequent flyer Mile info. I have successfully been able to login and scrape American Airlines and United but I am unable to do it on Delta, USairways, and Britishairways.
The methodology that I have been using is watching network traffic from Fiddler2, Chrome, or Firebug. Wireshark seems too complicated at the moment.
For my script to work with American and United scraping all I did was watch the traffic on fiddler2, copy the FORM DATA and REQUEST HEADER DATA and then use the python 3rd party Requests library to access the data. Very simple. Very Easy. The other airline website are giving me a lot of trouble.
Let's talk about British Airways specifically. Below are pictures of the FORM DATA and REQUEST HEADER DATA that I took from fiddler when I logged into my dummy BA account. I have also included the test script that I have been using. I wrote two different versions. One using the Requests library and one using urllib. They both produce the same error but I thought I would provide both to make it easier for somebody to help me if they didn't have the Requests library imported. Use the one you would like.
Basically, when I make a request.post I am getting a
10054, 'An existing connection was forcibly closed by the remote host' error.
I have no idea what is going on. Been searching for 3 days and come up with nothing. I hope somebody can help me. The below code is using my dummy BA account info. username:python_noob password:p4ssword. Feel free to use and test it.
Here are some pictures to the fiddler2 data
http://i.imgur.com/iOL91.jpg?1
http://i.imgur.com/meLHL.jpg?1
import requests
import urllib
def get_BA_login_using_requests ():
url_loginSubmit1 = 'https://www.britishairways.com/travel/loginr/public/en_us'
url_viewaccount1 = 'https://www.britishairways.com/travel/viewaccount/public/en_us?eId=106011'
url_viewaccount2 = 'https://www.britishairways.com/travel/viewaccount/execclub/_gf/en_us?eId=106011'
form_data = {
'Directional_Login':'',
'eId':'109001',
'password':'p4ssword',
'membershipNumber':'python_noob',
}
request_headers= {
'Cache-Control':'max-age=0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding':'gzip,deflate,sdch',
'Accept-Language':'en-US,en;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11',
'Cookie': 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BASessionA=kDtBQWGclJymXtlsTXyYtykDLLsy3KQKvd3wMrbygd7JZZPJfJz2!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356203603.1356203603.1; __utmb=28787695.1.10.1356203603; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":3,"c":"http://www.britishairways.com/travel/home/public/en_us","pv":1,"lc":{"d0":{"v":1,"s":false}},"cd":0}',
'Content-Length':'78',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://www.britishairways.com',
'Referer':'https://www.britishairways.com/travel/loginr/public/en_us',
'Connection':'keep-alive',
'Host':'www.britishairways.com',
}
print ('Trying to login to British Airways using Requests Library (takes about 1 minute for error to occur)')
try:
r1 = requests.post(url_loginSubmit1, data = form_data, headers = request_headers)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def get_BA_login_using_urllib():
"""Tries to request the URL. Returns True if the request was successful; false otherwise.
https://www.britishairways.com/travel/loginr/public/en_us
response -- After the function has finished, will possibly contain the response to the request.
"""
response = None
print ('Trying to login to British Airways using urllib Library (takes about 1 minute for error to occur)')
# Create request to URL.
req = urllib.request.Request("https://www.britishairways.com/travel/loginr/public/en_us")
# Set request headers.
req.add_header("Connection", "keep-alive")
req.add_header("Cache-Control", "max-age=0")
req.add_header("Origin", "https://www.britishairways.com")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11")
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
req.add_header("Referer", "https://www.britishairways.com/travel/home/public/en_us")
req.add_header("Accept-Encoding", "gzip,deflate,sdch")
req.add_header("Accept-Language", "en-US,en;q=0.8")
req.add_header("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3")
req.add_header("Cookie", 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; BAAUTHKEY=BA4760A2434L; BA_ENROLMENT_APPLICATION_COOKIE=1356219482491AT; BASessionA=wKG4QWGSTggNGnsLTnrgQnMxGMyzvspGLCYpjdSZgv2pSgYN1YRn!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; HOME_AD_DISPLAY=1; previousCountryInfo=us; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356216924.1356219076.6; __utmb=28787695.15.10.1356219076; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":5,"c":"https://www.britishairways.com/travel/home/public/en_us","pv":31,"lc":{"d0":{"v":31,"s":true}},"cd":0,"f":1356219889982,"sd":0}')
# Set request body.
body = b"Directional_Login=&eId=109001&password=p4ssword&membershipNumber=python_noob"
# Get response to request.
try:
response = urllib.request.urlopen(req, body)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def main():
get_BA_login_using_urllib()
print()
get_BA_login_using_requests()
return
main()
Offhand, I'd say you managed to create a malformed or illegal request, and the server (or even proxy) on the other side simply refuses to process it.
Do use the requests library. It's excellent. Urllib is quite outdated (and, well, not fun to use at all.)
Get rid of nearly all of the custom headers. In particular Content-Length, Keep-Alive, Connection and Cookie. The first three you should let the requests library take care of, as they're part of the HTTP 1.1 protocol. With regards to the Cookie: that, too, will be handled by the requests library, depending on how you use sessions. (You might want to consult the documentation there.) Without having any previous cookies, you'll probably get something like a 401 when you try to access the site, or you'll be (transparently) redirected to a login-page. Doing the login will set the correct cookies, after which you should be able to re-try the original request.
If you use a dict for the post-data, you won't need the Content-Type header either. You might want to experiment with using unicode-values in said dict. I've found that that sometimes made a difference.
In other words: try to remove as much as you can, and then build it up from there. Doing things like this typically should not cost more than a handful of lines. Now, scraping a web page, that's another matter: try 'beautifulsoup' for that.
P.S.: Don't ever post cookie-data on public forums: they might contain personal or otherwise sensitive data that shady characters might be able to abuse.
It seems there is a bug in the windows versions of Python 3.3 that is the cause of my problem. I used the answer from here
HTTPS request results in reset connection in Windows with Python 3
to make progress with urllib version of my script. I would like to use Requests so I need to figure out how to do the SSL downgrade work around with that module. I will make that a separate thread. If anybody has an answer to that you can post here as well. thx.