How to log on into Costco.com using Python requests - python

I'm trying to automate log-in into Costco.com to check some member only prices.
I used dev tool and the Network tab to identify the request that handles the Logon, from which I inferred the POST URL and the parameters.
Code looks like:
import requests
s = requests.session()
payload = {'logonId': 'email#email.com',
'logonPassword': 'mypassword'
}
#get this data from Google-ing "my user agent"
user_agent = {"User-Agent" : "myusergent"}
url = 'https://www.costco.com/Logon'
response = s.post(url, headers=user_agent,data=payload)
print(response.status_code)
When I run this, it just runs and runs and never returns anything. Waited 5 minutes and still running.
What am I going worng?

maybe you should try to make a get requests to get some cookies before make the post requests, if the post requests doesnt work, maybe you should add a timeout so the script stop and you know that it doesnt work.
r = requests.get(w, verify=False, timeout=10)

This one is tough. Usually, in order to set the proper cookies, a get request to the url is first required. We can go directly to https://www.costco.com/LogonForm so long as we change the user agent from the default python requests one. This is accomplished as follows:
import requests
agent = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/85.0.4183.102 Safari/537.36"
)
with requests.Session() as s:
headers = {'user-agent': agent}
s.headers.update(headers)
logon = s.get('https://www.costco.com/LogonForm')
# Saved the cookies in variable, explanation below
cks = s.cookies
Logon get request is successful, ie status code 200! Taking a look at cks:
print(sorted([c.name for c in cks]))
['C_LOC',
'CriteoSessionUserId',
'JSESSIONID',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'_abck',
'ak_bmsc',
'akaas_AS01',
'bm_sz',
'client-zip-short']
Then using the inspect network in google chrome and clicking login yields the following form data for the post in order to login. (place this below cks)
data = {'logonId': username,
'logonPassword': password,
'reLogonURL': 'LogonForm',
'isPharmacy': 'false',
'fromCheckout': '',
'authToken': '-1002,5M9R2fZEDWOZ1d8MBwy40LOFIV0=',
'URL':'Lw=='}
login = s.post('https://www.costco.com/Logon', data=data, allow_redirects=True)
However, simply trying this makes the request just sit there and infinitely redirect.
Using burp suite, I stepped into the post and and found the post request when done via browser. This post has many more cookies than obtained in the initial get request.
Quite a few more in fact
# cookies is equal to the curl from burp, then converted curl to python req
sorted(cookies.keys())
['$JSESSIONID',
'AKA_A2',
'AMCVS_97B21CFE5329614E0A490D45%40AdobeOrg',
'AMCV_97B21CFE5329614E0A490D45%40AdobeOrg',
'C_LOC',
'CriteoSessionUserId',
'OptanonConsent',
'RT',
'WAREHOUSEDELIVERY_WHS',
'WC_ACTIVEPOINTER',
'WC_AUTHENTICATION_-1002',
'WC_GENERIC_ACTIVITYDATA',
'WC_PERSISTENT',
'WC_SESSION_ESTABLISHED',
'WC_USERACTIVITY_-1002',
'WRIgnore',
'WRUIDCD20200731',
'__CT_Data',
'_abck',
'_cs_c',
'_cs_cvars',
'_cs_id',
'_cs_s',
'_fbp',
'ajs_anonymous_id_2',
'ak_bmsc',
'akaas_AS01',
'at_check',
'bm_sz',
'client-zip-short',
'invCheckPostalCode',
'invCheckStateCode',
'mbox',
'rememberedLogonId',
's_cc',
's_sq',
'sto__count',
'sto__session']
Most of these look to be static, however because there are so many its hard to tell which is which and what each is supposed to be. It's here where I myself get stuck, and I am actually really curious how this would be accomplished. In some of the cookie data I can also see some sort of ibm commerce information, so I am linking Prevent Encryption (Krypto) Of Url Paramaters in IBM Commerce Server 6 as its the only other relevant SO answer question pertaining somewhat remotely to this.
Essentially though the steps would be to determine the proper cookies to pass for this post (and then the proper cookies and info for the redirect!). I believe some of these are being set by some js or something since they are not in the get response from the site. Sorry I can't be more help here.
If you absolutely need to login, try using selenium as it simulates a browser. Otherwise, if you just want to check if an item is in stock, this guide uses requests and doesn't need to be logged in https://aryaboudaie.com/python/technical/educational/2020/07/05/using-python-to-buy-a-gift.html

Related

Python requests - Session not capturing response cookies

I'm not sure how else to describe this. I'm trying to log into a website using the requests library with Python but it doesn't seem to be capturing all cookies from when I login and subsequent requests to the site go back to the login page.
The code I'm using is as follows: (with redactions)
with requests.Session() as s:
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
Looking at the developer tools in Chrome. I see the following:
After checking r.cookies it seems only that PHPSESSID was captured there's no sign of the amember_nr cookie.
The value in PyCharm only shows:
{RequestsCookieJar: 1}<RequestsCookieJar[<Cookie PHPSESSID=kjlb0a33jm65o1sjh25ahb23j4 for .website.co.uk/>]>
Why does this code fail to save 'amember_nr' and is there any way to retrieve it?
SOLUTION:
It appears the only way I can get this code to work properly is using Selenium, selecting the elements on the page and automating the typing/clicking. The following code produces the desired result.
from seleniumrequests import Chrome
driver = Chrome()
driver.get('http://www.website.co.uk')
username = driver.find_element_by_xpath("//input[#name='amember_login']")
password = driver.find_element_by_xpath("//input[#name='amember_pass']")
username.send_keys("username")
password.send_keys("password")
driver.find_element_by_xpath("//input[#type='submit']").click() # Page is logged in and all relevant cookies saved.
You can try this:
with requests.Session() as s:
s.get('https://www.website.co.uk/login')
r = s.post('https://www.website.co.uk/login', data={
'amember_login': 'username',
'amember_password': 'password'
})
The get request will set the required cookies.
FYI I would use something like BurpSuite to capture ALL the data being sent to the server and sort out what headers etc are required ... sometimes people/servers to referrer checking, set cookies via JAVA or wonky scripting, even seen java obfuscation and blocking of agent tags not in whitelist etc... it's likely something the headers that the server is missing to give you the cookie.
Also you can have Python use burp as a proxy so you can see exactly what gets sent to the server and the response.
https://github.com/freeload101/Python/blob/master/CS_HIDE/CS_HIDE.py (proxy support )

Request not returning same data as browser

Trying to get some values from Duolingo using Python, but urllib is giving me something different than when I navigate to the url via my browser.
Navigating to a url (https://www.duolingo.com/2017-06-30/users/215344344?fields=xpGoalMetToday) via browser gives: {"xpGoalMetToday": false}.
However, trying via the below script:
import urllib.request
url = 'http://www.duolingo.com/2017-06-30/users/215344344?fields=xpGoalMetToday'
user_agent = '[insert my local user agent copied from browser attempt]'
# header variable
headers = { 'User-Agent' : user_agent, "Cache-Control": "no-cache, max-age=0" }
# creating request
req = urllib.request.Request(url, None, headers)
print(urllib.request.urlopen(req).read())
returns just a blank {}.
As you can tell from the above, I've tried a couple things: adding a user agent, cache control. I've even tried using the response module and adding authentication (didn't work).
Any ideas? Am I missing something?
Actually when I open the link in the browser it show me {}
Maybe you have some kind of cookie set in your browser?

Using Python to request draftkings.com info that requires login?

I'm trying to get contest data from the url: "https://www.draftkings.com/contest/gamecenter/32947401"
If you go to this URL and aren't logged in, it'll just re-direct you to the lobby. If you're logged in, it'll actually show you the contest results.
Here's some things I tried:
-First, I used Chrome's Dev networking tools to watch requests while I manually logged in
-I then tried copying the cookie that I thought contained the authentication info, it was of the form:
'ajs_anonymous_id=%123123123123123, mlc=true; optimizelyEndUserId'
-I then stored that cookie as an Evironment variable and ran this code:
HEADERS= {'cookie': os.environ['MY_COOKIE'] }
requests.get(draft_kings_url, headers= HEADERS)
No luck, this just gave me the lobby.
I then tried request's built in:
HTTPBasicAuth
HTTPDigestAuth
No luck here either.
I'm no python expert by far, and I've pretty much exhausted what I know and the search results I've found. Any ideas?
The tool that you want is selenium. Something along the lines of:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(r"https://www.draftkings.com/contest/gamecenter/32947401" )
username = browser.find_element_by_id("user")
username.send_keys("username")
password = browser.find_element_by_id("password")
password.send_keys("top_secret")
login = selenium.find_element_by_name("login")
login.click()
Use fiddler to see the exact request they are making when you try to log in. Then use Session class in requests package.
import requests
session = requests.Session()
session.get('YOUR_URL_LOGIN_PAGE')
this will save all the cookies from your url in your session variable (Like when you use a browser).
Then make a post request to the login url with appropriate data.
You dont have to manually pass cookie data as it is auto generated when you first visit a website. However you can set some header explicitly like UserAgent etc by:
session.headers.update({'header_name':'header_value'})
HTTPBasicAuth & HTTPDigestAuth might not work based on the website.

Instagram Api Requests Systematically Refused

I have been using client id and auth token of instagram api for a while to make requests with urllib and json. Since a few days, any client id/auth token I create for an instagram account returns systematically "HTTP Error 400: BAD REQUEST" when I make a request, it can be for like, follow, or unfollow, it always returns this error. The script is Python 2.7 based.
It was working great before, and the keys created before this happened still work great! I tried to create new accounts and new keys from usa with proxies, but the error persist..
Here is the part of the code :
user_agent = 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7’
headers = { 'User-Agent' : user_agent,
"Content-type": "application/x-www-form-urlencoded”
def likePicture(pictureId):
liked = 0
try:
urlLike = "https://api.instagram.com/v1/media/%s/likes"
values = {'access_token' : auth_token,
'client_id' : client_id}
newLike = urlLike % (pictureId)
data = urllib.urlencode(values)
req = urllib2.Request(newLike,data,headers)
response = urllib2.urlopen(req)
result = response.read()
dataObj = json.loads(result)
liked = 1
except Exception, e:
print e
return liked
The print e gives me systematically "HTTP Error 400: BAD REQUEST", even if the key is brand new and the account brand new. And this code is working like a charm on older keys (from a week ago).
Any idea or suggestion? Maybe blocked somehow by instagram because I created to many client id/auth token? If it is the case, how to do to resolve this situation? (I already tried to use different proxies, unsuccessful, so how would they track that?). If someone finds a solution to this problem I will be infinitely grateful to him!
Cheers, Kevin
First of all:
You may also receive responses with an HTTP response code of 400 (Bad
Request) if we detect spammy behavior by a person using your app.
These errors are unrelated to rate limiting.
Have you read the "Limits" from API docs?
When calling Instagram API methods it send two HTTP headers:
X-Ratelimit-Remaining: the remaining number of calls available to your app within the 1-hour window
X-Ratelimit-Limit: the total number of calls allowed within the 1-hour window
So check if you've reached the limit.
Keep in mind that multiples calls in a short time window is considered abusive.
Read more:
Limits
P.S: It's not necessary forge headers in order to make API calls! It isn't web scraping!

Python 10054 error when trying to log in to airline website using Requests Library

I'm learning python and as my first project I want to login to several airline websites and scrape my frequent flyer Mile info. I have successfully been able to login and scrape American Airlines and United but I am unable to do it on Delta, USairways, and Britishairways.
The methodology that I have been using is watching network traffic from Fiddler2, Chrome, or Firebug. Wireshark seems too complicated at the moment.
For my script to work with American and United scraping all I did was watch the traffic on fiddler2, copy the FORM DATA and REQUEST HEADER DATA and then use the python 3rd party Requests library to access the data. Very simple. Very Easy. The other airline website are giving me a lot of trouble.
Let's talk about British Airways specifically. Below are pictures of the FORM DATA and REQUEST HEADER DATA that I took from fiddler when I logged into my dummy BA account. I have also included the test script that I have been using. I wrote two different versions. One using the Requests library and one using urllib. They both produce the same error but I thought I would provide both to make it easier for somebody to help me if they didn't have the Requests library imported. Use the one you would like.
Basically, when I make a request.post I am getting a
10054, 'An existing connection was forcibly closed by the remote host' error.
I have no idea what is going on. Been searching for 3 days and come up with nothing. I hope somebody can help me. The below code is using my dummy BA account info. username:python_noob password:p4ssword. Feel free to use and test it.
Here are some pictures to the fiddler2 data
http://i.imgur.com/iOL91.jpg?1
http://i.imgur.com/meLHL.jpg?1
import requests
import urllib
def get_BA_login_using_requests ():
url_loginSubmit1 = 'https://www.britishairways.com/travel/loginr/public/en_us'
url_viewaccount1 = 'https://www.britishairways.com/travel/viewaccount/public/en_us?eId=106011'
url_viewaccount2 = 'https://www.britishairways.com/travel/viewaccount/execclub/_gf/en_us?eId=106011'
form_data = {
'Directional_Login':'',
'eId':'109001',
'password':'p4ssword',
'membershipNumber':'python_noob',
}
request_headers= {
'Cache-Control':'max-age=0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding':'gzip,deflate,sdch',
'Accept-Language':'en-US,en;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11',
'Cookie': 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BASessionA=kDtBQWGclJymXtlsTXyYtykDLLsy3KQKvd3wMrbygd7JZZPJfJz2!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356203603.1356203603.1; __utmb=28787695.1.10.1356203603; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":3,"c":"http://www.britishairways.com/travel/home/public/en_us","pv":1,"lc":{"d0":{"v":1,"s":false}},"cd":0}',
'Content-Length':'78',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://www.britishairways.com',
'Referer':'https://www.britishairways.com/travel/loginr/public/en_us',
'Connection':'keep-alive',
'Host':'www.britishairways.com',
}
print ('Trying to login to British Airways using Requests Library (takes about 1 minute for error to occur)')
try:
r1 = requests.post(url_loginSubmit1, data = form_data, headers = request_headers)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def get_BA_login_using_urllib():
"""Tries to request the URL. Returns True if the request was successful; false otherwise.
https://www.britishairways.com/travel/loginr/public/en_us
response -- After the function has finished, will possibly contain the response to the request.
"""
response = None
print ('Trying to login to British Airways using urllib Library (takes about 1 minute for error to occur)')
# Create request to URL.
req = urllib.request.Request("https://www.britishairways.com/travel/loginr/public/en_us")
# Set request headers.
req.add_header("Connection", "keep-alive")
req.add_header("Cache-Control", "max-age=0")
req.add_header("Origin", "https://www.britishairways.com")
req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11")
req.add_header("Content-Type", "application/x-www-form-urlencoded")
req.add_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
req.add_header("Referer", "https://www.britishairways.com/travel/home/public/en_us")
req.add_header("Accept-Encoding", "gzip,deflate,sdch")
req.add_header("Accept-Language", "en-US,en;q=0.8")
req.add_header("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3")
req.add_header("Cookie", 'BIGipServerba.com-port80=997762723.20480.0000; v1st=EDAB42A278BE913B; BIGipServerba.com-port81=997762723.20736.0000; BA_COUNTRY_CHOICE_COOKIE=us; Allow_BA_Cookies=accepted; BA_COUNTRY_CHOICE_COOKIE=US; BAAUTHKEY=BA4760A2434L; BA_ENROLMENT_APPLICATION_COOKIE=1356219482491AT; BASessionA=wKG4QWGSTggNGnsLTnrgQnMxGMyzvspGLCYpjdSZgv2pSgYN1YRn!-1893405604!clx42al01-wl01.baplc.com!7001!-1!-407095676!clx43al01-wl01.baplc.com!7001!-1; HOME_AD_DISPLAY=1; previousCountryInfo=us; opvsreferrer=functional/home/home_us.jsp; realreferrer=; __utma=28787695.2144676753.1356203603.1356216924.1356219076.6; __utmb=28787695.15.10.1356219076; __utmc=28787695; __utmz=28787695.1356203603.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fsr.s={"v":-2,"rid":"d464cf7-82608645-1f31-3926-49807","ru":"http://www.britishairways.com/travel/globalgateway.jsp/global/public/en_","r":"www.britishairways.com","st":"","to":5,"c":"https://www.britishairways.com/travel/home/public/en_us","pv":31,"lc":{"d0":{"v":31,"s":true}},"cd":0,"f":1356219889982,"sd":0}')
# Set request body.
body = b"Directional_Login=&eId=109001&password=p4ssword&membershipNumber=python_noob"
# Get response to request.
try:
response = urllib.request.urlopen(req, body)
print ('it worked')
except Exception as e:
msg = "An exception of type {0} occured, these were the arguments:\n{1!r}"
print (msg.format(type(e).__name__, e.args))
return
def main():
get_BA_login_using_urllib()
print()
get_BA_login_using_requests()
return
main()
Offhand, I'd say you managed to create a malformed or illegal request, and the server (or even proxy) on the other side simply refuses to process it.
Do use the requests library. It's excellent. Urllib is quite outdated (and, well, not fun to use at all.)
Get rid of nearly all of the custom headers. In particular Content-Length, Keep-Alive, Connection and Cookie. The first three you should let the requests library take care of, as they're part of the HTTP 1.1 protocol. With regards to the Cookie: that, too, will be handled by the requests library, depending on how you use sessions. (You might want to consult the documentation there.) Without having any previous cookies, you'll probably get something like a 401 when you try to access the site, or you'll be (transparently) redirected to a login-page. Doing the login will set the correct cookies, after which you should be able to re-try the original request.
If you use a dict for the post-data, you won't need the Content-Type header either. You might want to experiment with using unicode-values in said dict. I've found that that sometimes made a difference.
In other words: try to remove as much as you can, and then build it up from there. Doing things like this typically should not cost more than a handful of lines. Now, scraping a web page, that's another matter: try 'beautifulsoup' for that.
P.S.: Don't ever post cookie-data on public forums: they might contain personal or otherwise sensitive data that shady characters might be able to abuse.
It seems there is a bug in the windows versions of Python 3.3 that is the cause of my problem. I used the answer from here
HTTPS request results in reset connection in Windows with Python 3
to make progress with urllib version of my script. I would like to use Requests so I need to figure out how to do the SSL downgrade work around with that module. I will make that a separate thread. If anybody has an answer to that you can post here as well. thx.

Categories

Resources